[FLASH-USERS] SIGSEGV 174 error in Flash 4.5

Plechaty, Christopher cplechaty at RiversideResearch.org
Mon Mar 18 14:04:38 EDT 2019


Klaus,
I am not sure how to control the treading for copying. I did try setting every single flag in my makefile to -O0 (hoping that it would force the complier to do the simplest things), with no results.

In addition, I tried to (1) control the threading via openmp, and (2) I tried another complier combination. Unfortunately, neither of these attempts worked. I obtained the same error from my openmpi/gcc combination that I have the relevant tools already complied with (hypre, hdf, et al). 

The output I obtain for the openmpi/gcc combination contains the similar line (amr_1blk_cc_cp_remote.F90:356), and I must say that I am concerned. I hope that there is not something strange with my system. Here is the error in its full glory: 

Backtrace for this error:
#0  0x2B0F0F016697
#1  0x2B0F0F016CDE
#2  0x2B0F0FCC52EF
#0  0x2B90B9C71697
#1  0x2B90B9C71CDE
#2  0x2B90BA9202EF

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2B77EADD4697
#1  0x2B77EADD4CDE
#2  0x2B77EBA832EF
#3  0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#4  0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#3  0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#4  0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#3  0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#0  0x2B27FFD21697
#1  0x2B27FFD21CDE
#2  0x2B28009D02EF
#4  0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#5  0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#6  0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7  0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#8  0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#9  0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10  0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#11  0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#12  0x407021 in flash at Flash.F90:49
#13  0x2B90BA90C444
#5  0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#5  0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#6  0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7  0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#8  0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#9  0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10  0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#6  0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7  0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#11  0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#8  0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#3  0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#12  0x407021 in flash at Flash.F90:49
#13  0x2B77EBA6F444
#9  0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10  0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#4  0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#11  0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#12  0x407021 in flash at Flash.F90:49
#13  0x2B0F0FCB1444
#5  0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#6  0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7  0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#8  0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#9  0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10  0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#11  0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#12  0x407021 in flash at Flash.F90:49
#13  0x2B28009BC444
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 248 with PID 20782 on node node10 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

-----Original Message-----
From: Klaus Weide [mailto:klaus at flash.uchicago.edu] 
Sent: Monday, March 18, 2019 1:06 PM
To: Plechaty, Christopher
Cc: flash-users at flash.uchicago.edu
Subject: Re: [FLASH-USERS] SIGSEGV 174 error in Flash 4.5

On Mon, 18 Mar 2019, Plechaty, Christopher wrote:

> I searched other posts I found some issues with guard cells in Flash 4.3 that were solved utilizing the –O0 optimization flag. Unfortunately, that does not work here. Has anyone seen this error before? 
> 
>  
> 
> Error:
> 
> flash4debug        0000000000E4EA7D  Unknown               Unknown  Unknown
> 
> libpthread-2.17.s  00002AF5FDEFC6D0  Unknown               Unknown  Unknown

I notice that in both traces you have sent, the 2nd line has 
libpthread-2.17.s .

> flash4debug        00000000004CCF2B  amr_1blk_cc_cp_re         355  amr_1blk_cc_cp_remote.F90

In this case, line number 355 points to the statement

              unk1(ivar_next,ii,jj,kk,idest) =                         &
                       temprecv_buf(indx+ivar)

which is part of a loop nest that copies a potentially large amount of 
data from a receive-buffer.

It is possible that
 * your compiler is trying to optimize the data-copying by automatically
   using threads;
 * something goes wrong with this (occasionally); and
 * you can find some way (compiler flag, maybe env variable) to turn off 
   this behavior (even of -O0 alone does not do that), or maybe use a
   different compiler version or configuration.

Klaus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5241 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190318/bd7dc3a2/attachment.p7s>


More information about the flash-users mailing list