[FLASH-USERS] SIGSEGV 174 error in Flash 4.5
Plechaty, Christopher
cplechaty at RiversideResearch.org
Mon Mar 18 14:04:38 EDT 2019
Klaus,
I am not sure how to control the treading for copying. I did try setting every single flag in my makefile to -O0 (hoping that it would force the complier to do the simplest things), with no results.
In addition, I tried to (1) control the threading via openmp, and (2) I tried another complier combination. Unfortunately, neither of these attempts worked. I obtained the same error from my openmpi/gcc combination that I have the relevant tools already complied with (hypre, hdf, et al).
The output I obtain for the openmpi/gcc combination contains the similar line (amr_1blk_cc_cp_remote.F90:356), and I must say that I am concerned. I hope that there is not something strange with my system. Here is the error in its full glory:
Backtrace for this error:
#0 0x2B0F0F016697
#1 0x2B0F0F016CDE
#2 0x2B0F0FCC52EF
#0 0x2B90B9C71697
#1 0x2B90B9C71CDE
#2 0x2B90BA9202EF
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2B77EADD4697
#1 0x2B77EADD4CDE
#2 0x2B77EBA832EF
#3 0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#4 0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#3 0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#4 0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#3 0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#0 0x2B27FFD21697
#1 0x2B27FFD21CDE
#2 0x2B28009D02EF
#4 0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#5 0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#6 0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7 0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#8 0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#9 0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10 0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#11 0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#12 0x407021 in flash at Flash.F90:49
#13 0x2B90BA90C444
#5 0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#5 0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#6 0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7 0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#8 0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#9 0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10 0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#6 0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7 0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#11 0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#8 0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#3 0x4754F5 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356
#12 0x407021 in flash at Flash.F90:49
#13 0x2B77EBA6F444
#9 0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10 0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#4 0x496AAB in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753
#11 0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#12 0x407021 in flash at Flash.F90:49
#13 0x2B0F0FCB1444
#5 0x5B0F2C in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743
#6 0x5EF7FF in amr_guardcell_ at mpi_amr_guardcell.F90:301
#7 0x434870 in grid_fillguardcells_ at Grid_fillGuardCells.F90:460
#8 0x441138 in grid_markrefinederefine_ at Grid_markRefineDerefine.F90:106
#9 0x536E17 in gr_expanddomain_ at gr_expandDomain.F90:212
#10 0x440D00 in grid_initdomain_ at Grid_initDomain.F90:98
#11 0x413D55 in driver_initflash_ at Driver_initFlash.F90:166
#12 0x407021 in flash at Flash.F90:49
#13 0x2B28009BC444
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 248 with PID 20782 on node node10 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
-----Original Message-----
From: Klaus Weide [mailto:klaus at flash.uchicago.edu]
Sent: Monday, March 18, 2019 1:06 PM
To: Plechaty, Christopher
Cc: flash-users at flash.uchicago.edu
Subject: Re: [FLASH-USERS] SIGSEGV 174 error in Flash 4.5
On Mon, 18 Mar 2019, Plechaty, Christopher wrote:
> I searched other posts I found some issues with guard cells in Flash 4.3 that were solved utilizing the –O0 optimization flag. Unfortunately, that does not work here. Has anyone seen this error before?
>
>
>
> Error:
>
> flash4debug 0000000000E4EA7D Unknown Unknown Unknown
>
> libpthread-2.17.s 00002AF5FDEFC6D0 Unknown Unknown Unknown
I notice that in both traces you have sent, the 2nd line has
libpthread-2.17.s .
> flash4debug 00000000004CCF2B amr_1blk_cc_cp_re 355 amr_1blk_cc_cp_remote.F90
In this case, line number 355 points to the statement
unk1(ivar_next,ii,jj,kk,idest) = &
temprecv_buf(indx+ivar)
which is part of a loop nest that copies a potentially large amount of
data from a receive-buffer.
It is possible that
* your compiler is trying to optimize the data-copying by automatically
using threads;
* something goes wrong with this (occasionally); and
* you can find some way (compiler flag, maybe env variable) to turn off
this behavior (even of -O0 alone does not do that), or maybe use a
different compiler version or configuration.
Klaus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5241 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190318/bd7dc3a2/attachment-0001.p7s>
More information about the flash-users
mailing list