[FLASH-USERS] SIGSEGV 174 error in Flash 4.5
Plechaty, Christopher
cplechaty at RiversideResearch.org
Mon Mar 18 09:13:44 EDT 2019
To all,
I also wanted to point out that I experimented this weekend in an attempt to figure out this SIGSEGV 174 error, and I have uncovered another one that occurs in an entirely different part of the FLASH code. It appears to occur in the Guard Cell code. I am not sure if this error is related to my original error or not, but I will post about it here.
I found that the error I list below occurs whenever I attempt to start a new simulation with my lasslab 3D deck. Experimenting with this error, I have noticed that the time this error occurs in the simulation appears to be a function of the number of blocks in my problem (nblockx, nblocky, nblockz), and the refinement level.
I searched other posts I found some issues with guard cells in Flash 4.3 that were solved utilizing the –O0 optimization flag. Unfortunately, that does not work here. Has anyone seen this error before?
Error:
flash4debug 0000000000E4EA7D Unknown Unknown Unknown
libpthread-2.17.s 00002AF5FDEFC6D0 Unknown Unknown Unknown
flash4debug 00000000004CCF2B amr_1blk_cc_cp_re 355 amr_1blk_cc_cp_remote.F90
flash4debug 00000000004FF3F1 amr_1blk_guardcel 750 amr_1blk_guardcell_srl.F90
flash4debug 000000000072278F amr_1blk_guardcel 737 mpi_amr_1blk_guardcell.F90
flash4debug 00000000007AFF91 amr_guardcell_ 298 mpi_amr_guardcell.F90
flash4debug 0000000000460866 grid_fillguardcel 459 Grid_fillGuardCells.F90
flash4debug 0000000000552D3A diff_advancetherm 194 diff_advanceTherm.F90
flash4debug 0000000000408D26 diffuse_ 76 Diffuse.F90
flash4debug 000000000041824D driver_evolveflas 302 Driver_evolveFlash.F90
flash4debug 000000000044D805 MAIN__ 51 Flash.F90
flash4debug 0000000000407E9E Unknown Unknown Unknown
libc-2.17.so 00002AF5FE808445 __libc_start_main Unknown Unknown
flash4debug 0000000000407DA9 Unknown Unknown Unknown
-Chris
From: Plechaty, Christopher
Sent: Monday, March 18, 2019 8:38 AM
To: 'Ryan Farber'
Cc: flash-users at flash.uchicago.edu
Subject: RE: [FLASH-USERS] SIGSEGV 174 error in Flash 4.5
Hi Ryan,
I have over 15 TB of free space; so unfortunately (or maybe fortunately), I do not have a space problem.
If I restart from a check point, the code seems to crash only when it writes. This is highly suspicious. I will try your suggestions and get back to you.
-Chris
From: Ryan Farber [mailto:rjfarber at umich.edu]
Sent: Saturday, March 16, 2019 4:05 PM
To: Plechaty, Christopher
Cc: flash-users at flash.uchicago.edu
Subject: Re: [FLASH-USERS] SIGSEGV 174 error in Flash 4.5
Hi Chris,
I've seen plenty of segfaults though I haven't seen one happen during I/O before. Could you check if your cluster ran out of disk? (E.g., run "df -h") It is very odd if you were writing checkpoint files perfectly fine earlier and nothing changed and it suddenly stopped working. If you restart from the checkpoint before the issue does it crash at the same point or randomly later on? If it crashes at the same point, then you could use DDT (parallel debugging tool) or litter io_xfer_mesh_dataset.c with "printf" statements to localize the error (you might have to put "#include <stdio.h>" at the top of the file (that is, the version of the file you'd copy from object to your problem directory to preserve the original)).
Best,
--------
Ryan
On Fri, Mar 15, 2019 at 9:17 AM Plechaty, Christopher <cplechaty at riversideresearch.org> wrote:
To all,
I am experiencing a SIGSEGV 174 error that I cannot seem to figure it out.
I have been running the lasslab example (in 3D) which has been modified for my purposes. The simulation runs great, and happily dumps restart and plot data files for a long time. However, after some time, FLASH decides to write a restart dump and suddenly crashes due to a SIGSEGV 174 error (placed below).
I am running:
Flash 4.5
Intel compliers and mpi (2018)
HDF 1.8.13
Hypre 2.11.2
My cluster runs Centos 7.
The error is as follows:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
flash4debug 0000000000E4EA7D Unknown Unknown Unknown
libpthread-2.17.s 00002ACEDECD56D0 Unknown Unknown Unknown
libmpifort.so.12. 00002ACEDDB60460 __I_MPI___intel_a Unknown Unknown
libmpi.so.12.0 00002ACEDDEFE836 Unknown Unknown Unknown
libmpi.so.12 00002ACEDDF08184 ADIOI_GEN_WriteSt Unknown Unknown
libmpi.so.12.0 00002ACEDE326ABC Unknown Unknown Unknown
libmpi.so.12 00002ACEDE327B35 PMPI_File_write_a Unknown Unknown
flash4debug 000000000098661C Unknown Unknown Unknown
flash4debug 0000000000981469 Unknown Unknown Unknown
flash4debug 000000000096F95A Unknown Unknown Unknown
flash4debug 0000000000972536 Unknown Unknown Unknown
flash4debug 000000000095036D Unknown Unknown Unknown
flash4debug 00000000009509B7 Unknown Unknown Unknown
flash4debug 000000000094D295 Unknown Unknown Unknown
flash4debug 0000000000706157 Unknown Unknown Unknown
flash4debug 00000000007152FD Unknown Unknown Unknown
flash4debug 0000000000714B10 io_xfer_mesh_data 362 io_xfer_mesh_data.F90
flash4debug 0000000000713FA1 io_writedata_ 341 io_writeData.F90
flash4debug 000000000049A5B1 io_writecheckpoin 129 IO_writeCheckpoint.F90
flash4debug 000000000049962F io_output_ 267 IO_output.F90
flash4debug 0000000000417F6F driver_evolveflas 423 Driver_evolveFlash.F90
flash4debug 000000000044D805 MAIN__ 51 Flash.F90
flash4debug 0000000000407E9E Unknown Unknown Unknown
libc-2.17.so 00002ACEDF5E1445 __libc_start_main Unknown Unknown
flash4debug 0000000000407DA9 Unknown Unknown Unknown
Has anyone seen this type of error before?
-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190318/382b8f9a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5241 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190318/382b8f9a/attachment-0001.p7s>
More information about the flash-users
mailing list