[FLASH-USERS] SIGSEGV 174 error in Flash 4.5
Ryan Farber
rjfarber at umich.edu
Sat Mar 16 16:04:48 EDT 2019
Hi Chris,
I've seen plenty of segfaults though I haven't seen one happen during I/O
before. Could you check if your cluster ran out of disk? (E.g., run "df
-h") It is very odd if you were writing checkpoint files perfectly fine
earlier and nothing changed and it suddenly stopped working. If you restart
from the checkpoint before the issue does it crash at the same point or
randomly later on? If it crashes at the same point, then you could use DDT
(parallel debugging tool) or litter io_xfer_mesh_dataset.c with "printf"
statements to localize the error (you might have to put "#include
<stdio.h>" at the top of the file (that is, the version of the file you'd
copy from object to your problem directory to preserve the original)).
Best,
--------
Ryan
On Fri, Mar 15, 2019 at 9:17 AM Plechaty, Christopher <
cplechaty at riversideresearch.org> wrote:
> To all,
>
>
>
> I am experiencing a SIGSEGV 174 error that I cannot seem to figure it out.
>
>
>
> I have been running the lasslab example (in 3D) which has been modified
> for my purposes. The simulation runs great, and happily dumps restart and
> plot data files for a long time. However, after some time, FLASH decides to
> write a restart dump and suddenly crashes due to a SIGSEGV 174 error
> (placed below).
>
>
>
> I am running:
>
> Flash 4.5
>
> Intel compliers and mpi (2018)
>
> HDF 1.8.13
>
> Hypre 2.11.2
>
>
>
> My cluster runs Centos 7.
>
>
>
> The error is as follows:
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>
> Image PC Routine Line
> Source
>
> flash4debug 0000000000E4EA7D Unknown Unknown Unknown
>
> libpthread-2.17.s 00002ACEDECD56D0 Unknown Unknown Unknown
>
> libmpifort.so.12. 00002ACEDDB60460 __I_MPI___intel_a Unknown Unknown
>
> libmpi.so.12.0 00002ACEDDEFE836 Unknown Unknown Unknown
>
> libmpi.so.12 00002ACEDDF08184 ADIOI_GEN_WriteSt Unknown Unknown
>
> libmpi.so.12.0 00002ACEDE326ABC Unknown Unknown Unknown
>
> libmpi.so.12 00002ACEDE327B35 PMPI_File_write_a Unknown Unknown
>
> flash4debug 000000000098661C Unknown Unknown Unknown
>
> flash4debug 0000000000981469 Unknown Unknown Unknown
>
> flash4debug 000000000096F95A Unknown Unknown Unknown
>
> flash4debug 0000000000972536 Unknown Unknown Unknown
>
> flash4debug 000000000095036D Unknown Unknown Unknown
>
> flash4debug 00000000009509B7 Unknown Unknown Unknown
>
> flash4debug 000000000094D295 Unknown Unknown Unknown
>
> flash4debug 0000000000706157 Unknown Unknown Unknown
>
> flash4debug 00000000007152FD Unknown Unknown Unknown
>
> flash4debug 0000000000714B10 io_xfer_mesh_data 362
> io_xfer_mesh_data.F90
>
> flash4debug 0000000000713FA1 io_writedata_ 341
> io_writeData.F90
>
> flash4debug 000000000049A5B1 io_writecheckpoin 129
> IO_writeCheckpoint.F90
>
> flash4debug 000000000049962F io_output_ 267
> IO_output.F90
>
> flash4debug 0000000000417F6F driver_evolveflas 423
> Driver_evolveFlash.F90
>
> flash4debug 000000000044D805 MAIN__ 51
> Flash.F90
>
> flash4debug 0000000000407E9E Unknown Unknown Unknown
>
> libc-2.17.so 00002ACEDF5E1445 __libc_start_main Unknown
> Unknown
>
> flash4debug 0000000000407DA9 Unknown Unknown Unknown
>
>
>
> Has anyone seen this type of error before?
>
> -Chris
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190316/edf856a2/attachment-0001.htm>
More information about the flash-users
mailing list