[FLASH-USERS] LaserSlab Segmentation Fault

Descamps, Adrien adescamp at slac.stanford.edu
Thu Jul 2 15:03:02 EDT 2020


Dear Klaus,

Thank you for the response.


  *   I included memory_stat_freq = 1 in the parfile. Please find attached the lasslab.log attached. To me, it seems that this is not a memory problem as I am using 8 GB memory for each processors.
  *   How could I know whether the segmentation fault is coming from the batch system? Please find the message error below

*** Wrote plotfile to lasslab_hdf5_plt_cnt_0006 ****
  iteration, no. not moved =            0          39
  iteration, no. not moved =            1           0
 refined: total leaf blocks =           71
 refined: total blocks =           94
      60 3.0348E-12 3.0448E-13  ( 3.281E-04,  5.953E-03,   0.00    ) |  1.622E-12 9.868E+84 1.014E+88 0.4000000
 *** Wrote checkpoint file to lasslab_hdf5_chk_0003 ****
srun: error: sh02-01n31: task 1: Segmentation fault
srun: Terminating job step 3490332.0
slurmstepd: error: *** STEP 3490332.0 ON sh02-01n31 CANCELLED AT 2020-07-02T11:04:11 ***
srun: error: sh02-01n31: task 0: Segmentation fault


  *   It seems to run fine with I/O. I set the maximum number of steps to 100 and it runs fine for all of it.

Thank you for your help,
Adrien


________________________________
From: flash-users-bounces at flash.uchicago.edu <flash-users-bounces at flash.uchicago.edu> on behalf of Klaus Weide <klaus at flash.uchicago.edu>
Sent: Thursday, July 2, 2020 10:48 AM
To: flash-users at flash.uchicago.edu <flash-users at flash.uchicago.edu>
Subject: Re: [FLASH-USERS] LaserSlab Segmentation Fault

> I am running into segmentation fault when running the LaserSlab example in parallel. I already posted this problem a week ago and I conducted more tests to try to figure out where the issue might be coming from. My initial guess is that the segmentation might be happening when writing part the hdf5 files.

1. Make sure you don't have a problem with memory.
Set memory_stat_freq = 1 (or so) in the parfile,
and check the numbers in the log file whether memory
use grows significantly.

2. You still haven't shared the actual abort message
for standard output or error files.
You seem to run under a batch system. Are you sure that that
isn't killing your runs?

3. I see no indications that the abort is directly caused by HDF5 IO.
But you could try to setup with +noio (you may also have to add something
like --without-unit=.../LaserIO for +noio to take effect), and see
whether the code runs successfully.

Klaus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20200702/4492e5e6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lasslab.log
Type: application/octet-stream
Size: 113449 bytes
Desc: lasslab.log
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20200702/4492e5e6/attachment.obj>


More information about the flash-users mailing list