[FLASH-USERS] Problems running at higher levels of refinement.

Alexander Sheardown A.Sheardown at 2011.hull.ac.uk
Tue Apr 11 15:03:10 EDT 2017


Hello Everyone,

I am running N-Body + Hydro galaxy cluster merger simulations but I am running into problems when trying to run with higher levels of refinement.

My simulation has a box size 8 Mpc x 8 Mpc and contains 2 million particles and is refining on density. If I run the simulation on a maximum refinement level of 6, the simulation runs fine and completes its run. However if I turn the max refine level up to 7 or 8, the simulation only gets so far (this varies, it doesn't stop at the same point everytime) and exits with the MPI error in the output file:


--------------------------------------------------------------------------

An MPI process has executed an operation involving a call to the

"fork()" system call to create a child process.  Open MPI is currently

operating in a condition that could result in memory corruption or

other system errors; your MPI job may hang, crash, or produce silent

data corruption.  The use of fork() (or system() or other calls that

create child processes) is strongly discouraged.


The process that invoked fork was:


  Local host:          c127 (PID 108285)

  MPI_COMM_WORLD rank: 414


If you are *absolutely sure* that your application will successfully

and correctly survive a call to fork(), you may disable this warning

by setting the mpi_warn_on_fork MCA parameter to 0.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpirun noticed that process rank 429 with PID 0 on node c128 exited on signal 11 (Segmentation fault).

--------------------------------------------------------------------------


..and the error file shows:

Backtrace for this error:

#0  0x7F073AAD9417

#1  0x7F073AAD9A2E

#2  0x7F0739DC124F

#3  0x454665 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356

#4  0x4759AE in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:370

#5  0x582550 in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743

#6  0x5DB143 in amr_guardcell_ at mpi_amr_guardcell.F90:299

#7  0x41BFDA in grid_fillguardcells_ at Grid_fillGuardCells.F90:456

#8  0x5569A3 in hy_ppm_sweep_ at hy_ppm_sweep.F90:229

#9  0x430A3A in hydro_ at Hydro.F90:87

#10  0x409904 in driver_evolveflash_ at Driver_evolveFlash.F90:275

#11  0x404B16 in flash at Flash.F90:51

#12  0x7F0739DADB34


Since this showed a memory issue I doubled the number of nodes I am running on but the simulation fails straight away with this in the output file (nothing appears in the error file):


--------------------------------------------------------------------------

mpirun noticed that process rank 980 with PID 0 on node c096 exited on signal 9 (Killed).

--------------------------------------------------------------------------



In terms of the simulation itself, looking at the output data that I can get out everything looks fine in terms of the physics, so I can't decide whether this is a problem with my simulation or the MPI I am using.


Are there any parameters I could include in the simulation that would print out say the number of particles per processor at a given time? or any other diagnostics to do with particles? One thought I am wondering is are there too many particles landing on a processor or something related.

For info if anyone has had related MPI problems with FLASH the modules I am using are:
hdf5/gcc/openmpi/1.8.16

openmpi/gcc/1.10.5

I would greatly appreciate any thoughts or opinions on what could cause it to fail with higher levels of refinement.

Many Thanks,
Alex

________________________________
Mr Alex Sheardown
Postgraduate Research Student

E.A. Milne Centre for Astrophysics
University of Hull
Cottingham Road
Kingston upon Hull
HU6 7RX

www.milne.hull.ac.uk<https://mail.hull.ac.uk/owa/redir.aspx?REF=_wok6-STjTeTuQlVeEE3DYaVcvKXJXINIb2ho14u7UoAceEsmknTCAFodHRwOi8vd3d3Lm1pbG5lLmh1bGwuYWMudWs.>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20170411/9fe82a4d/attachment.htm>
-------------- next part --------------
**************************************************
To view the terms under which this email is 
distributed, please go to 
http://www2.hull.ac.uk/legal/disclaimer.aspx
**************************************************


More information about the flash-users mailing list