<div dir="ltr"><div><div><div><div><div><div><div>Hello Alex,<br><br></div> This is a strange error if you are using only native Flash. I currently control Flash from Python by forking to make threads to run Flash under, but this is purposely done at the beginning of a run. It shouldn't occur during a run (unless you have made processes/threads to handle your N-body). If you are spawning processes during the run, you can safely turn off the fork warning (which is what I do in my runs) by calling mpirun as detailed here <a href="https://www.open-mpi.org/faq/?category=tuning#setting-mca-params">https://www.open-mpi.org/faq/?category=tuning#setting-mca-params</a> which should look something like:<br><br></div>mpirun --mca <span class="inbox-inbox-m_781603564051877588s1">mpi_warn_on_fork 0 </span>-np 96 ./flash4<br><br></div>Otherwise I'd make sure you have gdb debugging on for Flash during compile and either look at the core dump file with gdb or attach gdb to a running process to investigate what MPI is doing the moment it tries to fork. Some helpful links for doing this:<br><br></div>Turn on core dumps: <a href="http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault">http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault</a><br><br></div>Use gdb with Open-MPI by attaching to running processes: <a href="https://www.open-mpi.org/faq/?category=debugging#serial-debuggers">https://www.open-mpi.org/faq/?category=debugging#serial-debuggers</a> <br>also: <a href="http://stackoverflow.com/questions/329259/how-do-i-debug-an-mpi-program">http://stackoverflow.com/questions/329259/how-do-i-debug-an-mpi-program</a><br><br></div>Debugging MPI programs is a bit of a black art and a bit different than usual debugging. It helps to have lot of patience (and sometimes a plush toy to toss!). Best of luck.<br><br></div><div>Cordially,<br><br></div><div>Joshua Wall<br></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Apr 11, 2017 at 3:03 PM Alexander Sheardown <<a href="mailto:A.Sheardown@2011.hull.ac.uk">A.Sheardown@2011.hull.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div style="direction:ltr;font-family:Tahoma;color:#000000;font-size:10pt">Hello Everyone,
<div><br>
</div>
<div>I am running N-Body + Hydro galaxy cluster merger simulations but I am running into problems when trying to run with higher levels of refinement.</div>
<div><br>
My simulation has a box size 8 Mpc x 8 Mpc and contains 2 million particles and is refining on density. <span style="font-size:10pt">If I run the simulation on a maximum refinement level of 6, the simulation runs fine and completes its run. However if I turn
the max refine level up to 7 or 8, the simulation only gets so far (this varies, it doesn't stop at the same point everytime) and exits with the MPI error in the output file:</span></div>
<div><br>
</div>
<div>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">An MPI process has executed an operation involving a call to the</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">"fork()" system call to create a child process. Open MPI is currently</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">operating in a condition that could result in memory corruption or</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">other system errors; your MPI job may hang, crash, or produce silent</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">data corruption. The use of fork() (or system() or other calls that</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">create child processes) is strongly discouraged. </span></p>
<p class="m_781603564051877588p2"><span class="m_781603564051877588s1"></span><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">The process that invoked fork was:</span></p>
<p class="m_781603564051877588p2"><span class="m_781603564051877588s1"></span><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"> Local host: c127 (PID 108285)</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"> MPI_COMM_WORLD rank: 414</span></p>
<p class="m_781603564051877588p2"><span class="m_781603564051877588s1"></span><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">If you are *absolutely sure* that your application will successfully</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">and correctly survive a call to fork(), you may disable this warning</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">by setting the mpi_warn_on_fork MCA parameter to 0.</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">mpirun noticed that process rank 429 with PID 0 on node c128 exited on signal 11 (Segmentation fault).</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"></span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:small;font-variant-ligatures:no-common-ligatures"><br>
</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:small;font-variant-ligatures:no-common-ligatures">..and the error file shows:</span></p>
<p class="m_781603564051877588p1"><span style="font-variant-ligatures:no-common-ligatures">Backtrace for this error:</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#0 0x7F073AAD9417</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#1 0x7F073AAD9A2E</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#2 0x7F0739DC124F</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#3 0x454665 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#4 0x4759AE in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:370</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#5 0x582550 in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#6 0x5DB143 in amr_guardcell_ at mpi_amr_guardcell.F90:299</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#7 0x41BFDA in grid_fillguardcells_ at Grid_fillGuardCells.F90:456</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#8 0x5569A3 in hy_ppm_sweep_ at hy_ppm_sweep.F90:229</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#9 0x430A3A in hydro_ at Hydro.F90:87</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#10 0x409904 in driver_evolveflash_ at Driver_evolveFlash.F90:275</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#11 0x404B16 in flash at Flash.F90:51</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"></span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#12 0x7F0739DADB34</span></p>
</div>
<div><br>
</div>
<div><span style="font-size:10pt"><br>
</span></div>
<div><span style="font-size:10pt">Since this showed a memory issue </span><span style="font-size:10pt">I doubled the number of nodes I am running on but the simulation fails straight away with this in the output file (nothing appears in the error file):</span></div>
<div>
<p class="m_781603564051877588p1"><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">mpirun noticed that process rank 980 with PID 0 on node c096 exited on signal 9 (Killed).</span></p>
<p class="m_781603564051877588p1"></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><br>
</p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:13.3333px"><br>
</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:13.3333px">In terms of the simulation itself, looking at the output data that I can get out everything looks fine in terms of the physics, so I can't decide whether</span><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:10pt"> this
is a problem with my simulation or the MPI I am using.</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:10pt"><br>
</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0);font-family:Tahoma;font-size:10pt">Are there any parameters I could include in the simulation that would print out say the number of particles per processor at a given time? or any other diagnostics to do
with particles? One thought I am wondering is are there too many particles landing on a processor or something related.</span></p>
</div>
<div>
<div><br>
</div>
<div>For info if anyone has had related MPI problems with FLASH the modules I am using are:</div>
<div><span style="color:rgb(80,80,80);font-family:Menlo;font-size:11px;font-variant-ligatures:no-common-ligatures">hdf5/gcc/openmpi/1.8.16</span></div>
<div>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">openmpi/gcc/1.10.5</span></p>
</div>
<div><br>
</div>
<div><span style="font-size:10pt">I would greatly appreciate any thoughts or opinions on what could cause it to fail with higher levels of refinement.</span></div>
<div><br>
</div>
<div>Many Thanks,</div>
<div>Alex</div>
<div><br>
<div style="font-family:Tahoma;font-size:13px">
<div style="font-family:Tahoma;font-size:13px">
<div style="font-family:Tahoma;font-size:13px">
<div style="font-size:12px">
<hr>
</div>
<div style="font-size:12px"><b><font face="Arial">Mr Alex Sheardown</font></b></div>
<div style="font-size:12px"><font face="Arial">Postgraduate Research Student</font></div>
<div style="font-size:12px"><font face="Arial"><br>
</font></div>
<div style="font-size:12px"><font face="Arial">E.A. Milne Centre for Astrophysics</font></div>
<div style="font-size:12px"><font face="Arial">University of Hull</font></div>
<div style="font-size:12px"><font face="Arial">Cottingham Road</font></div>
<div style="font-size:12px"><font face="Arial">Kingston upon Hull</font></div>
<div style="font-size:12px"><font face="Arial">HU6 7RX</font></div>
<div style="font-size:12px"><font face="Arial"><br>
</font></div>
<div style="font-size:12px"><a href="https://mail.hull.ac.uk/owa/redir.aspx?REF=_wok6-STjTeTuQlVeEE3DYaVcvKXJXINIb2ho14u7UoAceEsmknTCAFodHRwOi8vd3d3Lm1pbG5lLmh1bGwuYWMudWs." target="_blank"><font face="Arial">www.milne.hull.ac.uk</font></a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
**************************************************<br>
To view the terms under which this email is<br>
distributed, please go to<br>
<a href="http://www2.hull.ac.uk/legal/disclaimer.aspx" rel="noreferrer" target="_blank">http://www2.hull.ac.uk/legal/disclaimer.aspx</a><br>
**************************************************</blockquote></div><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature"><div dir="ltr"><div><div><div><div><div>Joshua Wall<br></div>Doctoral Candidate<br></div>Department of Physics<br></div>Drexel University<br></div>3141 Chestnut Street<br></div>Philadelphia, PA 19104<br></div></div>