<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" id="owaParaStyle"></style>
</head>
<body fpstyle="1" ocsi="0">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;"><span style="font-size: 13.3333px;">Hello Joshua,</span>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">Thanks for the reply.</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">We appear to have a workaround for the MPI forking problem by reducing the number of CPUs per node, this has solved the problem so far. The only issue now is the simulation will stop at some point and give this error in the
output file:</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">--------------------------------------------------------------------------</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">mpirun noticed that process rank 355 with PID 0 on node c069 exited on signal 11 (Segmentation fault).</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">--------------------------------------------------------------------------</span></p>
<div><br>
</div>
<div>with the error file showing:</div>
<div><br>
</div>
<div>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">Program received signal SIGSEGV: Segmentation fault - invalid memory reference.</span></p>
<p class="p2" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80); min-height: 13px;">
<span class="s1"></span><br>
</p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">Backtrace for this error:</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#0 0x7F57EC0A7467</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#1 0x7F57EC0A7AAE</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#2 0x7F57EB39224F</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#3 0x4595CD in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#4 0x47F686 in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:753</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#5 0x59DA10 in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#6 0x5F7273 in amr_guardcell_ at mpi_amr_guardcell.F90:301</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#7 0x41CC9A in grid_fillguardcells_ at Grid_fillGuardCells.F90:460</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#8 0x56FC61 in hy_uhd_unsplit_ at hy_uhd_unsplit.F90:296</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#9 0x437645 in hydro_ at Hydro.F90:67</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#10 0x409FC1 in driver_evolveflash_ at Driver_evolveFlash.F90:290</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#11 0x404CB6 in flash at Flash.F90:51</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1">#12 0x7F57EB37EB34</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1"><br>
</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span class="s1"><font size="2" face="Tahoma"><font color="#000000">There seems to be some issue transferring guard cell information between blocks. I had come across this</font> </font></span><a href="http://flash.uchicago.edu/pipermail/flash-users/2015-February/001637.html" target="_blank" style="font-family: Tahoma; font-size: 10pt;">http://flash.uchicago.edu/pipermail/flash-users/2015-February/001637.html</a> <font size="2" face="Tahoma" color="#000000">which
details an error with gcc 4.9.1 (although I am now using</font> <font face="Tahoma" size="2" color="#000000">openmpi/gcc/1.10.5</font>)<span style="color: rgb(0, 0, 0); font-family: Tahoma; font-size: small;">when filling guard cells and details a work around
by adding a few lines to the end of the Makefile which I have added. This is what I have been using. Interestingly, if I don't add these lines to the end of the Makefile the simulation will stop straight away after producing the initial plot file. However
if I include the lines then the simulation will run alot further than the initial plot file but will eventually stop with the error shown above.</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span style="color: rgb(0, 0, 0); font-family: Tahoma; font-size: small;"><br>
</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<span style="color: rgb(0, 0, 0); font-family: Tahoma; font-size: small;">Have you seen any related issue like this in FLASH before with problems transferring guard cell info?</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<br>
</p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<font size="2" face="Tahoma" color="#000000">Cheers,</font></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(80, 80, 80);">
<font size="2" face="Tahoma" color="#000000">Alex</font></p>
</div>
</div>
<div><br>
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-size:12px">
<hr>
</div>
<div style="font-size:12px"><b><font face="Arial">Mr Alex Sheardown</font></b></div>
<div style="font-size:12px"><font face="Arial">Postgraduate Research Student</font></div>
<div style="font-size:12px"><font face="Arial"><br>
</font></div>
<div style="font-size:12px"><font face="Arial">E.A. Milne Centre for Astrophysics</font></div>
<div style="font-size:12px"><font face="Arial">University of Hull</font></div>
<div style="font-size:12px"><font face="Arial">Cottingham Road</font></div>
<div style="font-size:12px"><font face="Arial">Kingston upon Hull</font></div>
<div style="font-size:12px"><font face="Arial">HU6 7RX</font></div>
<div style="font-size:12px"><font face="Arial"><br>
</font></div>
<div style="font-size:12px"><a href="https://mail.hull.ac.uk/owa/redir.aspx?REF=_wok6-STjTeTuQlVeEE3DYaVcvKXJXINIb2ho14u7UoAceEsmknTCAFodHRwOi8vd3d3Lm1pbG5lLmh1bGwuYWMudWs." target="_blank"><font face="Arial">www.milne.hull.ac.uk</font></a></div>
</div>
</div>
</div>
</div>
<div style="font-family: Times New Roman; color: #000000; font-size: 16px">
<hr tabindex="-1">
<div id="divRpF238653" style="direction: ltr;"><font face="Tahoma" size="2" color="#000000"><b>From:</b> Joshua Wall [joshua.e.wall@gmail.com]<br>
<b>Sent:</b> 19 April 2017 22:00<br>
<b>To:</b> Alexander Sheardown; flash-users@flash.uchicago.edu<br>
<b>Subject:</b> Re: [FLASH-USERS] Problems running at higher levels of refinement.<br>
</font><br>
</div>
<div></div>
<div>
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hello Alex,<br>
<br>
</div>
This is a strange error if you are using only native Flash. I currently control Flash from Python by forking to make threads to run Flash under, but this is purposely done at the beginning of a run. It shouldn't occur during a run (unless you have made
processes/threads to handle your N-body). If you are spawning processes during the run, you can safely turn off the fork warning (which is what I do in my runs) by calling mpirun as detailed here
<a href="https://www.open-mpi.org/faq/?category=tuning#setting-mca-params" target="_blank">
https://www.open-mpi.org/faq/?category=tuning#setting-mca-params</a> which should look something like:<br>
<br>
</div>
mpirun --mca <span class="inbox-inbox-m_781603564051877588s1">mpi_warn_on_fork 0 </span>
-np 96 ./flash4<br>
<br>
</div>
Otherwise I'd make sure you have gdb debugging on for Flash during compile and either look at the core dump file with gdb or attach gdb to a running process to investigate what MPI is doing the moment it tries to fork. Some helpful links for doing this:<br>
<br>
</div>
Turn on core dumps: <a href="http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault" target="_blank">
http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault</a><br>
<br>
</div>
Use gdb with Open-MPI by attaching to running processes: <a href="https://www.open-mpi.org/faq/?category=debugging#serial-debuggers" target="_blank">
https://www.open-mpi.org/faq/?category=debugging#serial-debuggers</a> <br>
also: <a href="http://stackoverflow.com/questions/329259/how-do-i-debug-an-mpi-program" target="_blank">
http://stackoverflow.com/questions/329259/how-do-i-debug-an-mpi-program</a><br>
<br>
</div>
Debugging MPI programs is a bit of a black art and a bit different than usual debugging. It helps to have lot of patience (and sometimes a plush toy to toss!). Best of luck.<br>
<br>
</div>
<div>Cordially,<br>
<br>
</div>
<div>Joshua Wall<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Tue, Apr 11, 2017 at 3:03 PM Alexander Sheardown <<a href="mailto:A.Sheardown@2011.hull.ac.uk" target="_blank">A.Sheardown@2011.hull.ac.uk</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
<div>
<div style="direction:ltr; font-family:Tahoma; color:#000000; font-size:10pt">Hello Everyone,
<div><br>
</div>
<div>I am running N-Body + Hydro galaxy cluster merger simulations but I am running into problems when trying to run with higher levels of refinement.</div>
<div><br>
My simulation has a box size 8 Mpc x 8 Mpc and contains 2 million particles and is refining on density. <span style="font-size:10pt">If I run the simulation on a maximum refinement level of 6, the simulation runs fine and completes its run. However if I turn
the max refine level up to 7 or 8, the simulation only gets so far (this varies, it doesn't stop at the same point everytime) and exits with the MPI error in the output file:</span></div>
<div><br>
</div>
<div>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">An MPI process has executed an operation involving a call to the</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">"fork()" system call to create a child process. Open MPI is currently</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">operating in a condition that could result in memory corruption or</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">other system errors; your MPI job may hang, crash, or produce silent</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">data corruption. The use of fork() (or system() or other calls that</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">create child processes) is strongly discouraged. </span></p>
<p class="m_781603564051877588p2"><span class="m_781603564051877588s1"></span><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">The process that invoked fork was:</span></p>
<p class="m_781603564051877588p2"><span class="m_781603564051877588s1"></span><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"> Local host: c127 (PID 108285)</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"> MPI_COMM_WORLD rank: 414</span></p>
<p class="m_781603564051877588p2"><span class="m_781603564051877588s1"></span><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">If you are *absolutely sure* that your application will successfully</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">and correctly survive a call to fork(), you may disable this warning</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">by setting the mpi_warn_on_fork MCA parameter to 0.</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">mpirun noticed that process rank 429 with PID 0 on node c128 exited on signal 11 (Segmentation fault).</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"></span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:small"><br>
</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:small">..and the error file shows:</span></p>
<p class="m_781603564051877588p1"><span style="">Backtrace for this error:</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#0 0x7F073AAD9417</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#1 0x7F073AAD9A2E</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#2 0x7F0739DC124F</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#3 0x454665 in amr_1blk_cc_cp_remote_ at amr_1blk_cc_cp_remote.F90:356</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#4 0x4759AE in amr_1blk_guardcell_srl_ at amr_1blk_guardcell_srl.F90:370</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#5 0x582550 in amr_1blk_guardcell_ at mpi_amr_1blk_guardcell.F90:743</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#6 0x5DB143 in amr_guardcell_ at mpi_amr_guardcell.F90:299</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#7 0x41BFDA in grid_fillguardcells_ at Grid_fillGuardCells.F90:456</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#8 0x5569A3 in hy_ppm_sweep_ at hy_ppm_sweep.F90:229</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#9 0x430A3A in hydro_ at Hydro.F90:87</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#10 0x409904 in driver_evolveflash_ at Driver_evolveFlash.F90:275</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#11 0x404B16 in flash at Flash.F90:51</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1"></span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">#12 0x7F0739DADB34</span></p>
</div>
<div><br>
</div>
<div><span style="font-size:10pt"><br>
</span></div>
<div><span style="font-size:10pt">Since this showed a memory issue </span><span style="font-size:10pt">I doubled the number of nodes I am running on but the simulation fails straight away with this in the output file (nothing appears in the error file):</span></div>
<div>
<p class="m_781603564051877588p1"><br>
</p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">mpirun noticed that process rank 980 with PID 0 on node c096 exited on signal 9 (Killed).</span></p>
<p class="m_781603564051877588p1"></p>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">--------------------------------------------------------------------------</span></p>
<p class="m_781603564051877588p1"><br>
</p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:13.3333px"><br>
</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:13.3333px">In terms of the simulation itself, looking at the output data that I can get out everything looks fine in terms of the physics, so I can't decide whether</span><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:10pt"> this
is a problem with my simulation or the MPI I am using.</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:10pt"><br>
</span></p>
<p class="m_781603564051877588p1"><span style="color:rgb(0,0,0); font-family:Tahoma; font-size:10pt">Are there any parameters I could include in the simulation that would print out say the number of particles per processor at a given time? or any other diagnostics
to do with particles? One thought I am wondering is are there too many particles landing on a processor or something related.</span></p>
</div>
<div>
<div><br>
</div>
<div>For info if anyone has had related MPI problems with FLASH the modules I am using are:</div>
<div><span style="color:rgb(80,80,80); font-family:Menlo; font-size:11px">hdf5/gcc/openmpi/1.8.16</span></div>
<div>
<p class="m_781603564051877588p1"><span class="m_781603564051877588s1">openmpi/gcc/1.10.5</span></p>
</div>
<div><br>
</div>
<div><span style="font-size:10pt">I would greatly appreciate any thoughts or opinions on what could cause it to fail with higher levels of refinement.</span></div>
<div><br>
</div>
<div>Many Thanks,</div>
<div>Alex</div>
<div><br>
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-size:12px">
<hr>
</div>
<div style="font-size:12px"><b><font face="Arial">Mr Alex Sheardown</font></b></div>
<div style="font-size:12px"><font face="Arial">Postgraduate Research Student</font></div>
<div style="font-size:12px"><font face="Arial"><br>
</font></div>
<div style="font-size:12px"><font face="Arial">E.A. Milne Centre for Astrophysics</font></div>
<div style="font-size:12px"><font face="Arial">University of Hull</font></div>
<div style="font-size:12px"><font face="Arial">Cottingham Road</font></div>
<div style="font-size:12px"><font face="Arial">Kingston upon Hull</font></div>
<div style="font-size:12px"><font face="Arial">HU6 7RX</font></div>
<div style="font-size:12px"><font face="Arial"><br>
</font></div>
<div style="font-size:12px"><a href="https://mail.hull.ac.uk/owa/redir.aspx?REF=_wok6-STjTeTuQlVeEE3DYaVcvKXJXINIb2ho14u7UoAceEsmknTCAFodHRwOi8vd3d3Lm1pbG5lLmh1bGwuYWMudWs." target="_blank"><font face="Arial">www.milne.hull.ac.uk</font></a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
**************************************************<br>
To view the terms under which this email is<br>
distributed, please go to<br>
<a href="http://www2.hull.ac.uk/legal/disclaimer.aspx" rel="noreferrer" target="_blank">http://www2.hull.ac.uk/legal/disclaimer.aspx</a><br>
**************************************************</blockquote>
</div>
<div dir="ltr">-- <br>
</div>
<div>
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>Joshua Wall<br>
</div>
Doctoral Candidate<br>
</div>
Department of Physics<br>
</div>
Drexel University<br>
</div>
3141 Chestnut Street<br>
</div>
Philadelphia, PA 19104<br>
</div>
</div>
</div>
</div>
</div>
</body>
</html>