<div dir="ltr"><div>Hi Mark,</div><div><br></div><div>I had the same issue and a TACC staff helped me out. Try to load the modules using the following script.</div><div>```</div><div>export LMOD_EXPERT=1</div><div>module purge</div><div>module load intel/17.0.4</div><div>module load mvapich2</div><div>module use /opt/apps/intel17/impi17_0/modulefiles</div><div>module load petsc/3.7</div><div>module use /opt/apps/intel17/impi17_0/modulefiles</div><div>module load phdf5</div><div>```</div><div><br></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Hope this helps!</span><br></div><div><br></div><div>Yingchao</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Jun 21, 2018 at 8:37 AM Mark Richardson <<a href="mailto:mark.richardson.work@gmail.com">mark.richardson.work@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi Josh,<br></div> Thanks for your suggestions. I have turned on debug with -O0 and -g, and it didn't affect the outcome. All gcc compilers on stampede2 are version > 5, so I might try and install an earlier version. Further, their mpif90 etc is built with intel, while my builds that work elsewhere are all gnu built.<br><br></div><div>I have maxblocks set to 200, but I am only trying to allocate an average of 63 blocks per processor. I have played with the number of processors per node and number of node, effectively changing both the average blocks per processor being allocated, and the memory available to each processor. Neither averted the hang up.<br><br></div><div>Thanks again,<br></div> -Mark<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 18 June 2018 at 07:59, Joshua Wall <span dir="ltr"><<a href="mailto:joshua.e.wall@gmail.com" target="_blank">joshua.e.wall@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello Mark,<div><br></div><div> I seem to remember some issue with the code hanging that was due to optimization with newer versions of the GCC compiler suite. Indeed, I think I also implemented this to get past a hang:</div><div><br></div><div><a href="http://flash.uchicago.edu/pipermail/flash-users/2015-February/001637.html" target="_blank">http://flash.uchicago.edu/pipermail/flash-users/2015-February/001637.html</a> <br></div><div><br></div><div>Does attempting this fix (or running with -O0 ) help at all?</div><div><br></div><div>As a final note, I have also seen the code hang silently when in fact I had either 1) exceeded the maximum number of blocks per processor or 2) run out of RAM on a node. So those are things to check as well.</div><div><br></div><div>Hope that helps!</div><div><br></div><div>Josh</div></div><br><div class="gmail_quote"><div><div class="m_-3499177860968419452h5"><div dir="ltr">On Mon, Jun 18, 2018 at 1:47 AM Mark Richardson <<a href="mailto:mark.richardson.work@gmail.com" target="_blank">mark.richardson.work@gmail.com</a>> wrote:<br></div></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="m_-3499177860968419452h5"><div style="word-wrap:break-word"><div>Hello,</div><div><br></div><div> My current FLASH build worked fine on the original stampede, and on small local clusters. But on both KNL and SKX nodes on Stampede2, I get a hang during refinement in mpi_amr_redist_blk. If I build the initial simulation on a different cluster, then the hang happens on Stampede2 the first time the grid structure changes. If I build the initial simulation on Stampede2, it hangs after triggering level 6 in that initial Simulation_initBlk loop, but still in mpi_amr_redist_blk. </div><div><br></div><div>Setup call: </div><div> ./setup -auto -3d -nxb=32 -nyb=16 -nzb=8 -maxblocks=200 species=rock,watr +uhd3tr mgd_meshgroups=1 Simulation_Buiild</div><div><br></div><div> Using: ifort (IFORT) 17.0.4 20170411 </div><div><br></div><div>Log file tail in file Logfile.pdf</div><div><br></div><div>I’ve change maxblocks, and number of nodes, without getting out of this issue. </div><div><br></div><div>I’ve changed the “iteration, no. not moved” output to occur for each processor, and they all print out the identical correct info. I’ve added per processor print statements before the nrecv>0 waitall and nsend>0 waitall in mpi_amr_redist_blk.F90 and see that about 25% of processors are waiting indefinitely in the nrecv>0 waitall, while the other 75% complete the resist_blk subroutine and are waiting later for the remaining processors to finish. </div><div><br></div><div>I’ve tried adding sleep(1) inside the niter loop, as suggested in the past for someone who found niter going to 100 (note, I’m getting niter = 2 with no. not move=0, so all processors successfully exit that loop but hang later). This didn’t change the result.</div><div><br></div><div>Has anyone else seen similar hanging occurring, on any cluster? Any suggestions for overcoming this hang event? </div><div><br></div><div>Thank you for your help,</div><div> -Mark</div><div><br></div><div> </div></div></div></div><div style="word-wrap:break-word"><div></div><div><br></div><div><br></div><div><br></div><div><br></div><span class="m_-3499177860968419452HOEnZb"><font color="#888888"><br><div>
<div dir="auto" style="word-wrap:break-word"><div style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">-- <br><br>Mark Richardson<br>MAT Postdoctoral Fellow<br>Department of Astrophysics<br>American Museum of Natural History<br><a href="mailto:MRichardson@amnh.org" target="_blank">MRichardson@amnh.org</a><br><a href="https://sites.google.com/site/marklarichardson/" target="_blank">My Website</a><br><a href="tel:(212)%20496-3432" value="+12124963432" target="_blank">212 496 3432</a></div></div>
</div>
<br></font></span></div></blockquote></div><span class="m_-3499177860968419452HOEnZb"><font color="#888888">-- <br><div dir="ltr" class="m_-3499177860968419452m_5046780262454028167gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div><div><div><div>Joshua Wall<br></div>Doctoral Candidate<br></div>Department of Physics<br></div>Drexel University<br></div><a href="https://maps.google.com/?q=3141+Chestnut+Street+Philadelphia,+PA+19104&entry=gmail&source=g" target="_blank">3141 Chestnut Street</a><br></div><a href="https://maps.google.com/?q=3141+Chestnut+Street+Philadelphia,+PA+19104&entry=gmail&source=g" target="_blank">Philadelphia, PA 19104</a><br></div></div>
</font></span></blockquote></div><br><br clear="all"><br>-- <br><div class="m_-3499177860968419452gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br><span style="font-size:12.8px">Mark Richardson</span><br style="font-size:12.8px"><div style="font-size:small"><span style="font-size:12.8px">MAT Postdoctoral Fellow</span><div style="font-size:12.8px">Department of Astrophysics</div><div><span style="font-size:12.8px">American Museum of Natural History</span><div style="font-size:12.8px"><a href="mailto:Mark.Richardson.Work@gmail.com" style="color:rgb(17,85,204)" target="_blank">Mark.Richardson.Work@gmail.com</a></div><div><a href="https://sites.google.com/site/marklarichardson/" style="color:rgb(17,85,204)" target="_blank">My Website</a><br><div style="font-size:12.8px">212 496 3432</div></div></div></div></div></div></div></div></div></div></div></div></div>
</div>
</blockquote></div>