<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Dear FLASH Developers and Users,
<div><br>
</div>
<div>I have encountered an MPI hanging problem after the redistribution of blocks. In this case, the last line of the output message is "refined: total blocks = XXXXX". After some investigation, I have found two potential bugs in the process of passing particles
between processors. I will document the details below. Hope this will help other people encountering similar problems.</div>
<div><br>
</div>
<div><b>1. Hanging when the number of particles exceeds the limit</b></div>
<div>Relevant file:</div>
<div>Grid/GridParticles/gr_ptHandleExcess.F90</div>
<div><br>
</div>
<div>This subroutine is called in gr_ptLocalMatch, which is called in Grid_moveParticles or gr_ptMoveSieve. The problem is that if the option to remove excess particles is disabled (which is the default, gr_ptRemove = .false.), it should write a checkpoint
file and abort the simulation. However, if this happens during gr_ptMoveSieve, only the processor with particle number over the limit will call IO_writeCheckpoint while the rest of the processors will continue the while loop and will be waiting for MPI_ALLREDUCE
in gr_ptNextProcPair or MPI_SENDRECV in gr_ptMoveSieve to complete. This causes the whole simulation to hang.</div>
<div><br>
</div>
<div><br>
</div>
<div><b>2. A (minor) bug that might cause unnecessary communication</b></div>
<div>Relevant files:<br>
</div>
<div>Grid/GridParticles/GridParticlesMove/Sieve/gr_ptMoveSieve.F90<br>
</div>
<div>Grid/GridParticles/GridParticlesMove/Sieve/BlockMatch/gr_ptResetProcPair.F90<br>
</div>
<div>Grid/GridParticles/GridParticlesMove/Sieve/BlockMatch/gr_ptNextProcPair.F90</div>
<div><br>
</div>
<div>This seems to be a minor bug and will likely cause additional communications between processors. Since the particles module does not use a large portion of overall time, it might only affect the performance a little.</div>
<div>This bug has to do with the use of <font color="#0000ff">gr_ptSieveCheckFreq </font>and <font color="#ff00ff">gr_ptSieveFreq</font>. The former is an input parameter but also serves as a counter in gr_ptNextProcPair. In gr_ptMoveSieve, <font color="#ff00ff">gr_ptSieveFreq</font> is
set to <font color="#0000ff">gr_ptSieveCheckFreq</font>. Then <font color="#0000ff">gr_ptSieveCheckFreq</font> is set to <font color="#ff00ff">gr_ptSieveFreq</font>+1 in gr_ptResetProcPair and will decrease by 1 for every call to in gr_ptNextProcPair. However,
if communication is not needed, gr_ptNextProcPair will not be called and <span style="color:rgb(0,0,255)">gr_ptSieveCheckFreq</span> will keep increasing for later timesteps in the simulation. See the following excerpts of codes for details. The result is
that at a later time, all processors will keep communicating until timesInLoop==gr_meshNumProcs. I guess this is not intended behavior. </div>
<div> </div>
<div><font face="monospace, monospace"><font color="#0000ff">gr_ptSieveCheckFreq</font> = 1 (default)</font></div>
<div><font face="monospace, monospace"><br>
</font></div>
<div><font face="monospace, monospace">In gr_ptMoveSieve:</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><font face="monospace, monospace"><font color="#ff00ff">gr_ptSieveFreq</font>=<font color="#0000ff">gr_ptSieveCheckFreq</font></font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><font face="monospace, monospace">call gr_ptResetProcPair</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><font face="monospace, monospace"><font color="#0000ff">gr_ptSieveCheckFreq</font>=<font color="#ff00ff">gr_ptSieveFreq</font>+1</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><b><font face="monospace, monospace">do while (mustCommunicate)</font></b></blockquote>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><font face="monospace, monospace">call gr_ptNextProcPair</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
</div>
</div>
</div>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><font face="monospace, monospace"><font color="#0000ff">gr_ptSieveCheckFreq</font>=<font color="#0000ff">gr_ptSieveCheckFreq</font>-1</font></div>
<div><font face="monospace, monospace">if((<font color="#0000ff">gr_ptSieveCheckFreq</font>==0).or.(timesInLoop==gr_meshNumProcs)) then<br>
</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><font face="monospace, monospace"><font color="#0000ff">gr_ptSieveCheckFreq</font>=<span style="color:rgb(255,0,255)">gr_ptSieveFreq</span></font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</blockquote>
</div>
</div>
</div>
</blockquote>
<div dir="ltr">
<div dir="ltr">
<div>I would appreciate any comments you have. Hopefully these will be addressed by the FLASH team. Thank you very much for reading this long email.</div>
<div><br>
</div>
<div>Sincerely,</div>
<div>Yi-Hao</div>
<div dir="ltr"><br>
</div>
</div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div><br>
</div>
<div>
<div>
<div dir="ltr" class="m_-6504076160181317805gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr"><a href="mailto:ychen@astro.wisc.edu" target="_blank"></a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>