<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Yi-Hao,</p>
<p>I'm not sure if it is helpful, but it is only hinted at in the
references that you cite, so I figured I'd say it...</p>
<p>I have, in the past, experienced some issues like this when
pushing close to the amount of available memory. It appears to be
possible for paramesh to fail in odd ways when memory is tight.
i.e. when there "should" be enough memory, but with only a modest
amount of margin. Sometimes just running on more processors is
enough, but sometimes a decrease in maxblocks is also required.
You may have tried this, but I figured I would mention it.</p>
<p>It does seem like if the recv is succeeding but the corresponding
ssend is just hanging indefinitely, you have confirmed that this
is not a bug in flash. I believe that behavior violates the MPI
API.</p>
<p>As to the question of whether or not synchronous sends should be
used or not, I think that is probably a deep discussion about
portability and error reproducibility for paramesh that I'm
certainly not qualified to get into. From what I can tell from
MPI specs, this is not just about whether synchronous or
non-synchronous is algorithmically necessary, but about management
of message buffering. Nominally use of ssend is a request for the
system to not buffer. This forestalls (possibly vast) differences
between the buffering behavior of different systems.<br>
</p>
<p>Good luck!</p>
<p>Dean<br>
</p>
<div class="moz-cite-prefix">On 7/16/19 6:34 PM, Yi-Hao Chen wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAO9cUhS5WspBuSdLHdNhUjxKsBGw6+szQ87R0uG_1+QiyNu-aw@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div>Dear all,</div>
<div><br>
</div>
<div>This is an update on this issue. I have been in
conversation with TACC staff and one of the concerns is the
use of MPI_SSEND. What would be the motivation of using
MPI_SSEND here as oppose to MPI_SEND?</div>
<div><br>
</div>
<div>I used DDT and can see that only one process is hanging
at the MPI_SSEND while the rest are waiting at MPI_ALLREDUCE
in mpi_amr_redist_blk. The screenshot from DDT is attached.
I tried to print out the corresponding irecv calls and did
see the matching irecv was executed. Unless there is an
additional send that has the same (to, from, block#, tag), I
cannot see the reason that the send call hangs.</div>
<div><br>
</div>
<div>
<div>I've searched the FLASH-USERS mailing list and saw that
similar problems were brought up for a few
times[^a][^b][^c], but without a definite solution. The
problem seems to be associated with the intel MPI. A
possible solution mentioned was to use mvapich2 rather
than impi. But when I compiled FLASH with mvapich2, it
hangs right at reading the checkpoint file.
<br>
</div>
<div><br>
</div>
<div>[^a]: [FLASH-USERS] mpi_amr_redist_blk has some
processors hang at nrecv waitall</div>
<div><a
href="http://flash.uchicago.edu/pipermail/flash-users/2018-June/002653.html"
target="_blank" moz-do-not-send="true">http://flash.uchicago.edu/pipermail/flash-users/2018-June/002653.html</a></div>
<div>
<div dir="ltr"><br>
</div>
<div dir="ltr">[^b]: [FLASH-USERS] MPI deadlock in block
move after refinement
<div><a
href="http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html"
target="_blank" moz-do-not-send="true">http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html</a></div>
<div><br>
</div>
</div>
</div>
[^c]: [FLASH-USERS] FLASH crashing: "iteration, no. not
moved"
<div><a
href="http://flash.uchicago.edu/pipermail/flash-users/2017-March/002219.html"
target="_blank" moz-do-not-send="true">http://flash.uchicago.edu/pipermail/flash-users/2017-March/002219.html</a></div>
</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>Another thought is regarding the use of MPI_IRECV and
MPI_ALLREDUCE. The code is structured as
<br>
</div>
<div><br>
</div>
<div>~~~~<br>
</div>
<div><span style="font-family:courier new,monospace">MPI_IRECV</span></div>
<div><span style="font-family:courier new,monospace"></span></div>
<div><span style="font-family:courier new,monospace">while
(repeat)<br>
</span></div>
<div><span style="font-family:courier new,monospace">
MPI_SSEND</span></div>
<div><span style="font-family:courier new,monospace"></span></div>
<div><span style="font-family:courier new,monospace">
MPI_ALLREDUCE</span></div>
<div><span style="font-family:courier new,monospace"><br>
</span></div>
<div><span style="font-family:courier new,monospace">MPI_WAITALL</span></div>
<div>~~~~<br>
</div>
<div><br>
</div>
<div>I am wondering if the MPI_ALLREDUCE call in the receiving
process could prevent the previous MPI_IRECV to receive the
data. I suspect this might be the reason that the MPI_SSEND
hangs. However, if this is the case, the problem should
probably happen more frequently.</div>
<div><br>
</div>
<div>This part of the code is from Paramesh4dev. The
send_block_data was separated from mpi_amr_redist_blk in
Paramesh4.0. Although I did not see a big difference, I am
not sure if there are any significant changes between
Paramesh4.0 and Paramesh4dev.
<br>
</div>
<div><br>
</div>
<div>I would appreciate any thought you might have.<br>
</div>
<div><br>
</div>
<div>Thank you,</div>
<div>Yi-Hao<br>
</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jul 11, 2019 at
12:30 PM Yi-Hao Chen <<a
href="mailto:ychen@astro.wisc.edu" target="_blank"
moz-do-not-send="true">ychen@astro.wisc.edu</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div>Dear All,</div>
<div><br>
</div>
<div>I am having an MPI deadlock happening right after
the restart of a simulation. It happens after the
initialization and in the evolution stage when
refinement occurs. Few of my simulations ran into the
same problem. It seems to be reproducible. However, if
I<span
class="m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_197
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr-alert
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_tiny
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_spell
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_inline_cards
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_run_anim
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-ContextualSpelling
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-multiReplace"
id="m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-197"></span>
use a different number of MPI tasks, sometimes it can
go through the deadlock.</div>
<div><br>
</div>
<div>I am using FLASH4.5 with AMR and USM on stampede2
using modules intel/18.0.2 and impi/<a
href="http://18.0.2." target="_blank"
moz-do-not-send="true">18.0.2.</a></div>
<div><br>
</div>
<div>If you have any suggestions or possible directions
to look into, please let me know. Some details are
described below.<br>
<div>
<div><br>
</div>
<div>Thank you,</div>
<div>Yi-Hao<br>
</div>
<div><br>
</div>
</div>
</div>
<br>
<div>The last few lines in the log file are<br>
</div>
<div><font size="1"><span style="font-family:courier
new,monospace"> ==============================================================================<br>
[ 07-02-2019 23:07:04.890 ] [gr_initGeometry]
checking BCs for idir: 1<br>
[ 07-02-2019 23:07:04.891 ] [gr_initGeometry]
checking BCs for idir: 2<br>
[ 07-02-2019 23:07:04.892 ] [gr_initGeometry]
checking BCs for idir: 3<br>
[ 07-02-2019 23:07:04.951 ] memory: /proc vsize
(MiB): 2475.21 (min) 2475.73 (max)
2475.21 (avg)<br>
[ 07-02-2019 23:07:04.952 ] memory: /proc rss
(MiB): 686.03 (min) 699.24 (max)
690.59 (avg)<br>
[ 07-02-2019 23:07:04.964 ] [io_readData] file
opened: type=checkpoint
name=Group_L430_hdf5_chk_0148<br>
[ 07-02-2019 23:11:04.268 ] memory: /proc vsize
(MiB): 2869.67 (min) 2928.42 (max)
2869.76 (avg)<br>
[ 07-02-2019 23:11:04.303 ] memory: /proc rss
(MiB): 1080.69 (min) 1102.95 (max)
1085.31 (avg)<br>
[ 07-02-2019 23:11:04.436 ] [GRID
amr_refine_derefine]: initiating refinement<br>
[ 07-02-2019 23:11:04.454 ] [GRID
amr_refine_derefine]: redist. phase. tot blks
requested: 177882<br>
[GRID amr_refine_derefine] min blks 230 max
blks 235 tot blks 177882<br>
[GRID amr_refine_derefine] min leaf blks 199
max leaf blks 205 tot leaf b<br>
lks 155647<br>
[ 07-02-2019 23:11:04.730 ] [GRID
amr_refine_derefine]: refinement complete<br>
INFO: Grid_fillGuardCells is ignoring masking.<br>
[Hydro_init] MHD: hy_fullRiemannStateArrays and
hy_fullSpecMsFluxHandling are b<br>
oth turned on!<br>
[ 07-02-2019 23:11:07.111 ] memory: /proc vsize
(MiB): 2858.31 (min) 2961.38 (max)
2885.72 (avg)<br>
[ 07-02-2019 23:11:07.112 ] memory: /proc rss
(MiB): 1090.02 (min) 1444.63 (max)
1121.47 (avg)<br>
[ 07-02-2019 23:11:08.532 ]
[Particles_getGlobalNum]: Number of particles now:
18389431<br>
[ 07-02-2019 23:11:08.535 ] [IO_writePlotfile]
open: type=plotfile
name=Group_L430_forced_hdf5_plt_cnt_0000<br>
[ 07-02-2019 23:11:18.449 ] [IO_writePlotfile]
close: type=plotfile
name=Group_L430_forced_hdf5_plt_cnt_0000<br>
[ 07-02-2019 23:11:18.450 ] memory: /proc vsize
(MiB): 2857.45 (min) 2977.69 (max)
2885.71 (avg)<br>
[ 07-02-2019 23:11:18.453 ] memory: /proc rss
(MiB): 1095.74 (min) 1450.62 (max)
1126.72 (avg)<br>
[ 07-02-2019 23:11:18.454 ]
[Driver_evolveFlash]: Entering evolution loop<br>
[ 07-02-2019 23:11:18.454 ] step: n=336100
t=7.805991E+14 dt=2.409380E+09<br>
[ 07-02-2019 23:11:23.199 ] [hy_uhd_unsplit]:
gcNeed(MAGI_FACE_VAR,MAG_FACE_VAR) - FACES<br>
[ 07-02-2019 23:11:27.853 ]
[Particles_getGlobalNum]: Number of particles now:
18389491<br>
[ 07-02-2019 23:11:28.830 ] [GRID
amr_refine_derefine]: initiating refinement<br>
[ 07-02-2019 23:11:28.932 ] [GRID
amr_refine_derefine]: redist. phase. tot blks
requested: 172522<br>
</span></font></div>
<div><br>
</div>
<div>The last few lines in the output are</div>
<div><br>
</div>
<div><font size="1"><span style="font-family:courier
new,monospace"> MaterialProperties initialized<br>
[io_readData] Opening Group_L430_hdf5_chk_0148
for restart<br>
Progress read 'gsurr_blks' dataset - applying
pm4dev optimization.<br>
Source terms initialized<br>
iteration, no. not moved = 0
175264<br>
iteration, no. not moved = 1
8384<br>
iteration, no. not moved = 2
0<br>
refined: total leaf blocks = 155647<br>
refined: total blocks = 177882<br>
INFO: Grid_fillGuardCells is ignoring masking.<br>
Finished with Grid_initDomain, restart<br>
Ready to call Hydro_init<br>
[Hydro_init] NOTE: hy_fullRiemannStateArrays and
hy_fullSpecMsFluxHandling are<br>
both true for MHD!<br>
Hydro initialized<br>
Gravity initialized<br>
Initial dt verified<br>
*** Wrote plotfile to
Group_L430_forced_hdf5_plt_cnt_0000 ****<br>
Initial plotfile written<br>
Driver init all done<br>
iteration, no. not moved = 0
165494</span></font></div>
<div><font size="1"><span style="font-family:courier
new,monospace">slurmstepd: error: *** JOB 3889364
ON c476-064 CANCELLED AT 2019-07-02T23:54:57 ***</span></font></div>
<div>
<div><br>
</div>
<div>I have found that the particular MPI function it
hangs is at line 360 in <span
style="font-family:courier new,monospace">
send_block_data.F90<span
style="font-family:arial,sans-serif">, but I am
not sure how to further debug the problem.</span></span><br>
</div>
<div>359 If (nvar > 0) Then<br>
360 Call MPI_SSEND
(unk(1,is_unk,js_unk,ks_unk,lb), & <br>
361 1,
& <br>
362 unk_int_type,
& <br>
363 new_loc(2,lb),
& <br>
364 new_loc(1,lb),
& <br>
365
amr_mpi_meshComm, & <br>
366 ierr)<br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>