<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Yi-Hao,</p>
    <p>I'm not sure if it is helpful, but it is only hinted at in the
      references that you cite, so I figured I'd say it...</p>
    <p>I have, in the past, experienced some issues like this when
      pushing close to the amount of available memory.  It appears to be
      possible for paramesh to fail in odd ways when memory is tight. 
      i.e. when there "should" be enough memory, but with only a modest
      amount of margin.  Sometimes just running on more processors is
      enough, but sometimes a decrease in maxblocks is also required. 
      You may have tried this, but I figured I would mention it.</p>
    <p>It does seem like if the recv is succeeding but the corresponding
      ssend is just hanging indefinitely, you have confirmed that this
      is not a bug in flash.  I believe that behavior violates the MPI
      API.</p>
    <p>As to the question of whether or not synchronous sends should be
      used or not, I think that is probably a deep discussion about
      portability and error reproducibility for paramesh that I'm
      certainly not qualified to get into.  From what I can tell from
      MPI specs, this is not just about whether synchronous or
      non-synchronous is algorithmically necessary, but about management
      of message buffering.  Nominally use of ssend is a request for the
      system to not buffer.  This forestalls (possibly vast) differences
      between the buffering behavior of different systems.<br>
    </p>
    <p>Good luck!</p>
    <p>Dean<br>
    </p>
    <div class="moz-cite-prefix">On 7/16/19 6:34 PM, Yi-Hao Chen wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAO9cUhS5WspBuSdLHdNhUjxKsBGw6+szQ87R0uG_1+QiyNu-aw@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div>Dear all,</div>
          <div><br>
          </div>
          <div>This is an update on this issue. I have been in
            conversation with TACC staff and one of the concerns is the
            use of MPI_SSEND. What would be the motivation of using
            MPI_SSEND here as oppose to MPI_SEND?</div>
          <div><br>
          </div>
          <div>I used DDT and can see that only one process is hanging
            at the MPI_SSEND while the rest are waiting at MPI_ALLREDUCE
            in mpi_amr_redist_blk. The screenshot from DDT is attached.
            I tried to print out the corresponding irecv calls and did
            see the matching irecv was executed. Unless there is an
            additional send that has the same (to, from, block#, tag), I
            cannot see the reason that the send call hangs.</div>
          <div><br>
          </div>
          <div>
            <div>I've searched the FLASH-USERS mailing list and saw that
              similar problems were brought up for a few
              times[^a][^b][^c], but without a definite solution. The
              problem seems to be associated with the intel MPI. A
              possible solution mentioned was to use mvapich2 rather
              than impi. But when I compiled FLASH with mvapich2, it
              hangs right at reading the checkpoint file.
              <br>
            </div>
            <div><br>
            </div>
            <div>[^a]: [FLASH-USERS] mpi_amr_redist_blk has some
              processors hang at nrecv waitall</div>
            <div><a
href="http://flash.uchicago.edu/pipermail/flash-users/2018-June/002653.html"
                target="_blank" moz-do-not-send="true">http://flash.uchicago.edu/pipermail/flash-users/2018-June/002653.html</a></div>
            <div>
              <div dir="ltr"><br>
              </div>
              <div dir="ltr">[^b]: [FLASH-USERS] MPI deadlock in block
                move after refinement
                <div><a
href="http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html"
                    target="_blank" moz-do-not-send="true">http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html</a></div>
                <div><br>
                </div>
              </div>
            </div>
            [^c]: [FLASH-USERS] FLASH crashing: "iteration, no. not
            moved"
            <div><a
href="http://flash.uchicago.edu/pipermail/flash-users/2017-March/002219.html"
                target="_blank" moz-do-not-send="true">http://flash.uchicago.edu/pipermail/flash-users/2017-March/002219.html</a></div>
          </div>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>
          <div>Another thought is regarding the use of MPI_IRECV and
            MPI_ALLREDUCE. The code is structured as
            <br>
          </div>
          <div><br>
          </div>
          <div>~~~~<br>
          </div>
          <div><span style="font-family:courier new,monospace">MPI_IRECV</span></div>
          <div><span style="font-family:courier new,monospace"></span></div>
          <div><span style="font-family:courier new,monospace">while
              (repeat)<br>
            </span></div>
          <div><span style="font-family:courier new,monospace">   
              MPI_SSEND</span></div>
          <div><span style="font-family:courier new,monospace"></span></div>
          <div><span style="font-family:courier new,monospace">   
              MPI_ALLREDUCE</span></div>
          <div><span style="font-family:courier new,monospace"><br>
            </span></div>
          <div><span style="font-family:courier new,monospace">MPI_WAITALL</span></div>
          <div>~~~~<br>
          </div>
          <div><br>
          </div>
          <div>I am wondering if the MPI_ALLREDUCE call in the receiving
            process could prevent the previous MPI_IRECV to receive the
            data. I suspect this might be the reason that the MPI_SSEND
            hangs. However, if this is the case, the problem should
            probably happen more frequently.</div>
          <div><br>
          </div>
          <div>This part of the code is from Paramesh4dev. The
            send_block_data was separated from mpi_amr_redist_blk in
            Paramesh4.0. Although I did not see a big difference, I am
            not sure if there are any significant changes between
            Paramesh4.0 and Paramesh4dev.
            <br>
          </div>
          <div><br>
          </div>
          <div>I would appreciate any thought you might have.<br>
          </div>
          <div><br>
          </div>
          <div>Thank you,</div>
          <div>Yi-Hao<br>
          </div>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Thu, Jul 11, 2019 at
            12:30 PM Yi-Hao Chen <<a
              href="mailto:ychen@astro.wisc.edu" target="_blank"
              moz-do-not-send="true">ychen@astro.wisc.edu</a>> wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <div dir="ltr">
                <div>Dear All,</div>
                <div><br>
                </div>
                <div>I am having an MPI deadlock happening right after
                  the restart of a simulation. It happens after the
                  initialization and in the evolution stage when
                  refinement occurs. Few of my simulations ran into the
                  same problem. It seems to be reproducible. However, if
                  I<span
class="m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_197
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr-alert
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_tiny
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_spell
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_inline_cards
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-gr_run_anim
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-ContextualSpelling
m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-multiReplace"
id="m_5341148263124634759m_-3791934762761712513m_-2809840919508679901m_-3858132456204862063m_4484173492888926364m_-5201599341974555857gmail-m_6363570219312529462gmail-m_1633247020468104359gmail-m_-2445835142924874369gmail-197"></span>
                  use a different number of MPI tasks, sometimes it can
                  go through the deadlock.</div>
                <div><br>
                </div>
                <div>I am using FLASH4.5 with AMR and USM on stampede2
                  using modules intel/18.0.2 and impi/<a
                    href="http://18.0.2." target="_blank"
                    moz-do-not-send="true">18.0.2.</a></div>
                <div><br>
                </div>
                <div>If you have any suggestions or possible directions
                  to look into, please let me know.  Some details are
                  described below.<br>
                  <div>
                    <div><br>
                    </div>
                    <div>Thank you,</div>
                    <div>Yi-Hao<br>
                    </div>
                    <div><br>
                    </div>
                  </div>
                </div>
                <br>
                <div>The last few lines in the log file are<br>
                </div>
                <div><font size="1"><span style="font-family:courier
                      new,monospace"> ==============================================================================<br>
                       [ 07-02-2019  23:07:04.890 ] [gr_initGeometry]
                      checking BCs for idir: 1<br>
                       [ 07-02-2019  23:07:04.891 ] [gr_initGeometry]
                      checking BCs for idir: 2<br>
                       [ 07-02-2019  23:07:04.892 ] [gr_initGeometry]
                      checking BCs for idir: 3<br>
                       [ 07-02-2019  23:07:04.951 ] memory: /proc vsize
                         (MiB):     2475.21 (min)       2475.73 (max)  
                          2475.21 (avg)<br>
                       [ 07-02-2019  23:07:04.952 ] memory: /proc rss  
                         (MiB):      686.03 (min)        699.24 (max)  
                           690.59 (avg)<br>
                       [ 07-02-2019  23:07:04.964 ] [io_readData] file
                      opened: type=checkpoint
                      name=Group_L430_hdf5_chk_0148<br>
                       [ 07-02-2019  23:11:04.268 ] memory: /proc vsize
                         (MiB):     2869.67 (min)       2928.42 (max)  
                          2869.76 (avg)<br>
                       [ 07-02-2019  23:11:04.303 ] memory: /proc rss  
                         (MiB):     1080.69 (min)       1102.95 (max)  
                          1085.31 (avg)<br>
                       [ 07-02-2019  23:11:04.436 ] [GRID
                      amr_refine_derefine]: initiating refinement<br>
                       [ 07-02-2019  23:11:04.454 ] [GRID
                      amr_refine_derefine]: redist. phase.  tot blks
                      requested: 177882<br>
                       [GRID amr_refine_derefine] min blks 230    max
                      blks 235    tot blks 177882<br>
                       [GRID amr_refine_derefine] min leaf blks 199  
                       max leaf blks 205    tot leaf b<br>
                       lks 155647<br>
                       [ 07-02-2019  23:11:04.730 ] [GRID
                      amr_refine_derefine]: refinement complete<br>
                       INFO: Grid_fillGuardCells is ignoring masking.<br>
                       [Hydro_init] MHD: hy_fullRiemannStateArrays and
                      hy_fullSpecMsFluxHandling are b<br>
                       oth turned on!<br>
                       [ 07-02-2019  23:11:07.111 ] memory: /proc vsize
                         (MiB):     2858.31 (min)       2961.38 (max)  
                          2885.72 (avg)<br>
                       [ 07-02-2019  23:11:07.112 ] memory: /proc rss  
                         (MiB):     1090.02 (min)       1444.63 (max)  
                          1121.47 (avg)<br>
                       [ 07-02-2019  23:11:08.532 ]
                      [Particles_getGlobalNum]: Number of particles now:
                      18389431<br>
                       [ 07-02-2019  23:11:08.535 ] [IO_writePlotfile]
                      open: type=plotfile
                      name=Group_L430_forced_hdf5_plt_cnt_0000<br>
                       [ 07-02-2019  23:11:18.449 ] [IO_writePlotfile]
                      close: type=plotfile
                      name=Group_L430_forced_hdf5_plt_cnt_0000<br>
                       [ 07-02-2019  23:11:18.450 ] memory: /proc vsize
                         (MiB):     2857.45 (min)       2977.69 (max)  
                          2885.71 (avg)<br>
                       [ 07-02-2019  23:11:18.453 ] memory: /proc rss  
                         (MiB):     1095.74 (min)       1450.62 (max)  
                          1126.72 (avg)<br>
                       [ 07-02-2019  23:11:18.454 ]
                      [Driver_evolveFlash]: Entering evolution loop<br>
                       [ 07-02-2019  23:11:18.454 ] step: n=336100
                      t=7.805991E+14 dt=2.409380E+09<br>
                       [ 07-02-2019  23:11:23.199 ] [hy_uhd_unsplit]:
                      gcNeed(MAGI_FACE_VAR,MAG_FACE_VAR) - FACES<br>
                       [ 07-02-2019  23:11:27.853 ]
                      [Particles_getGlobalNum]: Number of particles now:
                      18389491<br>
                       [ 07-02-2019  23:11:28.830 ] [GRID
                      amr_refine_derefine]: initiating refinement<br>
                       [ 07-02-2019  23:11:28.932 ] [GRID
                      amr_refine_derefine]: redist. phase.  tot blks
                      requested: 172522<br>
                    </span></font></div>
                <div><br>
                </div>
                <div>The last few lines in the output are</div>
                <div><br>
                </div>
                <div><font size="1"><span style="font-family:courier
                      new,monospace"> MaterialProperties initialized<br>
                       [io_readData] Opening Group_L430_hdf5_chk_0148
                      for restart<br>
                          Progress read 'gsurr_blks' dataset - applying
                      pm4dev optimization.<br>
                       Source terms initialized<br>
                        iteration, no. not moved =            0    
                       175264<br>
                        iteration, no. not moved =            1      
                       8384<br>
                        iteration, no. not moved =            2        
                        0<br>
                       refined: total leaf blocks =       155647<br>
                       refined: total blocks =       177882<br>
                       INFO: Grid_fillGuardCells is ignoring masking.<br>
                        Finished with Grid_initDomain, restart<br>
                       Ready to call Hydro_init<br>
                       [Hydro_init] NOTE: hy_fullRiemannStateArrays and
                      hy_fullSpecMsFluxHandling are<br>
                       both true for MHD!<br>
                       Hydro initialized<br>
                       Gravity initialized<br>
                       Initial dt verified<br>
                       *** Wrote plotfile to
                      Group_L430_forced_hdf5_plt_cnt_0000 ****<br>
                       Initial plotfile written<br>
                       Driver init all done<br>
                        iteration, no. not moved =            0    
                       165494</span></font></div>
                <div><font size="1"><span style="font-family:courier
                      new,monospace">slurmstepd: error: *** JOB 3889364
                      ON c476-064 CANCELLED AT 2019-07-02T23:54:57 ***</span></font></div>
                <div>
                  <div><br>
                  </div>
                  <div>I have found that the particular MPI function it
                    hangs is at line 360 in <span
                      style="font-family:courier new,monospace">
                      send_block_data.F90<span
                        style="font-family:arial,sans-serif">, but I am
                        not sure how to further debug the problem.</span></span><br>
                  </div>
                  <div>359                 If (nvar > 0) Then<br>
                    360                 Call MPI_SSEND
                    (unk(1,is_unk,js_unk,ks_unk,lb),        & <br>
                    361                                 1,              
                                          & <br>
                    362                                 unk_int_type,  
                                           & <br>
                    363                                 new_loc(2,lb),  
                                          & <br>
                    364                                 new_loc(1,lb),  
                                          & <br>
                    365                                
                    amr_mpi_meshComm,                        & <br>
                    366                                 ierr)<br>
                  </div>
                  <div><br>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>