[FLASH-USERS] MPI hangs in send_block_data
Yi-Hao Chen
ychen at astro.wisc.edu
Tue Jul 16 19:34:41 EDT 2019
Dear all,
This is an update on this issue. I have been in conversation with TACC staff and one of the concerns is the use of MPI_SSEND. What would be the motivation of using MPI_SSEND here as oppose to MPI_SEND?
I used DDT and can see that only one process is hanging at the MPI_SSEND while the rest are waiting at MPI_ALLREDUCE in mpi_amr_redist_blk. The screenshot from DDT is attached. I tried to print out the corresponding irecv calls and did see the matching irecv was executed. Unless there is an additional send that has the same (to, from, block#, tag), I cannot see the reason that the send call hangs.
I've searched the FLASH-USERS mailing list and saw that similar problems were brought up for a few times[^a][^b][^c], but without a definite solution. The problem seems to be associated with the intel MPI. A possible solution mentioned was to use mvapich2 rather than impi. But when I compiled FLASH with mvapich2, it hangs right at reading the checkpoint file.
[^a]: [FLASH-USERS] mpi_amr_redist_blk has some processors hang at nrecv waitall
http://flash.uchicago.edu/pipermail/flash-users/2018-June/002653.html
[^b]: [FLASH-USERS] MPI deadlock in block move after refinement
http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html
[^c]: [FLASH-USERS] FLASH crashing: "iteration, no. not moved"
http://flash.uchicago.edu/pipermail/flash-users/2017-March/002219.html
Another thought is regarding the use of MPI_IRECV and MPI_ALLREDUCE. The code is structured as
~~~~
MPI_IRECV
while (repeat)
MPI_SSEND
MPI_ALLREDUCE
MPI_WAITALL
~~~~
I am wondering if the MPI_ALLREDUCE call in the receiving process could prevent the previous MPI_IRECV to receive the data. I suspect this might be the reason that the MPI_SSEND hangs. However, if this is the case, the problem should probably happen more frequently.
This part of the code is from Paramesh4dev. The send_block_data was separated from mpi_amr_redist_blk in Paramesh4.0. Although I did not see a big difference, I am not sure if there are any significant changes between Paramesh4.0 and Paramesh4dev.
I would appreciate any thought you might have.
Thank you,
Yi-Hao
On Thu, Jul 11, 2019 at 12:30 PM Yi-Hao Chen <ychen at astro.wisc.edu<mailto:ychen at astro.wisc.edu>> wrote:
Dear All,
I am having an MPI deadlock happening right after the restart of a simulation. It happens after the initialization and in the evolution stage when refinement occurs. Few of my simulations ran into the same problem. It seems to be reproducible. However, if I use a different number of MPI tasks, sometimes it can go through the deadlock.
I am using FLASH4.5 with AMR and USM on stampede2 using modules intel/18.0.2 and impi/18.0.2.<http://18.0.2.>
If you have any suggestions or possible directions to look into, please let me know. Some details are described below.
Thank you,
Yi-Hao
The last few lines in the log file are
==============================================================================
[ 07-02-2019 23:07:04.890 ] [gr_initGeometry] checking BCs for idir: 1
[ 07-02-2019 23:07:04.891 ] [gr_initGeometry] checking BCs for idir: 2
[ 07-02-2019 23:07:04.892 ] [gr_initGeometry] checking BCs for idir: 3
[ 07-02-2019 23:07:04.951 ] memory: /proc vsize (MiB): 2475.21 (min) 2475.73 (max) 2475.21 (avg)
[ 07-02-2019 23:07:04.952 ] memory: /proc rss (MiB): 686.03 (min) 699.24 (max) 690.59 (avg)
[ 07-02-2019 23:07:04.964 ] [io_readData] file opened: type=checkpoint name=Group_L430_hdf5_chk_0148
[ 07-02-2019 23:11:04.268 ] memory: /proc vsize (MiB): 2869.67 (min) 2928.42 (max) 2869.76 (avg)
[ 07-02-2019 23:11:04.303 ] memory: /proc rss (MiB): 1080.69 (min) 1102.95 (max) 1085.31 (avg)
[ 07-02-2019 23:11:04.436 ] [GRID amr_refine_derefine]: initiating refinement
[ 07-02-2019 23:11:04.454 ] [GRID amr_refine_derefine]: redist. phase. tot blks requested: 177882
[GRID amr_refine_derefine] min blks 230 max blks 235 tot blks 177882
[GRID amr_refine_derefine] min leaf blks 199 max leaf blks 205 tot leaf b
lks 155647
[ 07-02-2019 23:11:04.730 ] [GRID amr_refine_derefine]: refinement complete
INFO: Grid_fillGuardCells is ignoring masking.
[Hydro_init] MHD: hy_fullRiemannStateArrays and hy_fullSpecMsFluxHandling are b
oth turned on!
[ 07-02-2019 23:11:07.111 ] memory: /proc vsize (MiB): 2858.31 (min) 2961.38 (max) 2885.72 (avg)
[ 07-02-2019 23:11:07.112 ] memory: /proc rss (MiB): 1090.02 (min) 1444.63 (max) 1121.47 (avg)
[ 07-02-2019 23:11:08.532 ] [Particles_getGlobalNum]: Number of particles now: 18389431
[ 07-02-2019 23:11:08.535 ] [IO_writePlotfile] open: type=plotfile name=Group_L430_forced_hdf5_plt_cnt_0000
[ 07-02-2019 23:11:18.449 ] [IO_writePlotfile] close: type=plotfile name=Group_L430_forced_hdf5_plt_cnt_0000
[ 07-02-2019 23:11:18.450 ] memory: /proc vsize (MiB): 2857.45 (min) 2977.69 (max) 2885.71 (avg)
[ 07-02-2019 23:11:18.453 ] memory: /proc rss (MiB): 1095.74 (min) 1450.62 (max) 1126.72 (avg)
[ 07-02-2019 23:11:18.454 ] [Driver_evolveFlash]: Entering evolution loop
[ 07-02-2019 23:11:18.454 ] step: n=336100 t=7.805991E+14 dt=2.409380E+09
[ 07-02-2019 23:11:23.199 ] [hy_uhd_unsplit]: gcNeed(MAGI_FACE_VAR,MAG_FACE_VAR) - FACES
[ 07-02-2019 23:11:27.853 ] [Particles_getGlobalNum]: Number of particles now: 18389491
[ 07-02-2019 23:11:28.830 ] [GRID amr_refine_derefine]: initiating refinement
[ 07-02-2019 23:11:28.932 ] [GRID amr_refine_derefine]: redist. phase. tot blks requested: 172522
The last few lines in the output are
MaterialProperties initialized
[io_readData] Opening Group_L430_hdf5_chk_0148 for restart
Progress read 'gsurr_blks' dataset - applying pm4dev optimization.
Source terms initialized
iteration, no. not moved = 0 175264
iteration, no. not moved = 1 8384
iteration, no. not moved = 2 0
refined: total leaf blocks = 155647
refined: total blocks = 177882
INFO: Grid_fillGuardCells is ignoring masking.
Finished with Grid_initDomain, restart
Ready to call Hydro_init
[Hydro_init] NOTE: hy_fullRiemannStateArrays and hy_fullSpecMsFluxHandling are
both true for MHD!
Hydro initialized
Gravity initialized
Initial dt verified
*** Wrote plotfile to Group_L430_forced_hdf5_plt_cnt_0000 ****
Initial plotfile written
Driver init all done
iteration, no. not moved = 0 165494
slurmstepd: error: *** JOB 3889364 ON c476-064 CANCELLED AT 2019-07-02T23:54:57 ***
I have found that the particular MPI function it hangs is at line 360 in send_block_data.F90, but I am not sure how to further debug the problem.
359 If (nvar > 0) Then
360 Call MPI_SSEND (unk(1,is_unk,js_unk,ks_unk,lb), &
361 1, &
362 unk_int_type, &
363 new_loc(2,lb), &
364 new_loc(1,lb), &
365 amr_mpi_meshComm, &
366 ierr)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190716/cebf9b84/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot_20190715_184204.png
Type: image/png
Size: 224856 bytes
Desc: Screenshot_20190715_184204.png
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190716/cebf9b84/attachment-0001.png>
More information about the flash-users
mailing list