[FLASH-USERS] MPI deadlock in amr_refine_derefine

Klaus Weide klaus at flash.uchicago.edu
Mon Feb 25 18:08:33 EST 2019


On Sun, 24 Feb 2019, Vishal Tiwari wrote:

> Hello,
> 
> I am facing issues with my simulations when running on stampede2, which gets stuck in the refinement part of the code. The code keeps refining until the number of blocks requested is smaller than the number of tasks, but hangs when no. of blocks >  ntasks. Looking at the trace of the code using ddt suggests that there is a MPI deadlock. (see the figure attached).
> 
> This issue occurs only on the stampede2 because it was refining fine on stampede1 and works fine on a local cluster on my campus.
> 
> Further, I found that people were facing the exact same issue in this thread [1]<http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html>, but the thread wasn't concluded with a solution.
> 
> I would be grateful for any pointers with regards to this issue.

Vishal,

You did not say which version of FLASH you are using. I does not seem the 
be the latest, since according to your tack trace, there should be a 
WAITALL call on line 720 of mpi_amr_redist_blk.F90. This the case in
 
Grid/GridMain/paramesh/paramesh4/Paramesh4dev/PM4_package/mpi_source/mpi_amr_redist_blk.F90

of the FLASH 4.4 release code, but not in the same file from the FLASH 
4.5 release. So there have been code changes in a file that plays an 
important role in your stack trace. You should try whether you get the 
same problem with the most recent release, FLASH 4.5.

Klaus




More information about the flash-users mailing list