[FLASH-USERS] MPI deadlock in amr_refine_derefine

Ryan Farber rjfarber at umich.edu
Mon Feb 25 14:57:22 EST 2019


Hi Vishal,

I've had a similar issue when I'm not allocating enough memory to a job on
Stampede2. You can try requesting one additional node as a test.
Note that reducing maxblocks (in the thread you linked) has the effect of
requiring less memory.

If you're still having trouble, could you attach your logfile and mention
how many nodes (and how many processors if you're not using 256 as
suggested by your DDT attachment) and what node type (SKX or KNL) you're
using?

Best,
--------
Ryan


On Sun, Feb 24, 2019 at 12:27 PM Vishal Tiwari <vtiwari at umassd.edu> wrote:

> Hello,
>
> I am facing issues with my simulations when running on stampede2, which
> gets stuck in the refinement part of the code. The code keeps refining
> until the number of blocks requested is smaller than the number of tasks,
> but hangs when no. of blocks >  ntasks. Looking at the trace of the code
> using ddt suggests that there is a MPI deadlock. (see the figure attached).
>
> This issue occurs only on the stampede2 because it was refining fine on
> stampede1 and works fine on a local cluster on my campus.
>
> Further, I found that people were facing the exact same issue in this
> thread [1]
> <http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html>,
> but the thread wasn't concluded with a solution.
>
> I would be grateful for any pointers with regards to this issue.
>
> Thank you!
>
> [1]
> http://flash.uchicago.edu/pipermail/flash-users/2017-September/002402.html
>
> Regards,
> Vishal Tiwari
> Graduate Student, Physics
> University of Massachusetts, Dartmouth
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190225/d047de4d/attachment.htm>


More information about the flash-users mailing list