[FLASH-USERS] MPI deadlock in block move after refinement

Jeremy S Ritter jritter at mail.utexas.edu
Wed Aug 9 16:55:08 EDT 2017


Hello,

I have been experiencing a problem with my simulations hanging at random
occasions while processing a refinement step (via gr_updateRefinement). The
issue seems to be related to a mismatch between the processors sending and
receiving block data in mpi_amr_redist_blk.F90. A stack trace shows that
there is a single processor still making a call to send_block_data() while
the rest have moved on to the subsequent MPI_ALLREDUCE() call (see excerpt
below). The deadlock condition is repeatable: e.g. if it happened at step 3
it will keep happening at the same point unless I change the grid structure
by refining more or less. I have some of my own routines that are modifying
the logical structures for marking blocks, as in refine(blockID) = .true.,
but am not attempting to modify the grid through any other means. Is it
possible there is a problem with my MPI setup? I am using FLASH4.4 on the
new Stampede2 at TACC with 8 nodes by 48 processors each for 384 total
processors.

Thanks!
-Jeremy

flash4             000000000079E2D6  send_block_data_          269
send_block_data.F90
flash4             000000000070C3E4  amr_redist_blk_           674
mpi_amr_redist_blk.F90
flash4             00000000004FE560  amr_morton_order_         164
amr_morton_order.F90
flash4             000000000071A7DA  amr_refine_derefi         319
mpi_amr_refine_derefine.F90
flash4             00000000005D8B22  gr_updaterefineme         112
gr_updateRefinement.F90
flash4             0000000000457A10  grid_updaterefine          98
Grid_updateRefinement.F90
flash4             0000000000413F74  driver_evolveflas         390
Driver_evolveFlash.F90
flash4             000000000041DBB3  MAIN__                     51
Flash.F90

flash4             000000000070C50F  amr_redist_blk_           686
mpi_amr_redist_blk.F90
flash4             00000000004FE560  amr_morton_order_         164
amr_morton_order.F90
flash4             000000000071A7DA  amr_refine_derefi         319
mpi_amr_refine_derefine.F90
flash4             00000000005D8B22  gr_updaterefineme         112
gr_updateRefinement.F90
flash4             0000000000457A10  grid_updaterefine          98
Grid_updateRefinement.F90
flash4             0000000000413F74  driver_evolveflas         390
Driver_evolveFlash.F90
flash4             000000000041DBB3  MAIN__                     51
Flash.F90
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20170809/dbca2fc6/attachment.htm>


More information about the flash-users mailing list