[FLASH-USERS] MPI deadlock in block move after refinement
Jeremy S Ritter
jritter at mail.utexas.edu
Wed Aug 9 16:55:08 EDT 2017
Hello,
I have been experiencing a problem with my simulations hanging at random
occasions while processing a refinement step (via gr_updateRefinement). The
issue seems to be related to a mismatch between the processors sending and
receiving block data in mpi_amr_redist_blk.F90. A stack trace shows that
there is a single processor still making a call to send_block_data() while
the rest have moved on to the subsequent MPI_ALLREDUCE() call (see excerpt
below). The deadlock condition is repeatable: e.g. if it happened at step 3
it will keep happening at the same point unless I change the grid structure
by refining more or less. I have some of my own routines that are modifying
the logical structures for marking blocks, as in refine(blockID) = .true.,
but am not attempting to modify the grid through any other means. Is it
possible there is a problem with my MPI setup? I am using FLASH4.4 on the
new Stampede2 at TACC with 8 nodes by 48 processors each for 384 total
processors.
Thanks!
-Jeremy
flash4 000000000079E2D6 send_block_data_ 269
send_block_data.F90
flash4 000000000070C3E4 amr_redist_blk_ 674
mpi_amr_redist_blk.F90
flash4 00000000004FE560 amr_morton_order_ 164
amr_morton_order.F90
flash4 000000000071A7DA amr_refine_derefi 319
mpi_amr_refine_derefine.F90
flash4 00000000005D8B22 gr_updaterefineme 112
gr_updateRefinement.F90
flash4 0000000000457A10 grid_updaterefine 98
Grid_updateRefinement.F90
flash4 0000000000413F74 driver_evolveflas 390
Driver_evolveFlash.F90
flash4 000000000041DBB3 MAIN__ 51
Flash.F90
flash4 000000000070C50F amr_redist_blk_ 686
mpi_amr_redist_blk.F90
flash4 00000000004FE560 amr_morton_order_ 164
amr_morton_order.F90
flash4 000000000071A7DA amr_refine_derefi 319
mpi_amr_refine_derefine.F90
flash4 00000000005D8B22 gr_updaterefineme 112
gr_updateRefinement.F90
flash4 0000000000457A10 grid_updaterefine 98
Grid_updateRefinement.F90
flash4 0000000000413F74 driver_evolveflas 390
Driver_evolveFlash.F90
flash4 000000000041DBB3 MAIN__ 51
Flash.F90
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20170809/dbca2fc6/attachment.htm>
More information about the flash-users
mailing list