[FLASH-USERS] FLASH crashing: "iteration, no. not moved"

Klaus Weide klaus at flash.uchicago.edu
Fri Mar 24 14:51:22 EDT 2017


Dominik Derigs wrote:

> 
>    45251 8.5418E-01 5.7059E-06  ( 7.080E-02,  8.350E-02,  0.000E+00) |
>  5.706E-06
>   iteration, no. not moved =            0        4629
>   iteration, no. not moved =            1           3
>   iteration, no. not moved =            2           1
>   iteration, no. not moved =            3           1
>   iteration, no. not moved =            4           1
> [...]
>   iteration, no. not moved =           98           1
>   iteration, no. not moved =           99           1
>   iteration, no. not moved =          100           1
>   ERROR: could not move all blocks in amr_redist_blk
>   Try increasing maxblocks or use more processors
>   nm2_old, nm2 =            1           1
>   ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


It turns out that this was a timing issue. Apparently, on some systems, 
some asynchronous MPI messages can occasionally get delayed so much that 
the block redistribution algorithm of PARAMESH fails to receive them 
within 100 or so iterations of its main loop.

The attached small patch, to be applied to source file 
mpi_amr_redist_blk.F90 in 
Grid/GridMain/paramesh/paramesh4/Paramesh4dev//PM4_package/mpi_source, 
aims to avoid the problem by introducing extra 1-second delays after
the 50th iteration. This appears to solve the problem for Dominik.

The patch does nothing to address the question why some messages are slow.

If you want to use this patch, you may want to do some fine-tuning - 
perhaps of the delay, or of  iteration when it kicks in.

I am considering including some modification of this in future FLASH
releases.

Klaus
-------------- next part --------------
Index: source/Grid/GridMain/paramesh/paramesh4/Paramesh4dev/PM4_package/mpi_source/mpi_amr_redist_blk.F90
===================================================================
--- source/Grid/GridMain/paramesh/paramesh4/Paramesh4dev/PM4_package/mpi_source/mpi_amr_redist_blk.F90	(revision 25652)
+++ source/Grid/GridMain/paramesh/paramesh4/Paramesh4dev/PM4_package/mpi_source/mpi_amr_redist_blk.F90	(working copy)
@@ -701,6 +701,10 @@
          If (mype == 0) Then
             Print *,' iteration, no. not moved = ',nit,nm2
          End If
+         if (nit .ge. 50) Then
+          Print *,'* sleeping for one second in mpi_amr_redist_blk'
+           call sleep(1)
+         end if
          
          nit = nit + 1
          


More information about the flash-users mailing list