[FLASH-USERS] FLASH crashing: "iteration, no. not moved"

Klaus Weide klaus at flash.uchicago.edu
Tue Mar 7 22:57:45 EST 2017


On Tue, 7 Mar 2017, Dominik Derigs wrote:

> Dear FLASH users,
> 
> I'm seeing a problem for quite some time on our local cluster which I don't
> seem able to get rid of. Whenever I run a sufficiently large simulation, it
> will fail sooner or later while paramesh writes this to the output:
> 
>    45251 8.5418E-01 5.7059E-06  ( 7.080E-02,  8.350E-02,  0.000E+00) |
>  5.706E-06
>   iteration, no. not moved =            0        4629
>   iteration, no. not moved =            1           3
>   iteration, no. not moved =            2           1
>   iteration, no. not moved =            3           1
>   iteration, no. not moved =            4           1
> [...]
>   iteration, no. not moved =           98           1
>   iteration, no. not moved =           99           1
>   iteration, no. not moved =          100           1
>   ERROR: could not move all blocks in amr_redist_blk
>   Try increasing maxblocks or use more processors
>   nm2_old, nm2 =            1           1
>   ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Hi Dominik,

I have not encountered this problem myself.

Please provide some more info about your setup.
Some particular questions:

Are you using face variables?

Does the same happen if you 
 - use the older Paramesh implementation (setup with +pm40) ?
 - or, alternatively, set the following runtime parameters to .false. ?
     
      use_flash_surr_blks_fill
      use_reduced_orrery

Is Paramesh trying to increase or to decrease the number of blocks when 
this happens? (The long file may show this information, perhaps in a 
message like this: 
  ... [GRID amr_refine_derefine]: redist. phase.  tot blks requested: 453

Have you made any unusual changes to Paramesh?
In particular, is it still true that
   
        maxblocks_tr = 10*maxblocks

(as per amr_initialize.F90) ?


> Do you know how to prevent this error from happening or - if not - if it is
> safe to remove the corresponding MPI_ABORT entirely and just work with one
> block not being shifted around correctly?

I do not think this would be safe. Meta-information has already been 
moved and modified at this point, under the assumption that movement of 
the contents of all blocks succeeds. it would be inconsistent if that last
blocks then does not actually get moved.

Klaus




More information about the flash-users mailing list