[FLASH-USERS] FLASH crashing: "iteration, no. not moved"

Dominik Derigs derigs at ph1.uni-koeln.de
Wed Mar 8 06:25:00 EST 2017


Hi Klaus,

Thank you for your messages.

Some answers to the question you mentioned:

> Are you using face variables?

No.

Does the same happen if you
>  - use the older Paramesh implementation (setup with +pm40) ?
>  - or, alternatively, set the following runtime parameters to .false. ?
>       use_flash_surr_blks_fill
>       use_reduced_orrery

I'll have to try this.

Is Paramesh trying to increase or to decrease the number of blocks when
> this happens?

It tried to increase by 4 blocks, these are the most recent messages:
 [ 03-07-2017  01:59:00.137 ] [GRID amr_refine_derefine]: redist. phase.
 tot blks requested: 26466
 [...]
 [ 03-07-2017  01:59:51.245 ] [GRID amr_refine_derefine]: redist. phase.
 tot blks requested: 26470
 (end of log file)

Have you made any unusual changes to Paramesh?

No.

In particular, is it still true that
>         maxblocks_tr = 10*maxblocks
> (as per amr_initialize.F90) ?

 Yes.

Meta-information has already been
> moved and modified at this point, under the assumption that movement of
> the contents of all blocks succeeds. it would be inconsistent if that last
> blocks then does not actually get moved.

I was afraid that this might have been the case.

I added the debug code you suggested in your follow-up mail and queued a
new simulation.

Best regards,
Dominik

2017-03-08 4:57 GMT+01:00 Klaus Weide <klaus at flash.uchicago.edu>:

> On Tue, 7 Mar 2017, Dominik Derigs wrote:
>
> > Dear FLASH users,
> >
> > I'm seeing a problem for quite some time on our local cluster which I
> don't
> > seem able to get rid of. Whenever I run a sufficiently large simulation,
> it
> > will fail sooner or later while paramesh writes this to the output:
> >
> >    45251 8.5418E-01 5.7059E-06  ( 7.080E-02,  8.350E-02,  0.000E+00) |
> >  5.706E-06
> >   iteration, no. not moved =            0        4629
> >   iteration, no. not moved =            1           3
> >   iteration, no. not moved =            2           1
> >   iteration, no. not moved =            3           1
> >   iteration, no. not moved =            4           1
> > [...]
> >   iteration, no. not moved =           98           1
> >   iteration, no. not moved =           99           1
> >   iteration, no. not moved =          100           1
> >   ERROR: could not move all blocks in amr_redist_blk
> >   Try increasing maxblocks or use more processors
> >   nm2_old, nm2 =            1           1
> >   ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> Hi Dominik,
>
> I have not encountered this problem myself.
>
> Please provide some more info about your setup.
> Some particular questions:
>
> Are you using face variables?
>
> Does the same happen if you
>  - use the older Paramesh implementation (setup with +pm40) ?
>  - or, alternatively, set the following runtime parameters to .false. ?
>
>       use_flash_surr_blks_fill
>       use_reduced_orrery
>
> Is Paramesh trying to increase or to decrease the number of blocks when
> this happens? (The long file may show this information, perhaps in a
> message like this:
>   ... [GRID amr_refine_derefine]: redist. phase.  tot blks requested: 453
>
> Have you made any unusual changes to Paramesh?
> In particular, is it still true that
>
>         maxblocks_tr = 10*maxblocks
>
> (as per amr_initialize.F90) ?
>
>
> > Do you know how to prevent this error from happening or - if not - if it
> is
> > safe to remove the corresponding MPI_ABORT entirely and just work with
> one
> > block not being shifted around correctly?
>
> I do not think this would be safe. Meta-information has already been
> moved and modified at this point, under the assumption that movement of
> the contents of all blocks succeeds. it would be inconsistent if that last
> blocks then does not actually get moved.
>
> Klaus
>
>


-- 
Dominik Derigs
I. Physikalisches Institut
Universität zu Köln
Zülpicher Straße 77
50937 Köln
GERMANY

https://hera.ph1.uni-koeln.de/~derigs/

Tel. (+49|0) 221 470-8352
Fax. (+49|0) 221 470-5162

Diese Email ist vertraulich und nur für den angegebenen Empfänger bestimmt.
Zugang, Freigabe, die Kopie, die Verteilung oder Weiterleitung durch jemand
anderen außer dem Empfänger selbst ist verboten und kann eine kriminelle
Handlung sein. Bitte löschen Sie die Email, wenn Sie sie durch einen Fehler
erhalten haben und informieren Sie den Absender.

This email and any files transmitted with it may contain confidential
and/or privileged material and is intended only for the person or entity to
which it is addressed. Any review, retransmission, dissemination or other
use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
have received this email in error, please notify the sender immediately and
delete this material from all known records.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20170308/e513ee39/attachment.htm>


More information about the flash-users mailing list