[FLASH-USERS] Issue restarting FLASH - hangs after [GRID amr_refine_derefine]: refinement complete

Ryan Farber rjfarber at umich.edu
Wed Sep 13 11:01:06 EDT 2023


Hi Tim,

Excuse the very brief response but historically I've found allowing for
more of an overhead in memory helps with this issue.

Best,
--------
Ryan


On Tue, Sep 12, 2023 at 11:08 AM Timothy Mark Johnson <tmarkj at mit.edu>
wrote:

> Hi FLASH users,
>
>
>
> I’ve been trying to run some relatively large 3D simulations but I’m
> consistently running into issues restarting from the checkpoint files. The
> checkpoint files are about 150 GB and seem to be read in just fine. The
> code loads it in, then freezes after the AMR refinement is complete. It
> will stay here indefinitely. I’m running on 30 nodes each with 32 cores.
> All the files live in a luster filesystem.
>
> I’ve managed to restart it sometimes by moving the checkpoint file to
> different locations, but this has been pretty hit or miss. The
> supercomputer also might be giving me different nodes between tries so it
> might be an issue with specific nodes. Maybe the nodes it give me are too
> far apart? I’m not sure if that’s realistic though…
>
>
>
> Has anyone else had issues restarting large simulations? I wonder how much
> of this might be a result of issues with the supercomputer. I’ve attached
> my terminal output and the .log file. Please let me know if additional
> information would be helpful.
>
>
>
> Best,
>
> Tim Johnson
> _______________________________________________
> flash-users mailing list
> flash-users at flash.rochester.edu
>
> For list info, including unsubscribe:
> https://flash.rochester.edu/mailman/listinfo/flash-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230913/7ccb7f10/attachment.htm>


More information about the flash-users mailing list