[FLASH-USERS] Issue restarting FLASH - hangs after [GRID amr_refine_derefine]: refinement complete

Timothy Mark Johnson tmarkj at mit.edu
Mon Sep 18 10:47:58 EDT 2023


Hi Ryan,

I managed to get it working consistently by reducing both the memory usage and the number of nodes (went from 30 nodes to 20).

Best,
Tim

From: Ryan Farber <rjfarber at umich.edu>
Sent: Wednesday, September 13, 2023 11:01 AM
To: Timothy Mark Johnson <tmarkj at mit.edu>
Cc: flash-users at flash.rochester.edu
Subject: Re: [FLASH-USERS] Issue restarting FLASH - hangs after [GRID amr_refine_derefine]: refinement complete

Hi Tim,

Excuse the very brief response but historically I've found allowing for more of an overhead in memory helps with this issue.

Best,
--------
Ryan


On Tue, Sep 12, 2023 at 11:08 AM Timothy Mark Johnson <tmarkj at mit.edu<mailto:tmarkj at mit.edu>> wrote:
Hi FLASH users,

I’ve been trying to run some relatively large 3D simulations but I’m consistently running into issues restarting from the checkpoint files. The checkpoint files are about 150 GB and seem to be read in just fine. The code loads it in, then freezes after the AMR refinement is complete. It will stay here indefinitely. I’m running on 30 nodes each with 32 cores. All the files live in a luster filesystem.

I’ve managed to restart it sometimes by moving the checkpoint file to different locations, but this has been pretty hit or miss. The supercomputer also might be giving me different nodes between tries so it might be an issue with specific nodes. Maybe the nodes it give me are too far apart? I’m not sure if that’s realistic though…

Has anyone else had issues restarting large simulations? I wonder how much of this might be a result of issues with the supercomputer. I’ve attached my terminal output and the .log file. Please let me know if additional information would be helpful.

Best,
Tim Johnson
_______________________________________________
flash-users mailing list
flash-users at flash.rochester.edu<mailto:flash-users at flash.rochester.edu>

For list info, including unsubscribe:
https://flash.rochester.edu/mailman/listinfo/flash-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230918/471eece3/attachment.htm>


More information about the flash-users mailing list