[FLASH-USERS] 4000+ cpus on Franklin

Klaus Weide klaus at flash.uchicago.edu
Fri Sep 4 23:57:58 EDT 2009


On Fri, 4 Sep 2009, James Guillochon wrote:

> Etc, etc. The -1 subscript is coming from the "to_be_sent" array, which by
> default is initialized as an array with all entries = -1, but is populated by
> MPI calls in another function. So it seems like the error leading to my crash
> is somewhere in the PM3 mpi_source directory.
> 
> Anyone familiar with that part of the code? I would upgrade to 3.2 to see if
> PM4 fixes the problem, but I am not sure if I can restart from my 3.0
> checkpoint if I do that.

James,

Yes it is a good idea to upgrade to 3.2.  Quite a bit of experience with 
running on large numbers of processors has gone into the code development 
since version 3.0 was released.  So the cause of the problem may well have 
been fixed, with either the Paramesh4.0 or the Paramesh4dev variants of 
PARAMESH that come with FLASH3.2.

The format of checkpoint files has changed slightly, but restarting 
with version 3.2 code from 3.0 checkpoint files should work fine.
(Unless the grid state was somehow already corrupt when the checkpoint
was written!)

Klaus



More information about the flash-users mailing list