[FLASH-USERS] Restarting on Stampede

Klaus Weide klaus at flash.uchicago.edu
Fri Dec 5 10:25:16 EST 2014


On Wed, 3 Dec 2014, Amit Kashi wrote:

> I am trying to restart a run on Stampede, however the run reaches the
> [io_readData] step and doesn't progress beyond it (even after many hours).
> 
> I use version 4.2.2, 256 cores, and the checkpoint file is ~1.6GB.
> The checkpoint file is perfectly OK and I was able to restart it on a
> different machine (Blacklight).

I am not on stampede and cannot say anything specific to that environment.

> The io unit I use is:
> IO/IOMain/hdf5/serial/PM
> 
> The loaded hdf module is:
> hdf5/1.8.13
> 
> Could it be that the problem is related to the parallel support
> +hdf5TypeIO I am not using? I can recompile the code with the +hdf5TypeIO
> option but I don't know whether the checkpoint file that was written
> without this option will be read correctly.

You should try it.

> I also found these discussions in the list:
> http://flash.uchicago.edu/pipermail/flash-users/2013-April/001266.html
> http://flash.uchicago.edu/pipermail/flash-users/2013-May/001277.html
> 
> I don't know if the patch mentioned in the second discussion
> (fix_io_incompatibility.diff) is relevant for version 4.2.2 I am using.

The code that is added by that patch is included in FLASH 4.2 and later,
so you do not have to worry about that.  Your old checkpoint files SHOULD
work in FLASH 4.2.2 configured with +hdf5TypeIO.

Klaus



More information about the flash-users mailing list