[FLASH-USERS] Restarting on Stampede

Amit Kashi kashi at physics.umn.edu
Wed Dec 3 14:51:55 EST 2014


Hi,

I am trying to restart a run on Stampede, however the run reaches the
[io_readData] step and doesn't progress beyond it (even after many hours).

I use version 4.2.2, 256 cores, and the checkpoint file is ~1.6GB.
The checkpoint file is perfectly OK and I was able to restart it on a
different machine (Blacklight).

The io unit I use is:
IO/IOMain/hdf5/serial/PM

The loaded hdf module is:
hdf5/1.8.13

Could it be that the problem is related to the parallel support
+hdf5TypeIO I am not using? I can recompile the code with the +hdf5TypeIO
option but I don't know whether the checkpoint file that was written
without this option will be read correctly.

I also found these discussions in the list:
http://flash.uchicago.edu/pipermail/flash-users/2013-April/001266.html
http://flash.uchicago.edu/pipermail/flash-users/2013-May/001277.html

I don't know if the patch mentioned in the second discussion
(fix_io_incompatibility.diff) is relevant for version 4.2.2 I am using.


Thank you,

Amit






More information about the flash-users mailing list