[FLASH-USERS] Restarting on Stampede
Amit Kashi
kashi at physics.umn.edu
Wed Dec 3 14:51:55 EST 2014
Hi,
I am trying to restart a run on Stampede, however the run reaches the
[io_readData] step and doesn't progress beyond it (even after many hours).
I use version 4.2.2, 256 cores, and the checkpoint file is ~1.6GB.
The checkpoint file is perfectly OK and I was able to restart it on a
different machine (Blacklight).
The io unit I use is:
IO/IOMain/hdf5/serial/PM
The loaded hdf module is:
hdf5/1.8.13
Could it be that the problem is related to the parallel support
+hdf5TypeIO I am not using? I can recompile the code with the +hdf5TypeIO
option but I don't know whether the checkpoint file that was written
without this option will be read correctly.
I also found these discussions in the list:
http://flash.uchicago.edu/pipermail/flash-users/2013-April/001266.html
http://flash.uchicago.edu/pipermail/flash-users/2013-May/001277.html
I don't know if the patch mentioned in the second discussion
(fix_io_incompatibility.diff) is relevant for version 4.2.2 I am using.
Thank you,
Amit
More information about the flash-users
mailing list