[FLASH-USERS] Restart

Anshu Dubey dubey at flash.uchicago.edu
Mon Dec 17 12:29:54 EST 2007


The checkpoint and plotfile numbers are not related to each other.
For the plotfile, the number only provides the number from where
the plotfile count starts upon restart.

But if your second checkpoint didn't get written correctly, I am afraid
you will have to restart from the first checkpoint, and repeat all the
steps in between. You might want to increase the frequence of
checkpointing if you are running into IO problems, that way you won't
have to rollback as much.

Anshu
>
>    I am using FLASH2.5. My programme stops while writing a checkpint file
> after outputing several plot files, but it has not completed the run till
> tstop value.
>
>   [ 12-14-2007  21:45.22 ] step: n=364 t=5.630710E-09 dt=1.014811E-11
>   [ 12-14-2007  21:48.17 ] [AMR_REFINE_DEREFINE]: refinement initiated at
> 21:48.1 7
>   [ 12-14-2007  21:49.17 ] [AMR_REFINE_DEREFINE] blocks   all:  min=1326
> max=1336  tot=13321
>   [ 12-14-2007  21:49.17 ] [AMR_REFINE_DEREFINE] blocks valid:  min=1163
> max=1167  tot=11656
>   [ 12-14-2007  21:49.17 ] [AMR_REFINE_DEREFINE]: refinement complete
>   [ 12-14-2007  21:49.32 ] file_wr_open: type=checkpoint
> name=Ni_Tem_hdf5_chk_000 2
>
>
> I want to restart the simulation with previous checkpointpoint file
> Ni_Tem_hdf5_chk_0001. According to FLASH manual I change the restart
> logical variable to .true.
> cpunumber I specfied 0001 ( last written checkpointfile)
> pltnumber 0001 ( last written plotfile number after 0001 check point)
>
> Although the last written pltnumber is 0056. But there is gap between
> checkpoint file chk_0001 and chk_0002.
>
> But with this I am not able to restart the simulation. How do I restart
> the simulation run if I want to get the plotfile (0057) after the last
> plotfile generated (0056).
> thanks,
> Mousumi
>
> On Mon, 17 Dec 2007, Anshu Dubey wrote:
>
>> People have used the parallel hdf5 module in Flash2.5 at the center
>> and still do, but on many platforms the restart is extremely slow.
>> We haven't been able to determine the cause. However, fortunately,
>> the problem didn't get carried over to Flash3, so we strongly
>> recommend switching over to Flash3 if it is possible for you.
>>
>> Anshu
>>
>>> Hi all,
>>>
>>>   I have a serious problem when using the parallel hdf5 module to
>>> restart from a checkpoint file. It takes about half an hour to read a
>>> single variable to each node for a 1GB file. The same problem was
>>> reported two year ago
>>> (http://flash.uchicago.edu/pipermail/flash-users/2005-April/001938.html)
>>> but I cannot find the follow-ups. Did anyone succeed in using the
>>> parallel hdf5 module so far? Will the Flash center continue support
>>> such
>>> issue on Flash2.5? Thanks!
>>>
>>> Bests,
>>> Shikui
>>>
>>
>>
>>
>>
>




More information about the flash-users mailing list