[FLASH-USERS] Timing problem in LaserSlab example

Klaus Weide klaus at flash.uchicago.edu
Fri Aug 16 16:47:51 EDT 2013


On Fri, 16 Aug 2013, Reem Alraddadi wrote:

> Hi all,
> 
> I am running a Modified LaserSlab example for three materials with the
> following setup line:
> 
> Slab3 -2d +pm4dev -nxb=16 -nyb=16 +mtmmmt +mgd species=cham,targ,foil
> mgd_meshgroups=6 -parfile=example.par +3t -maxblocks=1000
> 
> the run goes OK at the first time with lrefine_max=6 but when I goes
> with lrefine_max=9
> I got the following warning message :
> 
> Warning: The initial timestep is too large.
> 
> initial timestep = 1.000000000000000E-014
> 
> CFL timestep = 5.139420973957487E-014
> 
> Resetting dtinit to TIMESTEP_SLOW_START_FACTOR*dtcfl.

[...]

> So I reduced the initial time step and made dtinit=dtmin=0.1e-14  and I
> increased nend to 8000 as I am really interested in time when it reaches to
> 50ps until 200ps. However, by doing that , the warning message does not
> exist any more but ...

Reem,

You did not necessarily have to modify dtinit, since (as the Warning 
message tries to say) the initial dt was reset to 0.5139420973957487e-14
automatically. (I believe TIMESTEP_SLOW_START_FACTOR is hardwired to 0.1.)
It is good that you decreased dtmin. It may also be good that you 
decreased dtinit to below that automatic value.

However, those things are probably unrelated to the ultimate failure of 
your runs.

>   I found the run didn't complete and just reach when
> t=5e-11 and n step was only 1809 . Also, I found an output file regard to
> Hypre library which I didn't understand what does mean. I have attached
> with this e-mail my flash.par file, out.log, lasslab.log and the output
> messages regard to Hpyre file. I need my run reach to time from 50ps until
> 200 ps. Could you help me with this, please?

It seems you got to ~ 53 ps. Then the run(s) failed.

You did not provide the contents of the err.err file, even though
your output.log says (twice):

   PS:
   
   Read file <err.err> for stderr output of this job.

Maybe there are some messages in that file explaining while the run(s) 
failed.

You may have exceeded the allowed CPU time in your batch system, or may
have run out of memory, or some similar reason.

> PS:  the flash,par is as same as example.par. Also,I wonder if the problem
> because I run the same problem twice. I found that by mistake I run the
> same file twice. Is this the reason that the run didn't complete ? 

I don't know. But it seems you had two instances of the same simulation
running in the same directory, overwriting each other's output files
and both writing interspersed lines to the log file. It is unnecessarily 
difficult to analyze from this output what has actually happened.

I would suggest you
 * restart, making sure this time that only one instance of the code is
   running.
 * You may restart from a checkpoint instead of starting from step 1,
   to save time (you seem to have dumped checkpoints frequently).
 * If the run fails again, make sure you look at all available output
   including any err.err file.

> and what
> does hypre message mean?

Apparently the run (or at least one of them? - it is unclear whether
the hypre stack traces were generated by one or by both runs) died while
at least some processors where in a hypre function.  It is unclear
to me whether those processors actually caused signals that resulted in 
aborting the runs.

Btw. it isn't clear what version of FLASH you were using. I hope you are
using 4.0.1, because it includes a hypre-related fix to 4.0.

Klaus



More information about the flash-users mailing list