[FLASH-USERS] lrefine and parallel problem in flash 3.0 beta

Klaus Weide klaus at flash.uchicago.edu
Wed Mar 5 12:58:32 EST 2008


On Tue, 4 Mar 2008, turcotte wrote:

> Using flash3.0 beta, if I run a job using lrefine_max =3 and then try running in
> parallel (on 4 cpus) the job always hang on the refinement step; last output in
> flash.log is
>
> ==============================================================================
> [GRID amr_refine_derefine]       initiating refinement
> [GRID amr_refine_derefine] min blks 75    max blks 75    tot blks 600
> [GRID amr_refine_derefine] min leaf blks 75    max leaf blks 75    tot leaf blk
> s 600
> [GRID amr_refine_derefine]       refinement complete
> [ 03-03-2008  20:55.58 ] [GRID gr_expandDomain]: iteration=1, create level=2
> [GRID amr_refine_derefine]       initiating refinement

Hi Sylvain,

  It seems the above is from a run on 8 cpus, since
   600 (tot blks) = 75 (min/max blks) * 8.

You seem to be starting with Nblockx*Nblocky*Nblockz = 75 root blocks,
do you need such a high number of initial blocks?

> If I use lrefine_max =1 then I can run on multiple processors without problem.
>
> If I use lrefine_max =3 but run on a single processor, the job runs fine.
>
> The MPI version we have on the machine is MPICH version 1.2.6.
>
> Similar jobs run without any problem in flash2.5 with arbitrary lrefine and
> number of processors.
>
> Any ideas?


Does the same problem occur with the FLASH3.0 code released last week?

Does the hang still occur when you increase MAXBLOCKS?

Can you reproduce it with fewer processors?  With fewer root blocks?

Also, try changing lrefine_min in addition to lrefine_max (to 2 or 3).

Can you find out which process hangs where but using a debugger?


Just some ideas, as you requested.

Klaus



More information about the flash-users mailing list