[FLASH-USERS] Flash3 hanging in refinement
Seyit Hocuk
seyit at astro.rug.nl
Mon May 14 12:24:19 EDT 2012
Hi,
wow, 173352 that's quite a lot of blocks. If you're sure it isn't due to
maxblocks, then I don't know. Anyone else care to comment?
Best,
Seyit
On 05/14/2012 06:14 PM, Dave wrote:
> Hi,
>
> I would have thought in that case that it would have triggered the
> error message and quit, but it's possible it hung while traying to
> abort. I should be able to support more than 173352 blocks with my
> settings for MAXBLOCKS & my number of processors, but I guess that
> number is just the number of blocks that are being moved between
> threads, not the total number of blocks in the system?
>
> -David
>
> On 14/05/2012 12:53 p.m., Seyit Hocuk wrote:
>> Hi,
>>
>> It seems like you do not have enough blocks per cpu, i.e., it is
>> reaching the maximum number of blocks. Either increase the number
>> MAXBLOCKS (while compiling, or in the files Makefile & Flash.h and
>> remake the directory), or increase the number of processors you are
>> using.
>>
>> Best,
>> Seyit
>>
>>
>>
>> On 05/14/2012 05:50 PM, Dave wrote:
>>> Hi,
>>>
>>> I am finding that FLASH hangs in the refinement step. The final
>>> output in the log file is:
>>>
>>> ----
>>> [ 05-13-2012 03:56:30.023 ] step: n=2132 t=4.078276E+12
>>> dt=6.783726E+08
>>> [ 05-13-2012 04:00:08.019 ] [mpi_amr_comm_setup]:
>>> buffer_dim_send=1519097, buffer_dim_recv=1485373
>>> [ 05-13-2012 04:00:38.738 ] [GRID amr_refine_derefine]: initiating
>>> refinement
>>> ----
>>>
>>> and the final output in the standard dump is:
>>>
>>> ----
>>> iteration, no. not moved = 0 173352
>>> iteration, no. not moved = 1 169097
>>> iteration, no. not moved = 2 165742
>>> iteration, no. not moved = 3 162432
>>> <etc etc>
>>> iteration, no. not moved = 98 19621
>>> iteration, no. not moved = 99 18939
>>> iteration, no. not moved = 100 18282
>>> ----
>>>
>>> The code isn't crashing, and it appears that the processors are
>>> still running according to qstat etc, but it hasn't produced any
>>> output at all in over 24 hours.
>>>
>>> It looks like it isn't redistributing blocks properly, but I would
>>> have assumed that would trigger an error message in
>>> mpi_amr_redist_blk.F90 :
>>>
>>> ----
>>> if (nm2_old.eq.nm2.and.nm2.ne.0.and.nit>=100) then
>>> if (mype.eq.0) then
>>> print *,' ERROR: could not move all blocks in amr_redist_blk'
>>> print *,' Try increasing maxblocks or use more processors'
>>> print *,' nm2_old, nm2 = ',nm2_old,nm2
>>> print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
>>> end if
>>> call MPI_ABORT(MPI_COMM_WORLD,errorcode,ierr)
>>> end if
>>> ----
>>>
>>> Any clues or suggestions?
>>>
>>> Cheers,
>>> David Williamson
>>>
>>> PhD Candidate
>>> St. Mary's University
>>> Halifax, NS
>>
>
More information about the flash-users
mailing list