[FLASH-USERS] Flash3 hanging in refinement
Dave
david.john.williamson at gmail.com
Mon May 14 12:41:44 EDT 2012
Hi,
Actually, I think you're right. The number 173352 is just the number of
blocks that are being moved. My limit was 192000, and it looks like in
the previous refinement I had hit 190392. I'm restarting from checkpoint
after cranking it up to 224000 blocks.
Thanks!
-David
On 14/05/2012 1:24 p.m., Seyit Hocuk wrote:
> Hi,
>
> wow, 173352 that's quite a lot of blocks. If you're sure it isn't due
> to maxblocks, then I don't know. Anyone else care to comment?
>
> Best,
> Seyit
>
>
> On 05/14/2012 06:14 PM, Dave wrote:
>> Hi,
>>
>> I would have thought in that case that it would have triggered the
>> error message and quit, but it's possible it hung while traying to
>> abort. I should be able to support more than 173352 blocks with my
>> settings for MAXBLOCKS & my number of processors, but I guess that
>> number is just the number of blocks that are being moved between
>> threads, not the total number of blocks in the system?
>>
>> -David
>>
>> On 14/05/2012 12:53 p.m., Seyit Hocuk wrote:
>>> Hi,
>>>
>>> It seems like you do not have enough blocks per cpu, i.e., it is
>>> reaching the maximum number of blocks. Either increase the number
>>> MAXBLOCKS (while compiling, or in the files Makefile & Flash.h and
>>> remake the directory), or increase the number of processors you are
>>> using.
>>>
>>> Best,
>>> Seyit
>>>
>>>
>>>
>>> On 05/14/2012 05:50 PM, Dave wrote:
>>>> Hi,
>>>>
>>>> I am finding that FLASH hangs in the refinement step. The final
>>>> output in the log file is:
>>>>
>>>> ----
>>>> [ 05-13-2012 03:56:30.023 ] step: n=2132 t=4.078276E+12
>>>> dt=6.783726E+08
>>>> [ 05-13-2012 04:00:08.019 ] [mpi_amr_comm_setup]:
>>>> buffer_dim_send=1519097, buffer_dim_recv=1485373
>>>> [ 05-13-2012 04:00:38.738 ] [GRID amr_refine_derefine]:
>>>> initiating refinement
>>>> ----
>>>>
>>>> and the final output in the standard dump is:
>>>>
>>>> ----
>>>> iteration, no. not moved = 0 173352
>>>> iteration, no. not moved = 1 169097
>>>> iteration, no. not moved = 2 165742
>>>> iteration, no. not moved = 3 162432
>>>> <etc etc>
>>>> iteration, no. not moved = 98 19621
>>>> iteration, no. not moved = 99 18939
>>>> iteration, no. not moved = 100 18282
>>>> ----
>>>>
>>>> The code isn't crashing, and it appears that the processors are
>>>> still running according to qstat etc, but it hasn't produced any
>>>> output at all in over 24 hours.
>>>>
>>>> It looks like it isn't redistributing blocks properly, but I would
>>>> have assumed that would trigger an error message in
>>>> mpi_amr_redist_blk.F90 :
>>>>
>>>> ----
>>>> if (nm2_old.eq.nm2.and.nm2.ne.0.and.nit>=100) then
>>>> if (mype.eq.0) then
>>>> print *,' ERROR: could not move all blocks in
>>>> amr_redist_blk'
>>>> print *,' Try increasing maxblocks or use more processors'
>>>> print *,' nm2_old, nm2 = ',nm2_old,nm2
>>>> print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
>>>> end if
>>>> call MPI_ABORT(MPI_COMM_WORLD,errorcode,ierr)
>>>> end if
>>>> ----
>>>>
>>>> Any clues or suggestions?
>>>>
>>>> Cheers,
>>>> David Williamson
>>>>
>>>> PhD Candidate
>>>> St. Mary's University
>>>> Halifax, NS
>>>
>>
>
More information about the flash-users
mailing list