[FLASH-USERS] Flash3 hanging in refinement

Dave david.john.williamson at gmail.com
Mon May 14 12:41:44 EDT 2012


Hi,

Actually, I think you're right. The number 173352 is just the number of 
blocks that are being moved. My limit was 192000, and it looks like in 
the previous refinement I had hit 190392. I'm restarting from checkpoint 
after cranking it up to 224000 blocks.

Thanks!
-David

On 14/05/2012 1:24 p.m., Seyit Hocuk wrote:
> Hi,
>
> wow, 173352 that's quite a lot of blocks. If you're sure it isn't due 
> to maxblocks, then I don't know. Anyone else care to comment?
>
> Best,
> Seyit
>
>
> On 05/14/2012 06:14 PM, Dave wrote:
>> Hi,
>>
>> I would have thought in that case that it would have triggered the 
>> error message and quit, but it's possible it hung while traying to 
>> abort. I should be able to support more than 173352 blocks with my 
>> settings for MAXBLOCKS & my number of processors, but I guess that 
>> number is just the number of blocks that are being moved between 
>> threads, not the total number of blocks in the system?
>>
>> -David
>>
>> On 14/05/2012 12:53 p.m., Seyit Hocuk wrote:
>>> Hi,
>>>
>>> It seems like you do not have enough blocks per cpu, i.e., it is 
>>> reaching the maximum number of blocks. Either increase the number 
>>> MAXBLOCKS (while compiling, or in the files Makefile & Flash.h and 
>>> remake the directory), or increase the number of processors you are 
>>> using.
>>>
>>> Best,
>>> Seyit
>>>
>>>
>>>
>>> On 05/14/2012 05:50 PM, Dave wrote:
>>>> Hi,
>>>>
>>>> I am finding that FLASH hangs in the refinement step. The final 
>>>> output in the log file is:
>>>>
>>>> ----
>>>>  [ 05-13-2012  03:56:30.023 ] step: n=2132 t=4.078276E+12 
>>>> dt=6.783726E+08
>>>>  [ 05-13-2012  04:00:08.019 ] [mpi_amr_comm_setup]: 
>>>> buffer_dim_send=1519097, buffer_dim_recv=1485373
>>>>  [ 05-13-2012  04:00:38.738 ] [GRID amr_refine_derefine]: 
>>>> initiating refinement
>>>> ----
>>>>
>>>> and the final output in the standard dump is:
>>>>
>>>> ----
>>>>   iteration, no. not moved =            0      173352
>>>>   iteration, no. not moved =            1      169097
>>>>   iteration, no. not moved =            2      165742
>>>>   iteration, no. not moved =            3      162432
>>>> <etc etc>
>>>>   iteration, no. not moved =           98       19621
>>>>   iteration, no. not moved =           99       18939
>>>>   iteration, no. not moved =          100       18282
>>>> ----
>>>>
>>>> The code isn't crashing, and it appears that the processors are 
>>>> still running according to qstat etc, but it hasn't produced any 
>>>> output at all in over 24 hours.
>>>>
>>>> It looks like it isn't redistributing blocks properly, but I would 
>>>> have assumed that would trigger an error message in 
>>>> mpi_amr_redist_blk.F90 :
>>>>
>>>> ----
>>>>       if (nm2_old.eq.nm2.and.nm2.ne.0.and.nit>=100) then
>>>>          if (mype.eq.0) then
>>>>           print *,' ERROR: could not move all blocks in 
>>>> amr_redist_blk'
>>>>           print *,' Try increasing maxblocks or use more processors'
>>>>           print *,' nm2_old, nm2 = ',nm2_old,nm2
>>>>           print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
>>>>          end if
>>>>          call MPI_ABORT(MPI_COMM_WORLD,errorcode,ierr)
>>>>       end if
>>>> ----
>>>>
>>>> Any clues or suggestions?
>>>>
>>>> Cheers,
>>>> David Williamson
>>>>
>>>> PhD Candidate
>>>> St. Mary's University
>>>> Halifax, NS
>>>
>>
>




More information about the flash-users mailing list