[FLASH-USERS] Flash3 hanging in refinement

Dave david.john.williamson at gmail.com
Mon May 14 13:58:29 EDT 2012


Nah, I didn't use that option. I think I'm just hitting the MAXBLOCKS limit.

-Dave

On 14/05/2012 2:07 p.m., James Guillochon wrote:
> Hi Dave, by chance are you using +pm4dev? If so, did you copy the
> amr_runtime_parameters file to your run directory?
>
> - James
>
> On May 14, 2012, at 9:41 AM, Dave<david.john.williamson at gmail.com>  wrote:
>
>> Hi,
>>
>> Actually, I think you're right. The number 173352 is just the number of blocks that are being moved. My limit was 192000, and it looks like in the previous refinement I had hit 190392. I'm restarting from checkpoint after cranking it up to 224000 blocks.
>>
>> Thanks!
>> -David
>>
>> On 14/05/2012 1:24 p.m., Seyit Hocuk wrote:
>>> Hi,
>>>
>>> wow, 173352 that's quite a lot of blocks. If you're sure it isn't due to maxblocks, then I don't know. Anyone else care to comment?
>>>
>>> Best,
>>> Seyit
>>>
>>>
>>> On 05/14/2012 06:14 PM, Dave wrote:
>>>> Hi,
>>>>
>>>> I would have thought in that case that it would have triggered the error message and quit, but it's possible it hung while traying to abort. I should be able to support more than 173352 blocks with my settings for MAXBLOCKS&  my number of processors, but I guess that number is just the number of blocks that are being moved between threads, not the total number of blocks in the system?
>>>>
>>>> -David
>>>>
>>>> On 14/05/2012 12:53 p.m., Seyit Hocuk wrote:
>>>>> Hi,
>>>>>
>>>>> It seems like you do not have enough blocks per cpu, i.e., it is reaching the maximum number of blocks. Either increase the number MAXBLOCKS (while compiling, or in the files Makefile&  Flash.h and remake the directory), or increase the number of processors you are using.
>>>>>
>>>>> Best,
>>>>> Seyit
>>>>>
>>>>>
>>>>>
>>>>> On 05/14/2012 05:50 PM, Dave wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am finding that FLASH hangs in the refinement step. The final output in the log file is:
>>>>>>
>>>>>> ----
>>>>>> [ 05-13-2012  03:56:30.023 ] step: n=2132 t=4.078276E+12 dt=6.783726E+08
>>>>>> [ 05-13-2012  04:00:08.019 ] [mpi_amr_comm_setup]: buffer_dim_send=1519097, buffer_dim_recv=1485373
>>>>>> [ 05-13-2012  04:00:38.738 ] [GRID amr_refine_derefine]: initiating refinement
>>>>>> ----
>>>>>>
>>>>>> and the final output in the standard dump is:
>>>>>>
>>>>>> ----
>>>>>>   iteration, no. not moved =            0      173352
>>>>>>   iteration, no. not moved =            1      169097
>>>>>>   iteration, no. not moved =            2      165742
>>>>>>   iteration, no. not moved =            3      162432
>>>>>> <etc etc>
>>>>>>   iteration, no. not moved =           98       19621
>>>>>>   iteration, no. not moved =           99       18939
>>>>>>   iteration, no. not moved =          100       18282
>>>>>> ----
>>>>>>
>>>>>> The code isn't crashing, and it appears that the processors are still running according to qstat etc, but it hasn't produced any output at all in over 24 hours.
>>>>>>
>>>>>> It looks like it isn't redistributing blocks properly, but I would have assumed that would trigger an error message in mpi_amr_redist_blk.F90 :
>>>>>>
>>>>>> ----
>>>>>>       if (nm2_old.eq.nm2.and.nm2.ne.0.and.nit>=100) then
>>>>>>          if (mype.eq.0) then
>>>>>>           print *,' ERROR: could not move all blocks in amr_redist_blk'
>>>>>>           print *,' Try increasing maxblocks or use more processors'
>>>>>>           print *,' nm2_old, nm2 = ',nm2_old,nm2
>>>>>>           print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
>>>>>>          end if
>>>>>>          call MPI_ABORT(MPI_COMM_WORLD,errorcode,ierr)
>>>>>>       end if
>>>>>> ----
>>>>>>
>>>>>> Any clues or suggestions?
>>>>>>
>>>>>> Cheers,
>>>>>> David Williamson
>>>>>>
>>>>>> PhD Candidate
>>>>>> St. Mary's University
>>>>>> Halifax, NS




More information about the flash-users mailing list