[FLASH-USERS] Flash3 hanging in refinement

Seyit Hocuk seyit at astro.rug.nl
Mon May 14 12:24:19 EDT 2012


wow, 173352 that's quite a lot of blocks. If you're sure it isn't due to 
maxblocks, then I don't know. Anyone else care to comment?


On 05/14/2012 06:14 PM, Dave wrote:
> Hi,
> I would have thought in that case that it would have triggered the 
> error message and quit, but it's possible it hung while traying to 
> abort. I should be able to support more than 173352 blocks with my 
> settings for MAXBLOCKS & my number of processors, but I guess that 
> number is just the number of blocks that are being moved between 
> threads, not the total number of blocks in the system?
> -David
> On 14/05/2012 12:53 p.m., Seyit Hocuk wrote:
>> Hi,
>> It seems like you do not have enough blocks per cpu, i.e., it is 
>> reaching the maximum number of blocks. Either increase the number 
>> MAXBLOCKS (while compiling, or in the files Makefile & Flash.h and 
>> remake the directory), or increase the number of processors you are 
>> using.
>> Best,
>> Seyit
>> On 05/14/2012 05:50 PM, Dave wrote:
>>> Hi,
>>> I am finding that FLASH hangs in the refinement step. The final 
>>> output in the log file is:
>>> ----
>>>  [ 05-13-2012  03:56:30.023 ] step: n=2132 t=4.078276E+12 
>>> dt=6.783726E+08
>>>  [ 05-13-2012  04:00:08.019 ] [mpi_amr_comm_setup]: 
>>> buffer_dim_send=1519097, buffer_dim_recv=1485373
>>>  [ 05-13-2012  04:00:38.738 ] [GRID amr_refine_derefine]: initiating 
>>> refinement
>>> ----
>>> and the final output in the standard dump is:
>>> ----
>>>   iteration, no. not moved =            0      173352
>>>   iteration, no. not moved =            1      169097
>>>   iteration, no. not moved =            2      165742
>>>   iteration, no. not moved =            3      162432
>>> <etc etc>
>>>   iteration, no. not moved =           98       19621
>>>   iteration, no. not moved =           99       18939
>>>   iteration, no. not moved =          100       18282
>>> ----
>>> The code isn't crashing, and it appears that the processors are 
>>> still running according to qstat etc, but it hasn't produced any 
>>> output at all in over 24 hours.
>>> It looks like it isn't redistributing blocks properly, but I would 
>>> have assumed that would trigger an error message in 
>>> mpi_amr_redist_blk.F90 :
>>> ----
>>>       if (nm2_old.eq.nm2.and.nm2.ne.0.and.nit>=100) then
>>>          if (mype.eq.0) then
>>>           print *,' ERROR: could not move all blocks in amr_redist_blk'
>>>           print *,' Try increasing maxblocks or use more processors'
>>>           print *,' nm2_old, nm2 = ',nm2_old,nm2
>>>           print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
>>>          end if
>>>          call MPI_ABORT(MPI_COMM_WORLD,errorcode,ierr)
>>>       end if
>>> ----
>>> Any clues or suggestions?
>>> Cheers,
>>> David Williamson
>>> PhD Candidate
>>> St. Mary's University
>>> Halifax, NS

More information about the flash-users mailing list