[FLASH-USERS] Flash3 hanging in refinement

Hunger, Lars Lars.Hunger at uibk.ac.at
Tue May 15 05:45:30 EDT 2012


Hi,

I had the exact same error too.
I think its the problem you already guessed, the problem is the moving of the blocks.
If you near the Maxblocks limit the swapping space gets small and he cannot redistribute the blocks in the 100 iterations he does for blockredistribution.
As already suggested before, it helped to increase MAXBLOCKS or number of processors.
What I also did was change the number of iterations to a bigger number (nit>=10000), so he can do more steps for swapping.
That didn't take too long since the swapping seems to be quite fast.
But I think this problem only happens if you are already close to the point where you overstep your overall blocklimit, so the first 2 solutions are probably better. 

Best,
Lars Hunger


-----Original Message-----
From: flash-users-bounces at flash.uchicago.edu on behalf of John ZuHone
Sent: Mon 5/14/2012 8:35 PM
To: Dave
Cc: flash-users at flash.uchicago.edu
Subject: Re: [FLASH-USERS] Flash3 hanging in refinement
 
Hi Dave,

Unfortunately the desired behavior in this case (triggering the error message) is not what always happens. MAXBLOCKS is indeed the max number of blocks per processor. I don't always know the reason (and maybe someone else can comment), but I do see "early" crashes like this from time to time for this reason.

Best,

John

On May 14, 2012, at 12:14 PM, Dave wrote:

> Hi,
> 
> I would have thought in that case that it would have triggered the error message and quit, but it's possible it hung while traying to abort. I should be able to support more than 173352 blocks with my settings for MAXBLOCKS & my number of processors, but I guess that number is just the number of blocks that are being moved between threads, not the total number of blocks in the system?
> 
> -David
> 
> On 14/05/2012 12:53 p.m., Seyit Hocuk wrote:
>> Hi,
>> 
>> It seems like you do not have enough blocks per cpu, i.e., it is reaching the maximum number of blocks. Either increase the number MAXBLOCKS (while compiling, or in the files Makefile & Flash.h and remake the directory), or increase the number of processors you are using.
>> 
>> Best,
>> Seyit
>> 
>> 
>> 
>> On 05/14/2012 05:50 PM, Dave wrote:
>>> Hi,
>>> 
>>> I am finding that FLASH hangs in the refinement step. The final output in the log file is:
>>> 
>>> ----
>>> [ 05-13-2012  03:56:30.023 ] step: n=2132 t=4.078276E+12 dt=6.783726E+08
>>> [ 05-13-2012  04:00:08.019 ] [mpi_amr_comm_setup]: buffer_dim_send=1519097, buffer_dim_recv=1485373
>>> [ 05-13-2012  04:00:38.738 ] [GRID amr_refine_derefine]: initiating refinement
>>> ----
>>> 
>>> and the final output in the standard dump is:
>>> 
>>> ----
>>>  iteration, no. not moved =            0      173352
>>>  iteration, no. not moved =            1      169097
>>>  iteration, no. not moved =            2      165742
>>>  iteration, no. not moved =            3      162432
>>> <etc etc>
>>>  iteration, no. not moved =           98       19621
>>>  iteration, no. not moved =           99       18939
>>>  iteration, no. not moved =          100       18282
>>> ----
>>> 
>>> The code isn't crashing, and it appears that the processors are still running according to qstat etc, but it hasn't produced any output at all in over 24 hours.
>>> 
>>> It looks like it isn't redistributing blocks properly, but I would have assumed that would trigger an error message in mpi_amr_redist_blk.F90 :
>>> 
>>> ----
>>>      if (nm2_old.eq.nm2.and.nm2.ne.0.and.nit>=100) then
>>>         if (mype.eq.0) then
>>>          print *,' ERROR: could not move all blocks in amr_redist_blk'
>>>          print *,' Try increasing maxblocks or use more processors'
>>>          print *,' nm2_old, nm2 = ',nm2_old,nm2
>>>          print *,' ABORTING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
>>>         end if
>>>         call MPI_ABORT(MPI_COMM_WORLD,errorcode,ierr)
>>>      end if
>>> ----
>>> 
>>> Any clues or suggestions?
>>> 
>>> Cheers,
>>> David Williamson
>>> 
>>> PhD Candidate
>>> St. Mary's University
>>> Halifax, NS
>> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20120515/a44cfac5/attachment.htm>


More information about the flash-users mailing list