[FLASH-USERS] Load Balancing with the Burn Unit

Sean Couch smc at flash.uchicago.edu
Fri Apr 25 15:57:16 EDT 2014


Hi James,

I’ve done this before with what I think was good success.  It’s a pretty straight-forward, if hack-tastic, modification.  What I did was to modify bn_burner.F90 such that the variable xoktot became a counter of the max number of burn sub-steps of any zone in a given block.  I use this as an estimate of the “work” required to advance the block and this is what will go into the Morton curve weighting.  So around line 198 in bn_burner.F90 I had:

Line 198 
Line 198 
   end if	   end if
 	 
 	 
   xoktot  = xoktot + real(nok)	   !!xoktot  = xoktot + real(nok)
 	   !! Custom usage by SMC
 	   xoktot = max(xoktot, real(nok))
   xbadtot = xbadtot + real(nbad)	   xbadtot = xbadtot + real(nbad)


I then changed Burn.F90 to simply reset this counter:

Line 54 
Line 54 
 	 
   use Burn_data, ONLY:  bn_nuclearTempMin, bn_nuclearTempMax, bn_nuclearDensMin, &	   use Burn_data, ONLY:  bn_nuclearTempMin, bn_nuclearTempMax, bn_nuclearDensMin, &
        &   bn_nuclearDensMax, bn_nuclearNI56Max, bn_useShockBurn, &	        &   bn_nuclearDensMax, bn_nuclearNI56Max, bn_useShockBurn, &
        &   bn_smallx, bn_useBurn, bn_meshMe	        &   bn_smallx, bn_useBurn, bn_meshMe, xoktot
   use bn_interface, ONLY :  bn_mapNetworkToSpecies, bn_burner	   use bn_interface, ONLY :  bn_mapNetworkToSpecies, bn_burner
 	 
 	 
Line 114 
Line 114 
   ! start the timer ticking	   ! start the timer ticking
   call Timers_start("burn")	   call Timers_start("burn")
 	 
 	   ! Restart counter
 	   xoktot = 0.0
 	 
   ! make sure that guardcells are up to date	   ! make sure that guardcells are up to date
   if (.NOT. bn_useShockBurn) then	   if (.NOT. bn_useShockBurn) then
 	 
 	 
      call Grid_fillGuardCells(CENTER, ALLDIR)	      call Grid_fillGuardCells(CENTER, ALLDIR)
 	 
   endif	   endif


Now in Burn_computeDt.F90 I accessed some private Grid data (shame on me, but hey, I said it was hack-tastic):

Line 79 
Line 79 
                            solnData,   &	                            solnData,   &
                            dt_burn, dt_minloc)	                            dt_burn, dt_minloc)
 	 
   use Burn_data, ONLY: bn_enucDtFactor, bn_useBurn, bn_meshMe	   use Burn_data, ONLY: bn_enucDtFactor, bn_useBurn, bn_meshMe, xoktot
   use Driver_interface, ONLY : Driver_abortFlash	   use Driver_interface, ONLY : Driver_abortFlash
 	   use tree, ONLY : bflags
   implicit none	   implicit none
 	 
 #include "constants.h"	 #include "constants.h"
Line 130 
Line 131 
            ! the inverse of what we want, and then only (un)invert that inverse	            ! the inverse of what we want, and then only (un)invert that inverse
            ! if it is a reasonable number.	            ! if it is a reasonable number.
            energyRatioInv = abs(solnData(ENUC_VAR,i,j,k)) / eint_zone	            energyRatioInv = abs(solnData(ENUC_VAR,i,j,k)) / eint_zone
 	            !if (energyRatioInv > 1.0e-1) bflags(1,blockid) = 4.0
 	            bflags(1,blockid) = xoktot
 	 #ifdef DTBN_VAR
 	            solnData(DTBN_VAR,i,j,k) = xoktot
 	 #endif
            if (energyRatioInv > dt_tempInv) then	            if (energyRatioInv > dt_tempInv) then
               dt_tempInv = energyRatioInv	               dt_tempInv = energyRatioInv
               dt_temp = 1.0 / energyRatioInv	               dt_temp = 1.0 / energyRatioInv

You can ignore DTBN_VAR.  That was just a variable I used for diagnostics.  The key is the bflags array.  That comes from the PARAMESH tree data module and basically does nothing.  It’s just a handy array for this, PARAMESH does nothing else with it.

Then, finally, in source/Grid/GridMain/paramesh/paramesh4/Paramesh4dev/PM4_package/mpi_source/mpi_amr_refine_derefine.F90 I added the following:

Line 309 
Line 309 
       work_block(:) = 0.	       work_block(:) = 0.
       Do i = 1,lnblocks	       Do i = 1,lnblocks
          if (nodetype(i).eq.1) then	          if (nodetype(i).eq.1) then
             work_block(i) = 2.  !<<< USER EDIT	 !            work_block(i) = 2.  !<<< USER EDIT
 	             work_block(i) = max(2.,float(bflags(1,i)))   !<<< by SMC
 #ifdef FLASH_DEBUG_AMR	 #ifdef FLASH_DEBUG_AMR
             lnblocks_leaf = lnblocks_leaf + 1	             lnblocks_leaf = lnblocks_leaf + 1
 #endif	 #endif


This sets the Morton curve weighting parameter for the block to the maximum number of burn sub-steps that were required for advancement.  In my limited experimentation, this straightened out the Morton curve in regions of rapid burning nicely.  Fewer ‘burning’ blocks per MPI rank.  Increased the efficiency of the simulations I was running quite a bit.  Caveat emptor:  this all could use some tweaking for your particular application and YMMV.

Best regards,
Sean


--------------------------------------------------------
Sean M. Couch
Hubble Fellow
Flash Center for Computational Science
Department of Astronomy & Astrophysics
The University of Chicago
5747 S Ellis Ave, Jo 315
Chicago, IL  60637
(773) 702-3899
www.flash.uchicago.edu/~smc




On Apr 22, 2014, at 3:00 PM, James Guillochon <jguillochon at cfa.harvard.edu> wrote:

> Hi all, I'm running a simulation using one of the burning networks, which unfortunately is leading to a runtime efficiency of 35% as only ~10% of blocks are above the burning network thresholds, but those blocks take several times longer than non-burning regions.
> 
> I noticed that the only weighting done currently is to give leaf blocks a factor of "2", and everything else "1". I think it would be relatively easy to count cells that have burned in a block, and then add an additional work factor to account for this overhead (say number of cells burned times a constant, with a block in which all cells are burning being a factor of 5-10 times more expensive). Ideally I'd want FLASH's efficiency to be 80%+ no matter what the Burn unit is doing.
> 
> My question is in implementation: Would it make sense to add to the "work_block" (which is in the "tree" module) scaling factor directly in the Burn unit? Or is this the wrong place in the code to make this change?
> 
> Thanks!
> - James
> 
> -- 
> James Guillochon
> Einstein Fellow at the Harvard-Smithsonian CfA
> jguillochon at cfa.harvard.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140425/283f946f/attachment.htm>


More information about the flash-users mailing list