Subsections

8.5 Uniform Grid

The Uniform Grid has the same resolution in all the blocks throughout the domain, and each processor has exactly one block. The uniform grid can operate in either of two modes: fixed block size (FIXEDBLOCKSIZE) mode, and non-fixed block size (NONFIXEDBLOCKSIZE) mode. The default fixed block size grid is statically defined at compile time and can therefore take advantage of compile-time optimizations. The non-fixed block size version uses dynamic memory allocation of grid variables.

8.5.1 FIXEDBLOCKSIZE Mode

FIXEDBLOCKSIZE mode, also called static mode, is the default for the uniform grid. In this mode, the block size is specified at compile time as NXB $\times$ NYB $\times$ NZB. These variables default to

if the dimension is defined and

otherwise - e.g. for a two-dimensional simulation, the defaults are NXB

, NYB

, NZb

. To change the static dimensions, specify the desired values on the command line of the setup script; for example

./setup Sod -auto -3d -nxb=12 -nyb=12 -nzb=4 +ug

The distribution of processors along the three dimensions is given at run time as $iprocs\times jprocs\times kprocs$ with the constraint that this product must be equal to the number of processors that the simulation is using. The global domain size in terms of number of grid points is ${\tt NXB}*iprocs \times {\tt NYB}*jprocs \times {\tt NZB}*kprocs$ . For example, if

and

, the execution command should specify

processors.

mpirun -np 16 flash3

When working in static mode, the simulation is constrained to run on the same number of processors when restarting, since any different configuration of processors would change the domain size.

At Grid initialization time, the domain is created and the communication machinery is also generated. This initialization includes MPI communicators and datatypes for directional guardcell exchanges. If we view processors as arranged in a three-dimensional processor grid, then a row of processors along each dimension becomes a part of the same communicator. We also define MPI datatypes for each of these communicators, which describe the layout of the block on the processor to MPI. The communicators and datatypes, once generated, persist for the entire run of the application. Thus the MPI_SEND/RECV function with specific communicator and its corresponding datatype is able to carry out all data exchange for guardcell fill in the selected direction in a single step.

Since all blocks exist at the same resolution in the Uniform Grid, there is no need for interpolation while filling the guardcells. Simple exchange of correct data between processors, and the application of boundary conditions where needed is sufficient. The guard cells along the face of a block are filled with the layers of the interior cells of the block on the neighboring processor if that face is shared with another block, or calculated based upon the boundary conditions if the face is on the physical domain boundary. Also, because there are no jumps in refinement in the Uniform Grid, the flux conservation step across processor boundaries is unnecessary. For correct functioning of the Uniform Grid in FLASH, this conservation step should be explicitly turned off with a runtime parameter flux_correct which controls whether or not to run the flux conservation step in the PPM Hydrodynamics implementation. AMR sets it by default to true, while UG sets it to false. Users should exercise care if they wish to override the defaults via their “flash.par” file.

8.5.2 NONFIXEDBLOCKSIZE mode

Up ot version 2, FLASH always ran in a mode where all blocks have exactly the same number of grid points in exactly the same shape, and these were fixed at compile time. FLASH was limited to use the fixed block size mode described above. With FLASH3 this constraint was eliminated through an option at setup time. The two main reasons for this development were: one, to allow a uniform grid based simulation to be able to restart with different number of processors, and two, to open up the possibility of using other AMR packages with FLASH. Patch-based packages typically have different-sized block configurations at different times. This mode, called the “NONFIXEDBLOCKSIZE” mode, can currently be selected for use with Uniform Grid, and is the default mode with Chombo. To run an application in “NONFIXEDBLOCKSIZE” mode, the “-nofbs” option must be used when invoking the setup tool; see Chp:The FLASH configuration script for more information. For example:

./setup Sod -3d -auto -nofbs

Note that NONFIXEDBLOCKSIZE mode requires the use of its own IO implementation, and a convenient shortcut has been provided to ensure that this mode is used as in the example below:

./setup Sod -3d -auto +nofbs

In this mode, the blocksize in UG is determined at execution from runtime parameters iGridSize, jGridSize and kGridSize. These parameters specify the global number of grid points in the computational domain along each dimension. The blocksize then is $(iGridSize/iprocs)\times(jGridSize/jprocs)\times(kGridSize/kprocs)$ .

Unlike FIXEDBLOCKSIZE mode, where memory is allocated at compile time, in the NONFIXEDBLOCKSIZE mode all memory allocation is dynamic. The global data structures are allocated when the simulation initializes and deallocated when the simulation finalizes, whereas the local scratch space is allocated and deallocated every time a unit is invoked in the simulation. Clearly there is a trade-off between flexibility and performance as the NONFIXEDBLOCKSIZE mode typically runs about 10-15% slower. We support both to give choice to the users. The amount of memory consumed by the Grid data structure of the Uniform Grid is ${\tt nvar} \times (2*{\tt nguard}+{\tt nxb}) \times (2*{\tt nguard}+{\tt nyb})\times(2*{\tt nguard}+{\tt nzb})$ irrespective of the mode. Note that this is not the total amount of memory used by the code, since fluxes, temporary variables, coordinate information and scratch space also consume a large amount of memory.

The example shown below gives two possible ways to define parameters in flash.par for a 3d problem of global domain size $64 \times 64 \times 64$ , being run on 8 processors.

iprocs = 2
jprocs = 2
kprocs = 2
iGridSize = 64
jGridSize = 64
kGridSize = 64

This specification will result in each processor getting a block of size $32 \times 32 \times 32$ . Now consider the following specification for the number of processors along each dimension, keeping the global domain size the same.

iprocs = 4
jprocs = 2
kprocs = 1

In this case, each processor will now have blocks of size $16 \times 32 \times 64$ .