The GridSolvers unit groups together subunits that are used to solve particular types of differential equations. Currently, there are two types of solvers: a parallel Fast Fourier Transform package (Sec:GridSolversPfft) and various solvers for the Poisson equation (Sec:GridSolversPoisson).
Pfft has a layered architecture where the lower layer contains functions implementing basic tasks and primary data structures. The upper layer combines pieces from the lowest layer to interface with FLASH and create the parallel transforms. The computational part of Pfft is handled by sequential 1-dimensional FFT's, which can be from a native, vendor supplied scientific library, or from public domain packages. The current distribution of FLASH uses fftpack from NCAR for the 1-D FFTs, since that package also contains transforms that are useful with non-periodic boundary conditions.
The lowest layer has three distinct components. The first component redistributes data. It includes routines for distributed transposes for two and three dimensional data. The second component provides a uniform interface for FFT calls to hide the details of individual libraries. The third component is the data structures. There are global data structures to keep track of the type of transform, number of data dimensions, and physical and transform space information about each dimension. The dimensional information includes the start and end point of data (useful when the dimension is spread over more than one processor), the MPI communicator, the coordinates of the node in the processor grid etc. The structures also include pointers to the trigonometric tables and work space needed by the sequential FFT calls.
The upper layer of PFFT combines the lower layer routines to create end-to-end transforms in a variety of ways. The available one dimensional transforms include real-to-complex, complex-to-complex, sine and cosine transforms. They can be combined to create two or three dimensional tranforms for different configuration of the domain. The two dimensional transforms support parallelization of one dimension (or a one dimensional grid of processors). The three dimensional transforms support one or two dimensional grid of processors. All transforms must have at least one dimension within the processor at all times. The data distribution changes during the computation. However, a complete cycle of forward and inverse transform restores the data distribution.
The computation of a forward three dimensional FFT in parallel involves following steps :
The inverse transform can be computed by carrying out the steps described above in reverse order. The more commonly used domain decomposition in FFT based codes assumes a one dimensional processor grid:
 is the  global data size and
 is the  global data size and  is
the number of processors. Here, the first transpose is local, while
the second one is distributed. The internode communication is limited
to one distributed transpose involving all the processors. However,
there are two distinct disadvantages of this distribution  of work:
 is
the number of processors. Here, the first transpose is local, while
the second one is distributed. The internode communication is limited
to one distributed transpose involving all the processors. However,
there are two distinct disadvantages of this distribution  of work: 
 , where as in our
distribution, the upper limit on the number of processors is the
smallest of
, where as in our
distribution, the upper limit on the number of processors is the
smallest of 
 .
. 
During the course of a simulation, Pfft can be used in two different modes. In the first mode, every instance of Pfft use will be exactly identical to every other instance in terms of domain size and the type of transforms. In this mode, the user can set the runtime parameter pfft_setupOnce to true, which enables the FLASH initialization process to also create and initialize all the data structures of Pfft. The finalization of the Pfft subunit is also done automatically by the FLASH shutdown process in this mode. However, if a simulation needs to use Pfft in different configurations at different instances of its use, then each set of calls to Grid_pfft for computing the transforms must be preceded by a call to Grid_pfftInit and followed by a call to Grid_pfftFinalize. In addition, the runtime parameter pfft_setupOnce should be set to false. A few other helper routines are available in the subunit to allow the user to query Pfft about the dimensioning of the domain, and also to map the Mesh variables from the unk array to and from Pfft compatible (single dimensional) arrays. Pfft also provides the location of wave numbers in the parallel domain; information that users can utilize to develop their own customized PDE solvers using FFT based techniques.
The pencil grid processor topology is stored in an MPI communicator, and the communicator may contain fewer processors than are used in the simulation. This is to ensure the pencil grid points are never distributed too finely over the processors, and naturally handles the case where the user may wish to obtain a pencil grid at a very coarse level in the AMR grid. If there are more blocks than processors then we are safe to distribute the pencil grid over all processors, otherwise we must remove a number of processors. Currently, we eliminate those processors that own zero FLASH blocks at this level, as this is a simple calculation that can be computed locally.
Both mesh reconfiguration implementations generate a map describing the data movement before moving any grid data. The map is retained between calls to the Pfft routines and is only regenerated when the grid changes. This avoids repeating the same global communications, but means communication buffers are left allocated between calls to Pfft.
In the Collective implementation, the map coordinates are used to specify where the FLASH data is copied into a send communication buffer. Two MPI_Alltoall calls then move this data to the appropriate pencil processor at coordinates (J,K). Here, the first MPI_Alltoall moves data to processor (J,0), and the second MPI_Alltoall moves data to processor (J,K). The decision to use MPI_Alltoalls simplifies the MPI communication, but leads to very large send / receive communication buffers on each processor which consume:
Memory(bytes) = sizeof(REAL) * total grid points at solve level * 2
The PtToPt implementation consumes less memory compared to the Collective implementation, and communicates using point to point MPI messages. It is based upon using nodes in a linked list which contain metadata (a map) and a communication buffer for a single block fragment. There are two linked lists: one for the FLASH block fragments and one for Pfft block fragments. Metadata information about each FLASH block fragment is placed in separate messages and sent using MPI_Isend to the appropriate destination pencil grid processor.
Each destination pencil grid processor repeatedly invokes MPI_Iprobe using MPI_ANY_SOURCE, and creates a node in its Pfft list whenever it discovers a message. The MPI message is received into a metadata region of the freshly allocated node, and a communication buffer is also allocated according to the size specified in the metadata. The pencil processor continues probing for messages until the cumulative size of its node's communication buffers is equal to the pencil grid points it has been assigned. At this stage, grid data is communicated by traversing the Pfft list and posting MPI_Irecvs, and then traversing the FLASH list and sending block fragment using MPI_Isends. After performing MPI_Waits, the received data in the nodes of the Pfft list is copied into internal Pfft arrays.
Note, the linked list is constructed using an include file stored at flashUtilities/datastructures/linkedlist. The file is named ut_listMethods.includeF90 and is meant to be included in any Fortran90 module to create lists with nodes of a user-defined type. Please see the README file, and the unit test example at flashUtilities/datastructures/linkedlist/UnitTest.
The unit test for Pfft solver solves the following equation:
|  | (8.8) | 
 , and
, and
 . The equation satisfies periodic boundary
conditions in this formulation and FFT based poisson solve techniques
can be applied. In the unit test we initialize one variable of the
solution data with the function
. The equation satisfies periodic boundary
conditions in this formulation and FFT based poisson solve techniques
can be applied. In the unit test we initialize one variable of the
solution data with the function  , and another one
with the right hand side of  (8.7). We
compute the forward real-to-complex transform of the solution data
variable that is initialized with the right hand side of (8.7).  This variable is then divided by
, and another one
with the right hand side of  (8.7). We
compute the forward real-to-complex transform of the solution data
variable that is initialized with the right hand side of (8.7).  This variable is then divided by  
 where
 where 
 and
 and  are the
wavenumbers at any point i,j,k in the domain. An inverse complex-to-real
transform after the division should give the function
 are the
wavenumbers at any point i,j,k in the domain. An inverse complex-to-real
transform after the division should give the function  as
result. Hence the unit test is considered successful if both the
variables have matching values within the specified tolerance.
 as
result. Hence the unit test is considered successful if both the
variables have matching values within the specified tolerance.
The GridSolvers subunit contains several different algorithms
for solving the general Poisson equation for a potential 
 given a source
 given a source 
 
 
 is a constant that depends upon the application.  For example,
when the gravitational Poisson equation is being solved,
 is a constant that depends upon the application.  For example,
when the gravitational Poisson equation is being solved, 
 is
the mass density,
 is
the mass density, 
 is the gravitational potential, and
 is the gravitational potential, and
 , where
, where  is Newton's gravitational constant.
 is Newton's gravitational constant.
The multipole Poisson solver is appropriate for spherical or nearly-spherical
source distributions with isolated boundary conditions (Müller (1995)).
It currently works in 1D and 2D spherical, 2D axisymmetric cylindrical ( ), and
3D Cartesian and axisymmetric geometries. Because of the imposed symmetries,
in the 1D spherical case, only the monopole term (
), and
3D Cartesian and axisymmetric geometries. Because of the imposed symmetries,
in the 1D spherical case, only the monopole term ( ) makes sense,
while in the axisymmetric and 2D spherical cases, only the
) makes sense,
while in the axisymmetric and 2D spherical cases, only the  moments are used (i.e., the
basis functions are Legendre polynomials).
 moments are used (i.e., the
basis functions are Legendre polynomials).
The multipole algorithm consists of the following steps.
First, find the center of mass 
 
|  | (8.10) | 
 as our origin.  In integral form, Poisson's
((8.9)) is
 as our origin.  In integral form, Poisson's
((8.9)) is
 and
 and  are expressed in
spherical coordinates
 are expressed in
spherical coordinates 
 about
 about 
 , and
, and
|  |  |  | (8.13) | 
|  |  |  | 
 are the spherical harmonic functions
 are the spherical harmonic functions
|  | (8.14) | 
 are Legendre polynomials.
Substituting (8.12) into
(8.11), we obtain
 are Legendre polynomials.
Substituting (8.12) into
(8.11), we obtain
 .
By taking spherical harmonic expansions about the center of mass, we
ensure that the expansions are dominated by low-multipole terms, so
that for a given value of
.
By taking spherical harmonic expansions about the center of mass, we
ensure that the expansions are dominated by low-multipole terms, so
that for a given value of 
 , the error created by
neglecting high-multipole terms is minimized.
Note that the product of spherical harmonics in (8.15) is
real-valued
, the error created by
neglecting high-multipole terms is minimized.
Note that the product of spherical harmonics in (8.15) is
real-valued
|  |  |  | (8.16) | 
| ![$\displaystyle 2\sum_{m=1}^\ell {(\ell-m)!\over (\ell+m)!}
P_{\ell m}(\cos\theta)P_{\ell m}(\cos\theta')
\cos\left(m(\varphi-\varphi')\right)
\Biggr] .$](img263.png) | 
 (8.21))
for a given source field
 (8.21))
for a given source field 
 , and then to use these moments
in (8.17) to compute the
potential.
, and then to use these moments
in (8.17) to compute the
potential.
In practice, the above procedure must take account of the fact that the
source and the potential are assumed to be cell-averaged
quantities discretized on a block-structured mesh with varying cell size.
Also, because of the radial dependence of the multipole moments of the
source function, these moments must be tabulated as functions of distance from
 , with an implied discretization.  The solver allocates
storage for moment samples spaced a distance
, with an implied discretization.  The solver allocates
storage for moment samples spaced a distance  apart in radius
 apart in radius
|  |  | (8.22) | |
|  |  | (8.23) | 
 varies from 0 to
 varies from 0 to  (
 (
 and
 and
 are not used).  The sample spacing
 are not used).  The sample spacing  is
chosen to be one-half the geometric mean of the
 is
chosen to be one-half the geometric mean of the  ,
,  , and
, and  cell spacings
at the highest level of refinement, and
 cell spacings
at the highest level of refinement, and  is chosen to be large enough
to span the diagonal of the computational volume with samples.
 is chosen to be large enough
to span the diagonal of the computational volume with samples.
Determining the contribution of individual
cells to the tabulated moments requires some care.  To reduce the error
caused by the grid geometry, in each cell  an optional subgrid can be establish (see mpole_subSample)
consisting of
an optional subgrid can be establish (see mpole_subSample)
consisting of  points at the locations
 points at the locations
 , where
, where
|  |  | (8.24) | |
|  |  | (8.25) | |
|  |  | (8.26) | 
 is the center of cell
 is the center of cell  .  (For clarity, we have
omitted
.  (For clarity, we have
omitted  indices on
 indices on  as well as all block indices.)
For each subcell, we assume
 as well as all block indices.)
For each subcell, we assume 
 and then apply
and then apply
|  | (8.31) | 
 (8.21), because the
subgrid volume elements are not spherical. These
errors are greatest when
 (8.21), because the
subgrid volume elements are not spherical. These
errors are greatest when 
 ; hence, using a subgrid
reduces the amount of source affected by these errors.
An error of order
; hence, using a subgrid
reduces the amount of source affected by these errors.
An error of order  is also introduced by assuming the source
profile within each cell to be flat.
Note that the total source computed by this method (
 is also introduced by assuming the source
profile within each cell to be flat.
Note that the total source computed by this method (
 )
is exactly equal to the total implied by
)
is exactly equal to the total implied by 
 .
.
Another way to reduce grid geometry errors when using the multipole
solver is to modify the AMR refinement criterion to refine all blocks
containing the center of mass (in addition to other criteria that may
be used, such as the second-derivative criterion supplied with PARAMESH).
This ensures that the center-of-mass point is maximally refined at all
times, further restricting the volume which contributes errors to the
moments because 
 .
.
The default value of  is 1; note that large values of this
parameter very quickly increase the amount of time required to evaluate
the multipole moments (as
 is 1; note that large values of this
parameter very quickly increase the amount of time required to evaluate
the multipole moments (as  ). In order to speed up the moment
summations, the sines and cosines in
((8.27))
). In order to speed up the moment
summations, the sines and cosines in
((8.27))  (8.30) are evaluated using
trigonometric recurrence relations, and the factorials are pre-computed
and stored at the beginning of the run.
 (8.30) are evaluated using
trigonometric recurrence relations, and the factorials are pre-computed
and stored at the beginning of the run.
When computing the cell-averaged potential, we again employ a subgrid, but
here the subgrid points fall on cell boundaries to improve the continuity
of the result.  Using  subgrid points per dimension, we have
 subgrid points per dimension, we have
|  |  | (8.32) | |
|  |  | (8.33) | |
|  |  | (8.34) | 
 is then
 is then
|  | (8.35) | 
 .
.
The multipole Poisson solver is based on a multipolar expansion of the source (mass for gravity,
for example) distribution around a conveniently chosen center of expansion. The angular number
 entering this expansion is a measure of how detailed the description of the source
distribution will be on an angular basis. Higher
 entering this expansion is a measure of how detailed the description of the source
distribution will be on an angular basis. Higher  values mean higher angular resolution with
respect to the center of expansion. The multipole Poisson solver is thus appropriate for spherical
or nearly-spherical source distributions with isolated boundary conditions. For problems which
require high spatial resolution throughout the entire domain (like, for example, galaxy collision
simulations), the multipole Poisson solver is less suited, unless one is willing to go to extremely
(computationally unfeasible) high
 values mean higher angular resolution with
respect to the center of expansion. The multipole Poisson solver is thus appropriate for spherical
or nearly-spherical source distributions with isolated boundary conditions. For problems which
require high spatial resolution throughout the entire domain (like, for example, galaxy collision
simulations), the multipole Poisson solver is less suited, unless one is willing to go to extremely
(computationally unfeasible) high  values. For stellar evolution, however, the multipole
Poisson solver is the method of choice.
 values. For stellar evolution, however, the multipole
Poisson solver is the method of choice.
The new implementation of the multipole Poisson solver is located in the directory
 
  
source/Grid/GridSolvers/Multipole_new.
 ) domains
iii) elimination of subroutine call overhead (1 call per cell),
iv) minimization of error due to non-statistical distributions of
moments near the multipolar origin. The following paragraphs explain
the new approach to the multipole solver and an explanation of the
above improvements. Details about the theory of the new implementation of the Poisson solver
can be found in Couch et al. (2013).
) domains
iii) elimination of subroutine call overhead (1 call per cell),
iv) minimization of error due to non-statistical distributions of
moments near the multipolar origin. The following paragraphs explain
the new approach to the multipole solver and an explanation of the
above improvements. Details about the theory of the new implementation of the Poisson solver
can be found in Couch et al. (2013).
The multipole Poisson solver is appropriate for spherical or nearly-spherical
source distributions with isolated boundary conditions. It currently works in
1D spherical, 2D spherical, 2D cylindrical, 3D Cartesian and 3D cylindrical. Symmetries
can be specified for the 2D spherical and 2D cylindrical cases (a horizontal symmetry plane
along the radial axis) and the 3D Cartesian case (assumed axisymmetric property).
Because of the radial symmetry in the 1D spherical case, only the monopole term
( ) contributes, while for the 3D Cartesian axisymmetric, the 2D cylindrical
and 2D spherical cases only the
) contributes, while for the 3D Cartesian axisymmetric, the 2D cylindrical
and 2D spherical cases only the  moments need to be used (the other
 moments need to be used (the other  moments effectively cancel out).
moments effectively cancel out).
The multipole algorithm consists of the following steps. First, the center of
the multipolar expansion 
 is determined via density-squared weighted
integration over position:
 is determined via density-squared weighted
integration over position:
|  | (8.36) | 
 as our origin.  In integral form, Poisson's
equation (8.9) becomes
 as our origin.  In integral form, Poisson's
equation (8.9) becomes
 (
 ( ) indicate the smaller (larger) of the magnitudes
and
) indicate the smaller (larger) of the magnitudes
and  denotes the angle between
 denotes the angle between  and
 and  . Note, that this
definition includes those cases where both magnitudes are equal. The expansion
is always convergent if
. Note, that this
definition includes those cases where both magnitudes are equal. The expansion
is always convergent if 
 . Expansion of the Legendre polynomials
in terms of spherical harmonics gives
. Expansion of the Legendre polynomials
in terms of spherical harmonics gives
 and
 and 
 are the spherical angular components of
 are the spherical angular components of
 and
 and  about
 about 
 .
Defining now the regular
.
Defining now the regular 
 and irregular
 and irregular 
 solid
harmonic functions
 solid
harmonic functions
|  |  |  | (8.40) | 
|  |  |  | (8.41) | 
 and
 and  values. In FLASH both the source and the potential are
assumed to be cell-averaged quantities discretized on a block-structured mesh with
varying cell size. The integration must hence be replaced by a summation over all 
leaf cells
 values. In FLASH both the source and the potential are
assumed to be cell-averaged quantities discretized on a block-structured mesh with
varying cell size. The integration must hence be replaced by a summation over all 
leaf cells
 denotes the cell's mass. Note, that the symbol
 denotes the cell's mass. Note, that the symbol  stands for cell index as well
as its defining distance position from the expansion center in the computational domain.
This discrete Poisson equation is incorrect. It contains the divergent
 stands for cell index as well
as its defining distance position from the expansion center in the computational domain.
This discrete Poisson equation is incorrect. It contains the divergent  term on
the rhs. The
 term on
the rhs. The  contribution to the potential corresponds to the cell self potential
 contribution to the potential corresponds to the cell self potential
 and is divergent in our case because all the cell's mass is assumed to be
concentrated at the cell's center. The value of this divergent term can easily be calculated
from Eq.(8.38) by setting
 and is divergent in our case because all the cell's mass is assumed to be
concentrated at the cell's center. The value of this divergent term can easily be calculated
from Eq.(8.38) by setting 
 :
:
|  |  |  | (8.44) | 
 is the cell's mass,
 is the cell's mass,  the highest angular number considered in the expansion
and
 the highest angular number considered in the expansion
and  the radial distance of the cell center from the expansion center. To avoid this divergence
problem, we evaluate the potentials on each face of the cell and form the average of all
cell face potentials to get the cell center potential. Eq.(8.43)
will thus be replaced by
 the radial distance of the cell center from the expansion center. To avoid this divergence
problem, we evaluate the potentials on each face of the cell and form the average of all
cell face potentials to get the cell center potential. Eq.(8.43)
will thus be replaced by
|  | (8.45) | 
 is the cell face radial distance from the expansion center and
 is the cell face radial distance from the expansion center and 
![$ [q',x_F]_<$](img350.png) denotes the
larger of the magnitudes between the cell center radial distance
 denotes the
larger of the magnitudes between the cell center radial distance  and the cell face radial distance
 and the cell face radial distance  .
Splitting the summation over cells in two parts
.
Splitting the summation over cells in two parts
We now change from complex to real formulation. We state this for the regular solid harmonic functions, the same reasoning being applied to the irregular solid harmonic functions and all their derived moments. The regular solid harmonic functions can be split into a real and imaginary part
|  |  |  | (8.53) | 
|  |  |  | (8.54) | 
|  |  |  | (8.55) | 
|  |  |  | (8.56) | 
 matrix is a diagonal matrix whose elements
are equal to 2 for
 matrix is a diagonal matrix whose elements
are equal to 2 for  and 1 otherwise, i.e.:
 and 1 otherwise, i.e.:
 are the cartesian location coordinates of the cell face and
 are the cartesian location coordinates of the cell face and 
 .
For geometries depending on polar angles one must first calculate the corresponding cartesian
coordinates for each cell before applying the recursions. In FLASH, the order of the two cosine
and sine components for each solid harmonic vector is such that
.
For geometries depending on polar angles one must first calculate the corresponding cartesian
coordinates for each cell before applying the recursions. In FLASH, the order of the two cosine
and sine components for each solid harmonic vector is such that  precedes
 precedes  .
This allows buildup of the vectors with maximum number of unit strides. The same
applies of course for the assembly of the moments. For 2D cylindrical and 2D spherical geometries
only the
.
This allows buildup of the vectors with maximum number of unit strides. The same
applies of course for the assembly of the moments. For 2D cylindrical and 2D spherical geometries
only the  parts of both recursions are needed, involving only the cartesian
 parts of both recursions are needed, involving only the cartesian  coordinate
and
 coordinate
and  . Symmetry along the radial axes of these 2D geometries inflicts only the sign change
. Symmetry along the radial axes of these 2D geometries inflicts only the sign change
 , resulting in the symmetry relations
, resulting in the symmetry relations 
 for
even
 for
even  and
 and 
 for odd
 for odd  , the same holding for the
irregular solid harmonic vector components. Thus symmetry in 2D can effectively be treated
by halving the domain size and multiplying each even
, the same holding for the
irregular solid harmonic vector components. Thus symmetry in 2D can effectively be treated
by halving the domain size and multiplying each even  moments by a factor of 2 while setting
the odd
 moments by a factor of 2 while setting
the odd  moments equal to 0. For 3D cartesian geometries introduction of symmetry
is far more complicated since all
 moments equal to 0. For 3D cartesian geometries introduction of symmetry
is far more complicated since all  components need to be taken into account. It is not
sufficient to simply reduce the domain to the appropriate size and multiply the moments
by some factor, but rather one would have to specify the exact symmetry operations intended
(generators of the symmetry group
 components need to be taken into account. It is not
sufficient to simply reduce the domain to the appropriate size and multiply the moments
by some factor, but rather one would have to specify the exact symmetry operations intended
(generators of the symmetry group  or one of its subgroups) in terms of their effects
on the
 or one of its subgroups) in terms of their effects
on the  cartesian basis. The resulting complications in calculating symmetry adapted moments
outweighs the computational gain that can be obtained from it. Options for 3D symmetry are
thus no longer available in the improved FLASH multipole solver. The 'octant' symmetry option
from the old multipole solver, using only the monopole
 cartesian basis. The resulting complications in calculating symmetry adapted moments
outweighs the computational gain that can be obtained from it. Options for 3D symmetry are
thus no longer available in the improved FLASH multipole solver. The 'octant' symmetry option
from the old multipole solver, using only the monopole  term, was too restrictive in
its applicability (exact only for up to angular momenta
 term, was too restrictive in
its applicability (exact only for up to angular momenta  due to cancellation of the
solid harmonic vector components).
 due to cancellation of the
solid harmonic vector components).
From the above recursion relations (8.59-8.66),
the solid harmonic vector components are functions
of  monomials, where
 monomials, where 
 for the
 for the  and (formally)
 and (formally)
 for the
 for the  . For large astrophysical coordinates and large
. For large astrophysical coordinates and large  values this leads
to potential computational over- and underflow. To get independent of the size of both
the coordinates and
 values this leads
to potential computational over- and underflow. To get independent of the size of both
the coordinates and  we introduce a damping factor
 we introduce a damping factor  for the coordinates
for each solid harmonic type before entering the recursions.
 for the coordinates
for each solid harmonic type before entering the recursions.  will be chosen such that for the
highest specified
 will be chosen such that for the
highest specified  we will have approximately a value close to 1 for both
solid harmonic components:
 we will have approximately a value close to 1 for both
solid harmonic components:
 . Due to the complicated nature of the recursions, the first step is to
find solid harmonic components which have a simple structure. To do this, consider a cell
face with
. Due to the complicated nature of the recursions, the first step is to
find solid harmonic components which have a simple structure. To do this, consider a cell
face with  and
 and  . Then
. Then  ,
,  and only the
 and only the  components are
different from zero. An explicit form can be stated for the absolute values of these
components in terms of
 components are
different from zero. An explicit form can be stated for the absolute values of these
components in terms of  :
:
 , damping of the coordinates with
, damping of the coordinates with  results in a damped
radial cell face distance
 results in a damped
radial cell face distance  . Inserting this result into (8.69)
and (8.70)
and imposing conditions (8.67) and (8.68)
results in
. Inserting this result into (8.69)
and (8.70)
and imposing conditions (8.67) and (8.68)
results in
 . In FLASH only the approximate forms are computed for
. In FLASH only the approximate forms are computed for  and
 and  to avoid having to deal with factorials of large numbers.
to avoid having to deal with factorials of large numbers.
From the moment defining equations (8.48) and (8.49)
we see, that the moments are sums over subsets of cell center solid harmonic vectors multiplied
by the corresponding cell mass. From Eq.(8.57) it follows that
for highest accuracy, the moments should be calculated and stored for each possible cell face.
For high refinement levels and/or 3D simulations this would result in an unmanageable request
for computer memory. Several cell face positions have to be bundled into radial bins  defined
by lower and upper radial bounds. Once a cell center solid harmonic vector pair
 defined
by lower and upper radial bounds. Once a cell center solid harmonic vector pair 
 and
 and
 for a particular cell has been calculated, its radial bin location
 for a particular cell has been calculated, its radial bin location 
 is determined and its contribution is added to the radial bin moments
is determined and its contribution is added to the radial bin moments 
 and
 and 
 .
The computational definition of the radial bin moments is
.
The computational definition of the radial bin moments is
 means including all cells belonging to
 means including all cells belonging to  and all radial bins
with lower radial boundaries than
 and all radial bins
with lower radial boundaries than  . The two basic operations of the multipole solver
are thus: i) assembly of the radial bin moments and ii) formation of the scalar
products via Eq.(8.57) to obtain the potentials.
The memory declaration of the moment array should reflect the way the individual
moment components are addressed and the most efficient layout puts the angular momentum
indices in rows and the radial bin indices in columns.
. The two basic operations of the multipole solver
are thus: i) assembly of the radial bin moments and ii) formation of the scalar
products via Eq.(8.57) to obtain the potentials.
The memory declaration of the moment array should reflect the way the individual
moment components are addressed and the most efficient layout puts the angular momentum
indices in rows and the radial bin indices in columns.
How do we extract moments 
 and
 and 
 at any particular position
 at any particular position
 inside the domain (and, in particular, at the cell face positions
 inside the domain (and, in particular, at the cell face positions  ), which are
ultimately needed for the potential evaluation at that location? Assume that the location
), which are
ultimately needed for the potential evaluation at that location? Assume that the location
 corresponds to a particular radial bin
 corresponds to a particular radial bin 
 . Consider the
three consecutive radial bins
. Consider the
three consecutive radial bins  ,
,  and
 and  , together with their calculated moments:
, together with their calculated moments:
|  | (8.75) | 
 bin, whose lower and upper radial limits are shown
as solid vertical lines. The radial distance corresponding to
 bin, whose lower and upper radial limits are shown
as solid vertical lines. The radial distance corresponding to  splits the
 splits the
 bin into two parts: the left fractional part, denoted
 bin into two parts: the left fractional part, denoted  , and the right
fractional part, denoted
, and the right
fractional part, denoted  . Since both
. Since both 
 and
 and 
 are completely contained respectively in
are completely contained respectively in 
 and
 and 
 , the
moments at
, the
moments at  be approximately evaluated as:
 be approximately evaluated as:
While the structure of the inner radial zone is fixed due to the underlying geometrical grid,
the size of each radial bin in the outer radial zones has to be specified by the user. There is
at the moment no automatic derivation of the optimum (accuracy vs memory cost)
bin size for the outer zones. There are two types of radial bin sizes defined for the FLASH
multipole solver: i) exponentially and/or ii) logarthmically growing:
 is a small distance 'atomic' (basic unit) radial measure, defined
as half the geometric mean of appropriate cell dimensions at highest refinement level,
 is a small distance 'atomic' (basic unit) radial measure, defined
as half the geometric mean of appropriate cell dimensions at highest refinement level,  is a scalar factor
to optionally increase or decrase the atomic unit radial measure and
 is a scalar factor
to optionally increase or decrase the atomic unit radial measure and 
 is a local bin
index counter for each outer zone. The atomic radial distance
 is a local bin
index counter for each outer zone. The atomic radial distance  is calculated for each individual
domain geometry as follows:
 is calculated for each individual
domain geometry as follows:
|  | (8.80) | 
 are the usual cartesian cell dimensions and
 are the usual cartesian cell dimensions and  is
the radial cell dimension. Note, that since
 is
the radial cell dimension. Note, that since  measures a basic radial unit along the
radial distance from the expansion center (which, for approximate spherical problems, is located
close to the domain's geometrical origin), only those cell dimensions for calculating each
 measures a basic radial unit along the
radial distance from the expansion center (which, for approximate spherical problems, is located
close to the domain's geometrical origin), only those cell dimensions for calculating each
 are taken, which are directly related to radial distances from the geometrical domain origin.
For 3D cylindrical domain geometries for example, only the radial cylindrical and z-coordinate
cell dimensions determine the 3D radial distance from the 3D cylindrical domain origin. The
angular coordinate is not needed. Likewise for spherical domains only the radial cell coordinate
is of importance. Definitions (8.78) and
(8.79) define the upper limit of the radial bins. Hence in order to
obtain the true bin size for the
 are taken, which are directly related to radial distances from the geometrical domain origin.
For 3D cylindrical domain geometries for example, only the radial cylindrical and z-coordinate
cell dimensions determine the 3D radial distance from the 3D cylindrical domain origin. The
angular coordinate is not needed. Likewise for spherical domains only the radial cell coordinate
is of importance. Definitions (8.78) and
(8.79) define the upper limit of the radial bins. Hence in order to
obtain the true bin size for the  -th bin one has to subtract its upper radial limit from the
corresponding one of the
-th bin one has to subtract its upper radial limit from the
corresponding one of the  -th bin:
-th bin:
|  |  | ![$\displaystyle s\cdot \Delta r \cdot \left[Q^t-(Q-1)^t\right]$](img459.png) | (8.81) | 
|  |  |  | (8.82) | 
 .
.
Multithreading of the code is currently enabled in two parts: 1) during moment evaluation and 2) during potential evaluation. The threading in the moment evaluation section is achieved by running multiple threads over separate, non-conflicting radial bin sections. Moment evaluation is thus organized as a single loop over all relevant radial bins on each processor. Threading over the potential evaluation is done over blocks, as these will address different non-conflicting areas of the solution vector.
The improved multipole solver was extensively tested and several runs have been performed
using large domains ( ) and extremely high angular numbers up to
) and extremely high angular numbers up to  for a variety
of domain geometries. Currently, the following geometries can be handled: 3D cartesian,
3D cylindrical, 2D cylindrical, 2D spherical and 1D spherical. The structure of the code is
such that addition of new geometries, should they ever be needed by some applications,
can be done rapidly.
 for a variety
of domain geometries. Currently, the following geometries can be handled: 3D cartesian,
3D cylindrical, 2D cylindrical, 2D spherical and 1D spherical. The structure of the code is
such that addition of new geometries, should they ever be needed by some applications,
can be done rapidly. 
The first unit test for the multipole Poisson solver is based on the MacLaurin spheroid analytical gravitational solution given in section 35.3.4. The unit test sets up a spheroid with uniform unit density and determines both the analytical and numerical gravitational fields. The absolute relative error based on the analytical solution is formed for each cell and the maximum of all the absolute errors is compared to a predefined error tolerance value for a particular uniform refinement level. The multipole unit test runs in 2D cylindrical, 2D spherical, 3D cartesian and 3D cylindrical geometries and threaded multipole unit tests are also available.
The tree solver is based on the Barnes & Hut (1986, Nature, 324, 446) tree code for calculation of gravity forces in N-body simulations. However, it is more general and it includes some more modern features described for instance in Salmon & Warren (1994, J. Comp. Phys, 136, 155), Springel (2005, MNRAS, 364, 1105) and other works. It builds a global octal tree over the whole computational domain, communicates its part to all processors and uses it for calculations of various physical problems provided as separate units in source/physics directory. The global tree is an extension of the AMR mesh octal tree down to individual cells. The communication of the tree is implemented so that only parts of the tree that are needed for the calculation of the potential on a given processor are sent to it. The calculation of physical problems is done by walking the tree for each grid cell (hereafter point-of-calculation) and evaluating whether the tree node should be used for calculation or whether its children should be open.
The tree solver is connected to physical units by several wrapper subroutines that are called at specific places of the tree build and tree walk, and that call corresponding subroutines of physical units. In this way, physical units can include arbitrary quantity into the tree, and then, use it to calculate some other physical quantity by integrating contributions of all tree nodes during the tree walk. In this version, the only working unit implementation is physics/Gravity/GravityMain/Poisson/BHTree, which calculates the gravitational potential.
The tree solver algorithm consists of four parts. The first one, communication of block properties, is called only if the AMR grid changes. The other three, building of the tree, communication of the tree and calculation of the potential, are called in each time-step.
Communication of block properties. In recent version, each processor needs to know some basic information about all blocks in the simulation. It includes arrays: nodetype, lrefine and child. These arrays are distributed from each processor to all the other processors. They can occupy a substantial amount of memory on large number of processors (memory required for statically allocated arrays of the tree solver can be calculated by a script tree_mem_use.py).
Building the tree. The global tree is constructed from bottom and the process consists of three steps. In the first one, the so-called block-tree is constructed in each leaf block on a given processor. The block-trees are 1-dimensional dynamically allocated arrays (see Figure 8.12) and pointers to them are stored in array gr_bhTreeArray. In the second step, top nodes of block-trees (corresponding to whole blocks) are distributed to all processors and stored in array gr_bhTreeParentTree. In the last step, higher nodes of the parent tree are calculated by each processor and stored in the gr_bhTreeParentTree array. At the end, each processor holds information about the global tree down to the level of leaf blocks. During the whole process of tree building, 5 subroutines providing the interface to physical units are called: gr_bhFillBotNode, gr_bhAccBotNode, gr_bhAccNode, gr_bhNormalizeNode and gr_bhPostprocNode (see their auto-documentation and source code for details). Each of them calls a corresponding subroutine of all physical units with a name where the first two letters 'gr' are replaced with the name of the unit (e.g. Gravity_bhFillBotNode).
|  | 
Communication of the tree. Most of the tree data is contained on the bottom levels in individual block-trees. In order to save memory and communication time, only parts of block-trees that are needed on a given processor are sent to it. The procedure consists of three steps. In the first one, a level down to which each block-tree has to be sent to each processor is determined. For a given block-tree, it is done by evaluating the criterion for the node acceptance (traditionally called multipole acceptance criterion, shortly MAC) for all blocks on a remote processor, searching for the maximum level down to which the evaluated node will be needed on a given remote processor. In the second step, information about the block-tree levels which are going to be communicated is sent to all processors. This information is needed for allocation of arrays in which block-trees are stored on remote processors. In the third step, the block-tree arrays are allocated, all block-trees for a given processor are packed into a single message and the messages are communicated.
The MAC is implemented in subroutine gr_bhMAC which includes only a simple geometrical MAC used also by Barnes & Hut. The node is accepted for calculation if
 is the node size (defined as the largest edge of the
corresponding cuboid) and
 is the node size (defined as the largest edge of the
corresponding cuboid) and  is the distance between the node and the point-of-calculation. Additionally, gr_bhMAC checks that the point-of-calculation is not located within the node itself enlarged by factor
gr_bhTreeSafeBox. On the top of that, gr_bhMAC calls MACs of 
physical units and the node is accepted only if all criteria are fulfilled.
 is the distance between the node and the point-of-calculation. Additionally, gr_bhMAC checks that the point-of-calculation is not located within the node itself enlarged by factor
gr_bhTreeSafeBox. On the top of that, gr_bhMAC calls MACs of 
physical units and the node is accepted only if all criteria are fulfilled.
Tree walk. The tree is traversed from the top to the bottom, evaluating
MAC of each node and in case it is not fulfilled, continuing the tree walk with
its children. If the node's MAC is fulfilled, the node is accepted for the
calculation and subroutine gr_bhBotNodeContrib or
gr_bhNodeContrib is called, depending on whether it is a bottom-most
node (i.e. a single grid cell) or higher node, respectively. These subroutines
only call the corresponding subroutines of physical units (e.g.,
Gravity_bhNodeContrib). This is the most CPU-intensive part of
the tree solver, it usually takes more than  of the total tree solver
time. It is completely parallel and it does not include any communication (apart
from sending some statistics to the main processor at the end).
 of the total tree solver
time. It is completely parallel and it does not include any communication (apart
from sending some statistics to the main processor at the end).
The tree solver includes several implementations of the tree walk. The default algorithm is the Barnes-Hut like tree walk in which the whole tree is traversed from the top down to nodes fulfilling MAC for each cell separately. This algorithm is used in case the runtime parameter gr_bhUnifiedTreeWalk is true (default). If it this parameter is set to false, another algorithm is used in which instead of walking the whole tree for each cell individually, MAC is at first evaluated for whole mesh block (interacting with some node). If the node is accepted and if the node is a parent node (i.e. corresponding to whole mesh block), the node is accepted for all cells of the block and the contribution of the node is added to them. However, the node contribution is calculated separately for each cell, because the distance between the node and individual cells differs. The third tree walk algorithm is an implementation of the so called SumSquare MAC described by Salmon & Warren (1994). The tree is traversed using the priority queue, taking contribution of the most important nodes first. This algorithm provides much better error control, however, the implementation in this code version is highly experimental.
The tree solver supports isolated and periodic boundary conditions that can be set independently in each direction. In the latter case, when a node is considered for MAC evaluation and eventually calculation by calling Gravity_bhNodeContrib, periodic copies of the node are checked, and the minimum distance among the node periodic copies is taken in account. This allows for instance to calculate gravitational potential with periodic boundary conditions using the Ewald method (see description of the Gravity unit).
The unit test for the tree gravity solver calculates the gravitational potential of the Bonnor-Ebert sphere (Bonnor, W. B., 1956, MNRAS, 116, 351) and compares it to the analytical potential. The density distribution and the analytical potential are calculated by the python script bes-generator.py. The simulation setup only reads the file with radial profiles of these quantities and sets it on the grid. It also normalizes the analytical potential (adds a constant to it) so that the minimum values of the analytical and numerical potential are the same. The error of the gravitational potential calculated by the tree code is stored in the field array PERR (written into the PlotFile). The maximum absolute and relative errors are written into the log file.
This section of the User's Guide is taken from a paper by Paul Ricker, “A Direct Multigrid Poisson Solver for Oct-Tree Adaptive Meshes” (2008). Dr. Ricker wrote an original version of this multigrid algorithm for FLASH2. The Flash Center adapted it to FLASH3.
Structured adaptive mesh refinement provides some challenges for the implementation of effective, parallel multigrid methods. In the case of patch-based meshes, Huang & Greengard (2000) presents an algorithm which works by using the coarse-grid solution to impose boundary values on the fine grid. Discontinuities in the solution caused by jumps in refinement are resolved through iterative calculation of the residual and subsequent correction. While this is not a multigrid method in the standard sense, it still provides significant convergence acceleration.
The adaptation of this method to the FLASH grid structure (Ricker, 2008) requires a few modifications. The original formulation required that there be shared points between the coarse and fine patches. Contrast this with finite-volume, nested-cell, cell-averaged grids as used in FLASH(Figure 8.13). This is overcome by the exchange of guardcells from coarse to fine using monotonic interpolation (Sec:gridinterp) and external boundary extrapolation for the calculation of the residual.
|   | 
Another difference between the method of (Ricker 2008) and Huang & Greengard is that an oct-tree undoubtedly has neighboring blocks of the same refinement, while a patch-based mesh would not. This problem is solved through uniform prolongation of boundaries from coarse-to-fine, with simple relaxation done to eliminate the slight error introduced between adjacent cells.
One final change between the two methods is that the original computes new sources at the boundary between corrections, while the propagation here is done through nested solves on various levels.
The entire algorithm requires that the PARAMESH grid be reset such that all
blocks at refinement above some level  are set as temporarily nonexistent.
This is required so that guardcell filling can occur at only that level,
neglecting blocks at a higher level of refinement.  This requires some global
communication by PARAMESH.
 are set as temporarily nonexistent.
This is required so that guardcell filling can occur at only that level,
neglecting blocks at a higher level of refinement.  This requires some global
communication by PARAMESH.
The method requires three basic operators over the solution  on the grid:
taking the residual, restricting a fine-level function to coarser-level blocks,
and prolonging values from the coarse level to the faces of fine level blocks in
order to impose boundary values for the fine mesh problems.
 on the grid:
taking the residual, restricting a fine-level function to coarser-level blocks,
and prolonging values from the coarse level to the faces of fine level blocks in
order to impose boundary values for the fine mesh problems.
The residual is calculated such that:
This is accomplished through the application of the finite difference laplacian,
defined on level  with length-scales
 with length-scales 
 ,
, 
 and
 and
 .
.
|  |  |  | (8.85) | 
|  | (8.86) | 
The restriction operator 
 for block interior zones
 for block interior zones  is:
 is:
|  | (8.87) | 
 refer to the zones in block
 refer to the zones in block  that lie within
zone
 that lie within
zone  of block
 of block 
 .  We apply the restriction operator
throughout the interiors of blocks, but its opposite, the prolongation operator
.  We apply the restriction operator
throughout the interiors of blocks, but its opposite, the prolongation operator
 , need only be defined on the edges of blocks, because it is only
used to set boundary values for the direct single-block Poisson solver:
, need only be defined on the edges of blocks, because it is only
used to set boundary values for the direct single-block Poisson solver:
|  | (8.88) | 
 determine the
interpolation scheme. For the
 determine the
interpolation scheme. For the  face in 3D,
 face in 3D,
|  |  |  | (8.89) | 
|  |  |  | |
|  |  |  | 
In the case of problems with Dirichlet boundary conditions, a  -dimensional
fast sine transform is used.  The transform-space Green's Function for this is:
-dimensional
fast sine transform is used.  The transform-space Green's Function for this is:
| ![$\displaystyle G^\ell_{ijk} = -16\pi G\left[ {1\over{\Delta x_\ell^2}}\sin^2\lef...
...) + {1\over{\Delta z_\ell^2}}\sin^2\left({k\pi\over 2n_z}\right)\right]^{-1} .$](img490.png) | (8.90) | 
However, to be able to use the block solver in a general fashion, we must be
able to impose arbitrary boundary conditions per-block. In the case of
nonhomogenous Dirichlet boundary values, boundary value elimination may be used
to generalize the solver.  For instance, at the  boundary:
 boundary:
For periodic problems only the coarsest block must be handled differently; block adjacency for finer levels is handled naturally. The periodic solver uses a real-to-complex FFT with the Green's function:
|  | (8.92) | 
This solve requires that the source be zero-averaged; otherwise the solution is non-unique. Therefore the source average is subtracted from all blocks. In order to decimate error across same-refinement-level boundaries, Gauss-Seidel relaxations to the outer two layers of zones in each block are done after applying the direct solver to all blocks on a level. With all these components outlined, the overall solve may be described by the following algorithm:
 to all levels.  Subtract the
global average for the periodic case.
 to all levels.  Subtract the
global average for the periodic case.
 from 1 to
 from 1 to 
 ,
,
      
 is the maximum refinement level
 is the maximum refinement level	
 for all blocks
 for all blocks  on level
 on level  .
.
 
 on level
 on level  that has children, prolong
            face values for
 that has children, prolong
            face values for 
 onto each child block.
 onto each child block.
      
 to all levels.
 to all levels.
 norm of the residual over
      all leaf-node blocks and divide it by the discrete
 norm of the residual over
      all leaf-node blocks and divide it by the discrete  norm of the source
      over the same blocks. If the result is greater than a preset threshold
      value, proceed with a correction step: for each level
 norm of the source
      over the same blocks. If the result is greater than a preset threshold
      value, proceed with a correction step: for each level  from 1 to
 from 1 to
      
 ,
,
      
 is the maximum refinement level
 is the maximum refinement level
 for all blocks
 for all blocks  on level
 on level  .
.
 with the new residual
 with the new residual
            
 for all blocks
 for all blocks
             on level
 on level  .
.
 on level
 on level  :
:
            
 .
.
 on level
 on level  that has children, interpolate
            face boundary values of
 that has children, interpolate
            face boundary values of 
 for each child.
 for each child.
      
The above procedure requires storage for 
 ,
,  ,
,  , and
, and  on
each block, for a total storage requirement of
 on
each block, for a total storage requirement of 
 values per
block. Global communication is required in computing the tolerance-based
stopping criterion.
 values per
block. Global communication is required in computing the tolerance-based
stopping criterion.
There is load imbalance in the Multigrid solver because each processor performs single block FFTs on the blocks it owns. At the coarse levels there are relatively few blocks compared to available processors which means many processors are effectively idle during the coarse level solves. The introduction of PFFT, and creation of a hybrid solver, eliminates some of the coarse level solves.
The performance characteristics of the hybrid solver are described in “Optimization of multigrid based elliptic solver for large scale simulations in the FLASH code” (2012) which is available online at http://onlinelibrary.wiley.com/doi/10.1002/cpe.2821/pdf. Performance results are obtained using the PFFT_PoissonFD unit test.
The GridSolvers subunit solves the Poisson equation ((8.9)). Two different elliptic solvers are supplied with FLASH: a multipole solver, suitable for approximately spherical source distributions, and a multigrid solver, which can be used with general source distributions. The multipole solver accepts only isolated boundary conditions, whereas the multigrid solver supports Dirichlet, given-value, Neumann, periodic, and isolated boundary conditions. Boundary conditions for the Poisson solver are specified using an argument to the Grid_solvePoisson routine which can be set from different runtime parameters depending on the physical context in which the Poisson equation is being solved. The Grid_solvePoisson routine is the primary entry point to the Poisson solver module and has the following interface
call Grid_solvePoisson (iSoln, iSrc, bcTypes(6), bcValues(2,6), poisfact) ,where iSoln and iSrc are the integer-valued indices of the solution and source (density) variables, respectively. bcTypes(6) is an integer array specifying the type of boundary conditions to employ on each of the (up to) 6 sides of the domain. Index 1 corresponds to the -x side of the domain, 2 to +x, 3 to -y, 4 to +y, 5 to -z, and 6 to +z. The following values are accepted in the array
| bcTypes | Type of boundary condition | 
| 0 | Isolated boundaries | 
| 1 | Periodic boundaries | 
| 2 | Dirichlet boundaries | 
| 3 | Neumann boundaries | 
| 4 | Given-value boundaries | 
 multiplying the
source function in  ((8.9)).
 multiplying the
source function in  ((8.9)).
When solutions found using the Poisson solvers are to be differenced (e.g., in computing the gravitational acceleration), it is strongly recommended that for AMR meshes, quadratic (or better) spatial interpolation at fine-coarse boundaries is chosen. (For PARAMESH, this is automatically the case by default, and is handled correctly for Cartesian as well as the supported curvilinear geometries. But note that the default interpolation implementation may be changed at configuration time with the '-gridinterpolation=...' setup option; and with the default implementation, the interpolation order may be lowered with the interpol_order runtime parameter.) If the order of the gridinterpolation of the mesh is not of at least the same order as the differencing scheme used in places like Gravity_accelOneRow, unphysical forces will be produced at refinement boundaries. Also, using constant or linear grid interpolation may cause the multigrid solver to fail to converge.
The poisson/multipole sub-module takes two runtime parameters,
listed in Table 8.3.
Note that storage and CPU costs scale roughly as the square of
mpole_lmax, so it is best to use this module only for nearly
spherical matter distributions.
To include the new multipole solver in a simulation, the best option
is to use the shortcut +newMpole at setup command line,
effectively replacing the following setup options :
 
  
-with-unit=Grid/GridSolvers/Multipole_new -with-unit=physics/Gravity/GravityMain/Poisson/Multipole -without-unit=Grid/GridSolvers/Multipole
The improved multipole solver takes several runtime parameters, whose functions are explained in detail below, together with comments about expected time and memory scaling.
 to be used for the multipole
Poisson solver. Depending on the domain geometry, the memory and time scaling factors
due to this variable alone are: i) 3D cartesian, 3D cylindrical
 to be used for the multipole
Poisson solver. Depending on the domain geometry, the memory and time scaling factors
due to this variable alone are: i) 3D cartesian, 3D cylindrical 
 , ii)
3D cartesian axisymmetric, 2D cylindrical, 2D spherical
, ii)
3D cartesian axisymmetric, 2D cylindrical, 2D spherical 
 ,
iii) 1D spherical
,
iii) 1D spherical 
 . Assuming no memory limitations, the multipole
solver is numerically stable for very large
. Assuming no memory limitations, the multipole
solver is numerically stable for very large  values. Runs up to
 values. Runs up to  for 3D cartesian domains have been performed. For 2D geometries,
for 3D cartesian domains have been performed. For 2D geometries,  was the
maximum tested.
 was the
maximum tested.
 )
axis in 3D cartesian domains. The assumed rotational invariance in the
)
axis in 3D cartesian domains. The assumed rotational invariance in the  plane effectively cancels all
plane effectively cancels all  multipole moments and one can restrict
the calculation to the
 multipole moments and one can restrict
the calculation to the  multipole moments only. The time and memory
savings compared to a asymmetric 3D cartesian run is thus about a factor
of
 multipole moments only. The time and memory
savings compared to a asymmetric 3D cartesian run is thus about a factor
of  . For 3D cylindrical domains, rotational invariance in the
. For 3D cylindrical domains, rotational invariance in the  plane is equivalent of setting up the corresponding 2D cylindrical domain,
hence this runtime parameter is not honored for 3D cylindrical domains, and
the user is informed about the 3D to 2D cylindrical domain reduction possibility.
plane is equivalent of setting up the corresponding 2D cylindrical domain,
hence this runtime parameter is not honored for 3D cylindrical domains, and
the user is informed about the 3D to 2D cylindrical domain reduction possibility.
 basenm
basenm _dumpMoments.txt', where
_dumpMoments.txt', where  is the base name given for the output files.
is the base name given for the output files.
 basenm
basenm _printRadialInfo.txt'.
_printRadialInfo.txt'.
 ) will result in a complete
separation of all inner zone radii into separate radial bins. The default
value of
) will result in a complete
separation of all inner zone radii into separate radial bins. The default
value of  should never be surpassed, and any attempt to do so will stop the
program with the appropriate information to the user. Likewise with a meaningless
resolution value of 0.
 should never be surpassed, and any attempt to do so will stop the
program with the appropriate information to the user. Likewise with a meaningless
resolution value of 0.
 ,
defining the upper bound radius of the Q-th radial bin in the n-th outer zone,
is used. If set to 'logarithmic', the radial equation
,
defining the upper bound radius of the Q-th radial bin in the n-th outer zone,
is used. If set to 'logarithmic', the radial equation 
 is used. In these equations
 
is used. In these equations  is a local radial bin counting index for each outer zone
and
 is a local radial bin counting index for each outer zone
and  are parameters defining size and growth of the outer zone radial bins
(see below).
 are parameters defining size and growth of the outer zone radial bins
(see below).
 in the n-th outer radial zone equation
 in the n-th outer radial zone equation
 or
 or 
 . The
scalar is needed to be able to increase (or decrease) the size of the first radial
bin with respect to the default smallest outer zone radius
. The
scalar is needed to be able to increase (or decrease) the size of the first radial
bin with respect to the default smallest outer zone radius  .
.
 in the n-th outer radial zone
equations
 in the n-th outer radial zone
equations 
 or
 or 
 .
The exponent controls the growth (shrinkage) in size of each radial bin with increasing bin index.
For the first equation, growing will occur for
.
The exponent controls the growth (shrinkage) in size of each radial bin with increasing bin index.
For the first equation, growing will occur for  , shrinking for
, shrinking for  and same size
for
 and same size
for  . For the logarithmic equation, growing will occur for
. For the logarithmic equation, growing will occur for  , shrinking for
, shrinking for
 , but the same size option
, but the same size option  will not work because the denominator becomes
undefined. The same size option must hence be treated using the exponential outer zone
type choice.
 will not work because the denominator becomes
undefined. The same size option must hence be treated using the exponential outer zone
type choice.
| Parameter | Type | Default | Options | 
| mpole_Lmax | integer | 0 |  | 
| mpole_2DSymmetryPlane | logical | false | true | 
| mpole_3DAxisymmetry | logical | false | true | 
| mpole_DumpMoments | logical | false | true | 
| mpole_PrintRadialInfo | logical | false | true | 
| mpole_IgnoreInnerZone | logical | false | true | 
| mpole_InnerZoneSize | integer | 16 |  | 
| mpole_InnerZoneResolution | real | 0.1 | less than  and  | 
| mpole_MaxRadialZones | integer | 1 |  | 
| mpole_ZoneRadiusFraction_n | real | 1.0 | less than  and  | 
| mpole_ZoneType_n | string | ”exponential” | ”logarithmic” | 
| mpole_ZoneScalar_n | real | 1.0 |  | 
| mpole_ZoneExponent_n | real | 1.0 |  (exponential) | 
| real | - | any  (logarithmic) | 
The tree gravity solver can be included by setup or a Config file by requesting
physics/Gravity/GravityMain/Poisson/BHTree
The current implementation works only in 3D Cartesian coordinates, and blocks have to be logically cubic (i.e., nxb=nyb=nzb). Physical dimensions of blocks can be arbitrary, however, some multipole acceptance criteria can provide inaccurate error estimates with non-cubic blocks. The computational domain can have arbitrary dimensions, and there can be more blocks with lrefine=1 (i.e., nblockx, nblocky and nblockz can have different values).
Runtime parameters gr_bhPhysMACTW and gr_bhPhysMACComm control whether MACs of physical units are used in tree walk and communication, respectively. If one of them (or both) is set .false., only purely geometric MAC is used for a corresponding part of the tree solver. It is not allowed to set gr_bhPhysMACTW = .false. and gr_bhPhysMACComm = .true..
Runtime parameter gr_bhTreeLimAngle allows to set the limit opening angle for the purely geometrical MAC. Another condition controlling the acceptance of the node for the calculations is that the point-of-calculation must lie out of the box obtained by increasing the considered node by factor gr_bhTreeSafeBox.
Parameter gr_bhUseUnifiedTW controls whether the Barnes-Hut like tree
walk algorithm is used (.true.) or whether an alternative algorithm is
used which checks the MAC only once for whole block for interactions with parent
blocks (.false.; see 8.10.2.4 for more details). The latter one is  faster, however, it may lead to higher errors at block boundaries, in
particular if the gravity modules calculates the potential which is
subsequently differentiated to obtain gravitational acceleration. The tree walk
algorithm base on the priority queue is used if grv_bhMAC is set to
"SumSquare".
 faster, however, it may lead to higher errors at block boundaries, in
particular if the gravity modules calculates the potential which is
subsequently differentiated to obtain gravitational acceleration. The tree walk
algorithm base on the priority queue is used if grv_bhMAC is set to
"SumSquare".
| Variable | Type | Default | Description | 
| gr_bhPhysMACTW | logical | .false. | whether physical MAC should be used in tree walk | 
| gr_bhPhysMACComm | logical | .false. | whether physical MAC should be used in communication | 
| gr_bhTreeLimAngle | real |  | limiting opening angle | 
| gr_bhTreeSafeBox | real |  | relative size of restricted volume around node where the | 
| point-of-calculation is not allowed to be located | |||
| gr_bhUseUnifiedTW | logical | .true. | whether Barnes-Hut like tree walk algorithm should be used | 
| gr_bhTWMaxQueueSize | integer | .true. | maximum length of the priority queue | 
The Grid/GridSolvers/Multigrid module is appropriate for general source
distributions.  It solves Poisson's equation for 1, 2, and 3 dimensional
problems with Cartesian geometries.  It only supports the PARAMESH 
Grid with one block at the coarsest level. For any other mesh
configuration it is advisable to use the hybrid solver, which switches
to a uniform grid exact solve when the specified level of coarsening
has been achieved. In most use cases for FLASH, the multigrid solver
will be used to solve for Gravity (see: chp:Gravity). It may
be included by setup or
Config by including:
 
  
physics/Gravity/GravityMain/Poisson/Multigrid
The multigrid solver may also be included stand-alone using:
 
  
Grid/GridSolvers/Multigrid
In which case the interface is as described above. The supported boundary conditions for the module are periodic, Dirichlet, given-value, and isolated. Due to the nature of the FFT block solver, the same type of boundary condition must be used in all directions. Therefore, only the value of bcTypes(1) will be considered in the call to Grid_solvePoisson.
The multigrid solver requires the use of two internally-used grid variables: isls and icor. These are used to store the calculated residual and solved-for correction, respectively. If it is used as a Gravity solver with isolated boundary conditions, then two additional grid variables, imgm and imgp, are used to store the image mass and image potential.
| Variable | Type | Default | Description | 
| mg_MaxResidualNorm | real |  | Maximum ratio of the norm of the residual to that of the right-hand side | 
| mg_maxCorrections | integer | 100 | Maximum number of iterations to take | 
| mg_printNorm | real | .true. | Print the norm ratio per-iteration | 
| mpole_lmax | integer | 4 | The number of multipole moments used in the isolated case | 
The hybrid solver can be used in place of the Multigrid solver for problems with
 
  
./setup unitTest/PFFT_PoissonFD -auto -3d -parfile=flash_pm_3d.par -maxblocks=800 +noio ./setup unitTest/PFFT_PoissonFD -auto -3d -parfile=flash_pm_3d.par -maxblocks=800 +noio -without-unit=Grid/GridSolvers/Multigrid/PfftTopLevelSolve -with-unit=Grid/GridSolvers/Multigrid
It is also possible to select a different PFFT solver variant. In that case, different combinations of boundary conditions for the Poisson problem may be supported. The HomBcTrigSolver variant supports the largest set of combinations of boundary conditions. Use the PfftSolver setup variable to choose a variant. Thus, appending PfftSolver=HomBcTrigSolver to the setup chooses the HomBcTrigSolver variant. When using the hybrid solver with the PFFT variants HomBcTrigSolver or SimplePeriodicSolver, the runtime parameter gr_pfftDiffOpDiscretize should be set to 1.
The Multigrid runtime parameters given in the previous section also apply.
Grid_advanceDiffusion is the API function which solves the system of equations. This API is provided by both
the split and unsplit solvers. The unsplit solver uses HYPRE to solve the system of equations and split solver does
a direct inverse using Thomas algorithm. Note that the split solver relies heavily on PFFT infrastructure
for data exchange and a significant portion of work in split Grid_advanceDiffusion involves PFFT routines. In the 
unsplit solver the data exchange is implicitly done within HYPRE and is hidden. 
The steps in unsplit Grid_advanceDiffusion are as follows:
Mapping UG grid to HYPRE matrix is trivial, however mapping PARAMESH grid to a HYPRE matrix can be quite complicated. The process is simplified using the grid interfaces provided by HYPRE.
The choice of an interface is tightly coupled to the underlying grid on which the problem is being solved. We have chosen the SSTRUCT 
interface as it is the most compatible with the block structured AMR mesh in FLASH4. Two terms commonly used in HYPRE terminology 
are part and box. We define these terms in equivalent FLASH4 terminology. A HYPRE box object maps directly to a leaf block in FLASH4. 
The block is then defined by it's extents. In FLASH4 this information can be computed easily using a combination of Grid_getBlkCornerID 
and Grid_getBlkIndexLimits.All leaf blocks at a given refinement level form a HYPRE part. So number of parts in a typical 
FLASH4 grid would be give by,  
nparts = leaf_block(lrefine_max) - leaf_block(lrefine_min) + 1 
So, if a grid is fully refined or UG, nparts = 1. However, there could still be more then one box object. 
Setting up the HYPRE grid object is one of the most important step of the solution process. We use the SSTRUCT interface 
provided in HYPRE to setup the grid object. Since the HYPRE Grid object is mapped directly with FLASH4 grid, whenever the 
FLASH4 grid changes the HYPRE grid object needs to be updated. Consequentlywith AMR the HYPRE grid setup might happen multiple times. 
Setting up a HYPRE grid object is a two step process,
Stenciled relationships typically exist between leaf blocks at same refinement level (intra part) and graph relationships exist 
between leaf blocks at different refinement levels (inter part). The
fine-coarse boundary is handled in such a way that fluxes are
conserved at the interface (see Chp:diffuse for details). UG
does not require any graph relationships. 
Whether a block needs a graph relationship depends on the refinement
level of it's neighbor. While this information is not directly
available in PARAMESH, it is possible to determine whether the block
neighbor is coarser or finer. Combining this information with the
constraint of at best a factor of two jump in refinement at block
boundaries, it is possible to compute the 
part number of a neighbor block, which in turn determines whether we need a graph. Creating a graph involves creating a link between all the cells on
the block boundary. 
Once the grid object is created, the matrix and vector objects are
built on the grid object. The diffusion solve needs uninterpolated data from 
neighbor blocks even when there is a fine-coarse boundary, therefore
it cannot rely upon the guardcell fill process.
 A two step process is used to handle this situation, 
With the computation of Vector B (RHS), the system can be solved using HYPRE and UNK can be updated. 
| Solver | Preconditioner | 
| PCG | AMG, ILU(0) | 
| BICGSTAB | AMG, ILU(0) | 
| GMRES | AMG, ILU(0) | 
| AMG | - | 
| SPLIT | - | 
Parallel runs: One issue that has been observed is that there is a difference in the results produced by HYPRE using one or 
more processors. This would most likely be due to use of CG (krylov subspace methods), which involves an MPI_SUM over the 
dot product of the residue. We have found this error to be dependent on the type of problem. One way to get across this problem is to use 
direct solvers in HYPRE like SPLIT. However we have noticed that direct solvers tend to be slow. THe released code has
an option to use the SPLIT solver, but this solver has not been extensively tested and was used only for internal debugging 
purposes and the usage of the HYPRE SPLIT solver in FLASH4 is purely experimental. 
Customizing solvers: HYPRE exposes a lot more parameters to tweak the solvers and preconditioners mentioned above. We have only used those 
which are applicable to general diffusion problems. Although in general these settings might be good enough it is by no means complete and 
might not be applicable to all class of  problems. Use of additional HYPRE parameters might require direct manipulation of FLASH4 code. 
Symmetric Positive Definite (SPD) Matrix: PCG has been noticed to have convergence issues which might be related to (not necessarily),
The default settings use PCG as the solver with AMG as preconditioner. The following parameters 
can be tweaked at run time, 
| Variable | Type | Default | Description | 
| gr_hyprePCType | string | "hypre_amg" | Algebraic Multigrid as Preconditioner | 
| gr_hypreMaxIter | integer | 10000 | Maximum number of iterations | 
| gr_hypreRelTol | real |  | Maximum ratio of the norm of the residual to that of the initial residue | 
| gr_hypreSolverType | string | "hypre_pcg" | Type of linear solver, Preconditioned Conjugate gradient | 
| gr_hyprePrintSolveInfo | boolean | FALSE | enables/disables some basic solver information (for e.g number of iterations) | 
| gr_hypreInfoLevel | integer | 1 | Verbosity level of solver diagnostics (refer HYPRE manual). | 
| gr_hypreFloor | real |  | Used to floor the end solution. | 
| gr_hypreUseFloor | boolean | TRUE | whether to apply gr_hypreFloor to floor results from HYPRE. | 
Often the systems of equations that we want to solve with HYPRE are some variation of parabolic PDES that can be expressed genericaly as
Typically these derivatives need to be discretized in the same manner independent of the system of equations with the only differences
being in the factors and transport coefficients used in the tensors  ,
,  ,
,  ,
,  ,
,  &
 &  .
FLASH4.8 introduces a unified HYPRE solver that will construct all the necessary HYPRE objects, solve the system equations given
the tensors in Eq. 8.93, and update the Grid variables. This is supported for uniform and paramesh grids.
.
FLASH4.8 introduces a unified HYPRE solver that will construct all the necessary HYPRE objects, solve the system equations given
the tensors in Eq. 8.93, and update the Grid variables. This is supported for uniform and paramesh grids. 
Previous versions of the HYPRE solvers, as described above, shared a single instance of all the HYPRE objects. The unified
solver described in this section maintains separate objects for each solver that are maintained in source/Grid/GridSolvers/HYPRE/Unified/gr_uhypreData.
New solvers are registered at setup in source/Grid/GridSolvers/HYPRE/Unified/Config. This will also generate runtime parameters
unique to each solver as described in Table 8.6 replacing gr_hypre 
 gr_hypreSolver_.
 gr_hypreSolver_.
| Parameter | Type | Default | Description | 
| SolverType | STRING | "HYPRE_PCG" | Runtime parameters used with Grid/GridSolvers/HYPRE. | 
| PCType | STRING | "HYPRE_AMG" | Algebraic Multigrid as Preconditioner | 
| RelTol | REAL | 1.0e-8 | Maximum ratio of the norm of the residual to that of the initial residue | 
| AbsTol | REAL | 0.0 | |
| SolverAutoAbsTolFact | REAL | 0.0 | |
| MinIter | INTEGER | 0 | Minimum number of iterations for HYPRE solve phase | 
| MaxIter | INTEGER | 500 | Maximum number of iterations for HYPRE solve phase | 
| PrintSolveInfo | BOOLEAN | FALSE | When .true. prints solve info from HYPRE | 
| InfoLevel | INTEGER | 1 | level of output when using PrintSolveInfo | 
| FloorType | INTEGER | 0 | |
| Floor | REAL | 1.0e-12 | Used to floor the end solution. | 
| Use2Norm | BOOLEAN | FALSE | |
| CfTol | REAL | 0.0 | |
| RecomputeResidual | BOOLEAN | FALSE | |
| RecomputeResidualP | INTEGER | -1 | |
| RelChange | BOOLEAN | FALSE | |
| SlopeLimType | STRING | "HYPRESL_MC" | 
Systems are solved with calls to Grid_advanceImplicit by passing an instance of parabolicProblemParms from the Grid_parabolicProblem module, whose parameters are detailed in Table 8.8. This type contains all the information about the system of equations described by Eq. 8.93. The various tensors are described by multidimensional arrays, such as FactorE whose elements will also be multiplied by a variable in unk through iFactorE, which should match the index in iFactorBMap.
| Variable | Type | Description | 
| iFactorBMap | integer(ncoeffs) | Unique indices in solnvec for transport coefficients | 
| iFactorA | integer(nvars) | Stores indices into solnvec for LHS term (weighting factors) | 
| FactorA | real(nvars) | Constant coefficient for LHS term (weighting factor) | 
| iFactorB | integer(nvars,nvars,NDIM,NDIM) | Indices in iFactorBMap for transport coefficients | 
| FactorB | real(nvars,nvars,NDIM,NDIM) | Constant coefficients for transport/gradient terms | 
| iFactorC | integer(nvars,nvars) | Indices in solnvec with inter variable coupling coefficients | 
| FactorC | real(nvars,nvars) | Constant coefficients for coupling coefficients | 
| iFactorD | integer(nvars) | Indices in solnvec to map to | 
| FactorD | real(nvars) | Constant coefficient for each variable to multiply constant source | 
| iFactorE | integer(nvars,nvars,NDIM) | Indices in solnvec to map to | 
| FactorE | real(nvars,nvars,NDIM) | Constant coefficients for transport/gradient terms involving geometric source terms | 
| iFactorF | integer(nvars,nvars,NDIM) | Indices in solnvec to map to | 
| FactorF | real(nvars,nvars,NDIM) | Constant coefficients for transport/gradient terms involving geometric source terms | 
| iFactorG | integer(nvars,nvars) | Indices in solnvec with inter variable coupling coefficients | 
| FactorG | real(nvars,nvars) | Constant coefficients for coupling coefficients | 
| iVar | integer(nvars) | Stores indices into solnvec for variables to solve | 
| iFluxMap | integer(nvars) | Stores unique indices into solnvec for flux limiting variable | 
| iFlux | integer(nvars) | Stores indices in iFluxMap for each variable's maximum flux | 
| Flux | real(nvars) | Stores indices in iFluxMap for each variable's maximum flux | 
| mode | integer(nvars) | Flux limiter mode to use for each variable | 
| bcTypes | integer(NDIM,nvars,2) | Boundary condition flag | 
| bcValues | real(NDIM,nvars,2) | Value or slope at boundary - when using Neumann or Dirichlet BC's | 
| theta | real | Implicitness of solve | 
| iSolution | integer | Handle so we can keep track of what equation we are solving. Should match a system PPDEFINE from GridSolvers/HYPRE/Unified/Config | 
Eq. 8.93 is evolved with a finite-volume discretization. We integrate Eq. 8.93 over a control volume  , centered
on the point
, centered
on the point 
 , to get
, to get
where the divergence theorem has been used to turn the  -factor term to be an integral over the face of our control volume and 
any remaining volume integrals are approximated by the value at the center of control volume
-factor term to be an integral over the face of our control volume and 
any remaining volume integrals are approximated by the value at the center of control volume 
 . The
. The  -factors
require derivatives integrated over the faces of mesh and the
-factors
require derivatives integrated over the faces of mesh and the  -factors need derivatives evaluated at the cell-centers.
-factors need derivatives evaluated at the cell-centers. 
There are two classes of derivatives that could be need at a face, either derivatives normal to the face or those that are transverse to the face. In both cases finite-difference weights on a stencil of neighboring points are used, but differ in the approximation used for the surface integration. In the case of normal derivatives the stencil is constructed from cells that share the face of interest, and the derivative is evaluated at the center of the face, allowing a mid-point quadrature for the surface integral on that face.
A stencil that could approximate derivatives transverse to a cell-face would require neighboring cells that share a common edge, introducing
a new coupling between diagnoal elements, for example 
 .
If the face-integrals are approximated with a mid-point quadrature, like in Eq. 8.97, the diagonal coupling, through the
face coefficients
.
If the face-integrals are approximated with a mid-point quadrature, like in Eq. 8.97, the diagonal coupling, through the
face coefficients 
 &
 & 
 is no longer symmetric if the transport coefficients,
 is no longer symmetric if the transport coefficients,  , vary in space
, vary in space
|  | (8.96) | 
The resulting systems of equations are numerically unstable, particularly when 
 . In order to symmetrize 
the discretization for these new edge-couplings we approximate the face-integrals with a two point trapezoid rule in the transverse direction
. In order to symmetrize 
the discretization for these new edge-couplings we approximate the face-integrals with a two point trapezoid rule in the transverse direction
This way the edge-coupling is always symmetric through the edge-centered transport coefficients 
 .
.
For both edge-centered and face-centered derivatives the Unified HYPRE solver computes stencil weights for all possible refinement configurations at fine-coarse boundaries and handles the creation of graph objects between parts.
For scalar systems of equations the finite-volume formulism in Eq. 8.94 naturally incorporates any grid geometry through the volume and area factors. However, for vector systems of equations, such as velocity with viscosity or magnetic field with resistive MHD, the discretization needs to account for geometric scale factors introduced through the vector calculus operators. The Unified HYPRE solver currently only handles "cylindrical" geometries, where it is often convenient to express the generic system of parabolic equations as
Just as before the second derivative  &
 &  -factor terms can be integrated with the divergence theorem to be evaluated 
on the cell-faces, but the volume
-factor terms can be integrated with the divergence theorem to be evaluated 
on the cell-faces, but the volume  &
 &  -factor terms will include a
-factor terms will include a 
 geometric term.
 geometric term.