FLASH User's Guide
Version 1.0
October 1999
Index:
2 Algorithmic overview
2.1 Hydrodynamics
2.2 Adaptive mesh refinement
2.3 Equation of state
2.4 Thermonuclear reactions
2 Algorithmic overview
FLASH provides a general simulation framework capable of incorporating
several different types of physics module (e.g., hydrodynamics on structured
or unstructured meshes, N-body solvers, reactive source terms, etc.) and
initial or boundary conditions. However, it
has been built to solve a particular class of problems, namely reactive,
compressible flows in astrophysical environments.
Thus, FLASH 1.0 is distributed with a particular set of algorithms to
handle compressible hydrodynamics
on a block-structured adaptive mesh, together with an appropriate nuclear
equation of state and nuclear reaction network. We treat each of these
elements in turn.
2.1 Hydrodynamics
The hydrodynamic module in FLASH 1.0 is
based on the PROMETHEUS code (Fryxell, Müller, and Arnett 1989).
This code solves Euler's equations for compressible gas dynamics in
one, two, or three spatial dimensions. These equations can be written
in conservative form as
where r is the fluid density, v is the fluid
velocity, P is the pressure, E is the
sum of the internal energy e and kinetic energy per unit mass,
g is the acceleration due to gravity,
and t is the time coordinate.
The pressure is obtained from the energy and
density using the equation of state.
For the case of an ideal gas equation of state, the pressure is
given by
where g is the ratio of specific heats.
More general equations of state are discussed below.
For reactive flows, a separate advection equation must be solved
for each chemical or nuclear species:
where Xl is the mass fraction of the lth species, with the
constraint that ål Xl = 1.
The quantity rXl represents the partial density of the lth
fluid. The code does not explicitly track interfaces between the fluids, so
a small amount of numerical mixing can be expected during the course of
a calculation.
The equations are solved using a modified version of the Piecewise-Parabolic
Method (PPM), which is described in detail in Woodward and Colella (1984)
and Colella and Woodward (1984).
PPM is a higher-order version of the
method developed by Godunov (1959).
Godunov's method uses a finite-volume spatial discretization of the Euler
equations together with an explicit forward time difference.
Time-advanced fluxes at cell boundaries are computed
using the analytic solution to
Riemann's shock tube problem at each boundary. Initial conditions
for each Riemann problem are
determined by assuming the nonadvanced solution
to be piecewise constant in each cell.
Using the Riemann solution has the effect of introducing explicit
nonlinearity into the difference equations and permits the calculation of
sharp shock fronts and contact discontinuities without introducing significant
nonphysical oscillations into the flow. Since the value of each variable in
each cell is assumed to be constant, Godunov's method is limited to
first-order accuracy in both space and time.
PPM improves on Godunov's method by representing
the flow variables with piecewise parabolic functions. It also
uses a monotonicity constraint rather than artificial viscosity to
control oscillations near discontinuities, a feature shared with
the MUSCL scheme of van Leer (1979).
Although this could lead to a method which is accurate to third order, PPM is
formally accurate only to second order in both space and time, as a fully
third-order scheme proved not to be cost-effective.
Nevertheless, PPM is considerably more
accurate and efficient than most formally second-order algorithms.
PPM is particularly well-suited to flows involving discontinuities, such as shocks
and contact discontinuities. The method also performs extremely well for smooth
flows, although other schemes which do not perform the extra work necessary for
the treatment of discontinuities might be more efficient in these cases. The
high resolution and accuracy of PPM are obtained by the explicit nonlinearity of
the scheme and through the use of intelligent dissipation algorithms,
such as monotonicity enforcement, contact steepening, and interpolant
flattening. These algorithms are described in detail by Colella and Woodward
(1984).
A complete description of PPM is beyond the scope of this user's guide.
However, for comparison with other codes, we note that the implementation of
PPM in FLASH 1.0 uses the direct Eulerian formulation of PPM and
the technique for allowing nonideal equations of state described by
Colella and Glaz (1985). For multidimensional problems, FLASH 1.0
uses second-order operator splitting (Strang 1968).
FLASH also implements PPM on a block-structured
adaptive mesh, which is described in the following section.
2.2 Adaptive mesh refinement
We have used a package known as PARAMESH (MacNeice et al. 1999) for
the parallelization and adaptive mesh refinement (AMR) portion of FLASH.
PARAMESH consists of a suite of subroutines which handle
refinement/derefinement, distribution of work to processors, guard cell
filling, and flux conservation.
In this section we briefly describe this package and the
ways in which it has been modified for use with FLASH.
PARAMESH uses a
block-structured adaptive mesh refinement scheme similar
to others in the literature (e.g., Parashar 1999;
Berger & Oliger 1984; Berger & Colella 1989; DeZeeuw & Powell 1993) as well
as to schemes which refine on an individual cell basis (Khokhlov 1997).
In block-structured AMR, the fundamental data structure is a
block of uniform cells arranged in a logically Cartesian fashion.
Each cell can be specified using a block identifier (processor number
and local block number) and a coordinate triple (i,j,k), where
i = 1¼nxb, j = 1¼nyb, and k = 1¼nzb.
The complete computational grid consists of a collection of blocks with
different physical cell sizes,
related to each other in a hierarchical fashion using
a tree data structure.
The blocks at the root of the tree have the largest cells, while their
children have smaller cells and are said to be refined.
Two rules govern the establishment of refined child blocks in PARAMESH.
First, the cells of a refined child block must be one-half as large as
those of
its parent block. Second, a block's children must be nested; that is,
the child blocks must fit within their parent block and
cannot overlap one another, and the complete set of children of a block
must fill its volume.
Thus, in d dimensions a given block can have at most 2d children.
Each block contains nxb×nyb×nzb
interior cells and a set of guard cells. The guard cells
contain boundary information needed to update the interior cells.
These can be obtained from physically neighboring blocks, externally
specified boundary conditions, or both.
The number of guard cells needed depends upon the interpolation scheme
and differencing stencil used for the hydrodynamics algorithm; for the
explicit PPM algorithm distributed with FLASH, four guard cells are
needed in each direction.
PARAMESH handles the filling of guard cells with information from other
blocks or a user-specified external boundary routine.
If a block's neighbor has the same level of refinement, PARAMESH fills
its guard cells using a direct copy from the neighbor's interior cells.
If the neighbor has a different level of refinement, the neighbor's
interior cells are used to interpolate guard cell values for the block.
If the block and its neighbor are stored in the memory of different
processors, PARAMESH handles the appropriate parallel communication
(blocks are not split between processors).
PARAMESH
supports only linear interpolation for guard cell filling at jumps at
refinement, but it is easily extended to allow other interpolation
schemes; for example, for FLASH we have added a quadratic
interpolation scheme. Once each block's guard cells are
filled, it can be updated independently of the other blocks.
PARAMESH also enforces flux conservation at jumps in refinement, as
described by Berger and Colella (1989). At jumps in
refinement, the fluxes of mass, momentum, energy, and nuclear abundances
across boundary cell faces are averaged across the fine cells at that face and
provided to the corresponding coarse face on the neighboring
block. These averaged fluxes are then used to update the values of the
coarse block's local flow variables.
Each processor decides when to refine or derefine its blocks by computing a
user-defined error estimator for each block.
Refinement involves creation of between one and 2d refined child blocks,
while derefinement involves deletion of a block.
As child blocks are created, they are temporarily placed at the end of the
processor's block list.
After the refinements and derefinements are complete, the
blocks are redistributed among the processors using a work-weighted
Morton space-filling curve in a manner similar to that described by
Warren and Salmon (1987) for a parallel
treecode.
During the
distribution step each block is assigned a work value (an estimate of
the relative amount of time required to update the block).
The Morton number of the block is then
computed by interleaving the bits of its integer coordinates
as described by Warren and Salmon (1987); this determines its
location along the space-filling curve. The Morton numbers of all of
the blocks in the problem are then sorted using a parallel bitonic sort.
Finally, the list of all
blocks is partitioned among the processors using the block weights,
equalizing the estimated workload of each processor.
During the sorting step, only Morton numbers and not block data are
communicated between processors; blocks are only moved (if necessary)
after the sort.
Changes in refinement are typically performed every ten timesteps.
The refinement criterion used by PARAMESH is adapted from Löhner (1987).
Löhner's error estimator was originally
developed for finite element applications and has the advantage that
it uses an entirely local calculation.
Furthermore, the estimator is
dimensionless and can be applied with complete generality to any of the
field variables of the simulation or any combination of them
(by default, PARAMESH uses the density and pressure).
Löhner's estimator is a modified second derivative, normalized
by the average of the gradient over one computational cell. In one
dimension on a uniform mesh it is given by
Ei = |
\mid ui+1 - 2ui + ui-1 \mid \mid ui+1 - ui \mid + \mid ui - ui-1 \mid + e[ \mid ui+1 \mid - 2 \mid ui \mid + \mid ui-1 \mid ]
|
, |
| (7) |
where ui is the refinement test variable's value in the ith cell.
The last term in the denominator of this expression acts as a filter,
preventing refinement of small ripples. The constant
e is given a value of 10-4. Although PPM is formally
second-order and its leading error terms scale as the third derivative,
we have found the second derivative criterion to be very good at detecting
discontinuities in the flow variable u. When
extending this criterion to multidimensions, all cross derivatives are
computed, and the following generalization of the above expresion is
used:
Ei = |
é ê ê ê ê
ê ê ê ê ë
|
|
å
k,l
|
|
|
é ê
ë
|
(\mid |
¶u ¶xk
|
\midi+1 + \mid |
¶u ¶xk
|
\midi)/Dxl + e\mid |
¶2 u ¶xk ¶xl
|
\mid |
ù ú
û
|
2
|
|
ù ú ú ú ú
ú ú ú ú û
|
1/2
|
, |
| (8) |
where the sums are carried out over the coordinate directions. If the
maximum value of Ei in a block is larger than an adjustable
constant, CTORE, the block is marked for refinement. If the maximum
value of Ei
is less than a second constant (CTODE), the block is marked for
derefinement. PARAMESH uses the default values CTORE = 0.8 and
CTODE = 0.2. For each block, the error estimator is calculated for
all interior cells plus two layers of guard cells, helping to ensure that
blocks refine in advance of discontinuities.
2.3 Equation of state
It is common to call an equation of state routine more than 109 times when
calculating two- and three-dimensional hydrodynamic models of stellar
phenomena. Thus, it is very desirable to have an equation of state
that is as efficient as possible, yet accurately represents the
relevant physics. While FLASH is easily capable of including
any equation of state, here we discuss three equation
of state (henceforth EOS) routines which are supplied with FLASH 1.0:
the ideal-gas or gamma-law EOS, the Nadyozhin EOS, and the
Helmholtz EOS.
Each EOS routine takes as input the temperature T, density
r, mean number of nucleons per isotope
[`A], and mean charge per isotope [`Z]. Each
routine then returns as output the scalar pressure (Ptot, specific thermal energy etot, and
entropy Stot.
In addition to these quantities, the EOS routines return
partial derivatives of
the pressure, specific thermal energy, and entropy with respect to the
density and temperature:
|
¶Ptot ¶T
|
|
ê ê
ê
|
r
|
, |
¶Ptot ¶r
|
|
ê ê
ê
|
T
|
, |
¶etot ¶T
|
|
ê ê
ê
|
r
|
, |
¶etot ¶r
|
|
ê ê
ê
|
T
|
, |
¶Stot ¶T
|
|
ê ê
ê
|
r
|
, |
¶Stot ¶r
|
|
ê ê
ê
|
T
|
. |
| (9) |
Quantities such as the specific heats and the adiabatic indices can be
determined once these partial derivatives are known.
When solving the energy equation, the
temperature usually is obtained by iteration, and the thermodynamic
derivatives must be available in order to implement efficient iteration
schemes. Confirming that an EOS routine numerically satisfies thermodynamic
consistency also demands that the derivatives be available.
An EOS is thermodynamically consistent if its outputs satisfy the
Maxwell relations:
|
|
|
r2 |
¶e ¶r
|
|
ê ê
ê
|
T
|
+ T |
¶P ¶T
|
|
ê ê
ê
|
r
|
|
| (10) | |
|
| (11) | |
|
| (12) |
| |
|
Thermodynamic inconsistency can manifest itself
in the unphysical buildup (or decay) of the entropy (or temperature)
during numerical simulations of what should be an adiabatic flow.
Entropy-sensitive models (e.g.,
core-collapse supernovae) may suffer inaccuracies if thermodynamic
consistency is significantly violated over sufficiently long time
scales. The equations of state supplied with FLASH satisfy the
Maxwell relations to a good degree of precision (in some cases to the
limits of 64-bit arithmetic).
The gamma-law EOS provided with FLASH represents a simple ideal
gas with constant adiabatic index g:
P = |
Na k
|
rTe = |
1 g- 1
|
|
P r
|
S = |
P/r+ e T
|
, |
| (13) |
where Na is the Avogadro number, and k is the Boltzmann
constant. Because it requires no iteration in order to obtain
the temperature, this EOS is quite inexpensive to evaluate.
More complex equations of state are necessary for realistic models of
X-ray bursts, Type Ia supernovae, classical novae, and other stellar
phenomena, where the electrons and positrons may be relativistic
and/or degenerate and where
radiation contributes significantly to the thermodynamic state. The Nadyozhin
and Helmholtz EOS routines address this need. The Nadyozhin EOS, summarized by
Nadyozhin (1974) and explained in detail by Blinnikov et al. (1996), was
found by Timmes and Arnett (1999) in a comparison of five different analytic
EOS schemes to give the best tradeoff among accuracy, thermodynamic
consistency, and speed. The Helmholtz EOS (Timmes &
Swesty 1999) uses interpolation from a two-dimensional table of
the Helmholtz free energy F(r,T), obtaining comparable accuracy,
better thermodynamic consistency, and about twice the speed of the
Nadyozhin EOS.
For stellar phenomena, the pressure, specific
thermal energy, and entropy contain contributions from several components:
|
|
|
Prad + Pion + Pele + Ppos |
| (14) | |
|
| (15) | |
|
Srad + Sion + Sele + Spos . |
| (16) |
| |
|
Here the subscripts ``rad,'' ``ion,'' ``ele,'' and ``pos'' represent
radiation, nuclei, electrons, and positrons,
respectively.
The radiation portion is that of a
blackbody in local thermodynamic equilibrium:
Prad = |
a T4 3
|
erad = |
3 Prad r
|
Srad = |
(P/r+ e) T
|
|
| (17) |
where a is the Stefan-Boltzmann energy constant,
and c is the speed of
light. The ion contribution is that of an ideal gas with
g = 5/3. The electrons and positrons are treated as a
non-interacting Fermi gas; the
number densities of free electrons Nele and positrons Npos
are given by
|
|
|
|
8 pÖ2 h3
|
me3 c3 b3/2 [ F1/2(h,b) + F3/2(h,b) ] |
| (18) | |
|
|
8 pÖ2 h3
|
me3 c3 b3/2 [ F1/2 ( -h- 2/b, b) + b F3/2 ( -h- 2 /b, b) ] , |
| (19) |
| |
|
where h is the Planck constant, me is the electron rest mass,
b = k T / (me c2) is the relativity parameter,
h = m/ k T is the normalized chemical potential energy
m for electrons, and Fk(h,b) is the
Fermi-Dirac integral
Fk(h,b) = |
¥ ó õ 0
|
|
xk (1 + 0.5 b x)1/2 dx exp(x - h) + 1
|
. |
| (20) |
Because the electron
rest mass is not included in the chemical potential, the
positron chemical potential must have the form hpos = -h-2/b.
For complete ionization, the number density of free electrons in the
matter is
Nele,matter = |
|
Na r = |
_ Z
|
Nion , |
| (21) |
and charge neutrality requires
Nele,matter = Nele - Npos . |
| (22) |
Solving this equation with a standard one-dimensional root-finding
algorithm determines h.
Once h is known, the Fermi-Dirac integrals can be evaluated,
giving the pressure,
specific thermal energy, and entropy due to the free electrons and
positrons.
The Nadyozhin EOS routine uses polynomial or rational functions to
evaluate the thermodynamic quantities. The temperature-density plane
is decomposed into 5 regions (see Figure 12 of Blinnikov et al. 1996),
with different expansions or fitting functions applied to each
region. The methods used in the 5 regions are (1) a perfect gas
approximation with the first-order corrections for degeneracy, (2)
expansions of the half-integer Fermi-Dirac functions, (3)
Chandrasekhar's (1939) expansion for a degenerate gas, (4)
relativistic asymptotics, and (5) Gaussian quadrature for the
thermodynamic quantities. The perturbation expansions, asymptotic
relations, and fitting functions for the 5 regions are combined, with
extraordinary care given to making sure that transitions between the
regions are continuous, smooth, and thermodynamically consistent. All
of the partial derivatives are obtained analytically.
The electron-positron Helmholtz free energy table used by the
Helmholtz EOS was generated by using the Timmes EOS routine (Timmes
& Arnett 1999). The Fermi-Dirac integrals, along with
their derivatives with respect to h and b, were calculated
to at least 18 significant figures using the quadrature schemes of
Aparicio (1998). Newton-Raphson iteration was used to solve for
h to at least 15 significant
figures. The thermodynamic derivatives were computed analytically.
Searches
through the free energy
table are avoided by computing hash indices from
the values of any given (T,r[`Z]/[`A]) pair.
No computationally expensive divisions are required in
interpolating from the table; all of them can be computed
and stored
the first time the EOS routine is called.
2.4 Thermonuclear reactions
Modelling thermonuclear flashes typically requires the energy generation rate
due to
nuclear burning over a large range of temperatures, densities and
compositions. The average energy generated or lost over a period of
time is found by integrating a system of ordinary differential equations
(the nuclear reaction network)
for the abundances of important nuclei and the total energy release.
In some contexts,
such as Type II supernova models, the abundances themselves
are also of interest. In either case, the coefficients that appear in the
equations typically are extremely sensitive to temperature.
The resulting stiffness of the system of equations requires the use of
an implicit time integration scheme.
Timmes (1999) has reviewed several methods for solving stiff nuclear
reaction networks, providing the basis for the reaction network
solver included with FLASH.
The scaling properties and behavior of three semi-implicit
time integration algorithms (a traditional first-order accurate Euler
method, a fourth-order accurate Kaps-Rentrop method, and a variable
order Bader-Deuflhard method) and eight linear algebra packages (LAPACK,
LUDCMP, LEQS, GIFT, MA28, UMFPACK, and Y12M) were investigated by
running each of these 24 combinations on seven different nuclear reaction
networks (hard-wired 13- and 19-isotope
networks and table-lookup networks using 47, 76, 127, 200, and 489 isotopes).
Timmes' analysis suggested
that the best balance of accuracy, overall efficiency, memory
footprint, and ease-of-use was provided by the choice of
a 13- or 19-isotope reaction network,
the variable order Bader-Deuflhard time integration method, and
the MA28 sparse matrix package.
The default FLASH distribution uses this combination of methods,
which we describe in this section.
We begin by describing the equations solved by the nuclear burning
module. We consider material which may be described by a density
r and a single
temperature T and contains a number of isotopes i, each of
which has Zi protons and Ai nucleons (protons +
neutrons). Let ni and ri
denote the number and mass density, respectively, of the ith isotope,
and let Xi denote its mass fraction, so that
Xi = ri/r = ni Ai/(rNA) , |
| (23) |
where NA is the Avogadro number.
Let the molar abundance of the ith isotope be
Mass conservation is then expressed by
At the end of each timestep, FLASH checks that the stored abundances
satisfy equation (25) to machine precision
in order to avoid the
unphysical buildup (or decay) of the abundances or energy generation
rate. Roundoff errors in this equation can lead to significant problems
in some contexts (e.g., classical nova envelopes) where trace abundances
are important.
The general continuity equation for the ith isotope is given in
Lagrangian formulation by
|
dYi dt
|
+ Ñ·( Yi Vi ) = |
. Ri
|
. |
| (26) |
In this equation
[(Ri)\dot] is the total reaction rate due to
all binary reactions of the form i(j,k)l,
|
. Ri
|
= |
å
j,k
|
Yl Yk lkj(l) - Yi Yj ljk(i) , |
| (27) |
where lkj and ljk are the reverse (creation)
and forward (destruction) nuclear reaction rates, respectively.
Contributions from three-body reactions, such as the triple-a
reaction, are easy to append to equation (27).
The mass diffusion velocities
Vi in equation (26)
are obtained from the solution of a
multicomponent diffusion equation (Chapman & Cowling 1970; Burgers
1969; Williams 1988) and reflect the fact that mass diffusion
processes arise from pressure, temperature, and/or abundance gradients as
well as external gravitational or electrical forces.
The case Vi º 0 is important for two reasons. First, mass
diffusion is often unimportant
when compared to other transport process such as thermal or
viscous diffusion (i.e., large Lewis numbers and/or small Prandtl
numbers). Such a situation obtains, for example, in the study of
laminar flame fronts propagating through the quiescent interior of a
white dwarf. Second, this case permits the decoupling of the
reaction network solver from the hydrodynamical solver through
the use of operator splitting, greatly simplifying the algorithm.
This is the method used by the default FLASH distribution.
Setting Vi º 0 transforms equation (26)
into
which may be written in the more compact and standard form
Stated another way, in the absence of mass diffusion or advection,
any changes to the fluid composition are due to local processes.
Because of the highly nonlinear temperature dependence of
the nuclear reaction rates, and because the abundances themselves
often range over several orders of magnitude in value,
the values of the coefficients which appear in equations
(28) and (29)
can vary quite significantly.
As a result, the nuclear reaction network
equations are ``stiff.'' A system of equations is stiff when the ratio
of the maximum to the minimum eigenvalue of the Jacobian matrix
[(J)\tilde] º ¶f/¶y is large and
imaginary. This means that at least
one of the isotopic abundances changes on a much shorter timescale
than another.
Implicit or semi-implicit time integration methods are generally necessary
to avoid following this short-timescale behavior, requiring the calculation
of the Jacobian matrix.
It is instructive at this point to look at an example of how
equation (28)
and the associated Jacobian matrix are formed. Consider the
12C(a,g)16O reaction, which competes with the
triple-a reaction during helium burning in stars. The rate R at
which this reaction proceeds is critical for
evolutionary models of massive stars since it determines how much of
the core is carbon and how much of the core is oxygen after the
initial helium fuel is exhausted. This reaction sequence contributes
to the right-hand side equation (29) through the terms
where the ellipsis indicate additional terms coming from other
reaction sequences. The minus signs indicate that helium and carbon are
being destroyed, while the plus sign indicates that oxygen is being
created. Each of these three expressions contributes two
terms to the Jacobian matrix
[(J)\tilde]=¶f/¶y:
|
J(4He,4He) = - Y(12C) R + ¼ |
|
J(4He,12C) = - Y(4He) R + ¼ |
|
J(12C,4He) = - Y(12C) R + ¼ |
|
J(12C,12C) = - Y(4He) R + ¼ |
|
| (31) |
J(16O,4He) = + Y(12C) R + ¼ |
|
J(16O,12C) = + Y(4He) R + ¼ |
|
| |
|
Entries in the Jacobian matrix represent the flow, in number of nuclei
s-1, into (positive) or out of (negative) an isotope. All of the
temperature and density dependence is included in the reaction
rate R. The Jacobian matrices that arise from nuclear reaction
networks are neither positive-definite nor symmetric since the forward and
reverse reaction rates are generally not equal. However, the
magnitudes of the matrix entries change as the abundances,
temperature, or density change with time.
The default FLASH distribution includes two reaction networks.
The 13-isotope a-chain
plus heavy-ion reaction network is suitable for most
multi-dimensional simulations of stellar phenomena where having a
reasonably accurate energy generation rate is of primary concern.
The 19-isotope reaction network has the same
a-chain and heavy-ion reactions as the 13-isotope
network, but it includes additional isotopes to accommodate some types of
hydrogen burning (PP and CNO chains), along with some aspects of
photodisintegration into 54Fe. This reaction network is described
in Weaver, Zimmerman, & Woosley (1978).
Both the networks supplied with FLASH are
examples of ``hard-wired'' reaction networks, where each of the
reaction sequences are carefully entered by hand.
This approach is suitable for small
networks when minimizing the CPU time required to run the reaction
network is a primary concern, although it suffers the disadvantage
of inflexibility.
The Jacobian matrices of nuclear reaction networks tend to be
sparse, and they become more sparse as the number of isotopes increases.
Implicit time integration schemes require the inverse of the
Jacobian matrix, so we need to use a sparse linear algebra package
to compute this inverse.
To control the iteration count for a prespecified accuracy,
FLASH uses a direct sparse matrix solver.
Direct methods typically divide the solution of
[(A)\tilde] ·x = b into a symbolic LU
decomposition, numerical LU decomposition, and a backsubstitution
phase. In the symbolic LU decomposition phase the pivot order of a
matrix is determined, and a sequence of decomposition operations which
minimize the amount of fill-in is recorded. Fill-in refers to zero
matrix elements which become nonzero (e.g., a sparse matrix times a
sparse matrix is generally a denser matrix). The matrix is not
(usually) decomposed; only the steps to do so are stored. Since the
nonzero pattern of a chosen nuclear reaction network does not change,
the symbolic LU decomposition is a one-time initialization cost for
reaction networks. In the numerical LU decomposition phase, a matrix
with the same pivot order and nonzero pattern as a previously
factorized matrix is numerically decomposed into its lower-upper form.
This phase must be done only once for each staged set of linear
equations. In the backsubstitution phase, a set of linear equations
is solved with the factors calculated from a previous numerical
decomposition. The backsubstitution phase may be performed with as
many right-hand sides as needed, and not all of the right-hand sides
need to be known in advance.
The MA28 linear algebra package used by FLASH is
described by Duff, Erisman, & Reid (1986). It uses a combination of
nested dissection and frontal envelope decomposition to minimize
fill-in during the factorization stage. An approximate degree update
algorithm that is much faster (asymptotically and in practice) than
computing the exact degrees is employed. One continuous real
parameter sets the amount of searching done to locate the pivot
element. When this parameter is set to zero, no searching is done and
the diagonal element is the pivot, while when set to unity, complete
partial pivoting is done. Since the matrices generated by reaction
networks are usually diagonally dominant, the routine is set in FLASH
to use the diagonal as the pivot element. Several test cases showed
that using partial pivoting did not make a significant accuracy
difference, but were less efficient since a search for an appropriate
pivot element had to be performed. MA28 accepts the nonzero entries
of the matrix in the (i, j, ai,j) coordinate system, and
typically uses uses 70-90% less storage than storing the full dense
matrix.
The time integration method used by FLASH for evolving the reaction
networks is the variable order Bader-Deuflhard method
(e.g., Bader & Deuflhard 1983; Press et al. 1992). The reaction
network is advanced over a large time step H
from yn to yn+1 by the following sequence of
matrix equations. First,
Then from k = 1,2,¼,m-1
and closure is obtained by the last stage
This staged sequence of matrix equations is executed at least twice
with m=2 and m=6, yielding a fifth-order method. The
sequence may be executed a maximum of seven times, which yields a
fifteenth-order method. The exact number of times the staged sequence
is executed depends on the accuracy requirements (set to one part in
106 in FLASH) and the smoothness of the solution. Estimates of the
accuracy of an integration step are made by comparing the solutions
derived from different orders. The minimum cost of this method -
which applies for a single time step that met or exceeded the
specified integration accuracy - is one Jacobian evaluation, eight
evaluations of the right-hand side, two matrix decompositions, and ten
backsubstitutions. This minimum cost can be increased at a rate of
one decomposition (the expensive part) and m backsubstitutions (the
inexpensive part) for every increase in the order 2k+1. The cost of
increasing the order is compensated for, hopefully, by taking a
correspondingly larger (but accurate) time step. The controls for
order versus step size are a built-in part of the Bader-Deuflhard
method. The cost per step of this integration method is at least
twice as large as the cost per step of either a traditional
first-order accurate Euler method or a fourth-order accurate
Kaps-Rentrop (essentially an implicit Runge-Kutta) method, However, if
the Bader-Deuflhard method can take accurate time steps that are at
least twice as large, then this method will be more efficient
globally. Timmes (1999) shows that this is typically the case. Note
that in equations (32) - (34)
not all of the right-hand sides are known in advance
since the sequence of linear equations is staged. This staging feature
of the integration method will make some matrix packages, such as
MA28, a more efficient choice.
The energy generation rate is given by the sum
Note that a nuclear reaction network does not need to be evolved in
order to obtain the instantaneous energy generation rate, since only the
right hand sides of the ordinary differential equations need to be
evaluated. It is more appropriate in the FLASH program to use the
average nuclear energy generated over a time step
In this case, the nuclear reaction network does need to be evolved.
The energy generation rate, after subtraction of any neutrino losses,
is returned to the FLASH program for use with the operator splitting
technique.
Finally, the tabulation of Caughlan & Fowler (1988) is used in FLASH
for most of the key nuclear reaction rates. Modern values for some of the
reaction rates were taken from the reaction rate library of Hoffman
(1999, priv. comm.). Nuclear reaction rate screening effects as
implemented by Wallace, Woosley, & Weaver (1982), and decreases in
the energy generation rate [(e)\dot]nuc due to neutrino
losses as given by Itoh et al. (1985) are included in calculations
done with FLASH.
Go on to Section 3.