Subsections


17.4 Energy Deposition Unit

The Energy Deposition unit calculates the energy deposited by one or more laser beams incident upon the simulation domain. The function EnergyDeposition is the main control driver of the Energy Deposition unit. The current implementation treats the laser beams in the geometric optics approximation. Beams are made of a number of rays whose paths are traced through the domain based on the local refractive index of each cell. The laser power deposited in a cell is calculated based on the inverse Bremsstrahlung power in the cell and depends on the local electron number density gradient and local electron temperature gradient. Currently there are two schemes implemented into FLASH: 1) a ray tracing algorithm based on cell average quantities and 2) a more refined ray tracing method based on cubic interpolation. Both schemes are discussed in the algorithmic implementation section 17.4.4 below.


17.4.1 Ray Tracing in the Geometric Optics Limit

In the geometric optics approach, the path of a laser wave can be described as the motion of a ray of unit-mass through the potential field

$\displaystyle V({\bf r})$ $\displaystyle =$ $\displaystyle \frac{c^2}{2}\eta({\bf r})^2,$ (17.33)

where $ c$ is the speed of light in vacuum and $ \eta$ is the index of refraction of the medium, assumed to vary on a much longer spatial scale than the wavelength of the laser wave. Also $ \eta({\bf r})$ is considered constant during the time it takes for the ray to cross the domain (frozen medium). However $ \eta({\bf r})$ is allowed to vary from one time step to the next. For a non-relativistic unmagnetized plasma, the refractive index is given by
$\displaystyle \eta({\bf r})^2$ $\displaystyle =$ $\displaystyle 1 - \frac{\omega_p^2({\bf r})}{\omega^2} = 1 - \frac{n_e({\bf r})}{n_c},$ (17.34)

where $ \omega_p({\bf r})$ is the plasma frequency, $ \omega$ the laser frequency, $ n_e({\bf r})$ is the electron number density at location $ {\bf r}$ and
$\displaystyle n_c$ $\displaystyle =$ $\displaystyle \frac{m_e\pi\omega^2}{e^2} = \frac{m_e\pi c^2}{\lambda^2e^2}$ (17.35)

is the critical density at which the ray frequency and the plasma frequency are equal ($ m_e$ and $ e$ are the electron mass and charge, $ \lambda$ is the laser wavelength). The ray equation of motion is then
$\displaystyle \frac{d^2{\bf r}}{dt^2}$ $\displaystyle =$ $\displaystyle \nabla \left(-\frac{c^2}{2}\frac{n_e({\bf r})}{n_c} \right),$ (17.36)

which constitutes the basic ray tracing 2nd order ODE equation. Splitting into two 1st order ODEs leads to
$\displaystyle \frac{d{\bf r}}{dt}$ $\displaystyle =$ $\displaystyle {\bf v}$ (17.37)
$\displaystyle \frac{d{\bf v}}{dt}$ $\displaystyle =$ $\displaystyle -\frac{c^2\nabla n_e({\bf r})}{2n_c}.$ (17.38)

For short distances we can assume a first order Taylor expansion of the electron number density around a specific location $ {\bf r}_0$:
$\displaystyle n_e({\bf r})$ $\displaystyle \approx$ $\displaystyle n_e({\bf r}_0) + \nabla n_e({\bf r}_0)
\cdot ({\bf r} - {\bf r}_0),$ (17.39)

where $ \nabla n_e({\bf r}_0)$ is the electron number density gradient vector at $ {\bf r}_0$. A consequence of this short distance approximation is that the electron number density gradient vectors at $ {\bf r}$ and $ {\bf r}_0$ are equal. For separable 1st order ODEs (cartesian) we can write the following equations for the ray velocity and position as a function of time
$\displaystyle {\bf v}(t) = {\bf v}_0 - \frac{c^2}{2 n_c} \nabla n_e({\bf r}_0) t,$     (17.40)
$\displaystyle {\bf r}(t) = {\bf r}_0 + {\bf v}_0 t
- \frac{c^2}{4 n_c} \nabla n_e({\bf r}_0) t^2,$     (17.41)

where $ {\bf r}_0$ and $ {\bf v}_0$ are the initial ray position and velocity. For non-separable 1st order ODEs, like for example cylindrical or spherical coordinates (see below), we cannot state analytical equations for the ray velocity and position as a function of time, and we must solve the set of 1st order ODEs using specific numerical methods like for example Runge-Kutta (RK).


17.4.2 Laser Power Deposition

The power $ P$ of an electromagnetic wave is depleted by the inverse Bremsstrahlung (ib) process. The rate of power loss is governed by a 1st order ordinary differential equation (ODE):

$\displaystyle \frac{dP}{dt}$ $\displaystyle =$ $\displaystyle -\nu_{ib}(t)P.$ (17.42)

The inverse Bremsstrahlung frequency factor is given by the formula
$\displaystyle \nu_{ib}$ $\displaystyle =$ $\displaystyle \frac{n_e}{n_c}\nu_{ei},$ (17.43)

where $ \nu_{ei}$ is the electron-ion collision frequency
$\displaystyle \nu_{ei}$ $\displaystyle =$ $\displaystyle \frac{4}{3} \left (\frac{2 \pi}{m_e} \right)^{1/2}
\frac{n_e Z e^4 \ln \Lambda}{(k_BT_e)^{3/2}}.$ (17.44)

Here $ m_e$ is the electron mass, $ Z$ is the average ionization number of the plasma, $ e$ is the electron charge, $ \ln \Lambda$ is the Coulomb logarithm, $ k_B$ is the Boltzmann constant and $ T_e$ is the electron temperature. The Coulomb logarithm is the natural logarithm of the Debye number $ \Lambda$ and is taken here as
$\displaystyle \ln \Lambda$ $\displaystyle =$ $\displaystyle \ln \left[\frac{3}{2Ze^3}\left(\frac{k_B^3T_e^3}{\pi n_e}\right)^{1/2}\right].$ (17.45)

The inverse Bremsstrahlung frequency depends thus on the electron number density and the electron temperature, both of which are functions of the position, and, since the position changes with time, it ultimately is also a function of time
$\displaystyle \nu_{ib}({\bf r}) = \nu_{ib}(t)$ $\displaystyle =$ $\displaystyle \frac{4}{3} \left (\frac{2 \pi}{m_e} \right)^{1/2} \frac{Z e^4}{n...
...^{3/2}}
\frac{n_e[{\bf r}(t)]^2\ln \Lambda[{\bf r}(t)]}{T_e[{\bf r}(t)]^{3/2}}.$ (17.46)

Solution of the above ODE in Eq.(17.42) gives the attenuation of the ray's power from time zero to time $ t$:
$\displaystyle P_t$ $\displaystyle =$ $\displaystyle P_0 \exp \left\{ - \int_0^t \nu_{ib}(t') dt' \right\}.$ (17.47)

For a given small time step, the integral in Eq.(17.47) can be approximated in the following way. In order to remove the intermediate function $ {\bf r}(t)$ from the expression in Eq.(17.46), we first assume that the Coulomb logarithm remains constant during the incremental time step
$\displaystyle \ln \Lambda[{\bf r}(t)]$ $\displaystyle \approx$ $\displaystyle \ln \Lambda[{\bf r}_0] =
\ln \left[\frac{3}{2Ze^3}\left(\frac{k^3T_e({\bf r}_0)^3}{\pi n_e({\bf r}_0)}\right)^{1/2}\right].$ (17.48)

Using first order Taylor expansions on both $ n_e$ and $ T_e$ similar to equation 17.39
$\displaystyle n_e[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle n_e({\bf r}_0) + \nabla n_e({\bf r}_0) \cdot ({\bf r}(t) - {\bf r}_0)$ (17.49)
$\displaystyle T_e[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle T_e({\bf r}_0) + \nabla T_e({\bf r}_0) \cdot ({\bf r}(t) - {\bf r}_0)$ (17.50)

and inserting the ray position equation 17.41, we get
$\displaystyle n_e[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle n_e({\bf r}_0) \left[1 + \frac{\nabla n_e({\bf r}_0) \cdot {\bf v...
...bla n_e({\bf r}_0) \cdot \nabla n_e({\bf r}_0)}{4n_cn_e({\bf r}_0)} t^2 \right]$ (17.51)
$\displaystyle T_e[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle T_e({\bf r}_0) \left[1 + \frac{\nabla T_e({\bf r}_0) \cdot {\bf v...
...la T_e({\bf r}_0) \cdot \nabla n_e({\bf r}_0)}{4n_cT_e({\bf r}_0)} t^2 \right].$ (17.52)

The inverse-Bremsstrahlung rate can thus be written as a rational polynomial expression
$\displaystyle \nu_{ib}(t)$ $\displaystyle =$ $\displaystyle \nu_{ib}(0)\frac{(1+Ut+Rt^2)^2}{(1+Wt+St^2)^{3/2}},$ (17.53)

where $ \nu_{ib,0}$ is the inverse-Bremsstrahlung rate at the initial point (zero time)
$\displaystyle \nu_{ib}(0)$ $\displaystyle =$ $\displaystyle \frac{4}{3} \left (\frac{2 \pi}{m_e} \right)^{1/2}
\frac{Z e^4 \ln \Lambda[{\bf r}_0]}{n_ck^{3/2}}
\frac{n_e({\bf r}_0)^2}{T_e({\bf r}_0)^{3/2}}$ (17.54)

and
$\displaystyle U$ $\displaystyle =$ $\displaystyle \frac{\nabla n_e({\bf r}_0) \cdot {\bf v}_0}{n_e({\bf r}_0)}$ (17.55)
$\displaystyle W$ $\displaystyle =$ $\displaystyle \frac{\nabla T_e({\bf r}_0) \cdot {\bf v}_0}{T_e({\bf r}_0)}$ (17.56)
$\displaystyle R$ $\displaystyle =$ $\displaystyle - \frac{c^2\nabla n_e({\bf r}_0) \cdot \nabla n_e({\bf r}_0)}{4n_cn_e({\bf r}_0)}$ (17.57)
$\displaystyle S$ $\displaystyle =$ $\displaystyle - \frac{c^2\nabla T_e({\bf r}_0) \cdot \nabla n_e({\bf r}_0)}{4n_cT_e({\bf r}_0)}.$ (17.58)

The integral in Eq.(17.47) can be solved by using a second order Gaussian quadrature
$\displaystyle \int_0^t \nu_{ib}(t') dt'$ $\displaystyle =$ $\displaystyle \nu_{ib}(0) \frac{t}{2} \sum_{i=1}^2 w_i
\frac{(1+Ut_i+Rt_i^2)^2}{(1+Wt_i+St_i^2)^{3/2}},$ (17.59)

where $ t_{1,2}=(1\pm 1/\sqrt{3})t/2$ and both weights are equal to 1. The rate of energy deposition (energy deposited per unit time) during this time is then $ P_0 - P_t$.


17.4.3 Laser Energy Density

Implemented only for 3D Cartesian geometry and 2D cylindrical geometry with ed_laser3Din2D ray tracing in FLASH 4.5.

Light carries energy through space and therefore can be attributed an energy density (energy per volume). For example, light propagating at velocity $ c$ with power $ P$ through an area $ A$ has an energy density of $ U_{light} = \frac{P}{A c}$.

Splitting the laser's power across a finite number of rays is an abstraction with an interesting consequence. Because each ray's power is artificially confined to a 1D curvilinear axis, light rays in FLASH are naturally described by linear energy densities (energy per distance) rather than volumetric energy densities (energy per volume). Volumetric energy density is more physically relevant. Therefore, the contribution of each ray to volumetric laser energy density (a cell quantity, “lase”, available as an output when using certain ray-tracing modules - see 17.4.11.5) is calculated by taking the linear energy density of each ray, integrating over the ray path within the cell, and dividing by the cell volume.

The details of the laser energy density calculation are as follows. Trading path length for parametrized time, ray power $ P_t$ changes along the ray path by Eq.(17.47). For light traveling at speed $ c$ (determined by frequency and medium), ray linear energy density $ E_t$ similarly varies along ray path:


$\displaystyle E_t = \frac{P_t}{c}$     (17.60)

Volumetric energy density $ U_{cell}$ is found by integrating linear energy density $ E_t$ over the ray's path within the cell ($ t_f$ is the parametrized time value for which the ray reaches edge of cell), then dividing by cell volume $ V_{cell}$:


$\displaystyle U_{cell} = \frac{\int_{0}^{t_f} E_t dt}{V_{cell}} = \frac{1}{c V_{cell}} \int_{0}^{t_f} P_t dt$     (17.61)

For each cell, all contributions to volumetric laser energy density (all rays' $ U_{cell}$) are summed to give a total laser energy density for that cell.


17.4.4 Algorithmic Implementations of the Ray Tracing

The current implementation of the laser energy deposition assumes that rays transit the entire domain in a single time step. This is equivalent of saying that the domain's reaction time (changes of its state variables) is much longer than the time it takes for each ray to either cross the domain or get absorbed by it. For each time step it is first checked, if at least one of the laser beams is still active. If this is the case, then for each active beam a collection of rays is generated possessing the right specifications according to the beam's direction of firing, its frequency, its pulse shape and its power. The Energy Deposition unit moves all rays across all blocks until every ray has either exited the domain or has been absorbed by the domain. The rays are moved on a block by block basis and for each block the rays are traced through the interior cells. The Energy Deposition unit utilizes the infrastructure from the Grid Particles unit 8.9 to move rays between blocks and processors. The code structure of the Energy Deposition unit during one time step is as follows:
Loop over all blocks containing rays Calculate needed block data for all cells.

The inner loop over all rays in a block is conveniently handled in one subroutine to allow for compact (optional) threading. Currently there are three algorithmic options on how to trace the rays in FLASH: 1) Cell average algorithm, 2) Cubic interpolation with piecewise parabolic ray tracing and 3) Cubic interpolation using Runge Kutta integration schemes.


17.4.4.1 Cell Average (AVG) Algorithm

The AVG algorithm (Kaiser 2000) is based on tracing the rays on a cell-by-cell basis. Each cell has its own average center electron number density $ \langle n_e \rangle$ and electron temperature $ \langle T_e \rangle$ value at the center of the cell. The electron number density gradient vector $ \langle \nabla n_e \rangle$ as well as the electron temperature gradient vector $ \langle \nabla T_e \rangle$ are assumed to be constant within each cell. Rays will be transported through each cell between cell faces in one step. Electron number densities $ n_{e0}$ and electron temperatures $ T_{e0}$ on the entry cell face are calculated from the cell center values in accordance with equation 17.39 using first order Taylor expansions

$\displaystyle n_{e0}$ $\displaystyle =$ $\displaystyle \langle n_e \rangle + \langle \nabla n_e \rangle \cdot ({\bf r}_0 - \langle {\bf r} \rangle)$ (17.62)
$\displaystyle T_{e0}$ $\displaystyle =$ $\displaystyle \langle T_e \rangle + \langle \nabla T_e \rangle \cdot ({\bf r}_0 - \langle {\bf r} \rangle)
,$ (17.63)

where $ {\bf r}_0$ are the position coordinates of the ray at the cell's face and $ \langle {\bf r} \rangle$ are the coordinates of the cell's center. The ray's cell crossing time $ t_{cross}$ is determined from the quadratic time equation 17.41 by inserting planar equations $ A_xx+A_yy+A_zz+D={\bf A}\cdot {\bf r}(t)+D=0$ for all cell faces (in cartesian coordinates), leading to
$\displaystyle -\frac{c^2{\bf A}\cdot\langle\nabla n_e\rangle}{4n_c}t^2 + {\bf A}\cdot {\bf v}_0t
+ {\bf A}\cdot {\bf r}_0+D$ $\displaystyle =$ $\displaystyle 0,$ (17.64)

and selecting the shortest time (excluding the zero time corresponding to the point of entry). In FLASH, the cartesian cell faces are coplanar with the xy-, xz- and the yz-plane, simplifying the quadratic time equation. A cell face located at $ x_{cell}$ and coplanar with the yz-plane has the plane equation $ x=x_{cell}$ and thus $ A_x=1$, $ A_y,A_z=0$ and $ D=-x_{cell}$. For this cell face, the quadratic time equation becomes
$\displaystyle -\frac{c^2{\langle\nabla n_e\rangle}_x}{4n_c}t^2 + v_{0x}t + r_{0x}-x_{cell}$ $\displaystyle =$ 0 (17.65)

and similar for the other cartesian components. In order to achieve a stable ray tracing algorithm and uniquely assign rays to cells and blocks while crossing through cell walls, a cell wall thickness is defined, which is shared between two adjacent cells. The code is set up such that rays are never to be found inside this wall. When crossing between two cells occurs (same block or different blocks), the rays positions are always adjusted (nudged) such that they will sit on this wall with finite thickness. The cell wall thickness is defined in the code as being a very tiny fraction (adjustable, with default value of 1:1,000,000) of the smallest cell dimension during a simulation.

The ray's rate of energy deposition between the cell entry face and the cell exit face is calculated using equations 17.47 to 17.59 with $ \nabla n_e({\bf r}_0)$ and $ \nabla T_e({\bf r}_0)$ replaced by their cell average values $ \langle \nabla n_e \rangle$ and $ \langle \nabla T_e \rangle$ and $ n_e({\bf r}_0)$ and $ T_e({\bf r}_0)$ replaced by their corresponding cell entry values $ n_{e0}$ and $ T_{e0}$. The upper integration time limit is $ t_{cross}$, the ray's cell crossing time.

The AVG algorithm, while conceptually simple, leads to discontinous change in $ n_e$ at the cell interfaces. To account for this, the ray undergoes refraction according to Snell's law. Refraction at cell interfaces causes a change in the ray's velocity normal to the interface surface while preserving the ray velocity component transverse to the interface normal. To derive the change in the ray normal velocity component, imagine the cell interface to have a small infinitesimal thickness $ s$ and the ray moving from left to right through the interface. On the left and right the normal velocity components are $ v_\bot(\ell)$ and $ v_\bot(r)$, respectively, while the corresponding $ n_e$ are $ n_e(\ell)$ and $ n_e(r)$. Since we are dealing with an interface of infinitesimal thickness, we can use the first order equation 17.40 to get

$\displaystyle v_\bot(r)$ $\displaystyle =$ $\displaystyle v_\bot(\ell) - \frac{c^2}{2 n_c} \frac{\Delta n_e}{s} t,$ (17.66)

where $ \Delta n_e=n_e(r)-n_e(\ell)$, and the electron number density gradient is $ \nabla n_e=\Delta n_e/s$. But $ s/t$ is the average velocity of the ray passing through the interface and since we have constant acceleration we have $ s/t=[v_\bot(r)+v_\bot(\ell)]/2$, which, when inserted into the last equation, gives
$\displaystyle v_\bot(r) - v_\bot(\ell)$ $\displaystyle =$ $\displaystyle - \frac{c^2}{n_c} \frac{\Delta n_e}{[v_\bot(r)+v_\bot(\ell)]}$  
$\displaystyle v_\bot^2(r) - v_\bot^2(\ell)$ $\displaystyle =$ $\displaystyle - \frac{c^2\Delta n_e}{n_c}$  
$\displaystyle \Delta v_\bot^2$ $\displaystyle =$ $\displaystyle - \frac{c^2\Delta n_e}{n_c}.$ (17.67)

Clearly there is a limit as to how large $ \Delta n_e$ can be for refraction to occur. If $ \Delta n_e> v_\bot^2(\ell)n_c/c^2$, we have $ v_\bot^2(r)<0$, which is impossible. The only way out is then for the ray to have $ \Delta n_e=0$, i.e. the ray stays in the same cell and reflects with $ v_\bot(r) = - v_\bot(\ell)$.

The basic algorithmic steps for tracing a single ray through all cells in one block (see Figure 17.4) can be summarized as follows:

Figure 17.4: A single ray crossing a cell.
Image LaserRayPath
The initial situation has the ray positioned on one of the block's faces with velocity components such that the ray's direction of movement is into the block.


17.4.4.2 Cubic Interpolation with Piecewise Parabolic Ray Tracing (CIPPRT)

The use of cubic interpolation schemes is an attempt at providing continuous $ n_e$ and $ T_e$ representations as well as continuous first derivatives $ \nabla n_e$ and $ \nabla T_e$ throughout the entire domain. This allows for the calculation of a more smoother path for each ray inside each cell when compared to the AVG algorithm. In effect, the cell-by-cell choppiness of the AVG algorithm can be avoided by taking many small ray steps inside each cell. Also, the troublesome $ n_e$ discontinuities at the cell boundaries and the application of Snell's law will disappear. The essential features of the cubic interpolation schemes are layed out in section 31.2. In what follows we show the piecewise parabolic ray tracing scheme.

The CIPPRT algorithm tries to map out the ray path inside each cell as a sequence of parabolic (i.e. constant acceleration) paths. It resembles the AVG algorithm and uses the same equations Eq.(17.40) and Eq.(17.41) for determination of velocity and position of each parabolic section of the path. The main difference between the AVG and the CIPPRT is that the latter uses the cubic interpolated ray acceleration field at each cell point, whereas the former assumes the same constant acceleration at each point throughout the cell. The CIPPRT can hence be viewed as a succession of AVG steps inside each cell. The energy deposition of each parabolic path section is also calculated like for the AVG algorithm, namely solving the exponential time integral in Eq.(17.47) using the second order Gaussian quadrature in Eq.(17.59). The required $ n_e$ and $ T_e$ at the Gaussian quadrature points are evaluated using the cubic interpolation equation Eq.(31.11). The following scheme illustrates the steps involved for tracing a ray through a cell, assuming the ray is somewhere located in the cell at $ {\bf r}_0$ with velocity $ {\bf v}_0$:

  1. Calculate, using Eq.(31.13), the electron number density gradient $ \nabla n_e({\bf r}_0)$.
  2. Determine the time $ t$ to reach the next cell face using the quadratic time equation Eq.(17.41) and determine the ray's position $ {\bf r}(t)$ on the face.
  3. Do two $ t/2$ steps and calculate the $ {\bf r}(2\times t/2)$ position. The first $ t/2$ step is guaranteed to stay inside the cell. Hence $ {\bf r}(t/2)\in$ cell. However, despite $ {\bf r}(2\times t/2)$ having the potential to step outside the cell, we only need it for comparison with $ {\bf r}(t)$.
  4. Form the error vector $ {\bf e}={\bf r}(2\times t/2)-{\bf r}(t)$. If $ \vert{\bf e}\vert\leq$ target accuracy, accept the $ {\bf r}(t)$ position. If not, set $ t=t/2$ and repeat step 2).
  5. Once a satisfactory stepping time has been determined, update the velocity using Eq.(17.40).


17.4.4.3 Cubic Interpolation with Runge Kutta Integration (CIRK)

Instead of using the piecewise parabolic ray tracing approach, the CIRK algorithm advances the ray through cells using Runge Kutta (RK) integration 33.2. An advantage of using the RK integrator is that the rate of power loss ODE 17.42 can be directly incorporated into the RK solution vector, thereby gaining access over its error control. The ray tracing RK vector has 7 entries

$\displaystyle \frac{d}{dt}
\left(\begin{array}{c}{\bf r} {\bf v} P\end{array}\right)$ $\displaystyle =$ $\displaystyle \left(\begin{array}{c}{\bf v} {\bf a} -\nu_{ib}P\end{array}\r...
...{\bf v} -c^2\nabla n_e({\bf r})/2n_c -\nu_{ib}({\bf r})P\end{array}\right),$ (17.68)

and the independent variable $ t$ does not appear on the rhs. Since each ray must be traced on a cell-by-cell basis, we have to make sure that each RK step either stays within the cell or hits one of its walls. The CIRK algorithm therefore has to use the confined RK stepping routine.


17.4.5 Setting up the Laser Pulse

The laser's pulse contains information about the laser's energy profile and can thus be characterized by a function of laser power in terms of time $ P(t)$. The curve of $ P(t)$ gives the power of the laser pulse at any given simulation time. In FLASH, the form of $ P(t)$ is given by a series of points, which are connected via straight lines (see Figure 17.5).

Figure 17.5: The structure of the laser pulse.
Image LaserPulse
Each point denotes a power/time pair $ P_i(t_i)$. The average laser power $ {\overline P}$ of the pulse during a time step from $ t\rightarrow t+\Delta t$ is then
$\displaystyle {\overline P}$ $\displaystyle =$ $\displaystyle \frac{\int_t^{t+\Delta t}P(t) dt}{\Delta t},$ (17.69)

where the integration is performed piecewise between all the points $ P_i(t_i)$ contained within the relevant time frame. The average laser power will be distributed among all the rays that will be created during that particular time step. Note, that $ t_1$ is the time the laser first comes into existence. In Figure 17.5 there is therefore no line connecting the origin (zero power) with $ P_1$. If the user wants a gradual buildup of laser power from zero, an explicit point $ P_1=0$ must be given at time $ t_1$. The same consideration applies for the other end of the timescale.


17.4.6 Setting up the Laser Beam

The laser's beam contains all the information about orientation and shape of the laser. The orientation is characterized by specifying lens and target coordinates. The shape of the laser is given by the size and shape of the lens and target cross section areas and the cross-sectional ray power distribution law. Figure 17.6 is helpful in visualizing the position of the vectors for the formulas below.

Figure 17.6: The laser beam.
Image LaserBeam
The most important vectors for setting up the rays are the two local elliptical semiaxes unit vectors $ {\bf u}_1$ and $ {\bf u}_2$. These two unit vectors, together with the lens and target positions, are very convenient in calculating the rays lens and target coordinates. Note, that the unit vectors are the same for both the lens and the target, if defined from their corresponding local elliptical centers. In what follows, vectors in capital letters are defined in the global coordinate system and vectors in small letters are defined in the local target coordinate system.

17.4.6.1 The Local Elliptical Semiaxis Unit Vectors

The laser beam originates at the lens, whose center is located at $ {\bf L}$, and hits the target, whose center is located at $ {\bf T}$. In 3D geometries, the elliptical target area is defined as an ellipse perpendicular to the lens-target line and whose largest semiaxis $ {\bf s}_1$ is positioned such that it makes a torsional angle $ \phi_1$ with the local target z-axis (see Figure 17.6). Let us define the beam vector $ {\bf b}={\bf L}-{\bf T}$ pointing from the target to the lens and connecting the two respective centers. We then have the defining equations for $ {\bf s}_1$

$\displaystyle s_{1x}b_x + s_{1y}b_y + s_{1z}b_z$ $\displaystyle =$ 0 (17.70)
$\displaystyle s_{1x}^2 + s_{1y}^2 + s_{1z}^2$ $\displaystyle =$ $\displaystyle \ell_1$ (17.71)
$\displaystyle s_{1z}$ $\displaystyle =$ $\displaystyle \ell_1 \cos \phi_1 \cos (\theta - \pi/2),$ (17.72)

where $ \ell_1$ is the length of $ {\bf s}_1$ and $ \theta $ is the angle that the beam vector $ {\bf b}$ makes with the local z-axis. The first equation 17.70 says that $ {\bf b}$ and $ {\bf s}_1$ are orthogonal. The second one 17.71 defines the length of $ {\bf s}_1$ and the third one 17.72 defines the local z-axis projection of $ {\bf s}_1$. The last equation can be rewritten as
$\displaystyle s_{1z}$ $\displaystyle =$ $\displaystyle \ell_1 \frac{\sqrt{b_x^2+b_y^2}}{\vert{\bf b}\vert} \cos \phi_1.$ (17.73)

Forming an expression for $ s_{1y}^2$ from equation 17.70, an expression for $ s_{1z}^2$ from equation 17.72 and inserting the results into equation 17.71, we obtain a quadratic equation in $ s_{1x}^2$, from which we can obtain all three components. We get after some algebra and simplifications
$\displaystyle s_{1x}$ $\displaystyle =$ $\displaystyle \frac{\ell_1}{\sqrt{b_x^2+b_y^2}}
\left[- \frac{b_x b_z}{\vert{\bf b}\vert} \cos \phi_1 \pm b_y \sin \phi_1 \right]$ (17.74)
$\displaystyle s_{1y}$ $\displaystyle =$ $\displaystyle \frac{\ell_1}{\sqrt{b_x^2+b_y^2}}
\left[- \frac{b_y b_z}{\vert{\bf b}\vert} \cos \phi_1 \mp b_x \sin \phi_1 \right]$ (17.75)
$\displaystyle s_{1z}$ $\displaystyle =$ $\displaystyle \ell_1 \frac{\sqrt{b_x^2+b_y^2}}{\vert{\bf b}\vert} \cos \phi_1.$ (17.76)

The two possible solutions for $ s_{1x}$ and $ s_{1y}$ correspond to the two possible definitions of the rotation angle $ \phi_1$ in either clockwise or counterclockwise rotation from the z-axis when looking along the beam vector $ {\bf b}$ from the lens to the target. Let us henceforth define $ \phi_1$ to be the clockwise rotation. Then the lower signs in $ s_{1x}$ and $ s_{1y}$ apply and dividing each component by $ \ell_1$ we obtain for the unit vector components
$\displaystyle u_{1x}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{b_x^2+b_y^2}}
\left[- \frac{b_x b_z}{\vert{\bf b}\vert} \cos \phi_1 - b_y \sin \phi_1 \right]$ (17.77)
$\displaystyle u_{1y}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{b_x^2+b_y^2}}
\left[- \frac{b_y b_z}{\vert{\bf b}\vert} \cos \phi_1 + b_x \sin \phi_1 \right]$ (17.78)
$\displaystyle u_{1z}$ $\displaystyle =$ $\displaystyle \frac{\sqrt{b_x^2+b_y^2}}{\vert{\bf b}\vert} \cos \phi_1.$ (17.79)

The second elliptical semiaxis $ {\bf s}_2$ is perpendicular to the first and lays in the same elliptical target plane. If we define it to be at a right angle in clockwise direction from $ {\bf s}_1$, the torsional angle it makes with the local z-axis is $ \phi_2 = \phi_1 + \pi/2$. The formulas for its unit vector components are the same as for $ {\bf s}_1$ but with $ \phi_1$ replaced by $ \phi_2$. From trigonometric sine and cosine relations we can re-express them in terms of $ \phi_1$
$\displaystyle u_{2x}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{b_x^2+b_y^2}}
\left[\frac{b_x b_z}{\vert{\bf b}\vert} \sin \phi_1 - b_y \cos \phi_1 \right]$ (17.80)
$\displaystyle u_{2y}$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{b_x^2+b_y^2}}
\left[\frac{b_y b_z}{\vert{\bf b}\vert} \sin \phi_1 + b_x \cos \phi_1 \right]$ (17.81)
$\displaystyle u_{2z}$ $\displaystyle =$ $\displaystyle - \frac{\sqrt{b_x^2+b_y^2}}{\vert{\bf b}\vert} \sin \phi_1.$ (17.82)

Note the importance of the $ \sqrt{b_x^2+b_y^2}$ term. If this term is equal to zero, then the unit vectors become undefined. This corresponds to the laser beam being parallel to the global z-axis (and thus coinciding with the local z-axis). Then both elliptical semiaxes are not defined uniquely through a z-axis torsional angle of zero. In this case the torsional angle must be defined through one of the other coordinate axis. The following coordinate index permutations in the above formulas apply:
$\displaystyle \mbox{$\phi_1$ defined through x-axis}$ $\displaystyle =$ $\displaystyle x\rightarrow y, y\rightarrow z, z\rightarrow x$ (17.83)
$\displaystyle \mbox{$\phi_1$ defined through y-axis}$ $\displaystyle =$ $\displaystyle x\rightarrow z, y\rightarrow x, z\rightarrow y.$ (17.84)

In 2D geometries, the beam's lens and target areas shrink to a line segment. All z-components are zero and the torsional angle is equal to $ \pi/2$. The components of the unit vector can be deduced from the equations 17.77 and 17.78 as

$\displaystyle u_x$ $\displaystyle =$ $\displaystyle - \frac{b_y}{\vert{\bf b}\vert}$ (17.85)
$\displaystyle u_y$ $\displaystyle =$ $\displaystyle \frac{b_x}{\vert{\bf b}\vert}.$ (17.86)


17.4.6.2 Extremum Values for the Elliptical Target Zone

In 3D simulations, the planar elliptical target zone can be placed in any possible way inside the domain by specifying the lens $ {\bf L}$ and target $ {\bf T}$ positions, both elliptical target zone semiaxes lengths $ \ell_1$ and $ \ell_2$, the first semiaxis torsion angle $ \phi_1$ and the coordinate axis to which $ \phi_1$ refers to. We wish to enforce in the code a complete containment of the entire target plane inside the domain boundaries. To check this condition we need the extremum coordinate values of the elliptical boundary curve of the target. The collection of all points $ {\bf e}$, based on the local target coordinate system, can be given in the following implicit form

$\displaystyle {\bf e}$ $\displaystyle =$ $\displaystyle \ell_1 {\bf u}_1 \cos(\omega) + \ell_2 {\bf u}_2 sin(\omega),
\;\;\;0\geq \omega\geq 2\pi.$ (17.87)

Differentiating with respect to $ \omega$ and equating to zero we get the minimax condition on all coordinate components as
$\displaystyle \mbox{\boldmath$\omega$}$$\displaystyle _{min/max}$ $\displaystyle =$ $\displaystyle \arctan \left(\frac{\ell_2 {\bf u}_2}{\ell_1 {\bf u}_1}\right),$ (17.88)

where $ \omega$$ _{min/max}$ denotes the vector of the minimax angles for each cartesian component. The equation 17.88 has two possible answers for each $ \omega$ component in the range $ 0\geq \omega\geq 2\pi$, corresponding to the minimum and the maximum value. The $ \omega_{min}$ and $ \omega_{max}$ angles differ by $ \pi$ radians for each cartesian component. The corresponding minima and maxima on the elliptical boundary curve are obtained by inserting the $ \omega_{min}$ and $ \omega_{max}$ angles into equation 17.87.

In order to simplify things, we note that what we need are the sine and cosine values of $ \omega_{min}$ and $ \omega_{max}$. From the definition of the trigonometric functions based on the length of the three sides of a right-angled triangle (a = opposite side, b = adjacent side, c = hypotenuse for an angle $ \omega$), we have, using $ \sin\omega = a/c$, $ \cos\omega = b/c$, $ \tan\omega = a/b$ and $ c^2=a^2+b^2$

$\displaystyle \sin (\arctan (a/b))$ $\displaystyle =$ $\displaystyle \sin (\omega) = a/c = \frac{a}{\sqrt{a^2+b^2}}$ (17.89)
$\displaystyle \cos (\arctan (a/b))$ $\displaystyle =$ $\displaystyle \cos (\omega) = b/c = \frac{b}{\sqrt{a^2+b^2}}.$ (17.90)

When applied to equation 17.87, we obtain
$\displaystyle {\bf e}_{min/max}$ $\displaystyle =$ $\displaystyle \pm \sqrt{(\ell_1 {\bf u}_1)^2 + (\ell_2 {\bf u}_2)^2}.$ (17.91)

The corresponding minimax equation for the 2D geometries (ellipse $ \rightarrow$ line segment) is
$\displaystyle {\bf e}_{min/max}$ $\displaystyle =$ $\displaystyle \pm \ell {\bf u},$ (17.92)

with $ \ell$ being half the length of the target line segment and $ {\bf u}$ the line segment unit vector.


17.4.7 Setting up the Rays

For each Energy Deposition unit call, the code sets up the initial collection of rays, defined to be located on the domain boundary with specific velocity components such that they will hit the target area at precise locations if they would cross an empty domain. Key concepts in setting up the rays initial position, velocity and power are the elliptical local square, radial, delta or statistical grids and the beam cross section power function. Rays will be launched from the lens grid intersection points to the corresponding target grid intersection points.

Figure 17.7: Setting up the rays between the beam's lens and target area using a square grid. Only half of the square elliptical grids are shown for clarity.
Image LaserRays
Using a square grid, a uniform beam cross-sectional ray density will be achieved, although this could be relaxed in the future to include the possibility of rectangular grids leading to different cross-sectional ray densities along the two elliptical semiaxes directions. A radial grid places the rays on concentrical ellipses.


17.4.7.1 The Elliptical Lens/Target Local Square Grid

In 3D geometries, both the lens and target areas are defined as two similar ellipses (different size, same shape). Given a specific elliptical shape by defining the lengths of both semiaxes $ \ell_1 \geq \ell_2$, we can set up a square grid inside the ellipse, such that the number of grid intersection points matches closely the number of rays $ N_{rays}$ specified by the user. The grid is defined by the separation $ \Delta$ between two grid points and the placement of the grid's origin at the center of the ellipse. Our goal is to find this $ \Delta$ value.

Denote by $ N_{rectangle}$ the number of grid intersection points for the circumscribing rectangle with sides $ 2\ell_1$ and $ 2\ell_2$. We then have

$\displaystyle N_{rectangle}$ $\displaystyle =$ Number of ellipse points$\displaystyle \frac{\mbox{Area rectangle}}{\mbox{Area ellipse}}$  
  $\displaystyle =$ $\displaystyle \frac{4N_{rays}}{\pi},$ (17.93)

this relation holding only approximately due to the finite resolution of the grid. Let us denote by $ r=\ell_1/\ell_2$ the ratio between the largest and smallest semiaxis. If $ n_1$ is the number of tics along the $ \ell_1$-semiaxis starting at the grid origin (0,0), then the number of tics along the $ \ell_2$-semiaxis is approximately equal to $ n_1/r$. Looking at the circumscribed rectangle area, the total number of grid intersection points laying on the grid's axes will be twice those on each semiaxis plus the grid's origin: $ 2n_1+2n_1/r +1$. The total number of grid intersection points not on any of the axes is $ 4(n_1)(n_1/r)$. Both numbers together should then equal to $ N_{rectangle}$. Hence, from 17.93, we are lead to a quadratic equation in $ n_1$
$\displaystyle \frac{4}{r}n_1^2 + \left( 2+\frac{2}{r} \right)n_1 - \left( \frac{4N_{rays}}{\pi}-1 \right)$ $\displaystyle =$ $\displaystyle 0,$ (17.94)

whose solution is
$\displaystyle n_1$ $\displaystyle =$ $\displaystyle \frac{1}{4}\left[ -(1+r)+\sqrt{(1-r)^2+\frac{16rN_{rays}}{\pi}} \right].$ (17.95)

Always $ n_1>0$, which can easily be seen from the lowest possible value that $ n_1$ can attain for $ N_{rays}=1$ and the lowest possible ratio $ r=1$ (in this case $ n_1=0.06418...$). Since $ n_1$ has to be an integer we take the ceiling value of the solution 17.95. If the user specified $ N_{rays}=1$, the search for $ n_1$ is bypassed and the ray is placed at the elliptical origin.

Having $ n_1$, the next task is to find the optimum (or close to optimum) grid spacing $ \Delta$. This is done by defining a minimum and maximum grid spacing value

$\displaystyle \Delta_{min}$ $\displaystyle =$ $\displaystyle \frac{\ell_1}{n_1+1}$ (17.96)
$\displaystyle \Delta_{max}$ $\displaystyle =$ $\displaystyle \frac{\ell_1}{n_1-1}.$ (17.97)

A series of $ \Delta_{min}\leq \Delta_k \leq \Delta_{max}$ grid spacings is then tested, and for each $ \Delta_k$ the number of grid intersection points $ N_k$ inside the ellipse area is determined. Of all $ N_k$ obtained, a certain number of them will be closest to $ N_{rays}$. The average over all these $ \Delta_k$ will then be taken as the final $ \Delta$ value. For this $ \Delta$ we compute the final number of grid intersection points, which will then replace the user's specified $ N_{rays}$.

Since the target $ \ell_1$ and $ \ell_2$ semiaxes are specified by the user, the $ \Delta_T$ for the target square grid is evaluated using the above algorithm. The corresponding lens $ \Delta_L$ value is set using the similarity in size between the lens and the target. Using the user's specified $ \ell_1$ for the lens, the preservation of length relations between similar objects leads to:

$\displaystyle \frac{\Delta_L}{\ell_{1,L}}$ $\displaystyle =$ $\displaystyle \frac{\Delta_T}{\ell_{1,T}}$ (17.98)

and hence
$\displaystyle \Delta_L$ $\displaystyle =$ $\displaystyle \frac{\ell_{1,L}}{\ell_{1,T}} \Delta_T.$ (17.99)

In 2D geometries the situation is much simpler. Due to the linear shape of the lens and target areas, it is very easy to calculate the $ \Delta_T$ value of the linear target grid such that exactly $ N_{rays}$ grid points are obtained with the outer two grid points coinciding with the target end points. The corresponding $ \Delta_L$ is evaluated using equation 17.99.


17.4.7.2 The Elliptical Lens/Target Local Radial Grid

In order to set up the radial lens and target grids, we use the implicit definition of the elliptical curve from 17.87. The grid will then be defined as the number of tics on the $ \ell_1,\ell_2$ pair and the number of tics on the angle $ \omega$ within the range $ 0\geq \omega\geq 2\pi$. The number of tics on both of these 'axes' will be the same and will be denoted by $ n$. The total number of radial grid points within the lens and target ellipses will be equal to $ 1+n(n+1)$, leading to the quadratic equation

$\displaystyle n^2+n+1$ $\displaystyle =$ $\displaystyle N_{rays},$ (17.100)

whose relevant solution is
$\displaystyle n$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left[-1+\sqrt{4N_{rays}-3}\right].$ (17.101)

In order to reduce the error to $ N_{rays}$, rounding to the nearest integer is applied for $ n$. The individual grid points are calculated from
$\displaystyle {\bf e}_{ij}$ $\displaystyle =$ $\displaystyle \frac{\ell_1i}{n} {\bf u}_1 \cos\left(\frac{2\pi j}{n}\right)
+ \frac{\ell_2i}{n} {\bf u}_2 \sin\left(\frac{2\pi j}{n}\right),$ (17.102)

where the index ranges are
$\displaystyle j$ $\displaystyle =$ $\displaystyle 0,1,2,\ldots,n$ (17.103)
$\displaystyle i$ $\displaystyle =$ $\displaystyle \min(1,j),\ldots,n.$ (17.104)

In an effort to provide more control to the user, the default 'same number of tics on both radial and angular axes' has been extended, such that the user can enforce one or even both of these values. This is controlled by setting the requested number of radial and/or angular tics as runtime parameters for each beam. The resulting number of rays will then be recalculated and overwritten.


17.4.7.3 The Elliptical Lens/Target Local Delta Grid

One of the drawbacks of the local square grid is that the user has no direct control of the resulting tic separation. The main goal when setting up the local square grid is to have the number of rays match as closely as possible the number of rays requested by the user. The local delta grid is provided to give the user full control over the tic separations on both semiaxes, relaxing at the same time the number of rays constraint. The delta grid becomes useful if the user wants to enforce a more smooth energy deposition in the cells hit by the 3D laser beam. Rays can be forced to hit each cell well away from the cell boundary, thereby avoiding the initial uneven ray distributions due to some rays hitting the block walls with corresponding nudging to one block or the other. As a consequence the delta grids are most useful if the target area of the beam has or is known to have a uniform refinement level.

The setup of the delta grid is simple: the user has to provide the tic separation values as beam runtime parameters. The beam setup code then adheres strictly to these values and excludes any grid points laying outside the elliptical beam boundary. The number of rays requested initially by the user is completely ignored and replaced by the number of rays determined for the delta grid. The user must make sure that the supplied memory requirements for ray storage are ok. This can be done by estimating the number of rays resulting from the delta grid on the area of the circumscribed rectangle with sides $ 2\ell_1$ and $ 2\ell_2$ and rescaling by the ellipse/rectangle area factor. If we denote the user supplied tic separations corresponding to the $ \ell_1$ semiaxis and $ \ell_2$ semiaxis by $ \Delta_1$ and $ \Delta_2$, then the estmated number of rays would be:

$\displaystyle N_{rays}$ $\displaystyle \approx$ $\displaystyle 2\left(\frac{\ell_1}{\Delta_1}\right)\times
2\left(\frac{\ell_2}{\Delta_2}\right)\times
\frac{\pi}{4}$  
  $\displaystyle =$ $\displaystyle \frac{\pi \ell_1 \ell_2}{\Delta_1 \Delta_2}.$ (17.105)

The first tics of the delta grid along each grid axis start at $ \Delta_1/2$ and $ \Delta_2/2$. The grid center $ (0,0)$ is thus not part of the delta grid.


17.4.7.4 The Elliptical Lens/Target Local Statistical Grid

The local statistical grid is defined by a statistical collection of $ (x,y)$ pairs within the range $ [-1,1]$, where the $ x$ and $ y$ denote fractions of the $ \ell_1$ and $ \ell_2$ semiaxis. A random number generator for the range $ [0,1]$ is used and the numbers are shifted to the $ [-1,1]$ range by multiplying by 2 and adding $ -1$. Every $ (x,y)$ pair is checked, if it actually lays within the ellipse and retained if it does. The random $ (x,y)$ pair generation stops, once the requested number of rays is reached. In order to use different statistical grids for each time step, the statistical grid is regenerated afresh for each time step using a different random seed value.


17.4.7.5 Beam Cross Section Power Function

The beam cross section power function describes the power distribution of the rays inside the beam at launching time. Currently there are two types of power distribution functions implemented in FLASH: 1) uniform (flat) distribution (equal power) and 2) gaussian decay from the center of the beam. The first one is trivial: every ray gets assigned an equal amount of power. The gaussian decay function assigns to each ray a relative weight according to the position inside the elliptical cross section of the beam. Using the local $ {\bf s}_1$ and $ {\bf s}_2$ target coordinate system located at the center of the beam, the gaussian weighting function for 2D target areas (3D geometries) reads

$\displaystyle w$ $\displaystyle =$ $\displaystyle \exp^{{\displaystyle -\left[\left(\frac{x}{R_x}\right)^2+
\left(\frac{y}{R_y}\right)^2\right]}^{{\displaystyle \gamma}}},$ (17.106)

where $ x$ and $ y$ denote the local coordinates inside the ellipse along the $ {\bf s}_1$ and $ {\bf s}_2$ axes, respectively, $ R_x$ and $ R_y$ are user defined decay radii values and $ \gamma$ is the user defined gaussian super exponent. For 1D target areas (2D geometries) the weighting function is
$\displaystyle w$ $\displaystyle =$ $\displaystyle \exp^{{\displaystyle -\left[\left(\frac{x}{R_x}\right)^2\right]}^{\displaystyle \gamma}}.$ (17.107)

In order to determine the actual power assigned to each ray, we use the average laser power from equation 17.69
$\displaystyle P_{ray}$ $\displaystyle =$ $\displaystyle {\overline P} \frac{w_{ray}}{\sum w_{ray}}.$ (17.108)

The sum of the weights over all rays plays the role of a sort of partition function for the beam grid and can thus be precomputed and become part of the beam properties when setting up the beams.


17.4.7.6 The Rays Initial Position and Velocity

In 3D, after the local lens and target grids have been set up, each ray has associated with it a local lens elliptical coordinate pair $ (x_L,y_L)$ and a local target elliptical coordinate pair $ (x_T,y_T)$. The corresponding global coordinates can be obtained using the global lens and target center coordinates and the elliptical unit vectors (see Figure 17.7)

$\displaystyle {\bf R}_L$ $\displaystyle =$ $\displaystyle x_L{\bf u}_1 + y_L{\bf u}_2 + {\bf L}$ (17.109)
$\displaystyle {\bf R}_T$ $\displaystyle =$ $\displaystyle x_T{\bf u}_1 + y_T{\bf u}_2 + {\bf T}.$ (17.110)

Defining now the ray vector $ {\bf R}$ pointing from the lens to the target
$\displaystyle {\bf R}$ $\displaystyle =$ $\displaystyle {\bf R}_T - {\bf R}_L,$ (17.111)

we can state the parametric ray line equation
$\displaystyle {\bf P}$ $\displaystyle =$ $\displaystyle {\bf R}_L + w{\bf R}.$ (17.112)

The ray line equation is used to determine where and which domain boundary surface the ray will hit. Consider a domain boundary surface contained within a plane given by the equation
$\displaystyle A_xx+A_yy+A_zz+D$ $\displaystyle =$ 0  
$\displaystyle {\bf A}\cdot (x\;y\;z)+D$ $\displaystyle =$ $\displaystyle 0.$ (17.113)

The value of the real $ w$ parameter where the ray line meets this plane is obtained by inserting the ray line equation into the plane equation. It is
$\displaystyle w$ $\displaystyle =$ $\displaystyle - \frac{{\bf A}\cdot {\bf R}_L+D}{{\bf A}\cdot {\bf R}}.$ (17.114)

Inserting this $ w$ value into the ray line equation 17.112 we obtain the location $ {\bf P}_{cross}$ of the crossing point on the plane. From Figure 17.7, for the ray vector to cross the plane, the value of $ w$ must be in the range $ 0\leq w \leq 1$. A value of $ w=0$ or $ w=1$ indicates that the plane is crossing the lens or target area, respectively. If $ w$ is in proper range, it must next be checked if $ {\bf P}_{cross}$ is located within the domain boundary surface. If yes, that $ w$ value is accepted. Note, that several proper $ w$ values can be obtained. A ray crossing near the corner of a rectangular domain gives three proper $ w$ values corresponding to crossing the three rectangular planes defining the corner. Since the rays originate from the lens, the relevant $ w$ is the one with minimum value and the corresponding $ {\bf P}_{cross}$ will be taken as the rays initial position. The initial ray velocity components are determined from the unit vector of the ray vector
$\displaystyle {\bf v}$ $\displaystyle =$ $\displaystyle v_0\frac{{\bf R}}{\vert{\bf R}\vert},$ (17.115)

where $ v_0$ is the initial magnitude of the velocity (in most simulations this is the speed of light). For 2D geometries all the above equations remain the same, except for the ray global coordinates, for which the terms in $ {\bf u}_2$ are dropped and for the domain boundary planes, for which the terms involving the z-component in the defining equation 17.113 do not exist. The following steps summarize the determination of the initial ray positions and velocity components for each ray:


17.4.8 3D Laser Ray Tracing in 2D Cylindrical Symmetry

Performing a pure 2D cylindrical ray tracing has an obvious disadvantage: each ray must also be treated as a 2D cylindrical symmetrical object. Each ray can only hit the R-disk of the cylindrical domain at precisely 90 degrees and no variation in this incident angle is possible. However, if one treats the cylinder as a true 3D object, then it is possible to trace each ray through this 3D cylinder. The advantage of retaining the 2D cylindrical description of the domain is obvious: only 2D storage is needed to describe the properties of the domain.

Figure 17.8: Tracing a ray in 3D in a 2D cylindrical domain. View onto the R-disk. The z-axis has been left out for clarity.
Image Laser3D2raypath
3D in 2D ray tracing is much more complicated than either the simple pure 3D or pure 2D counterparts. The interplay between the polar and the cartesian coordinates leads to ray tracing equations which can only be solved approximately via the use of elliptical integrals. Rather than approximating the integrals involved, a second approach decomposes the 3D cylinder into several identical cylindrical wedges with each wedge having planar boundaries. The larger the number of wedges the more accurate the 3D in 2D ray tracing will be. Both approaches (approximating the integrals and wedging the 3D cylinder) are entirely equivalent.

17.4.8.1 The Exact 3D in 2D Ray Tracing Solution

We will first state the exact time-radial solution of an object of mass $ m$ moving in a central force environment with no external forces acting on it. The motion of such an object is characterized by constant energy and angular momentum and is hence confined to a 2D plane. There are two ways to describe the position $ {\bf r}$, velocity $ {\bf v}$ and acceleration $ {\bf a}$ vectors of a particle in a 2D plane: using cartesian $ {\bf\hat i},{\bf\hat j}$ or polar $ {\bf\hat R},$ $ {\hat \theta}$ unit vectors. They are interrelated by

$\displaystyle ({\bf\hat R},$$\displaystyle \mbox{\boldmath${\hat \theta}$}$$\displaystyle )$ $\displaystyle =$ $\displaystyle ({\bf\hat i},{\bf\hat j})
\left(\begin{array}{cc}\cos \theta & -\sin \theta  \sin \theta & \cos \theta \end{array}\right)$ (17.116)

Using these unit vectors and the differentiation chain rule we get
$\displaystyle {\bf r}$ $\displaystyle =$ $\displaystyle x{\bf\hat i} + y{\bf\hat j}
 =  R{\bf\hat R}$ (17.117)
$\displaystyle {\bf v}$ $\displaystyle =$ $\displaystyle \dot{x}{\bf\hat i} + \dot{y}{\bf\hat j}
 =  \dot{R}{\bf\hat R} + R\dot{\theta}$$\displaystyle \mbox{\boldmath${\hat \theta}$}$ (17.118)
$\displaystyle {\bf a}$ $\displaystyle =$ $\displaystyle \ddot{x}{\bf\hat i} + \ddot{y}{\bf\hat j}
 =  (\ddot{R}-R\dot{\theta}^2){\bf\hat R}
+(R\ddot{\theta}+2\dot{R}\dot{\theta})$$\displaystyle \mbox{\boldmath${\hat \theta}$}$ (17.119)

where each dot on the dotted variables stands for $ d/dt$. The angular momentum of the object in polar coordinates is
$\displaystyle {\bf L}$ $\displaystyle =$ $\displaystyle {\bf r}\times m{\bf v}$  
  $\displaystyle =$ $\displaystyle R{\bf\hat R}\times m(\dot{R}{\bf\hat R} + R\dot{\theta}$$\displaystyle \mbox{\boldmath${\hat \theta}$}$$\displaystyle )$  
  $\displaystyle =$ $\displaystyle mR^2\dot{\theta}({\bf\hat R}\times$   $\displaystyle \mbox{\boldmath${\hat \theta}$}$$\displaystyle )$ (17.120)

and its magnitude
$\displaystyle L$ $\displaystyle =$ $\displaystyle mR^2\dot{\theta}$ (17.121)

Since only a central force component is present, this force depends only on the radial part
$\displaystyle {\bf F}(R)$ $\displaystyle =$ $\displaystyle m{\bf a}_R = m(\ddot{R}-R\dot{\theta}^2){\bf\hat R}.$ (17.122)

We are now ready to state the equation of motion of the object in polar coordinates. Let's assume the object has initially at time $ t_0$ the coordinates ( $ R_0,\theta_0$), and let $ E$ be its constant energy. $ E$ is composed of two parts: kinetic and potential energy. Both parts will change as time passes by. The potential energy $ U$ can only change along a change in $ R$, because the force has only a radial component. We have
$\displaystyle E$ $\displaystyle =$ $\displaystyle \frac{1}{2}mv^2 + U(R)$  
  $\displaystyle =$ $\displaystyle \frac{1}{2}m(\dot{R}^2+R^2\dot{\theta}^2) + U(R)$  
  $\displaystyle =$ $\displaystyle \frac{1}{2}m\dot{R}^2 + \frac{1}{2}\frac{L^2}{mR^2} + U(R)$ (17.123)

where in the last step we have eliminated the angular dependence through the use of Eq. (17.121). The last equation for $ E$ is a first order differential equation allowing for separation of variables
$\displaystyle \int_{R_0}^{R_T}\left[\frac{2}{m}(E-U(R))-\frac{L^2}{m^2R^2}\right]^{-1/2}dR$ $\displaystyle =$ $\displaystyle \int_{t_0}^{t_T}dt$ (17.124)

The potential energy can be obtained via integration
$\displaystyle U(R)$ $\displaystyle =$ $\displaystyle - \int_{R_0}^{R}F(R')dR' + U(R_0)$  
  $\displaystyle =$ $\displaystyle - m\int_{R_0}^{R}a_R(R')dR' + U(R_0).$ (17.125)

The value of the initial potential energy is arbitrary and is conveniently set to zero. Assuming further that $ t_0=0$, we obtain
$\displaystyle t_T$ $\displaystyle =$ $\displaystyle \int_{R_0}^{R_T}\left[\frac{2E}{m}-\frac{L^2}{m^2R^2}+2\int_{R_0}^{R}a_R(R')dR'\right]^{-1/2}dR.$ (17.126)

Given an initial radial $ R_0$ and a final target radial $ R_T$ value, $ t_T$ is the time the object will take in travelling from $ R_0$ to $ R_T$ under the specified initial conditions at $ R_0$ (values of $ E$ and $ L$). If the object will not reach $ R_T$ then $ t_T$ is either negative or complex. The time equation (17.126) can be solved analytically only for a small number of cases (like for example if $ a_R=0$). All other cases require numerical approximation to the radial integral. For our case, when the ray moves through one radial 2D cylindrical zone using the AVG approximation 17.4.4.1, the central force acceleration is constant, i.e $ a_R=a$. For this case we have
$\displaystyle t_T$ $\displaystyle =$ $\displaystyle \int_{R_0}^{R_T}\left[\frac{2E}{m}-\frac{L^2}{m^2R^2}+2a(R-R_0)\right]^{-1/2}dR,$ (17.127)

for which it is possible to give an analytical solution in terms of elliptical integrals. However, the analytical solution is far to complicated to be of practical use.

17.4.8.2 The Approximate 3D in 2D Ray Tracing Solution

The main reason for the complicated time equation has its roots in the $ \theta $ polar variable describing the rotational symmetry. An approximate treatment can be formulated, in which each circle is treated as a linear polygon with $ n$ sides ($ n$ not necessarily integer), where the value of $ n$ determines the quality of the approximation. The exact 3D radial acceleration cylindrical problem in $ R,\theta,z$ coordinates is transformed into an approximate 3D cartesian problem in $ x,y,z$ coordinates, where the $ z$ coordinate remains unaffected. Each torus cell is hence approximated as a collection of truncated wedge (TW) cells. The trapezoidal sides of these wedges are parallel to the $ x,y$ plane and the rectangular sides are perpendicular to this plane. Since all TW cells have exactly the same size and shape when coming from the same torus cell, it suffices to concentrate on just one of them. These representative TW cells will be placed inside the $ x,y$ plane, such that the positive $ x$ axis divides the trapezoidal sides into two equal areas. The collection of all representative TW cells has the shape of a wedge with opening angle $ \Omega$.

Figure 17.9: Shape and location of the TW cells.
Image Laser3D2truncatedwedgecells
In order to trace the rays through the TW cells, we need the equation of all the cell boundary planes. The equations of the cell planes perpendicular to the $ z$ axis (containing the trapezoidal sides) are simply $ z=z_{cell}$. The equations of the cell planes perpendicular to the $ x$ axis are similarly $ x=x_{cell}$. The remaining cell plane equations corresponding to the non-coplanar cell faces are (see Figure 17.9)
$\displaystyle y$ $\displaystyle =$ $\displaystyle \pm \tan \left(\frac{\Omega}{2}\right)x$  
  $\displaystyle =$ $\displaystyle \pm mx.$ (17.128)

The quadratic time equations to be solved for the $ x=x_{cell}$ and $ z=z_{cell}$ plane equations are of the same form as equation 17.65. For the plane equations in 17.128 we obtain, after inserting the appropriate forms into the general cartesian quadratic time equation 17.64 and considering the zero gradient approximation in $ y$ direction
$\displaystyle -\frac{c^2}{4n_c}\left[\mp m{\langle\nabla n_e\rangle}_x\right]t^2
+\left[\mp mv_{0x}+v_{0y}\right]t
+\left[\mp mr_{0x}+r_{0y}\right]$ $\displaystyle =$ 0 (17.129)

We now describe the three main tasks for the 3D in 2D ray tracing: 1) proper setup of lenses and corresponding targets, 2) initial placement of the 3D rays on the 2D cylindrical domain and 3) tracing of the rays through the truncated wedges.


17.4.8.3 Extremum Global Radial 3D and 2D Distance Values for 3D Elliptical Lens and Target Zones

In curved domain setups (cylindrical and spherical), it becomes necessary to calculate extremum global radial distance values for a 3D elliptical curve from the domain origin in order to check proper placements of the lens and target zones. We start by stating the global equation of the elliptical target curve $ {\bf E}$ (the treatment of the lens ellipse is analogous) in 3D in implicit form (see Figure 17.6):

$\displaystyle {\bf E}$ $\displaystyle =$ $\displaystyle {\bf T} + {\bf s}_1 \cos(\omega) + {\bf s}_2 sin(\omega),
\;\;\;0\geq \omega\geq 2\pi.$ (17.130)

The 2D projections of this elliptical 3D curve in either the ($ x$,$ y$), ($ x$,$ z$) or ($ y$,$ z$) plane are themselves ellipses, however the 2D projections of $ {\bf s}_1$ and $ {\bf s}_2$ do no longer correspond to the 2D elliptical major and minor semi-axes. They generally are oblique to each other and their scalar product is non-zero. We will therefore develop general formulas taking the possible reduction of 3D to 2D into account. The square of the radial distance for any point on that curve from the global origin is
$\displaystyle \vert{\bf E}\vert^2$ $\displaystyle =$ $\displaystyle \vert{\bf T}\vert^2 + 2{\bf s}_1\cdot {\bf T}\cos(\omega) + 2{\bf s}_2\cdot {\bf T}\sin(\omega)$  
    $\displaystyle + {\bf s}_1\cdot {\bf s}_1\cos^2(\omega) + {\bf s}_2\cdot {\bf s}_2\sin^2(\omega)
+ 2{\bf s}_1\cdot {\bf s}_2\cos(\omega)\sin(\omega).$ (17.131)

Differentiating with respect to $ \omega$ and setting equal to zero leads to the minimax equation
$\displaystyle B\cos(\omega) - A\sin(\omega) - C \sin(\omega)\cos(\omega) + D [\cos^2(\omega)-\sin^2(\omega)]$ $\displaystyle =$ $\displaystyle 0,$ (17.132)

where
$\displaystyle A$ $\displaystyle =$ $\displaystyle {\bf s}_1\cdot {\bf T}$ (17.133)
$\displaystyle B$ $\displaystyle =$ $\displaystyle {\bf s}_2\cdot {\bf T}$ (17.134)
$\displaystyle C$ $\displaystyle =$ $\displaystyle {\bf s}_1\cdot {\bf s}_1 - {\bf s}_2\cdot {\bf s}_2$ (17.135)
$\displaystyle D$ $\displaystyle =$ $\displaystyle {\bf s}_1\cdot {\bf s}_2.$ (17.136)

The minimax equation 17.132 contains a mixed trigonometric term and can only be solved by eliminating either the sine or the cosine function. This leads in general to a quartic equation
$\displaystyle q_4x^4+q_3x^3+q_2x^2+q_1x+q_0$ $\displaystyle =$ $\displaystyle 0,$ (17.137)

whose coefficients are, depending on if $ x$ is either $ \cos(\omega)$ or $ \sin(\omega)$:
  $ x=\cos(\omega)$ $ x=\sin(\omega)$
$ q_4$ $ C^2+4D^2$ $ C^2+4D^2$
$ q_3$ $ 2AC+4BD$ $ -2BC+4AD$
$ q_2$ $ A^2+B^2-C^2-4D^2$ $ A^2+B^2-C^2-4D^2$
$ q_1$ $ -2AC-2BD$ $ 2BC-2AD$
$ q_0$ $ D^2-A^2$ $ D^2-B^2$
All four $ A,B,C,D$ can have a value of zero. The coefficient of $ x^4$ plays a special role. Note, that for this coefficient to be zero, both $ C$ and $ D$ must be zero, which corresponds to a circular curve in both 3D and 2D. For a circular target the quartic reduces to a quadratic. For the 3D cases we always have $ D=0$. The following table 17.4 shows the different cases that can arise for 3D, together with a description of the geometrical shapes of the target figures involved and the corresponding maximum and minimum radial distances:

Table: Different cases arising in 3D, depending on values of $ A,B,C,D$
$ D$ $ C$ $ A$ $ B$ Shape $ \vert{\bf E}\vert _{min}^2$ $ \vert{\bf E}\vert _{max}^2$
0 0 0 0 Circle (radius $ \vert{\bf s}_1\vert$) $ \vert{\bf T}\vert^2+\vert{\bf s}_1\vert^2$ $ \vert{\bf T}\vert^2+\vert{\bf s}_1\vert^2$
0 0 0 $ \neq 0$ Tilted circle $ \vert{\bf T}-{\bf s}_2\vert^2$ $ \vert{\bf T}+{\bf s}_2\vert^2$
0 0 $ \neq 0$ 0 Tilted circle $ \vert{\bf T}-{\bf s}_1\vert^2$ $ \vert{\bf T}+{\bf s}_1\vert^2$
0 0 $ \neq 0$ $ \neq 0$ Double-tilted circle Solve quadratic
0 $ \neq 0$ 0 0 Ellipse $ \vert{\bf T}+{\bf s}_2\vert^2$ $ \vert{\bf T}+{\bf s}_1\vert^2$
0 $ \neq 0$ 0 $ \neq 0$ Tilted ellipse $ \vert{\bf T}-{\bf s}_2\vert^2$ $ \vert{\bf T}+{\bf s}_1\vert^2$
0 $ \neq 0$ $ \neq 0$ 0 Tilted ellipse Solve quadratic product
0 $ \neq 0$ $ \neq 0$ $ \neq 0$ Double-tilted ellipse Solve quartic

In this table we have used our convention $ \vert{\bf s}_1\vert\geq \vert{\bf s}_2\vert$ and placement of the elliptical semi-axes in such a way that the angle they make with $ {\bf T}$ is $ \geq 90^{\circ}$. The cases that arise for the 2D are summarized in the table 17.5:

Table: Different cases arising in 2D, depending on values of $ A,B,C,D,\vert{\bf s}_1\vert,\vert{\bf s}_2\vert$
$ D$ $ C$ $ A$ $ B$ $ \vert{\bf s}_1\vert$ $ \vert{\bf s}_2\vert$ Shape $ \vert{\bf E}\vert _{min}^2$ $ \vert{\bf E}\vert _{max}^2$
0 0 0 0 0 0 Point $ \vert{\bf T}\vert^2$ $ \vert{\bf T}\vert^2$
0 $ \neq 0$ 0 $ 0,\neq 0$ 0 $ \neq 0$ Line (length $ \vert{\bf s}_2\vert$) $ \vert{\bf T}-{\bf s}_2\vert^2$ $ \vert{\bf T}+{\bf s}_2\vert^2$
0 $ \neq 0$ $ 0,\neq 0$ 0 $ \neq 0$ 0 Line (length $ \vert{\bf s}_1\vert$) $ \vert{\bf T}-{\bf s}_1\vert^2$ $ \vert{\bf T}+{\bf s}_1\vert^2$
0 0 $ 0,\neq 0$ $ 0,\neq 0$ $ \neq 0$ $ \neq 0$ Circle (radius $ \vert{\bf s}_1\vert$) $ (\vert{\bf T}\vert-\vert{\bf s}_1\vert)^2$ $ (\vert{\bf T}\vert+\vert{\bf s}_1\vert)^2$
0 $ \neq 0$ 0 $ \neq 0$ $ \neq 0$ $ \neq 0$ Ellipse ( $ {\bf s}_2\vert\vert{\bf T}$) Solve quadratic product
0 $ \neq 0$ $ \neq 0$ 0 $ \neq 0$ $ \neq 0$ Ellipse ( $ {\bf s}_1\vert\vert{\bf T}$) $ (\vert{\bf T}\vert-\vert{\bf s}_1\vert)^2$ $ (\vert{\bf T}\vert+\vert{\bf s}_1\vert)^2$
$ 0,\neq 0$ $ \neq 0$ $ \neq 0$ $ \neq 0$ $ \neq 0$ $ \neq 0$ Tilted ellipse Solve quartic

When solving the quartic or quadratic product equations, we must remember to eliminate the two non-valid extra solutions introduced due to squaring. All real solutions between $ -1$ and $ +1$ are inserted into 17.131 and the maximum and minimum of all four possible values are selected.


17.4.8.4 Initial Placement of the 3D Rays on the 2D Cylindrical Domain

There are two kinds of domain surfaces for a cylindrical domain. The first kind corresponds to the flat circular surface at both ends of the cylinder. These surfaces are planar and the determination of the ray intersection position on these surfaces proceeds along the same lines as shown in section 17.4.7.6. The second kind of surface is the cylinder mantle, whose geometrical equation is:

$\displaystyle x^2+y^2-D$ $\displaystyle =$ 0  
$\displaystyle (x\;y)\cdot (x\;y)-D$ $\displaystyle =$ $\displaystyle 0.$ (17.138)

Inserting the parametric ray line equation 17.112, we obtain a quadratic equation in $ w$:
$\displaystyle ({\bf R}\cdot {\bf R})w^2 + (2{\bf R}\cdot {\bf R}_L)w + ({\bf R}_L\cdot {\bf R}_L-D)$ $\displaystyle =$ $\displaystyle 0,$ (17.139)

where only the $ x$ and $ y$ components of the vectors are taken to form the scalar products. Having found the appropriate $ w$ for each ray and the corresponding crossing point $ {\bf P}_{cross}$ on the domain cylindrical surface, we need to translate the ray 3D domain crossing coordinates $ P_x,P_y,P_z$ into the initial ray 2D cylindrical wedge coordinates $ W_R,W_y,W_z$, where $ W_R$ and $ W_z$ correspond to the 2D cylindrical domain and $ W_y$ is the linearized angular coordinate from 17.128. Since the starting angular point is irrelevant (there is no preference of any point on the circle), we set this point equal to zero. Thus:
$\displaystyle W_R$ $\displaystyle =$ $\displaystyle \sqrt{P_x^2 + P_y^2}$ (17.140)
$\displaystyle W_y$ $\displaystyle =$ 0 (17.141)
$\displaystyle W_z$ $\displaystyle =$ $\displaystyle P_z$ (17.142)

To determine the initial ray wedge velocities, we define two unit vectors of the 3D ray vector: 1) the unit vector in the ($ x$,$ y$)-plane and 2) the unit vector along the $ z$-axis:
$\displaystyle {\bf u}_{xy}$ $\displaystyle =$ $\displaystyle {\bf R}_{xy}/\vert{\bf R}\vert$ (17.143)
$\displaystyle {\bf u}_z$ $\displaystyle =$ $\displaystyle {\bf R}_z/\vert{\bf R}\vert,$ (17.144)

where $ {\bf R}_{xy}$ is the ($ x$,$ y$)-plane projection of the 3D ray vector. Likewise, the crossing point vector is also split into two components:
$\displaystyle {\bf P}_{cross}$ $\displaystyle =$ $\displaystyle {\bf P}_{xy} + {\bf P}_z.$ (17.145)

Since the origin of the cylindrical domain is located at $ (x,y)=(0,0)$, the vector $ {\bf P}_{xy}$ is a radial vector in the ($ x$,$ y$)-plane. The ray vector component $ {\bf R}_{xy}$ on the other hand is usually not radially oriented, because the ray's origin (on the lens) does not necessarily lay on the $ (x,y)=(0,0)$ line. Both vectors $ {\bf P}_{xy}$ and $ {\bf R}_{xy}$ are useful in order to determine the ray's initial wedge velocities (see Figure 17.10).
Figure: Initial radial ($ v_R$) and angular ($ v_y$) ray velocity components on the wedge from vectors $ {\bf P}_{xy}$ and $ {\bf R}_{xy}$.
Image Laser3D2wedgevelocities
Denote the angle between $ {\bf R}_{xy}$ and $ {\bf P}_{xy}$ by $ \alpha$. Then
$\displaystyle \cos (\alpha)$ $\displaystyle =$ $\displaystyle \frac{{\bf R}_{xy}\cdot {\bf P}_{xy}}{\vert{\bf R}_{xy}\vert\vert{\bf P}_{xy}\vert}$ (17.146)

and the initial ray wedge velocities are
$\displaystyle v_R$ $\displaystyle =$ $\displaystyle v_0 * \cos (\alpha) \vert{\bf u}_{xy}\vert$ (17.147)
$\displaystyle v_y$ $\displaystyle =$ $\displaystyle v_0 * sign({\bf R}_{xy}\times {\bf P}_{xy}) * \sin (\alpha) \vert{\bf u}_{xy}\vert$ (17.148)
$\displaystyle v_z$ $\displaystyle =$ $\displaystyle v_0 * \vert{\bf u}_z\vert,$ (17.149)

where the sine and the cross product vector present in the expression for $ v_y$ are calculated as
$\displaystyle \sin (\alpha)$ $\displaystyle =$ $\displaystyle + \sqrt{1-\cos^2 (\alpha)}$ (17.150)

and
$\displaystyle {\bf R}_{xy}\times {\bf P}_{xy}$ $\displaystyle =$ \begin{displaymath}\left\vert
\begin{array}{ccc}
{\bf i} & {\bf j} & {\bf k}  ...
...P_x & P_y & 0
\end{array} \right\vert = (R_xP_y-R_yP_x){\bf k}.\end{displaymath} (17.151)

The sign of the cross product refers to the sign of the $ {\bf k}$ component. Note, that the wedge's $ y$-direction refers to the cylindrical $ \theta $-coordinate and the convention is that counterclockwise rotation along $ \theta $ is considered positive.


17.4.8.5 Tracing the Rays through the Truncated Wedges

Figure 17.11: Change in position and velocity components when ray moves into a new wedge. For clarity, the velocity components in the original wedge are shifted from the boundary crossing point.
Image Laser3D2raythroughwedges
When a ray hits the $ R$- or $ z$-planes of a truncated wedge, the $ R$- or $ z$-components of the ray's position and velocity change the same way as for a pure 2D cylindrical simulation. The $ y$-components stay the same. When the ray hits the $ y$-planes of the truncated wedge, it will cross into a new rotated wedge and the $ y$-position and the $ R$- and $ y$-components of the velocity must change to reflect that rotation. The change in $ y$-position is simple: just invert its sign. For the velocity component changes we need the cosine and sine of the wedge's opening angle $ \Omega$. The new components are thus
$\displaystyle W_y ($new$\displaystyle )$ $\displaystyle =$ $\displaystyle - W_y$ (17.152)
$\displaystyle v_R ($new$\displaystyle )$ $\displaystyle =$ $\displaystyle v_R * \cos (\Omega) \pm v_y * \sin (\Omega)$ (17.153)
$\displaystyle v_y ($new$\displaystyle )$ $\displaystyle =$ $\displaystyle v_y * \cos (\Omega) \mp v_R * \sin (\Omega),$ (17.154)

where the sign options in $ v_R$ and $ v_y$ correspond to the two possible $ y$-planes defined in 17.128 (see Figure 17.11). The change in direction of both velocity vector components is not a rotational effect on the ray due to the medium through which it travels but rather a geometrical effect due to changing wedges. When crossing a common $ y$-plane between two wedges, the ray is still on the same position in the ($ R$,$ z$) 2D cylindrical plane. Note, that $ \Omega$ must not necessarily be a divisor of $ 360^\circ$. Since $ \Omega$ is fixed through the entire simulation, $ \cos (\Omega)$ and $ \sin (\Omega)$ are conveniently computed only once during initialization.


17.4.9 Ray Tracing in 3D Cylindrical Domains

Ray tracing in 3D cylindrical domains starts from the acceleration vector definition at a point $ {\bf r}$ in 3D cylindrical coordinates $ {\bf r}=(R,\phi,z)$ in terms of their rotating unit vectors $ ({\bf R},\boldsymbol{\phi},{\bf z})$, the ray equation of motion in Eq.(17.36) and the definition of the gradient operator in 3D cylindrical coordinates:

$\displaystyle {\bf a}$ $\displaystyle =$ $\displaystyle (\ddot{R}-R\dot{\phi}^2){\bf R}
+(R\ddot{\phi}+2\dot{R}\dot{\phi})\boldsymbol{\phi}
+\ddot{z}{\bf z},$ (17.155)

the ray equation of motion Eq.(17.36):
$\displaystyle {\bf a}$ $\displaystyle =$ $\displaystyle \nabla \left(-\frac{c^2}{2}\frac{n_e({\bf r})}{n_c} \right)$ (17.156)

and the definition of the gradient operator in 3D cylindrical coordinates:
$\displaystyle \nabla$ $\displaystyle =$ $\displaystyle \left(\frac{\partial}{\partial R}\;,\;
\frac{1}{R}\frac{\partial}{\partial \phi}\;,\;
\frac{\partial}{\partial z}\right)$ (17.157)

The 3D cylindrical ray equation of motion equations become:
$\displaystyle {\ddot R} - R{\dot\phi}^2$ $\displaystyle =$ $\displaystyle n_R({\bf r})$ (17.158)
$\displaystyle R^2{\ddot\phi} + 2R{\dot R}{\dot\phi}$ $\displaystyle =$ $\displaystyle n_\phi({\bf r})$ (17.159)
$\displaystyle {\ddot z}$ $\displaystyle =$ $\displaystyle n_z({\bf r}),$ (17.160)

where $ ({\bf r})=(R,\phi,z)$ and:
$\displaystyle n_R({\bf r})$ $\displaystyle =$ $\displaystyle -\frac{c^2}{2n_c}\frac{\partial n_e({\bf r})}{\partial R}$ (17.161)
$\displaystyle n_\phi({\bf r})$ $\displaystyle =$ $\displaystyle -\frac{c^2}{2n_c}\frac{\partial n_e({\bf r})}{\partial \phi}$ (17.162)
$\displaystyle n_z({\bf r})$ $\displaystyle =$ $\displaystyle -\frac{c^2}{2n_c}\frac{\partial n_e({\bf r})}{\partial z}.$ (17.163)

Since $ \phi$ is given in dimensionless radian units, the units of the partial 3D cylindrical gradients of the electron number density are: $ n_R$ in cm/s$ ^2$, $ n_\phi$ in cm$ ^2$/s$ ^2$ and $ n_z$ in cm/s$ ^2$. The radial acceleration and velocity is $ \ddot R$ and $ \dot R$ in cm/s$ ^2$ and cm/s, the angular acceleration and velocity is $ \ddot \phi$ and $ \dot\phi$ in rad/s$ ^2$ and rad/s.

As the use of angular acceleration $ \ddot \phi$ and velocity $ \dot\phi$ is computationally somewhat inconvenient, we will transform these to the tangential acceleration $ {\dot v}_\phi$ and velocity $ v_\phi$ by using:

$\displaystyle v_\phi$ $\displaystyle =$ $\displaystyle {\dot\phi}R$ (17.164)
$\displaystyle {\dot v}_\phi$ $\displaystyle =$ $\displaystyle {\ddot\phi}R + {\dot\phi}{\dot R}$ (17.165)

Using the tangential variables, Eqs.(17.158-17.160) can be cast into the six 1st order ODEs:
$\displaystyle {\dot R}$ $\displaystyle =$ $\displaystyle v_R$ (17.166)
$\displaystyle {\dot\phi}$ $\displaystyle =$ $\displaystyle v_\phi/R$ (17.167)
$\displaystyle {\dot z}$ $\displaystyle =$ $\displaystyle v_z$ (17.168)
$\displaystyle {\dot v}_R$ $\displaystyle =$ $\displaystyle n_R({\bf r}) + v_\phi^2/R$ (17.169)
$\displaystyle {\dot v}_\phi$ $\displaystyle =$ $\displaystyle n_\phi({\bf r})/R - v_\phi v_R/R$ (17.170)
$\displaystyle {\dot v}_z$ $\displaystyle =$ $\displaystyle n_z({\bf r}).$ (17.171)

Even for constant electron number density gradients, there are no analytical solutions for $ R$ and $ \phi$ in time, and hence one must resort to numerical methods like Runge-Kutta (RK). Starting from initial values $ R,\phi,z,v_R,v_\phi,v_z$ inside a 3D cylindrical cell or on one of its walls, a RK time step is performed such that: 1) we stay inside the cell or hit one of its walls, 2) the RK step is within accuracy bounds and 3) is computationally stable. We would like to have a RK method which allows one to directly calculate the time it would take for one RK step to hit exactly the cell walls. This would avoid an iterative approach to the cell walls, which is computationally very time consuming. The second order RK2 methods are ideally suited for that purpose. A generic RK2 time step $ \Delta t$ on the above set of six 3D cylindrical ODE's gives new updated values (left updated variables, right old variables):
$\displaystyle R$ $\displaystyle =$ $\displaystyle R + \left(1-\frac{1}{2\alpha}\right)v_R{\Delta t}
+ \frac{1}{2\alpha}v_R^{*}{\Delta t}$ (17.172)
$\displaystyle \phi$ $\displaystyle =$ $\displaystyle \phi + \left(1-\frac{1}{2\alpha}\right)\left[\frac{v_\phi}{R}\right]{\Delta t}
+ \frac{1}{2\alpha}\left[\frac{v_\phi^{*}}{R^{*}}\right]{\Delta t}$ (17.173)
$\displaystyle z$ $\displaystyle =$ $\displaystyle z + \left(1-\frac{1}{2\alpha}\right)v_z{\Delta t}
+ \frac{1}{2\alpha}v_z^{*}{\Delta t}$ (17.174)

and new velocity values:
$\displaystyle v_R$ $\displaystyle =$ $\displaystyle v_R
+ \left(1-\frac{1}{2\alpha}\right)
\left[n_R({\bf r}) + \frac...
...2\alpha}
\left[n_R({\bf r}^{*}) + \frac{{v_\phi^{*}}^2}{R^{*}}\right]{\Delta t}$ (17.175)
$\displaystyle v_\phi$ $\displaystyle =$ $\displaystyle v_\phi
+ \left(1-\frac{1}{2\alpha}\right)
\left[\frac{n_\phi({\bf...
...{n_\phi({\bf r}^{*})}{R^{*}} - \frac{v_\phi^{*}v_R^{*}}{R^{*}}\right]{\Delta t}$ (17.176)
$\displaystyle v_z$ $\displaystyle =$ $\displaystyle v_z
+ \left(1-\frac{1}{2\alpha}\right)
n_z({\bf r}){\Delta t}
+ \frac{1}{2\alpha}
n_z({\bf r}^{*}){\Delta t},$ (17.177)

where $ ({\bf r}^{*})=(R^{*},\phi^{*},z^{*})$ and:
$\displaystyle R^{*}$ $\displaystyle =$ $\displaystyle R + \alpha v_R{\Delta t}$ (17.178)
$\displaystyle \phi^{*}$ $\displaystyle =$ $\displaystyle \phi + \alpha\left[v_\phi/R\right]{\Delta t}$ (17.179)
$\displaystyle z^{*}$ $\displaystyle =$ $\displaystyle z + \alpha v_z{\Delta t}$ (17.180)
$\displaystyle v_R^{*}$ $\displaystyle =$ $\displaystyle v_R + \alpha\left[n_R({\bf r}) + v_\phi^2/R\right]{\Delta t}$ (17.181)
$\displaystyle v_\phi^{*}$ $\displaystyle =$ $\displaystyle v_\phi + \alpha\left[n_\phi({\bf r})/R - v_\phi v_R/R\right]{\Delta t}$ (17.182)
$\displaystyle v_z^{*}$ $\displaystyle =$ $\displaystyle v_z + \alpha n_z({\bf r}){\Delta t}$ (17.183)

are the intermediate state values. $ \alpha$ is a numerical parameter and although it can in principle be set to any value for the generic RK2 scheme to be second order, by far the most common choice is to set $ \alpha$ in the range $[1/2,1]$. Due to the second order nature of the generic RK2 schemes, it allows us to find the $ \Delta t$ values to hit the cell walls by solving simple quadratic equations. Using Eqs.(17.172-17.174) and the intermediate values in Eqs.(17.178-17.183), we have $ \alpha$ independent radial and axial cell wall polynomials in $ \Delta t$ and an $ \alpha$ dependent angular cell wall polynomial:
$\displaystyle \Delta R$ $\displaystyle =$ $\displaystyle v_R{\Delta t} + \frac{1}{2}\left[n_R({\bf r}) + \frac{v_\phi^2}{R}\right]{\Delta t}^2$ (17.184)
$\displaystyle \Delta z$ $\displaystyle =$ $\displaystyle v_z{\Delta t} + \frac{1}{2}n_z({\bf r}){\Delta t}^2$ (17.185)
$\displaystyle \Delta \phi$ $\displaystyle =$ $\displaystyle \left[\frac{v_\phi}{R} - \alpha\frac{v_R\Delta\phi}{R}\right]{\De...
...right)\frac{v_\phi v_R}{R^2} + \frac{n_\phi({\bf r})}{2R^2}\right]{\Delta t}^2,$ (17.186)

where
$\displaystyle \Delta R$ $\displaystyle =$ $\displaystyle R_{wall} - R$ (17.187)
$\displaystyle \Delta \phi$ $\displaystyle =$ $\displaystyle \phi_{wall} - \phi$ (17.188)
$\displaystyle \Delta z$ $\displaystyle =$ $\displaystyle z_{wall} - z$ (17.189)

Note that these quadratic equations do not involve electron number density gradients at the intermediate points, like for example $ n_R({\bf r}^{*})$. Higher order RK methods will introduce these density gradients in expressions for $ R$, $ \phi$ and $ z$ thus leading to much more complex equations in $ \Delta t$. Also cell sizes, especially for highly refined domains, are usually small, hence a higher order RK method, besides having the problem of complicated radial, angular and axial $ \Delta t$ expansions, would involve lots of computational work for a modest gain in number of RK steps inside the cell. Typically one would need only up to three RK2 steps vs maybe one RK4 step inside a cell. The effort of a higher order RK method simply does not pay off.

Let us denote the minimum $ {\Delta t}$ (excluding zero) obtained from solving all six (two for each kind of cell wall) polynomials in Eqs.(17.184-17.186) as $ {\Delta t}^{RK2}$. Using $ {\Delta t}^{RK2}$ to obtain new positions via Eqs.(17.172-17.174) guarantees that we stay inside the cell, but using it to determine the intermediate state via Eqs.(17.178-17.180) might take us outside of the cell, i.e. $ (R^{*},\phi^{*},z^{*})\notin$cell. To enforce the intermediate state inside the cell, we determine also the minimum $ {\Delta t}$ (excluding zero) obtained from solving all six (two for each kind of cell wall) linear intermediate state equations in Eqs.(17.178-17.180) by replacing $R^{*},\phi^{*},z^{*}$ by the corresponding cell wall values. Denoting this minimum intermediate state linear time as $ {\Delta t}^{*}$, we form the allowed stepping time $ {\Delta t}_{step}$ inside the cell as:

$\displaystyle {\Delta t}_{step}$ $\displaystyle =$ $\displaystyle \min\left({\Delta t}^{RK2},{\Delta t}^{*}\right).$ (17.190)

$ {\Delta t}_{step}$ might take the ray to a cell wall ( ${\Delta t}^{RK2}\leq {\Delta t}^{*}$) or to a position still inside the cell ( ${\Delta t}^{RK2}>{\Delta t}^{*}$). The last case happens, when the intermediate state lays outside the cell and is an indication that the ray wants to turn into opposite direction for a particular coordinate (opposite velocity and acceleration direction), as shown in Figure 17.12.
Figure 17.12: Showing a ray turning inside a 3D cylindrical cell due to opposite radial velocity and acceleration direction.
Image Laser3DcylRayTurnInCell
Each bullet represents a different stepping time inside the cell. From Eqs.(17.178-17.180) we see, that generally $ {\Delta t}^{*}$ will be largest the smaller $ \alpha$ is, hence we choose $\alpha=1/2$ (midpoint RK2 method). The following 3D cylindrical ray tracing algorithm has been implemented: Currently the 3D cylindrical ray tracing has only been implemented for the cell average AVG algorithm (section 17.4.4.1), in which the radial, angular and axial electron number gradients in Eqs.(17.161-17.163) are constant for each cell. Only convex 3D cylindrical domains are treatable in which the angular domain coordinates extend over the full circle. Concave 3D cylindrical domains with partial angular domain coordinates are not planned for the near future. Error bounds in position are defined as a predefined fraction (tolerance) of cell dimensions along each coordinate. Error bounds in velocity are defined as a predefined fraction of the ray speed in the cell before applying the RK2 step. Rather than using the speed $\vert v\vert=\sqrt{v_R^2+v_\phi^2+v_z^2}$ itself, we use the sum of the absolute components $\vert v_R\vert+\vert v_\phi\vert+\vert v_z\vert$, as this is computationally much more economical to evaluate and is in error compared to the speed by a factor of $\sqrt{3}$ at most.

Rays crossing the 3D cylindrical axis at $R=0$ need a special treatment, as 3D cylindrical blocks touching the axis have no meaningful neighbor block information. To outline the strategy, we consider the 3D cylindrical axis to be hollow and to have a finite tiny radius $ R$ with no radial, angular and axial acceleration terms ( $n_R=n_\phi=n_z=0$). Figure 17.13 shows a ray crossing the hollow axis.

Figure 17.13: Showing a ray passing through a hollow 3D cylindrical axis.
Image Laser3DcylRayThroughAxis
If no acceleration terms are present, the 3D cylindrical ray equation of motion ODEs in Eqs.(17.166-17.171) have an exact analytical solution:
$\displaystyle R$ $\displaystyle =$ $\displaystyle \sqrt{(R + v_Rt)^2+v_\phi^2t^2}$ (17.191)
$\displaystyle \phi$ $\displaystyle =$ $\displaystyle \phi + \arctan{\left[v_\phi t/(R + v_Rt)\right]}$ (17.192)
$\displaystyle z$ $\displaystyle =$ $\displaystyle z + v_zt$ (17.193)
$\displaystyle v_R$ $\displaystyle =$ $\displaystyle \left[(v_R^2 + v_\phi^2)t + R\right] / \sqrt{(R + v_Rt)^2+v_\phi^2t^2}$ (17.194)
$\displaystyle v_\phi$ $\displaystyle =$ $\displaystyle v_\phi R / \sqrt{(R + v_Rt)^2+v_\phi^2t^2}$ (17.195)
$\displaystyle v_z$ $\displaystyle =$ $\displaystyle v_z,$ (17.196)

which represents movement along a straight line in 3D cylindrical coordinates. When a ray enters the hollow axis at $ R$, it must also exit the axis at $ R$, hence the time it takes to cross the hollow axis is the solution in $ t$ of Eq.(17.191):
$\displaystyle t$ $\displaystyle =$ $\displaystyle - \frac{2Rv_R}{(v_R^2 + v_\phi^2)},$ (17.197)

which, when inserted in Eqs.(17.191-17.196) gives:
$\displaystyle R$ $\displaystyle =$ $\displaystyle R$ (17.198)
$\displaystyle \phi$ $\displaystyle =$ $\displaystyle \phi + \arctan{\left[2v_Rv_\phi/(v_R^2 - v_\phi^2)\right]}$ (17.199)
$\displaystyle z$ $\displaystyle =$ $\displaystyle z + v_zt$ (17.200)
$\displaystyle v_R$ $\displaystyle =$ $\displaystyle - v_R$ (17.201)
$\displaystyle v_\phi$ $\displaystyle =$ $\displaystyle v_\phi$ (17.202)
$\displaystyle v_z$ $\displaystyle =$ $\displaystyle v_z.$ (17.203)

Taking the limit $R\rightarrow 0$ in which the hollow axis becomes a line, we have $ t=0$ and $ z$, $ v_\phi$, $ v_z$ retain their values while the radial velocity changes sign. The new angle after crossing the axis is only dependent on the initial angle and the radial and tangential angular velocity and given by Eq.(17.199). Figure 17.14 shows various cases of new ray angle values depending on initial radial and tangential angular velocities.
Figure 17.14: Various examples of new ray angle values depending on initial radial and tangential angular velocities.
Image Laser3DcylRayCrossingAxis
Direct application of Eq.(17.199) is, however, not recommended, due to the possibility of zero or near zero of the denominator $v_R^2 - v_\phi^2$, leading to computational overflow. We can transform the arctan expression into two equivalent forms by dividing both nominator and denominator by $v_R^2$ or $v_\phi^2$ and using the arctan addition formula, giving:
$\displaystyle \phi$ $\displaystyle =$ $\displaystyle \phi + 2 \arctan{\left[v_\phi/v_R\right]}$ (17.204)
$\displaystyle \phi$ $\displaystyle =$ $\displaystyle \phi - 2 \arctan{\left[v_R/v_\phi\right]}$ (17.205)

We apply the first equation whenever $v_R\geq v_\phi$ and the second otherwise, thus ensuring an arctan value $ \leq 1$ for all cases.


17.4.10 Synchronous and Asynchronous Ray Tracing

The default laser implementation uses the Lagrangian particle framework to move rays (modeled as particles) across block and process boundaries. We generally find that particle movement consumes a relatively small portion of total runtime even in large-scale simulations with millions of particles (Dubey et al., 2012). This is not always case in laser-driven FLASH simulations, where realistic simulations can spend the majority of runtime moving rays between blocks. The reason for the discrepancy in performance is because a ray generally travels a much larger distance than a tracer particle in a single time-step. In the tracer particle case we only need one application of the communication subroutines in a single time-step because a particle never moves further than a nearest-neighbor block (see CFL limit). In contrast a ray can move to the other side of the domain in a single time-step which may involve visiting hundreds of intermediate blocks.

There are various implementations of particle exchange in the GridParticles sub-unit. In order to understand the performance bottlenecks, we describe the default implementation used in FLASH applications configured with Paramesh. The particle movement communication consists of an MPI_Allreduce to determine the number of particles and messages that each MPI rank will receive, followed by point to point messages which exchange the actual particles. In addition to the communication in the GridParticles unit, there is also an MPI_Allreduce in the laser unit which determines if all the rays have either left the domain or have been absorbed. This is the termination criteria and controls whether another round of particle exchange is needed.

Performance profiles of simulations using the laser indicate that most time is spent in MPI_Allreduce calls in a particle movement subroutine. The trace in Figure 17.15 shows that this happens because of load imbalance and not slowness of the underlying collective communication call. The two main colors in this trace are red and purple; red is time spent in a computational kernel named ed_traceBlockRays2Dcyl3D and purple is time spent in MPI_Allreduce. We see that the color map is dominated by purple which indicates most MPI ranks are waiting for the MPI_Allreduce to complete. The cascading red pattern shows the ray paths as rays move between blocks assigned to different MPI ranks. It is clear that there are many round of bulk synchronous particle exchange and that some MPI ranks must wait a long time before receiving any rays.

Figure 17.15: Performance trace of MPI ranks 0-33 in a laser simulation using synchronous communication and run with 2048 MPI ranks on Intrepid BG/P.
Image LaserHpcToolKit2048Rsync
The biggest impediment to good performance is the dependency between MPI ranks which happens because ray paths are not known ahead of time. Performance quickly degrades at higher processor counts because increased domain decomposition means rays must cross more process boundaries. One approach to improve performance would be to use the mesh replication feature of FLASH to trivially parallelize over the rays, but this is not ideal because it introduces redundant computations of e.g. hydrodynamics in each mesh. The approach we take is to introduce pipelined parallelism to hide the dependency between MPI ranks; rays are sent to nearest-neighbor blocks on different MPI ranks as early as possible to minimize the time MPI ranks must wait for work. Computation and communication are overlapped, which is different to the current bulk synchronous approach where there are distinct computation and communication phases. The “asynchronous ray trace” is included in a FLASH application by setting up with the shortcut +asyncLaser. We describe the ray exchange and termination communication kernels in the asynchronous scheme below.

The ray exchange communication kernel requires all MPI ranks set up designated send and receive communication channels with only those MPI ranks which own nearest-neighbor blocks. This is a relatively small subset because of the block locality in Paramesh. We post one speculative non-blocking receive for each neighbor and then test all communication channels (at a rate determined by runtime parameter ed_commRaysBetweenMsgTest) for new rays. When an MPI rank receives a message we copy the rays and then post a new speculative non-blocking receive. This ensures that a message can be received from any nearest neighbor at any time. We send rays using non-blocking sends from source MPI ranks when send buffers (of size determined by runtime parameter ed_commChannelSize) are full. At idle phases, we also send rays before the send buffer is full to prevent deadlock and ensure good global progress. When exchange is complete all MPI ranks send a zero byte message to each neighboring MPI rank to match the remaining speculative receives.

The termination communication kernel must also be non-blocking to allow progress in the ray exchange communication kernel. This rules out using a blocking MPI_Allreduce as is used in the default synchronous scheme. We have multiple non-blocking implementations of the termination kernel which all have different library and MPI dependencies. The default implementation, which has no special dependencies, uses a master-slave approach to determine completion. It works by sending a count of local inactive rays from slaves to a master which maintains a count of global inactive rays. We send a notification message from the master to the slaves when the global number of inactive rays is equal to the initial count of active rays. Even though we have demonstrated that this implementation can work when using 8192 MPI ranks on Intrepid BG/P, there is a danger of exceeding internal MPI limits on the number of messages the master rank can receive. For this reason we have created another implementation which uses a non-blocking MPI_Iallreduce which is part of MPI-3. We recommend using this implementation because there is less danger of exceeding internal MPI limits. It can be used by setting up a FLASH application with the setup shortcut +mpi3. Given that the MPI-3 standard is still quite new, it is likely that some MPI implementations on today's production machines do not support non-blocking collectives. Therefore, we also support the non-blocking reduction in the Non Blocking Collectives (NBC) library http://htor.inf.ethz.ch/research/nbcoll/libnbc/. The necessary reduction name substitutions happen when a FLASH application is setup with +libnbc.

We repeat the experiment in Figure 17.15 with the asynchronous communication scheme and show results in Figure 17.16. The number of ranks on the y-axis and time scale on the x-axis are the same as before. The color map is much more chaotic than before because all MPI ranks are either computing or spinning in various subroutines looking for more work. As before, the red color indicates time spent in a computational kernel named ed_traceBlockRays2Dcyl3D. The red is now much more clustered on the left side of the figure indicating MPI ranks wait less time for rays than before. We see large segments of green for MPI rank 0 which indicates time spent testing for new messages. This is happening because we made use of the master-slave termination kernel in this test. A performance comparison of the synchronous and asynchronous communication schemes is shown in Figure 17.17.

Figure 17.16: Performance trace of MPI ranks 0-33 in a laser simulation using asynchronous communication and run with 2048 MPI ranks on Intrepid BG/P.
Image LaserHpcToolKit2048Rasync
Figure 17.17: Strong scaling of a laser simulation run on Intrepid BG/P with the synchronous and asynchronous communication scheme.
Image LaserPerformanceSynchAsynch
The parameters ed_commChannelSize and ed_commRaysBetweenMsgTest give some control over the memory overhead and performance of the asynchronous scheme. The ed_commChannelSize parameter should be kept relatively small because it determines the amount of space available for rays in send and receive buffers. Large values lead to a very high memory footprint because there is a designated portion of space in the buffers for all communication channels (i.e. all possible pairs of communicating processors) and there can be 10-100s of communication channels. The parameter also determines the number of rays which are buffered before being sent from source processors to destination processors, and so smaller values will improve overall load balance. All messages are non-blocking and so the source processor returns immediately to computation and is free to send rays to different destination processors. However, the source processor will block (actually spin on various MPI_Test statements with no danger of deadlock) if a ray should be sent to the original processor and the original message is not yet delivered. This means it is important for the destination processor to frequently check for new messages not only to start work as soon as possible but also to enable the source processor to return to useful work. The parameter ed_commRaysBetweenMsgTest should therefore be kept relatively small so that the destination processor performs less computational work between message checks.

We set the default values of ed_commRaysBetweenMsgTest=50 and ed_commChannelSize=100. Very small values for both parameters improve load balance but introduce new overheads such as the cost of continually testing for new messages and the cost of preparing the laser computation routines for the newly received rays. The preparation work includes sorting the rays in block order and then scanning the ray array to find the start and end index for each block. Sorting improves the memory-access pattern and is a sensible optimization when computational time is large relative to the sort time (as always happens in the original synchronous communication scheme), however, it hurts performance when computational work is very fine-grained. Hence, we do not advise using parameter values much smaller than the default values. In future it would be interesting to see if removing the serial optimization improves the global time to solution. This could happen because more processors are doing useful work despite there being worse memory-access on each processor.

The asynchronous communication scheme conflicts with various capabilities in FLASH. It only works with Paramesh at the current time, although support for UG could probably be added without a huge amount of effort. We initially did not support UG because we did not know the MPI rank IDs of the corner neighboring blocks, although this is no longer the case. It also conflicts with the Laser I/O.

The asynchronous communication scheme is able to run without the GridParticles dependency by setting up with useGridParticles=False. This is generally a good idea because the GridParticles sub-unit has initialization code which allocates large send and receive buffers for bulk synchronous communication. These buffers are not needed in the asynchronous scheme and so removing the dependency significantly reduces the memory footprint. The only down-side is that a slower sorting subroutine will be used in place of Grid_SortParticles, which is able to achieve a fast sort by using the space in the bulk synchronous communication buffers.

The new asynchronous ray tracing implementation works and has been tested with several laser problems using multiple compilers and MPI implementations. However, since this is a rather new implementation, the asynchronous ray tracing mode should still be considered as being in its experimental stage. Both synschronous and asynchronous ray tracing modes are currently made available, with synchronous mode being the default. The asynchronous mode can be activated by using the appropriate setup parameter.


17.4.11 Usage

To include the use of the Energy Deposition unit, the following should be included into the setup line command:  

+laser ed_maxPulses=<number> ed_maxPulseSections=<number> ed_maxBeams=<number>
The +laser is a shortcut that handles all the logistics for properly including the EnergyDeposition unit. By default, the ray tracing algorithm used is the AVG algorithm 17.4.4.1 in its synchronous communication mode 17.4.10. To activate the cubic interpolation ray tracing schemes, replace the +laser shortcut by the shortcut +laserCubicInterpolation. This automatically includes the relevant routines for cubic interpolation and bypasses the ones responsible for the AVG algorithm. The default ray tracing method for the cubic interpolation is the piecewise parabolic ray tracing method 17.4.4.2. To activate the cubic interpolation using the Runge Kutta integrator one has to set the shortcut +laserCubicInterpolationRK. To activate the asynchronous ray tracing communication mode, replace the +laser shortcut by the shortcut +asynchLaser. This enables the asynchronous communication routines and supresses the synchronous ones. Only the AVG ray tracing algorithm is currently implemented in asynchronous communication mode. The other three setup variables fix the dimensions needed for the pulses and beams: The Energy Deposition unit reads all the information it needs to construct the laser beams and pulses from runtime parameters specified in the flash.par file. Below is the list of runtime parameters that is needed to properly build the laser. For clarification and figuring out the input to a particular laser simulation, the reader is encouraged to have Figures 17.5, 17.6 and 17.7 at hand.

17.4.11.1 Laser Pulses Runtime Parameters

17.4.11.2 Laser Beams Runtime Parameters

17.4.11.3 Laser General Runtime Parameters

Simulating 3D Cylindrical Laser Beam Intensities using 1D/2D Cartesian Grids: Per implementation, in FLASH the concept of a cell volume for lower dimensional grids consists in setting a unit length for each missing dimension. For example, in 1D cartesian, if a 1D cell has length $ \Delta X$ cm, then the corresponding cell volume as used in FLASH would be: $ \Delta X$ cm x 1 cm x 1cm = $ \Delta X$ cm$ ^3$. This results in different beam intensities when using the same beam power for 1D/2D/3D cartesian laser simulations. To make these simulations comparable with equal beam intensities, one can either adjust the individual beam powers or one can use the same power and set the runtime parameter ed_adjustBeamsTargetIntensity equal to true. The latter will then internally determine the adjustment factor to be used for 1D/2D cartesian beams to emulate a 3D cylindrical laser beam with radius equal to the beam's ed_targetSemiAxisMajor. In case one wants to adjust the 1D/2D cartesian beam powers ($ P_{1D}$,$ P_{2D}$), the following conversions should be used:

$\displaystyle P_{1D}[W/cm^2]$ $\displaystyle =$ $\displaystyle P_{3D}[W]/\pi r^2$ (17.206)
$\displaystyle P_{2D}[W/cm]$ $\displaystyle =$ $\displaystyle 2 P_{3D}[W]/\pi r,$ (17.207)

where $ r$ is the beam's radius (ed_targetSemiAxisMajor).

Simulating Axisymmetric Cylindrical Laser Beam Intensities using 1D Cylindrical Grids: For these kind of simulations, the user needs to adjust the 1D cylindrical beam power by a factor that mimics an axisymmetrical cylindrical laser beam. The beams ed_targetSemiAxisMajor is now irrelevant, but instead the radial position of the target and the laser intensity on target ($ I$) should be considered. The 1D laser cylindrical beam power ( $ P_{1D}^{cyl}$) should be given as:

$\displaystyle P_{1D}^{cyl}[W/cm]$ $\displaystyle =$ $\displaystyle 2\pi r_t I[W/cm^2],$ (17.208)

where $ r_t$ is the radial position of the target ed_targetX.


17.4.11.4 LaserIO Runtime Parameters and Usage

Often times it is useful to be able to visualize the paths of the individual rays in a simulation. This can be very helpful in ensuring that the laser beams are set up in the intended manner. The LaserIO directory in the source tree located at:  

source/physics/sourceTerms/EnergyDeposition/EnergyDepositionMain/Laser/LaserIO
By default, this unit is requested by the Laser package (for example, when the +laser setup shortcut is specified).

The LaserIO package gives users the ability to write the trajectory of a certain number of rays to plot files when normal FLASH plot file writes occur. This feature is only compatible with parallel HDF5 IO implementations (for example, with +parallelIO or +hdf5typeio). Only three runtime parameter options need to be set to use LaserIO. These are:

The ed_laserIOMaxNumberOfRays parameter sets the approximate maximum number of rays to write out. The exact number of rays that will be written is equal to:

$\displaystyle \max \left[ \mathrm{int} \left( \frac{N_\mathrm{rays}}{\mathrm{ed\_laserIOMaxNumberOfRays}} \right), 1 \right],$    

where $ N_\mathrm{rays}$ is the sum of the number of rays to launch for each beam. The LaserIO ray information is written to the normal HDF5 plot files in the RayData dataset. The extract_rays.py script that is distributed with FLASH in the tools/scripts directory can be used to extract this data. This is a python script which requires the NumPy and PyTables python packages to operate. The script takes as arguments a list of plot files. For each plot file, the script will generate a corresponding VTK file which can be loaded into VisIt. The RayPower_Watts Pseudocolor in VisIt can be plotted to show the ray trajectories. The rays are colored based on their power (in watts). See the LaserSlab simulation Sec:LaserSlab for an example which uses LaserIO.


17.4.11.5 Laser Energy Density Output

Laser energy density (energy per volume - see 17.4.3) is very useful for visualizing lasers in the simulation space and can be output to the variable “lase” in checkpoint and plot files. This is an experimental feature in FLASH 4.4 and is implemented only for 3D Cartesian geometry. To enable “lase” output, edit your simulation's Config file (e.g., for the LaserSlab example, Simulation/SimulationMain/LaserSlab/Config) to include the line:  

VARIABLE lase TYPE: PER_VOLUME
Then setup and build FLASH to use 3D Cartesian ray tracing (otherwise, the added variable is ignored).

17.4.12 Unit Tests for 3D/2D Cartesian Domain Geometries

These unit tests set up a series of beams that launch rays onto a domain with a particular electron number density and electron temperature distribution in such a way that an analytical solution is possible for the ray paths and their power deposited inside the domain. The tests are 3D cartesian extensions to the 2D analytical test solution for the quadratic trough as presented in the paper by Kaiser (2000). Figure 17.18 shows the basic structure. 2D cartesian versions of the tests is also available.

Figure: One ray moving through an ellipsoidal tube with a quadratic electron number density profile and reaching a focal point $ {\bf F}$.
Image LaserUnitTest

17.4.12.1 Analytic Path Solution for the Ellipsoidal Quadratic Potential Tube

Let us define a quadratic ellipsoidal electron number density profile in a 3D space along the xz-plane and having no variation along the y-axis:

$\displaystyle n_e({\bf r})$ $\displaystyle =$ $\displaystyle n_w + A(x-x_w)^2 + B(z-z_w)^2,$ (17.209)

Here $ x_w$ and $ z_w$ are the center coordinates of the ellipsoidal tube cross section, $ n_w$ is the value of the electron number density at the center of the ellipse and $ A,B$ are the scaling factors for the individual components. The values of $ A$ and $ B$ are determined by the boundary conditions of $ n_e({\bf r})$. For example, if one wants $ n_e({\bf r})$ to be equal to a critical value $ n_c$ at specific coordinates $ x_c$ and (separately) $ z_c$, then we have:
$\displaystyle A$ $\displaystyle =$ $\displaystyle (n_c-n_w)/(x_c-x_w)^2$ (17.210)
$\displaystyle B$ $\displaystyle =$ $\displaystyle (n_c-n_w)/(z_c-z_w)^2.$ (17.211)

Using the ray equation of motion
$\displaystyle \frac{d^2{\bf r}}{dt^2}$ $\displaystyle =$ $\displaystyle \nabla \left(-\frac{c^2}{2}\frac{n_e({\bf r})}{n_c} \right),$ (17.212)

we can solve for the position vector $ {\bf r}$ under several circumstances (boundary conditions). The first thing to note is that $ n_e({\bf r})$ does not contain mixed variable terms, hence the differential ray equation is separable into the individual components. Let us take the x-component. We have
$\displaystyle \frac{d^2x}{dt^2}$ $\displaystyle =$ $\displaystyle - \frac{c^2}{2n_c}\frac{d}{dx}A(x-x_w)^2$  
  $\displaystyle =$ $\displaystyle - k(x-x_w),$ (17.213)

where $ k=c^2A/n_c$. This is the differential equation for a simple harmonic oscillator around the point $ x_w$. The boundary conditions are such that at time $ t=0$ the ray will be at an initial position $ x_0$ and will posses a certain velocity component $ v_x$
$\displaystyle x$ $\displaystyle =$ $\displaystyle x_0  ($$\displaystyle \mbox{at $t = 0)$}$ (17.214)
$\displaystyle \frac{dx}{dt}$ $\displaystyle =$ $\displaystyle v_x  ($$\displaystyle \mbox{at $t = 0)$}$$\displaystyle .$ (17.215)

To solve the simple harmonic oscillator equation, we make the ansatz
$\displaystyle x(t)$ $\displaystyle =$ $\displaystyle x_w + (x_0-x_w)\cos (\beta t) + b\sin (\beta t)$ (17.216)

and find the expressions for $ b$ and $ \beta$ satisfying the boundary conditions. We have
$\displaystyle \frac{dx}{dt}$ $\displaystyle =$ $\displaystyle - \beta (x_0-x_w)\sin (\beta t) + b\beta \cos (\beta t)$ (17.217)
$\displaystyle \frac{d^2x}{dt^2}$ $\displaystyle =$ $\displaystyle - \beta^2 (x_0-x_w)\cos (\beta t) - b\beta^2 \sin (\beta t)$  
  $\displaystyle =$ $\displaystyle - \beta^2 (x - x_w)$ (17.218)

from which we deduce that
$\displaystyle \beta$ $\displaystyle =$ $\displaystyle \sqrt{k}$ (17.219)
$\displaystyle b$ $\displaystyle =$ $\displaystyle \frac{v_x}{\beta}.$ (17.220)

The solution to the complete ray equation can thus be stated as
$\displaystyle x(t)$ $\displaystyle =$ $\displaystyle x_w + (x_0-x_w)\cos (\sqrt{k} t) + \frac{v_x}{\sqrt{k}} \sin (\sqrt{k} t)$ (17.221)
$\displaystyle y(t)$ $\displaystyle =$ $\displaystyle y_0 + v_yt$ (17.222)
$\displaystyle z(t)$ $\displaystyle =$ $\displaystyle z_w + (z_0-z_w)\cos (\sqrt{k'} t) + \frac{v_z}{\sqrt{k'}} \sin (\sqrt{k'} t),$ (17.223)

where $ {\bf r}_0=(x_0,y_0,z_0)$ is the ray's initial position, $ {\bf v}_0=(v_x,v_y,v_z)$ the initial velocity and
$\displaystyle k$ $\displaystyle =$ $\displaystyle \frac{c^2A}{n_c}$ (17.224)
$\displaystyle k'$ $\displaystyle =$ $\displaystyle \frac{c^2B}{n_c}.$ (17.225)

Let us now define a focal point $ {\bf F}$, where all rays launched will meet. The obvious choice is a focal point along the tube's center line $ (x_w,z_w)$. Without specifying the y-coordinate of this point yet, we ask the question: at what times $ t$ and $ t'$ will $ x(t)=x_w$ and $ z(t')=z_w$ for the first time after launch? From the ray equation solutions we get
$\displaystyle t$ $\displaystyle =$ $\displaystyle \frac{\arctan [-\sqrt{k}(x_0-x_w)/v_x]}{\sqrt{k}}$ (17.226)
$\displaystyle t'$ $\displaystyle =$ $\displaystyle \frac{\arctan [-\sqrt{k'}(z_0-z_w)/v_z]}{\sqrt{k'}},$ (17.227)

where the angular region of interest for the tangent is $ 0\leq \theta \leq \pi$. It is interesting that there is both a lower and an upper limit to these times. While the lower limit of $ t\rightarrow 0$ makes sense (if we shoot the ray with infinite velocity towards the tube's center) the upper limit of $ t\rightarrow \pi/\sqrt{k}$ is less intuitive: there is no escape from the tube no matter what initial outward velocity we give the ray due to increasing pulling back force towards the center as the ray moves away from it. If the velocity component is zero we have $ t=\pi/(2\sqrt{k})$.

In general, both times $ t$ and $ t'$ will be different. What this means is that the ray will not hit the focal point. For that to occur, both times must be equal, hence we must have

$\displaystyle \frac{\arctan [-\sqrt{k}(x_0-x_w)/v_x]}{\sqrt{k}}$ $\displaystyle =$ $\displaystyle \frac{\arctan [-\sqrt{k'}(z_0-z_w)/v_z]}{\sqrt{k'}}.$ (17.228)

Given $ A$ and $ B$ values (and thus $ k$ and $ k'$ values via equations 17.224 and 17.225) defining the tube's electron number density layout, this last relation imposes a condition among the variables $ x_0,z_0,v_x,v_z$. Only if this condition is fulfilled will a focal point at $ (x_w,y_{\bf F},z_w)$ be reached. The y-coordinate $ y_{\bf F}$ of the focal point is then obtained from the y-solution 17.222 of the ray equation by inserting the value of the time $ t$ obtained through either side of 17.228. For the unit test we must identify $ y_{\bf F}$ to be sitting on the domain boundary. Hence when launching a series of ray's for the unit test, all $ y_{\bf F}$ must be the same. A first scheme I thus emerges for setting up a very general unit test:
SCHEME I
A second simplified scheme II (and the one that is currently implemented in FLASH) has all rays launched vertically onto the xz-face of the domain. This means that $ v_x=v_z=0$. By looking at the time condition 17.228, we see that the only possibility for a focal point in this case is if $ k=k'$, and the time associated with it is equal to $ t=\pi/(2\sqrt{k})$. This means that we must have $ A=B$ (a circular tube). Scheme II is set up as follows:
SCHEME II
Note, that for scheme II there is no restriction on the initial ray positions $ (x_0,z_0)$ like in scheme I. Rays launched vertically from anywhere onto the xz-plane with the same $ v_y$ velocity will reach the same focal point $ {\bf F}$. Their paths might be of different lengths, but the time it takes for them to reach $ {\bf F}$ is always the same.

17.4.12.2 Analytic Power Deposition Solution for the Ellipsoidal Quadratic Potential Tube

According to equation 17.47, the power remaining after the ray exits the ellipsoidal tube is

$\displaystyle P_{exit}$ $\displaystyle =$ $\displaystyle P_0 \exp \left\{ - \int_0^{t_{cross}}\nu_{ib}[{\bf r}(t)] dt \right\},$ (17.229)

where $ P_0$ and $ P_{exit}$ denote the ray's power at the tube's entry and exit point and $ t_{cross}$ is the time it takes for the ray to reach the focal point $ {\bf F}$. From 17.46, the inverse-Bremsstrahlung rate is
$\displaystyle \nu_{ib}[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle \frac{4}{3} \left (\frac{2 \pi}{m_e} \right)^{1/2} \frac{e^4}{n_c...
...{Z[{\bf r}(t)]n_e[{\bf r}(t)]^2\ln \Lambda[{\bf r}(t)]}{T_e[{\bf r}(t)]^{3/2}}.$ (17.230)

Note the dependence of $ Z$ on position, since we are dealing with the entire domain and not with individual cells. In order to obtain an analytical integrand that is easy to integrate, the first assumption that we make is that both $ Z$ and $ \Lambda$ are independent of position. For convenience we chose:
$\displaystyle Z[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle 1$ (17.231)
$\displaystyle \ln \Lambda[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle 1.$ (17.232)

To change the quadratic dependence on the electron number density to a linear dependency, we set the electron temperature as
$\displaystyle T_e[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle T_w\left[\dfrac{n_e[{\bf r}(t)]}{n_w}\right]^{2/3}$ (17.233)

with a specific electron temperature $ T_w$ at the elliptical origin. Using these choices we obtain
$\displaystyle \nu_{ib}[{\bf r}(t)]$ $\displaystyle =$ $\displaystyle \frac{4}{3} \left (\frac{2 \pi}{m_e} \right)^{1/2}
\frac{n_we^4}{n_ck^{3/2}{T_w}^{3/2}}n_e[{\bf r}(t)]$  
  $\displaystyle =$ $\displaystyle \nu_{ib,w} \frac{n_e[{\bf r}(t)]}{n_w},$ (17.234)

where
$\displaystyle \nu_{ib,w}$ $\displaystyle =$ $\displaystyle \frac{4}{3} \left (\frac{2 \pi}{m_e} \right)^{1/2}
\frac{n_w^2e^4}{n_ck^{3/2}{T_w}^{3/2}}$ (17.235)

is the inverse-Bremsstrahlung rate at the origin. The integral in the exponential becomes
$\displaystyle \int_0^{t_{cross}}\nu_{ib}[{\bf r}(t)] dt$ $\displaystyle =$ $\displaystyle \frac{\nu_{ib,w}}{n_w}\int_0^{t_{cross}}n_e[{\bf r}(t)] dt$  
  $\displaystyle =$ $\displaystyle \frac{\nu_{ib,w}}{n_w}\int_0^{t_{cross}} n_w + A(x[t]-x_w)^2 + B(z[t]-z_w)^2  dt.$ (17.236)

Inserting the analytical path solutions 17.221 and 17.223, we can perform the integration. Let us concentrate on the x-component part only. We are lead to an integral of the form
$\displaystyle \int_0^{t_{cross}} (x[t]-x_w)^2  dt$ $\displaystyle =$ $\displaystyle \int_0^{t_{cross}}
\left[(x_0-x_w)\cos (\sqrt{k} t) + \frac{v_x}{\sqrt{k}} \sin (\sqrt{k} t) \right]^2  dt$  
  $\displaystyle =$ $\displaystyle \frac{v_x(x_0-x_w)}{2k} + \frac{v_x^2+k(x_0-x_w)^2}{2k}t_{cross},$ (17.237)

where $ t_{cross}$ is given by the expression in 17.226. For this integration, known sine and cosine integration rules have been used as well as the fact that $ \sin (\arctan a) = a/\sqrt{1+a^2}$ and $ \cos (\arctan a) = 1/\sqrt{1+a^2}$. The z-component integration is similar. Collecting everything together and using the expressions for $ k$,$ k'$,$ A$ and $ B$ from 17.224, 17.225, 17.210 and 17.211 we obtain
$\displaystyle \int_0^{t_{cross}}\nu_{ib}[{\bf r}(t)] dt$ $\displaystyle =$ $\displaystyle \nu_{ib,w} \left[t_{cross}+\frac{n_c}{2c^2n_w}
\left\{v_x(x_0-x_w)+v_z(z_0-z_w)+(v_x^2+v_z^2)t_{cross}\right\} \right.$  
    $\displaystyle \left.+\frac{n_c-n_w}{2n_w}
\left\{\frac{(x_0-x_w)^2}{(x_c-x_w)^2}+\frac{(z_0-z_w)^2}{(z_c-z_w)^2}\right\}t_{cross}\right].$ (17.238)

This is the general expression of the exponent integral for evaluating the power deposition of the rays for scheme I. For the circular tube of scheme II we have $ v_x=v_z=0$, $ t_{cross}=(\pi/2c)\sqrt{n_c/A}$, $ A=B$, and the exponent integral simplifies to
$\displaystyle \int_0^{t_{cross}}\!\!\nu_{ib}[{\bf r}(t)]  dt$ $\displaystyle =$ $\displaystyle \nu_{ib,w}t_{cross}
\left[1+\frac{A}{2n_w}\left\{(x_0-x_w)^2+(z_0-z_w)^2\right\}\right].$ (17.239)

If all rays start with the same power and imposing equal power deposition for all rays, then the term inside the curly brackets must be equal to a constant value
$\displaystyle (x_0-x_w)^2+(z_0-z_w)^2$ $\displaystyle =$ $\displaystyle R^2.$ (17.240)

This condition implies that the rays have to be launched on a circle with radius $ R$ around the tube origin $ (x_w,z_w)$. This ensures equal path lengths for all rays and hence equal power deposition.

17.4.12.3 Unit Tests Parameters and Results

The current Energy Deposition unit test coded into FLASH is based on the analytic solutions for scheme II. The circular quadratic potential tube is defined through the following parameters and constants:

Parameters Constants in FLASH =  
$ x_c,z_c$ = $ 10$ cm $ m_e$ = $ 9.10938215\times 10^{-28}$ g
$ x_w,z_w$ = $ 5$ cm $ e$ = $ 4.80320427\times 10^{-10}$ esu
$ R$ = $ 3$ cm $ k$ = $ 1.3806504\times 10^{-16}$ erg/K
$ \lambda$ = $ 1\times 10^{-4}$ cm $ c$ = $ 2.99792458\times 10^{10}$ cm/sec
$ v_y$ = $ c$  
$ T_w$ = $ 10$ keV  
$ P_0$ = $ 1$ erg/sec  
With these values we obtain the following additional variable values (in parenthesis are the equations used), completing the picture:
$ n_c$ = $ 1.1148542\ldots\times 10^{21}$ cm$ ^{-3}$ (17.35)
$ n_w$ = $ n_c/2$  
$ t_{cross}$ = $ 5\pi/c\sqrt{2}$ sec (17.226 with $ v_x=0$)
$ y_{\bf F}$ = $ 5\pi/\sqrt{2}$ cm (17.222)
$ \nu_{ib,w}$ = $ 8.1002987\ldots\times 10^{8}$ sec$ ^{-1}$ (17.235)
$ P_{exit}$ = $ 0.7017811\ldots$ erg/sec (17.239 and 17.229)
Evaluation of each cell's average electron number density is done by integrating equation 17.209 over the entire cell xz-area
$\displaystyle \langle n_e \rangle$ $\displaystyle =$ $\displaystyle \frac{\displaystyle \int_{z_1}^{z_2}\int_{x_1}^{x_2}n_e(x,y)  dx  dz}{(x_2-x_1)(z_2-z_1)}$  
  $\displaystyle =$ $\displaystyle n_w + A({\overline x}-x_w)^2 + A({\overline z}-z_w)^2 + \frac{A}{3}(\Delta x^2 + \Delta z^2),$ (17.241)

where $ x_1,z_1$ are the lower and $ x_2,z_2$ the upper corresponding cell edge coordinates and
$\displaystyle {\overline x}$ $\displaystyle =$ $\displaystyle \frac{x_1+x_2}{2}$ (17.242)
$\displaystyle \Delta x$ $\displaystyle =$ $\displaystyle \frac{x_1-x_2}{2}$ (17.243)

with corresponding equations for the z-part. The average cell's electron temperature is evaluated directly from the average electron number density using 17.233
$\displaystyle \langle T_e \rangle$ $\displaystyle =$ $\displaystyle T_w\left[\dfrac{\langle n_e \rangle}{n_w}\right]^{2/3}.$ (17.244)

The gradients are obtained from the cell average values using the symmetrized derivative formula (shown here for the $ i$-th cell x-component of the electron number density gradient)
$\displaystyle \frac{\partial}{\partial x} n_e (i)$ $\displaystyle =$ $\displaystyle \frac{\langle n_e \rangle_{i+1} - \langle n_e \rangle_{i-1}}{d_{i-1,i+1}},$ (17.245)

where $ d_{i-1,i+1}$ is twice the distance between neighboring cells on the x-axis.
TEST I
For test I, a series of 8 rays (from 8 beams with 1 ray each) with radial distances of $ 3cm$ from $ (x_w,z_w)=(5cm,5cm)$ are launched with the following initial $ (x_0,z_0)$ coordinates (all in cm):
Ray $ x_0$ $ z_0$
1 $ 8$ $ 5$
2 $ 5$ $ 8$
3 $ 2$ $ 5$
4 $ 5$ $ 2$
5 $ 5+3/\sqrt{2}$ $ 5+3/\sqrt{2}$
6 $ 5+3/\sqrt{2}$ $ 5-3/\sqrt{2}$
7 $ 5-3/\sqrt{2}$ $ 5+3/\sqrt{2}$
8 $ 5-3/\sqrt{2}$ $ 5-3/\sqrt{2}$
Results are shown only for the ray 1 and ray 5, the other ones are related to these two by symmetry. Also $ z_{exit}$ is not given, because for ray 1 we have $ z_{exit}=5cm$ and for ray 5 we have $ z_{exit}=x_{exit}$. The percentage errors were calculated as $ \vert x_{exit}-x_w\vert/x_w$ and likewise for the power deposition.
Refinement Level $ x_{exit}$ % error $ P_{exit}$ % error
Ray 1
1 $ 5.04894$ $ 0.98$ $ 0.69487$ $ 0.98$
2 $ 5.16343$ $ 3.27$ $ 0.69864$ $ 0.45$
3 $ 5.02737$ $ 0.55$ $ 0.70107$ $ 0.10$
4 $ 4.98971$ $ 0.21$ $ 0.70185$ $ 0.01$
5 $ 5.00004$ $ 0.00$ $ 0.70176$ $ 0.00$
6 $ 5.00234$ $ 0.05$ $ 0.70174$ $ 0.00$
Ray 5
1 $ 5.35171$ $ 7.03$ $ 0.69283$ $ 1.23$
2 $ 4.77272$ $ 4.55$ $ 0.70200$ $ 0.03$
3 $ 4.96221$ $ 0.76$ $ 0.70115$ $ 0.09$
4 $ 4.94433$ $ 1.11$ $ 0.70204$ $ 0.04$
5 $ 4.95092$ $ 0.98$ $ 0.70235$ $ 0.08$
6 $ 4.97987$ $ 0.40$ $ 0.70198$ $ 0.02$
Analytical
$ \infty$ $ 5.00000$   $ 0.70178$  
TEST II
The second test is also based on scheme II and aims at launching a large number of rays using only one beam centered along the tube's origin with a specific cross section power function, such that all rays reaching the focal point have an equal exit power. Since the power deposited depends on the ray path length, the cross section power function must be such that each ray gets the appropriate initial power to account for the differences in path lengths. For a specific ray we have for the exit power
$\displaystyle P_{exit,ray}$ $\displaystyle =$ $\displaystyle {\overline P} \frac{w_{ray}}{\sum w_{ray}}
\exp \left\{ - \nu_{ib,w}  t_{cross}\left[1+\frac{AR_{ray}}{2n_w}\right] \right\},$ (17.246)

where equations 17.239, 17.229 and 17.108 have been used and $ R_{ray}=(x_0-x_w)^2+(z_0-z_w)^2$ is the ray's radial distance from the tube origin. If we set the ray weighting function equal to the inverse gaussian function
$\displaystyle w_{ray}$ $\displaystyle =$ $\displaystyle e^{\displaystyle +\frac{A\nu_{ib,w}t_{cross}}{2n_w}R_{ray}^2},$ (17.247)

we obtain
$\displaystyle P_{exit,ray}$ $\displaystyle =$ $\displaystyle \frac{{\overline P}}{Z} e^{- \nu_{ib,w}  t_{cross}},$ (17.248)

where we have used $ Z$ for the power partition function $ \sum w_{ray}$. For a particular number of rays per beam, $ Z$ is constant and therefore the exit power of all rays are equal. To get independent of the number of rays, the unit test records the partition scaled exit power $ P_{exit,ray}Z$ for each ray as well as their focal $ (x_{exit},z_{exit})$ coordinates. A maximum square radial deviation
$\displaystyle \Delta R_{focal}^2$ $\displaystyle =$ $\displaystyle \max \left\{(x_{exit}-x_w)^2+(z_{exit}-z_w)^2\right\}$ (17.249)

from the focal point is then defined and recorded. The maximum absolute deviation in the partition scaled power is registered as
$\displaystyle \Delta P_{focal}$ $\displaystyle =$ $\displaystyle \max \vert P_{exit,ray}Z-{\overline P}e^{- \nu_{ib,w}  t_{cross}}\vert.$ (17.250)

Both quantities are then checked against tolerance values.

17.4.13 Unit Tests for 3D Cylindrical Domain Geometries

In these unit tests a ring of laser rays is shot on to a 3D cylindrical can and their domain exit properties are recorded and analyzed.

17.4.13.1 Laser Ring on Mantle of 3D Cylindrical Can with No Domain Acceleration

This test reveals how well a straight line movement by the rays is maintained in 3D cylindrical coordinates. As rays enter the 3D cylindrical domain, their radial and tangential angular velocity change continuously along their straight line path through the can. On exit the laser ring should have the same radius as on entry and all rays should have the same initial speed, with deviations depending on the domain refinement level. Figure 17.19 shows schematically the unit test.

Figure 17.19: Schematic drawing of the incident laser ring on the mantle of a 3D cylindrical can with no domain ray acceleration. Linear path of a ray through the 3D cylindrical domain blocks/cells is also shown.
Image Laser3DcylUnitTest1

17.4.13.2 Unit Test Parameters and Results

The 3D cylindrical can is specified by the following domain min/max coordinates: $R_{max}=1.0$ cm, $z_{min}=-1.0$ cm and $z_{max}=1.0$ cm. The laser ring consists of $ 10^6$ rays, has a radius of $ 0.5$ cm and is shot along the 3D cartesian $ y$ direction on the mantle center at $ z=0$ cm. Each ray carries a power of 1 erg/s. A uniform electron number density (half the critical density) is set throughout the can to ensure no radial and angular acceleration. Speed adjustment of the rays as they enter the domain is supressed, hence all rays will travel with the speed of light $ c$ through the can. A uniform inverse Bremsstrahlungsrate of

$\displaystyle \nu_{ib}$ $\displaystyle =$ $\displaystyle - c \ln (0.5) / 2$ (17.251)

is used, to accomplish a power reduction by $ 0.5$ for those rays that travel with the speed of light the full diameter of 2 cm of the can. In the top view of Figure 17.19, we denote by $ y$ half the distance traveled by the ray inside the can. Hence the ray travels $2y$ cm through the can in $2y/c$ seconds and for this ray an exit power reduction of
$\displaystyle P_{exit}/P_0$ $\displaystyle =$ $\displaystyle \exp (-\nu_{ib}2y/c) = 2^{-y}$ (17.252)

is expected. The unit test collects three relative quantities (exit value / expected value) for each ray: 1) the ring radius, 2) the ray speed and 3) the ray power. Min/max of these relative quantities are then formed over all rays and deviations from 1 are checked against a predefined benchmark for passing the test.

An additional test is being done to see if the laser ring retained its shape while entering and traveling through the can. On creation in the beam, all rays of the laser ring are placed uniformly on the ring with equal separation among each other. On exit from the can the rays should still be uniformly spaced in their angular position. Since checking for nearest neighbor rays for each ray is in our case a $N^2 = 10^6 x 10^6 = 10^{12}$ process, we resort to a simpler method. We define an angular tic $\phi_{tic}$ value as the minimum angular separation of each ray in the beam:

$\displaystyle \phi_{tic}$ $\displaystyle =$ $\displaystyle 2\pi/10^6.$ (17.253)

On exit from the can, the angular position variable $ \phi$ of each ray is divided by $\phi_{tic}$ and the nearest integer is formed, giving the label $ T$ of a tic in which the ray would be found:
$\displaystyle T$ $\displaystyle =$ nint$\displaystyle \;(\phi/\phi_{tic}).$ (17.254)

If the laser ring retained its shape, then there should be only 1 ray for each tic, i.e. the list of labels $\{T\}$ should consist of consecutive natural numbers, which can easily be checked by defining a zero tic array and adding 1 to each tic whenever a ray is found to belong to it. A final tic array containing only 1's indicates success in retaining the original laser ring shape. Since straight line movement in 3D cylindrical coordinates is a considerable stress test for the code, we allow a maximum tic occupancy of 2 (and hence a tic occupancy of 0 for other tics) for this test to succeed.

17.4.13.3 Laser Ring on Lid of 3D Cylindrical Can with Uniform Radial Acceleration

The purpose of this test is to check proper ray crossing at the 3D cylindrical axis. The laser ring contracts towards the axis in a singular point at $R=0$ and expands afterwards regaining its original shape. On exit the laser ring should have the same radius as on entry and all rays should have the same initial speed, which increases towards the singular point when the ring contracts and decreases afterwards when the ring expands. Inside the can the laser ring traces an hourglass shape with a single point at its center.

17.4.13.4 Unit Test Parameters and Results

The 3D cylindrical can is specified by the following domain min/max coordinates: $R_{max}=1.0$ cm, $z_{min}=-1.0$ cm and $z_{max}=1.0$ cm. The laser ring consists of $ 10^6$ rays, has a radius of $ 0.5$ cm and is shot along the 3D cartesian $ z$ direction on the lid center at $ x,y=0$ cm. Each ray carries a power of 1 erg/s. A linear electron number density gradient

$\displaystyle n_e(R)$ $\displaystyle =$ $\displaystyle n_cR/R_{max}$ (17.255)

is set throughout the can to ensure a uniform radial acceleration of (from Eq.(17.161) and using $R_{max}=1.0$)
$\displaystyle n_R({\bf r})$ $\displaystyle =$ $\displaystyle - c^2/2$ (17.256)

towards the 3D cylindrical axis. As the initial radial velocity $ v_R$ is zero, the time it takes for each ray to exit the can on the opposite lid is equal to twice the time it takes for it to reach the axis at $R=0$:
$\displaystyle t_{exit}$ $\displaystyle =$ $\displaystyle 2 t_{axis} = 2\sqrt{2}/c.$ (17.257)

Using a uniform inverse Bremsstrahlungsrate of
$\displaystyle \nu_{ib}$ $\displaystyle =$ $\displaystyle - c \ln (0.5) / 2,$ (17.258)

the power reduction rate for each ray at exit is
$\displaystyle P_{exit}/P_0$ $\displaystyle =$ $\displaystyle \exp (-\nu_{ib}2\sqrt{2}/c) = 2^{-\sqrt{2}}.$ (17.259)

In order for the rays not to exceed the speed of light inside the can, initial ray speed adjustment is set true, with each ray entering the can at $R=0.5$ with an axial velocity of:
$\displaystyle v_z$ $\displaystyle =$ $\displaystyle c * \sqrt{1 - n_R(0.5)/n_c} = c/\sqrt{2}$ (17.260)

At the axis, the rays will have gained a radial velocity of $v_R=-c/\sqrt{2}$ due to the radial acceleration. On exit the radial velocity is again $v_R=0$, while the axial velocity remains unchanged. The speed of each ray at the axis and at the exit point will therefore be:
$\displaystyle \vert v\vert _{axis}$ $\displaystyle =$ $\displaystyle c$ (17.261)
$\displaystyle \vert v\vert _{exit}$ $\displaystyle =$ $\displaystyle c/\sqrt{2}.$ (17.262)

The unit test collects three relative quantities at the exit point (exit value / expected value) for each ray: 1) the ring radius, 2) the ray speed and 3) the ray power. Min/max of these relative quantities are then formed over all rays and deviations from 1 are checked against a predefined benchmark for passing the test. Since pure radial movement of the rays is considerably easier in 3D cylindrical coordinates when compared to straight line movements, the laser ring shape test is only successful, if all tics are strictly occupied only once.