We have experimented with two different threading strategies for adding an extra layer of parallelism to the standard MPI decomposition described in Chapter 8.
The first strategy makes use of the fact that the solution updates on different Paramesh blocks are independent, and so, multiple threads can safely update the solution on different blocks at the same time. This is a coarse-grained threading approach which is only applicable to Paramesh. It can be included in a FLASH application by adding threadBlockList=True to the FLASH setup line.
call Grid_getListOfBlocks(LEAF,blockList,blockCount) !$omp do schedule(static) do b=1,blockCount blockID = blockList(b) call update_soln_on_a_block(blockID)
The second strategy parallelises the nested do loops in the kernel subroutines such that different threads are updating the solution on independent cells from the same block at the same time. This is a fine-grained threading approach which is applicable to both Paramesh and UG. It can be included in a FLASH application by adding threadWithinBlock=True to the FLASH setup line. Notice that we parallelise the outermost do loop for a given dimensionality to improve cache usage for each thread.
#if NDIM == 3 !$omp do schedule(static) #endif do k=k0-2,kmax+2 #if NDIM == 2 !$omp do schedule(static) #endif do j=j0-2,jmax+2 #if NDIM == 1 !$omp do schedule(static) #endif do i=i0-2,imax+2 soln(i,j,k) = ....
The final threading option is only applicable to the energy deposition unit. In this unit we assign different rays to each thread, where the number of rays assigned is dynamic because the work per ray is not fixed. It can be included in a FLASH application by adding threadRayTrace=True to the FLASH setup line
It is perfectly fine to mix these setup options except for threadBlockList=True and threadWithinBlock=True.