[FLASH-BUGS] sedov_sph with Flash2.4 (II)

Peter Woitke woitke at strw.leidenuniv.nl
Thu Oct 21 09:14:22 CDT 2004


Dear developers,

-- see also 1.submission "sedov_sph with Flash2.4" by Erik-Jan Rijkhorst --

We are still having trouble to run 1D models (sedov, sedov_sph)
on the parallel computers to our disposal (ASTER/TERAS, see 
http://www.sara.nl). Meanwhile we talked to the operators at these
computing center and have a few more informations. The problem is
still unsolved, however.

Description of the problem
==========================
Error in mesh_prolong.F90 or subordinate routines: after the prolongation, 
one or several new child blocks have wrong solndata, see attached output 
from files created before (5003) and after (5005) the call of 
mesh_prolong.F90 in source/mesh/amr/update_grid_refinement.F90.

Further description is related to sedov_sph -1d with 8 PEs:
===========================================================
- The same error occurs with FLASH2.3 and FLASH2.4

- The error is reproducable

- The same error occurs on ASTER (64-bit linux) and TERAS (64bit SGI origin),
   but not on our local linux-cluster!

- The same error occurs with mpt-module xor mpich-1.2.5-module loaded
   (different MPI implementations)

- The error occurs only for -1d models

- The error occurs only for certain numbers of PEs as described
   in our first submission

- The error occurs seldomly, for this problem at timestep 1666, but not
   at all the refinements done before.

Further observations/ideas
==========================
The occurrence of this error only for -1d models with lots of processors
which might suggest that the super-fast MPI-implementation is a problem:
many MPI_actions are taken shortly after each other with only few data.

The refinement step at 1666 is a complicated one: one block needs
to be refined, but since it is surrounded by a row of less refined 
neighbours on the right hand side, 10 new blocks are created.

We would be very happy about any kind of comments by the developers.


Kind regards,

Peter Woitke  &  Erik-Jan Rijkhorst





PS: details about setup and compilation
=======================================
----------
setup call
----------
./setup sedov -1d -auto, ./setup sedov_sph -1d -auto, respectively

----------------------------------
Makefile.h: (for TERAS SGI-origin)
----------------------------------
HDF5_PATH = /usr/local/opt/hdf5-1.4.4
FCOMP   = f90
CCOMP   = cc
CPPCOMP = CC
LINK    = f90
   (MIPSpro SGI f90/cc compilers)
FFLAGS_OPT  = -64 -c -r8 -d8 -i4 -cpp -mips4 -O3 -Ofast=ip35
   (we also tried without opimisation - no difference)
LFLAGS_OPT  = -64 -r8 -d8 -i4 -IPA -o
LIB_HDF5  = -L$(HDF5_PATH)/lib -lhdf5
LIB_OPT   = -lmpi

--------
run-call
--------
mpirun -np 8 flash2

--------------------------
slightly changed flash.par
--------------------------
lrefine_min     = 1
lrefine_max     = 8
basenm          = "sedov_sph_"
restart         = .false.
tplot           = 0.001
trstrt          = 0.01
nend            = 1700
tmax            = 0.05
plot_var_1      = "dens"
plot_var_2      = "pres"
plot_var_3      = "temp"
plot_var_4      = "velx"

--------------------------------------------------
This is how we created the additional output files
(in source/mesh/amr/update_grid_refinement.F90)
--------------------------------------------------
         integer,save:: counter = 0
         ...
         call mark_grid_refinement()
         if ( nstep .eq. 1466 .or. nstep .eq. 1666) then
           call plotfile(5000+counter, time)
           counter = counter + 1
         endif
         call mesh_refine_derefine()
         do block_no = 1,lnblocks
            call grid (block_no)
         end do
         new_child =>  dBaseTreePtrNewChild()
         if (conserved_var) then
            do block_no = 1, lnblocks
               if ( .not. new_child(block_no)) then
                  solnData => dBaseGetDataPtrSingleBlock(block_no, GC)
                  ...
                  call convert_var_prim_to_cons( solnData(:,:,:,:) )
                  call dBaseReleaseDataPtrSingleBlock(block_no, solnData)
               endif
            enddo
         endif
         if ( nstep .eq. 1466 .or. nstep .eq. 1666) then
           call plotfile(5000+counter, time)
           counter = counter + 1
         endif
         call mesh_prolong (MyPE, 1, nguard)
         if ( nstep .eq. 1466 .or. nstep .eq. 1666) then
           call plotfile(5000+counter, time)
           counter = counter + 1
         endif
         call mesh_guardcell (MyPE, 1, nguard, time, 1, 0)
         ...

plotfiles 5000,5001,5002 are created during an exemplary working 
refinement step (nstep=1466),
plotfiles 5003,5004,5005 are created during the refinement step that 
causes the error (nstep=1666).

----------------
The stdout file:
----------------
     1662 3.0518E-02 1.1371E-05 |  1.137E-05
     1663 3.0541E-02 1.1372E-05 |  1.137E-05
     1664 3.0564E-02 1.1374E-05 |  1.137E-05
     1665 3.0587E-02 1.1375E-05 |  1.138E-05
  block to be refined:  myPE=6, blockno=6  (print from amr_refine_derefine)
  *** Wrote output to sedov_sph_hdf5_plt_cnt_5003 ***
  PE  0  lnblocks  11                      (print from amr_refine_derefine)
  PE  1  lnblocks  7
  PE  2  lnblocks  7
  PE  3  lnblocks  9
  PE  4  lnblocks  6
  PE  5  lnblocks  7
  PE  6  lnblocks  7
  PE  7  lnblocks  5
  min_blocks          6 max_blocks         12 tot_blocks         69
  *** Wrote output to sedov_sph_hdf5_plt_cnt_5004 ***
  *** Wrote output to sedov_sph_hdf5_plt_cnt_5005 ***

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sedov_sph_hdf5_plt_cnt_blk_velx5005.png
Type: image/png
Size: 6483 bytes
Desc: 
Url : http://flash.uchicago.edu/pipermail/flash-bugs/attachments/20041021/3c16bb51/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sedov_sph_hdf5_plt_cnt_blk_velx5003.png
Type: image/png
Size: 6249 bytes
Desc: 
Url : http://flash.uchicago.edu/pipermail/flash-bugs/attachments/20041021/3c16bb51/attachment-0003.png 


More information about the flash-bugs mailing list