[FLASH-BUGS] Non-member submission from [Erik-Jan Rijkhorst <rijkhorst at strw.leidenuniv.nl>] (fwd)

Shawn Needham shawn at flash.uchicago.edu
Thu Sep 23 14:23:09 CDT 2004


Date: Thu, 23 Sep 2004 17:56:11 +0200 (MEST)
From: Erik-Jan Rijkhorst <rijkhorst at strw.leidenuniv.nl>
X-X-Sender: rijkhors at laak.strw.leidenuniv.nl
To: flash-bugs at flash.uchicago.edu
cc: Peter Woitke <woitke at strw.leidenuniv.nl>
Subject: sedov_sph with Flash2.4

Dear developers,

We are having trouble to run the 1D spherical sedov_sph setup on the 
parallel machines we have at our disposal. The machines are:

Aster with Intel efc/ecc version 8.0 compilers
An SGI Altix 3700 system, consisting of 416 CPUs (Intel Itanium 2, 1,3 GHz]

and

Teras with MIPSpro SGI f90/cc compilers
1024-CPU system consisting of two 512-CPU SGI Origin 3800 systems

(http://www.sara.nl for more info)

The problem we encounter is that at some timestep, the variables undergo 
sudden gross changes, e.g. creating new velocity jumps at locations where 
nothing should happen.

We have used simple testing compilation flags, e.g. without optimisation.

These problems occur if (and only if) the following three conditions are 
met:

  1) refinement is allowed, i.e. lrefinmax > lrefine_min. We used
     lrefine_min=1 and lrefine_max=8
  2) the module /mesh/amr/paramesh2.0/quadratic_spherical is involved
     (we have similar problems with other applications using this module)
  3) a certain number of processors is used, e.g.

  4 PE  =>  ok
  5 PE  =>  ok
  6 PE  =>  ok
  7 PE  =>  ok
  8 PE  =>  sudden variable jumps at timestep 1666
  9 PE  =>  ok
10 PE  =>  ok
12 PE  =>  ok
14 PE  =>  ok
16 PE  =>  sudden variable jumps at timestep 444

The errors are reproducable and similar on Aster and Teras (e.g. occur at 
the same timestep).

(However, on Aster with 8 PEs, we get a "Nonconvergence in subroutine 
rieman!"-error at timestep 1666.)

Because of the 3 conditions mentioned above needed to trigger this 
behaviour, we currently believe that there probably is a bug somewhere in 
the prolongation (or possibly guard cell exchange routines?) that go with 
the new quadratic/spherical routines.

We have checked some of the 'Flash daily test results' that are on the web 
but couldn't find a test with the new spherical module (like sod_sph or 
sedov_sph) done with more than 2 processors. Have such test with for 
example 8 processors been done on similar systems as ours?

If needed we can send you more information about the architecture, 
compiler and/or compilation flags used etc. Just tell us what you need.

Thank you very much for your help,

Peter Woitke and Erik-Jan Rijkhorst




More information about the flash-bugs mailing list