[FLASH-USERS] flash (sedov) runs with 4 processors, but no more ..

Diego dlopezc at ncsu.edu
Tue Jan 24 15:49:49 EST 2012


HI everybody, I am rather new to FLASH and have some bit of a problem when
trying to run a simple test problem in many procesors.

I hope somebody has come with this kind of problem and been able to solve
it.

So:

Running the Sedov test problem, I am able to run it with 4 procesors at
most (with either 4 processors on
one blade, 2 processors on each of 2 blades, or 4 processors, 1 each on
each of 4 blades). This runs beautifully for any refinement criteria,
refinement levels, courant criteria, riemann solver, and the interpolation
scheme.

Then when I try to solve exactly the same test problem with more than 4
procesors, for any blade-processor combination, the result is that I get an
MPI abort

For a failing job asking for 16 processors, the last lines of sedov.log are

[ 01-24-2012  14:42:52.608 ] message: rss   (MB):        29.66
(min)         29.96 (max)         29.78 (avg)
 [ 01-24-2012  14:42:56.165 ] [IO_writeCheckpoint] open: type=checkpoint
name=sedov_hdf5_chk_0000
 [ 01-24-2012  14:42:56.285 ] [IO_writeCheckpoint] close: type=checkpoint
name=sedov_hdf5_chk_0000
 WARNING you have called IO_writePlotfile but no plot_vars are defined.
 put the vars you want in the plotfile in your flash.par (plot_var_1 =
"dens")
 [ 01-24-2012  14:42:56.309 ] [IO_writePlotfile] open: type=plotfile
name=sedov_hdf5_plt_cnt_0000
 [ 01-24-2012  14:42:56.418 ] [IO_writePlotfile] close: type=plotfile
name=sedov_hdf5_plt_cnt_0000
 [ 01-24-2012  14:42:56.508 ] message: vsize (MB):        97.23
(min)         98.40 (max)         97.40 (avg)
 [ 01-24-2012  14:42:56.508 ] message: rss   (MB):        29.74
(min)         32.81 (max)         30.03 (avg)
 [ 01-24-2012  14:42:56.508 ] [Driver_evolveFlash]: Entering evolution loop
 [ 01-24-2012  14:42:56.509 ] step: n=1 t=0.000000E+00 dt=1.000000E-10
 [ 01-24-2012  14:42:56.739 ] [mpi_amr_comm_setup]: buffer_dim_send=7087,
buffer_dim_recv=2865

-------------------------------------------------------------
It seems weird that the buffer_dim_send is much larger than the
buffer_dim_recv ?
(but can find some other examples that continue to run with the send buffer
only slightly larger than
the receive buffer).

---------------------------------------------------------------
I also played with flash.par .. the lines

iGridSize = 16   #global number of gridpoints along x, excluding gcells
jGridSize = 16   #global number of gridpoints along y, excluding gcells
kGridSize = 1
iProcs = 4      #num procs in i direction
jProcs = 4      #num procs in j direction
kProcs = 1

Originially the iGridSize, jGridsize, kGridSize were commented.  and the
iProcs, jProcs were set to 1 (for these sedov runs ok with iProcs and
jProcs set to 2 -- running on 4 processors, but not with 16 ).

-----------------------------------------------------------------------
In the compilation

FDEFINES has -DNX=8 and -DNY=8  .. do not know how that is set ..

>From the sedov.log file the compile lines were  ( pgi10.5 and mpich2 with
"hydra")

 mpif90 -I/include -c -r8 -i4 -fastsse -Mnovect -pc 64 -DMAXBLOCKS=1000
-DNXB=8 -DNYB=8 -DNZB=1 -DN_DIM=2
 c compiler flags:
 mpicc -I/usr/local/apps/hdf/64hydra-pgi105/5-1.8.5-patch1-i8/include
-DH5_USE_16_API -I/include -c -O2 -DM

Any help, comments, or useful information is very welcome.

Thanks.

Diego
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20120124/c5011b48/attachment.htm>


More information about the flash-users mailing list