[FLASH-USERS] ERROR in mpi_morton_bnd_prol

Wed Jan 30 11:14:14 EST 2013

Hi Marco,

I think the error is still happening because maxblocks_alloc is still
too small.  Please experiment with increasing the value even more.
The default values in a 3D FLASH application are maxblocks=200 and
maxblocks_alloc=2000 (maxblocks_alloc=maxblocks*10).  You have
maxblocks_alloc=80.  It is perfectly fine to reduce the value of
maxblocks (and thus maxblocks_alloc), but there comes a point when the
buffers are too small for Paramesh to operate.

I've copied this email to the flash-users mailing list so that others
can see our complete email exchange which includes debugging
segmentation faults and running FLASH on BG/Q.

For your other questions

(i) The BG/Q error you show seems to be related to your runtime
environment and not FLASH.  We use cobalt on Mira BG/Q so your job
submission script is unfamiliar to me, however, it looks like bg_size
in your script specifies the number of nodes you want.  If so, it
should be set to 32 (and not 64) to give 1024 total MPI ranks and 32
ranks per node.

(ii) See the first paragraph of this email.

(iii) Never use direct I/O.  You should be able to get the performance
you need from the FLASH parallel I/O implementations.  Please see "A
case study for scientific I/O: improving the FLASH astrophysics code"
(http://iopscience.iop.org/1749-4699/5/1/015001) for a discussion of
FLASH parallel I/O and usage of collective optimizations.

(iv) The advantage of PFFT over FFTW is that PFFT was written by Anshu
so we have in-house knowledge of how it works.  I am unaware of any
performance comparisons between PFFT and FFTW.

It is probably possible to integrate FFTW in FLASH.  We have mapping
code inside the PFFT unit which is general enough to take FLASH mesh
data and create a slab decomposition for FFTW (where each MPI rank has
a complete 2D slab) instead of a pencil decomposition for PFFT (where
each MPI rank has a complete 1D pencil).

(v) I don't know.  Maybe someone else can help with this.

Chris

On 01/29/2013 09:07 AM, Marco Mazzuoli wrote:
>     Dear Christopher,
>
> obviously you did agree. The error was due to memory corruption caused 
> by a conflict between "qrealsize=8" and some parameter I defined as 
> real(kind(1D0)) instead of real(kind(1.2E0)). Now Flash4 (version 
> FLASH4-alpha_release) runs. Also the IO/IOMain/hdf5/parallel/PM works.
>
> I tried to implement the last version FLASH4 available on the website, 
> but I met some troubles both in compiletime (in the interface of some 
> Grid subroutines), which I have fixed, and in runtime. As for the 
> runtime, I would ask you a pair of questions.
> i) What does the following stdoutput, visualized just before the 
> simulation to abort, mean?
>
> --------------------------------------------------------------------------------------------------------------------
>  [Driver_initParallel]: Called MPI_Init_thread - requested level   2, 
> given level   2
> flash4: binding.c:290: _xlsmp_get_cpu_topology_lop: Assertion `thr < 
> ncpus' failed.
> --------------------------------------------------------------------------------------------------------------------
>
> Please you can find attached here (jobANDmake.tar) the input file 
> ("job.cmd") and the makefile with the compilation flags i put into 
> "sites" ("Makefile.h").
> Essentially I use a bgsize=64 (i.e. 1024 cores), with 32 ranks per 
> node, on the BG/Q machine. Block size= 16x16x16, maxblocks=40, Max 
> refinement level =6 (everywhere).
>
> ii) Notwithstanding I turned both maxblocks_tr (in tree.F90) and 
> maxblocks_alloc (in paramesh_dimensions.F90) into 20*maxblocks, the 
> following error holds when I use larger block dimensions (32x32x32) 
> and maxblocks=8 (max refinement level=5):
>
> --------------------------------------------------------------------------------------------------------------------
> ...
> ...
>  ERROR in process_fetch_list : guard block starting index -15 not 
> larger than lnblocks 5  processor no.  40  maxblocks_alloc 80
>  ERROR in process_fetch_list : guard block starting index -3 not 
> larger than lnblocks 5  processor no.  442  maxblocks_alloc 80
>  ERROR in process_fetch_list : guard block starting index -11 not 
> larger than lnblocks 5  processor no.  92  maxblocks_alloc 80
>  ERROR in process_fetch_list : guard block starting index -11 not 
> larger than lnblocks 5  processor no.  804  maxblocks_alloc 80
>  ERROR in process_fetch_list : guard block starting index -3 not 
> larger than lnblocks 5  processor no.  218  maxblocks_alloc 80
> Abort(0) on node 396 (rank 396 in comm -2080374784): application 
> called MPI_Abort(comm=0x84000000, 0) - process 396
> ...
> ...
> --------------------------------------------------------------------------------------------------------------------
> Why do you think it still occurs?
>
> iii) Do you know what is the speed up obtained by using 
> IO/IOMain/direct/PM? Except the huge number of files, is there any 
> other counter-indication?
>
> iv) From your experience could you explain me if it exists and, 
> eventually, which is the advantage to use the PFFT instead of an other 
> parallel fft library e.g. fftw3?
> v) The direct solver implemented into the multigrid Poisson solver is 
> based on the Ricker's algorithm (2008) and, for a rectangular domain 
> with Dirichlet boundary conditions, it makes use of the Integrated 
> Green's Function technique. The transform-space Green's function reads:
>
> G_{ijk}^l=-16\pi [ \Delta_{x_l}^{-2}\sin(i\pi/(2n_x))  + 
> \Delta_{y_l}^{-2}\sin(j\pi/(2n_y))  + 
> \Delta_{z_l}^{-2}\sin(k\pi/(2n_z)) ]^{-1}         \qquad (*)
>
> Could you suggest me how to obtain (*) in order me to be able to 
> compute a similar Green's function which solves a non-homogeneous 
> Helmholtz problem with Dirichlet boundary condition.
>
> It is clear that questions iv and v are out of the ordinary support, 
> so I ask you whether you can give me some hints.
>
> Thank you so much for your suggestions, Christopher. They have been 
> valuable.
>
> Sincerely,
>
>     Marco
>
>
> Ing. Marco Mazzuoli
> Dipartimento di Ingegneria
> delle Costruzioni, dell'Ambiente e
> del Territorio (DICAT)
> via Montallegro 1
> 16145 GENOVA-ITALY
> tel.  +39 010 353 2497
> cell. +39 338 7142904
> e-mail marco.mazzuoli at unige.it
>         marco.mazzuoli84 at gmail.com
>
>
>
>
>
>
> ------------------------------------------------------------------------
> Date: Mon, 28 Jan 2013 11:13:04 -0600
> From: cdaley at flash.uchicago.edu
> To: marco.mazzuoli at unige.it
> Subject: Re: [FLASH-USERS] ERROR in mpi_morton_bnd_prol
>
> Hi Marco,
>
> I've had a quick glance at the minGridSpacing subroutine and it
> looks OK.  Nonsensical results like this are normally an
> indication of earlier memory corruption.  One other thing that is
> possible is that the sizeof dxMin is different to dxMinTmp. If
> dxMin is declared with an explicit "kind=" then it may not be the
> same size as that specified by MPI_DOUBLE_PRECISION.
>
> Since you already have "MaxRef" I think you can simplify your
> subroutine to just
>
> use Grid_data, ONLY : gr_delta
> dxMin = gr_delta(1:MDIM,MaxRef)
>
>
> You attached a serial I/O FLASH subroutine and not parallel I/O.
> Again I suspect earlier memory corruption is causing a problem.
>
> Try the following unit test on BG/Q.  It uses very similar units
> to you.  If it works then it is a strong indication that your
> custom code has a bug which is causing memory corruption.
>
> ./setup unitTest/PFFT_PoissonFD -auto -3d -maxblocks=200 +pm4dev 
> -parfile=test_paramesh_3d_64cube.par
>
> Try it with 1 node and 16 MPI ranks.  Then add 1 to both
> lrefine_min and lrefine_max and run it again with 8 nodes and 128
> MPI ranks.  Repeat as you wish and go as big as you like. You
> should find that this unit test works without problem and the
> Linf value should keep getting less as you add refinement levels.
>
>
> I would remove the qsmp=omp:noauto
> -L/opt/ibmcmp/xlsmp/bg/3.1/lib64/ -lxlsmp flags because you are
> not using OpenMP.  In general, be careful with the -qsmp flag as
> by default it adds -qhot aggressive compilation option.
>
> It is only in the last few months that our target application for
> Early Science on Mira BG/Q can use the aggr essive compilation
> flag -qhot without generating bad answers.  Maybe it is still
> causing problems in your application.
>
> Chris
>
>
>
>
> On 01/28/2013 08:04 AM, Marco Mazzuoli wrote:
>
>         Dear Christopher,
>
>     I tried the code both by using serial hdf5 writing and parallel
>     hdf5 writing.
>
>     i) As for the former case, I found that, by implementing the code
>     into the BG/Q machine, an error occurs in the following (homemade)
>     subroutine which saves the minimal grid spacing for each Cartesian
>     direction in the array dXmin [real,dimension(3)]:
>     -----
>     ----------------------------------------------------------------------------
>     SUBROUTINE minGridSpacing()
>
>           USE Grid_data,      ONLY: gr_meshComm
>           USE Dns_data,       ONLY: dXmin, MaxRef
>           use Grid_interface, ONLY: Grid_getListOfBlocks, Grid_getDeltas
>           use tree,           ONLY: lrefine
>
>           IMPLICIT NONE
>
>     #include "constants.h"
>     #include "Flash.h"
>     #include "Flash_mpi.h"
>
>           INTEGER,DIMENSION(MAXBLOCKS) :: blockList
>           INTEGER              &n bsp;       :: blockCount
>           INTEGER                      :: lb, ierr
>
>           REAL,DIMENSION(MDIM)         :: dXminTMP
>
>           CALL Grid_getListOfBlocks(ALL_BLKS,blockList,blockCount)
>
>     !     Initialization of dXminTMP
>           dXminTMP(:)=9999.
>
>           DO lb = 1, blockCount
>
>             IF (lrefine(lb) .EQ. MaxRef) THEN
>               CALL Grid_getDeltas(blockList(lb),dXminTMP)
>               IF (ANY(dXminTMP.GE .0.)) EXIT
>             END IF
>
>           END DO
>     !    ****** PRINT 1 ******
>           PRINT*,dXminTMP(1),dXminTMP(2),dXminTMP(3),MaxRef
>     !    *********************
>     !      CALL MPI_Barrier (gr_meshComm, ierr)
>
>     !     find the smallest grid spacing for each direction among all
>     the blocks:
>           CALL MPI_ALLREDUCE(dXminTMP, dXmin, 3,
>     MPI_DOUBLE_PRECISION,       &
>                MPI_MIN, gr_meshComm, ierr)
>
>     !    ****** PRINT 2 ******
>           PRINT*,dXmin(1),dXmin(2),dXmin(3)
>     !    *********************
>
>           IF(ierr.NE.0)PRINT*,"minGridSpacing(): MPI error!"
>
>           END SUBROUTINE minGridSpacing
>     ---------------------------------------------------------------------------------
>
>     The stdoutput of the PRINTs are:
>
>      0.970785156250000003E-01 0.112096874999999999 0.112096874999999999 6
>      0.20917539062499999891198143586734659
>     0.11209687499999999860111898897230276
>     0.00000000000000000000000000000000000E+00
>
>     (each one repeated 1024 times)
>     I do not know why the variables dXminTMP and dXmin contain
>     different values (none of the stdoutput of dXminTMP is equal to
>     dXmin).
>     Could you suggest me why? Is it regular that the output format is
>     so different? Maybe some of the following flags I impose to the
>     fortran compiler (mpixlf90_r) is wrong ?
>
>     FFLAGS_OPT   = -g -O3 -qstrict -qsimd -qunroll=yes -qarch=qp
>     -qtune=qp -q64 -qrealsize=8 -qthreaded -qnosave -qfixed -c \
>                    -qlistopt -qreport -WF,-DIBM -qsmp=omp:noauto
>     -L/opt/ibmcmp/xlsmp/bg/3.1/lib64/ -lxlsmp
>
>     ii) As for the latter case (parallel hdf5 writing), I detected the
>     error I met last week (see previous conversation below). It is
>     between line 660 and line 705 of "io_writeData.F90" which please
>     you can find attached here. Do you know why the "segmentation
>     fault" may occur here?
>
>     Thank you again for your help, Christopher.
>
>     Sincerely,
>
>         Marco
>
>
>
>     Ing. Marco Mazzuoli
>     Dipartimento di Ingegneria
>     delle Costruzioni, dell'Ambiente e
>     del Territorio (DICAT)
>     via Montallegro 1
>     16145 GENOVA-ITALY
>     tel.  +39 010 353 2497
>     cell. +39 338 7142904
>     e-mailmarco.mazzuoli at unige.it  <mailto:marco.mazzuoli at unige.it>
>             marco.mazzuoli84 at gmail.com  <mailto:marco.mazzuoli84 at gmail.com>
>
>
>
>
>
>
>     ------------------------------------------------------------------------
>     Date: Thu, 24 Jan 2013 10:19:34 -0600
>     From: cdaley at flash.uchicago.edu <mailto:cdaley at flash.uchicago.edu>
>     To: marco.mazzuoli at unige.it <mailto:marco.mazzuoli at unige.it>
>     Subject: Re: [FLASH-USERS] ERROR in mpi_morton_bnd_prol
>
>     Unfortunately, none of the units that I have multithreaded are in
>     your simulation.
>     *If your problem fits in 512MB memory, I recommend you run FLASH
>     with 32 MPI ranks per node on BG/Q.  If I recall correctly, the
>     PPC core can execute 2 instructions per clock cycle, but the
>     instructions must be issued by different hardware threads.
>     Placing multiple MPI ranks per core achieves this aim and allows
>     us to hide memory latency.  Have a look at
>     http://flash.uchicago.edu/~cdaley/Siam/Daley_MS30-3.pdf
>     <http://flash.uchicago.edu/%7Ecdaley/Siam/Daley_MS30-3.pdf> on page
>     12.  By eye, a 16 MPI ranks per node run with 1 thread took ~275
>     seconds and a 32 MPI ranks per node run with 1 thread took ~190
>     seconds.
>
>     You should also setup FLASH with +pm4dev.  This is the latest
>     Paramesh with Flash Center enhancements to make it scale better.
>     You should also use the latest FLASH release.
>
>     In terms of debugging, you really need to look at core file 948
>     to find the instruction which caused the segmentation fault.
>     Most likely, there is some form of memory corruption which you
>     need to identify.
>
>     It may be useful to setup FLASH without I/O (+noio) and see if
>     your simulation still fails.  You can compare the integrated
>     quantities file (default name is flash.dat) with a healthy
>     simulation run on your local cluster to see if it is working
>     correctly.
>
>     It may be useful to remove compiler optimization fla gs and use
>     -O0 to see if optimization is causing a problem.
>
>     Good luck,
>     Chris
>
>
>     On 01/24/2013 03:51 AM, Marco Mazzuoli wrote:
>     *
>     *
>
>             Dear Christopher,
>
>         thank you again. Of course, with the multithreading up to 4
>         tasks per core (64 ranks per node) using parallelization
>         OpenMP you should obtain the best performance because the
>         threading is still hardware, not software. Are there routines
>         which use also OpenMP libraries for multithreading at the moment?
>         Anyway, please you can find attached here the setup_units file
>         from my object directory. I have heavily modified the Flash
>         code (Physics) in order to adapt it to my aim.
>         However, the IO, the Poisson solver, the Paramesh4 AMR as well
>         as the basic structure of Flash has been kept unchanged.
>         In particular, nothing has been changed of the initialization
>         (this why I am asking your help).
>
>         I suppose the error I meet, comes from a writing error on the
>         BG/Q machine because the same code on a Linux cluster machine
>         works very well.
>         If you had some further idea for my troubles please let me know.
>         Thank you very much, Christopher.
>
>         Sincerely,
>
>             Marco
>
>
>
>         Ing. Marco Mazzuoli
>         Dipartimento di Ingegneria
>         delle Costruzioni, dell'Ambiente e
>         del Territorio (DICAT)
>         via Montallegro 1
>         16145 GENOVA-ITALY
>         tel.  +39 010 353 2497
>         cell. +39 338
>         7142904
>         e-mailmarco.mazzuoli at unige.it  <mailto:marco.mazzuoli at unige.it>
>                 marco.mazzuoli84 at gmail.com  <mailto:marco.mazzuoli84 at gmail.com>
>
>
>
>
>
>
>         ------------------------------------------------------------------------
>         Date: Wed, 23 Jan 2013 11:43:57 -0600
>         From: cdaley at flash.uchicago.edu <mailto:cdaley at flash.uchicago.edu>
>         To: marco.mazzuoli at unige.it <mailto:marco.mazzuoli at unige.it>
>         Subject: Re: [FLASH-USERS] ERROR in mpi_morton_bnd_prol
>
>         It is failing with a segmentation fault (signal 11).
>
>         You should look at the stderr file and also core file 948.  There
>         is a tool named bgq_stack which reports the stack trace in a core
>         file.
>
>         bgq_stack ./flash4 core.948
>
>         If unavailable you can translate the stack addresses in this core
>         file one at a time using addr2line.
>
>         If the failing line is an innocent looking piece of FLASH code
>         then most like ly there is some form of memory corruption earlier
>         on.  Maybe there is something specific to your simulation or
>         perhaps your HDF5 needs to be recompiled against the latest
>         driver version on BG/Q.
>
>         A good debugging method is to try to reproduce the same error in
>         a smaller problem so that you can then repeat the run on a local
>         workstation.  Once on a local workstation you can debug much more
>         interactively and use excellent tools like valgrind.
>
>         Could you send me the setup_units from your FLASH object
>         directory?  I'm curious what units you are using.  We have been
>         doing a lot of work on the BG/Q recently including multithreading
>         the FLASH units needed for Supernova simulations.  Our Supernova
>         simulations are currently running on Mira BG/Q - we are using 16
>         MPI ranks per node and 4 OpenMP threads per MPI rank.
>
>         Chris
>
>
>         On 01/23/2013 11:19 AM, Marco Mazzuoli wrote:
>
>             Thank you Christopher,
>
>             indeed I could use the UG at this stage, but I am t esting
>             the code on the bluegene machine in order asap to make
>             larger simulations which need the AMR to be used.
>
>             Actually, I have already solved the problem by reducing
>             the dimensions of the blocks (16x16x16) and by introducing
>             a finer refinement level. But of course the solution you
>             proposed sounds better. I guess it is better to use the
>             largest blocks with the smallest number of cores.
>
>             If I can I would ask you an other question. Just after the
>             initialization, when the code writes the first checkpoint
>             file (at row54 of io_initFile.F90: "call
>             io_h5init_file(fileID, filename, io_comm,
>             io_outputSplitNum)"), the run crashes giving the following
>             message in the stdout put:
>
>             ---------------------------------------------------------------------------------------------------------
>              RuntimeParameters_read:  ignoring unknown parameter
>             "igrav"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "mgrid_max_iter_change"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "mgrid_solve_max_iter"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "mgrid_print_norm"...
>              Runtim eParameters_read:  ignoring unknown parameter
>             "msgbuffer"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "eint_switch"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "order"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "slopeLimiter"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "LimitedSlopeBeta"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "charLimiting"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "use_avisc"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "use_flattening"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "use_steepening"...
>              R untimeParameters_read:  ignoring unknown parameter
>             "use_upwindTVD"...
>              RuntimeParameters_read:  ignoring unkno wn parameter
>             "RiemannSolver"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "entropy"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "shockDetect"...
>              MaterialProperties initialized
>              Cosmology initialized
>              Source terms initialized
>               iteration, no. not moved =  0 0
>              refined: total leaf blocks =  1
>              refined: total blocks =  1
>               starting MORTON ORDERING
>               tot_blocks after  1
>               max_blocks 2 1
>               min_blocks 2 0
>              Finished initialising block: 1
>              INFO: Grid_fillGuardCells is ignoring masking.
>               iteration, no. not moved =  0 0
>              refined: total leaf blocks =  8
>              refined: total blocks =  9
>               iteration, no. not moved =  0 7
>               iteration, no. not moved =  1 0
>              refined: total leaf blocks =  64
>              refined: total blocks =  73
>               iteration, no. not moved =  0 70
>               iteration, no. not moved =  1 7
>               iteration, no. not moved =  2 0
>              refined: total leaf blocks =  512
>              refined: total blocks =  585
>               iteration, no. not moved =  0 583
>               iteration, no. not moved =  1 21
>               iteration, no. not moved =  2 0
>              refined: tot al leaf blocks =  4096
>              refined: total blocks =  4681
>              Finished initialising block: 5
>              Finished initialising block: 6
>               iteration, no. not moved =  0 904
>               iteration, no. not moved =  1 0
>              refined: total leaf blocks =  32768
>              refined: total blocks =  37449
>              Finished initialising block: 6
>              Finished initialising block: 7
>              Finished initialising block: 8
>              Finished initialising block: 9
>              Finished initialising block: 10
>              Finished initialising block: 11
>              Finished initialising block: 12
>             &nbs p;Finished initialising block: 13
>              Finished initialising block: 15
>              Finished initialising block: 16
>              Finished initialising block: 17
>              Finished initialising block: 18
>              Finished initialising block: 19
>              Finished initialising block: 20
>              Finished initialising block: 21
>              Finished initialising block: 22
>              Finished initialising block: 24
>              Finished initialising block: 25
>              Finished initialising block: 26
>              Finished initialising block: 27
>              Finished initialising block: 28
>             ...
>             ...
>               Finished with Grid_initDomain, no restart
>              Ready to call Hydro_init
>              Hydro initialized
>              Gravity initialized
>              Initial dt verified
>             2013-01-23 17:43:12.006 (WARN ) [0x40000e98ba0]
>             :97333:ibm.runjob.cli ent.Job: terminated by signal 11
>             2013-01-23 17:43:12.011 (WARN ) [0x40000e98ba0]
>             :97333:ibm.runjob.client.Job: abnormal termination by
>             signal 11 from rank 948
>             ---------------------------------------------------------------------------------------------------------
>
>             In particular 8 cores call the subroutine "io_h5init_file"
>             before the run crashes (I checked it during the debug).
>             What do you think it could depend on?
>
>             Thank you again, Christopher.
>             Since rely,
>
>                 Marco
>
>
>
>             Ing. Marco Mazzuoli
>             Dipartimento di Ingegneria
>             delle Costruzioni, dell'Ambiente e
>             del Territorio (DICAT)
>             via Montallegro 1
>             16145 GENOVA-ITALY
>             tel.  +39 010 353 2497
>             cell. +39 338 7142904
>             e-mailmarco.mazzuoli at unige.it  <mailto:marco.mazzuoli at unige.it>
>                     marco.mazzuoli84 at gmail.com  <mailto:marco.mazzuoli84 at gmail.com>
>
>
>
>
>
>
>             ------------------------------------------------------------------------
>             Date: Wed, 23 Jan 2013 10:16:20 -0600
>             From: cdaley at flash.uchicago.edu
>             <mailto:cdaley at flash.uchicago.edu>
>             To: marco.mazzuoli at unige.it <mailto:marco.mazzuoli at unige.it>
>             CC: flash-users at flash.uchicago.ed u
>             <mailto:flash-users at flash.uchicago.edu>
>             Subject: Re: [FLASH-USERS] ERROR in mpi_morton_bnd_prol
>
>             Hi Marco,
>
>             You should increase maxblocks because a value of
>             maxblocks=4 is too
>             low.  Y ou are using large blocks (32^3) and so memory
>             usage prevents
>             you setting maxblocks too high, but I would suggest for
>             this problem
>             you need a value of at least 10 for maxblocks.
>
>             The specific error you show is happening because data
>             cannot be found
>             in a array of size maxblocks_alloc, where the default value of
>             maxblocks_alloc is maxblocks * 10. Inte rnally Paramesh
>             has many
>             arrays of size maxblocks_alloc which h old e.g. various
>             information
>             about the local view of the oct-tree. A technique we have
>             used in the
>             past when there is insufficie nt memory to make maxblocks
>             much larger
>             and we need to avoid errors like you show is to make
>             maxblocks_alloc
>             larger in amr_initialize.F90, e.g. maxblocks_alloc =
>             maxblocks * 20.
>             You should also change maxblocks_tr to be the same size as
>             maxblocks_alloc.
>
>             Finally, if you don't need AMR then you should use the
>             FLASH uniform
>             grid (you have a fully refined domain at level 5).  Memory
>             usage will
>             be less and guard cell fills will be much faster.
>
>             Chris
>
>
>             On 01/18/2013 10:47 AM, Marco Mazzuoli wrote:
>                 Dear Flash users,
>
>             I am trying to run Flash on a bluegene-type supercomputer.
>             The details of the present run are:
>
>             #procs=1024 on #64 nodes (#16procs per node).
>
>             The domain is rectangular.
>             Block size = 32x32x32 computational points
>             Max refinement level = 5
>             The whole domain is refined at level = 5 such that N°bloc
>             ks=1+8+64+512+4096=4681
>             Max_blocks per core = 4
>
>             Do you know what the initialization error visualized in
>             the standard output and proposed in the following, could
>             depend on?
>
>             -----------------------------------------------------------------------------------------------------------------------
>             & nbsp;RuntimeParameters_read: ignoring unknown parameter
>             "igrav"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "mgrid_max_iter_change"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "mgrid_solve_max_iter"...
>              RuntimeParameters_read:  ignoring unknown para meter
>             "mgrid_print_norm"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "msgbuffer"...
>              RuntimeParameters_read:&n bsp; ignoring unknown parameter
>             "eint_switch"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "order"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "slopeLimiter"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "LimitedSlopeBeta"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "charLimiting"...
>               RuntimeParameters_read:  ignoring unknown parameter
>             "use_avisc"...< br>  RuntimeParameters_read: ignoring
>             unknown parameter "use_flattening"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "use_steepening"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "use_upwindTVD"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "RiemannSolver"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "entropy"...
>              RuntimeParameters_read:  ignoring unknown parameter
>             "shockDetect"...
>              MaterialProperties initialized
>              Cosmology initialized
>              Source terms initialized* Cosmology initialized
>              Source terms initialized
>               iteration, no. not moved =  0 0
>              refined: total leaf blocks =& nbsp; 1
>              refined: total blocks =  1
>               starting MORTON ORDERING
>               tot_blocks after  1
>               max_blocks 2 1
>               min_blocks 2 0
>              Finished initialising block: 1
>              INFO: Grid_fillGuardCells is ignoring masking.
>               iteration, no. not moved =  0 0
>              refined: total leaf blocks =  8< br>  refined: total
>             blocks =  9
>               iterati on, no. not moved =  0 7
>               iteration, no. not moved =  1 0
>              refined: total leaf blocks =  64
>              refined: total blocks =  73
>               iteration, no. not moved =  0 70
>               iteration, no. not moved =  1 7
>               iteration, no. not moved =  2 0
>              refined: total leaf blocks =  512
>              refined: total blocks =  585
>               ERROR in mpi_morton_bnd_ prol                      :
>             guard block starting index -3  not larger than lnblocks 1 
>             processor no.  8 maxblocks_alloc  40
>              ERROR in mpi_morton_bnd_prol&nbs p;                     :
>             guard block starting index -3  not larger than lnblocks 1 
>             processor no.  496 maxblocks_alloc  40
>              ERROR in mpi_morton_bnd_prol : guard block starting index
>             -3  not larger than lnblocks 1  processor no.  569 
>             maxblocks_alloc  40
>              ERROR in mpi_morton_bnd_prol       &n bsp;        &nb
>             sp;     : guard block starting index -3  not larger than
>             lnblocks 1 processor no.  172  maxblocks_alloc 40
>              ERROR in mpi_morton_bnd_prol                       :
>             guard block starting index -12  not larger than lnblocks
>             1  processor no.  368 maxblocks_alloc  40
>              ERROR in mpi_morton_bnd_prol : guard block starting index
>             -12 not larger than lnblocks 1 processor no.  189 
>             maxblocks_all oc  40
>             ...
>             ...
>             ...
>             Abort(1076419107) on node 442 (rank 442 in comm
>             -2080374784): application called MPI_A
>             bort(comm=0x84000000, 1076419107) - process 442
>             -----------------------------------------------------------------------------------------------------------------------
>
>             Thank you in advance.
>
>             Sincerely,
>
>                 Marco Mazzuoli
>
>
>
>             Ing. Marco Mazzuoli
>             Dipartimento di Ingegneria
>             delle Costruzioni, dell'Ambiente e
>             del Territorio (DICAT)
>             via Montallegro 1
>             16145 GENOVA-ITALY
>             tel.  +39 010 353 2497
>             cell. +39 338 7142904
>             e-mailmarco.mazzuoli at unige.it  <mailto:marco.mazzuoli at unige.it>
>                     marco.mazzuoli84 at gmail.com  <mailto:marco.mazzuoli84 at gmail.com>
>
>
>
>
>             *
>             **
>             **
>             **
>
>         *
>         < /font>*
>         *
>         *
>         **
>
>     *
>     **
>     ****
>     ****
>
> **
> **

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20130130/dca0b81c/attachment.htm>