<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi Marco, <br>
      <br>
      I think the error is still happening because maxblocks_alloc is
      still<br>
      too small.  Please experiment with increasing the value even more.<br>
      The default values in a 3D FLASH application are maxblocks=200 and<br>
      maxblocks_alloc=2000 (maxblocks_alloc=maxblocks*10).  You have<br>
      maxblocks_alloc=80.  It is perfectly fine to reduce the value of<br>
      maxblocks (and thus maxblocks_alloc), but there comes a point when
      the<br>
      buffers are too small for Paramesh to operate.<br>
      <br>
      <br>
      I've copied this email to the flash-users mailing list so that
      others<br>
      can see our complete email exchange which includes debugging<br>
      segmentation faults and running FLASH on BG/Q.<br>
      <br>
      For your other questions<br>
      <br>
      (i) The BG/Q error you show seems to be related to your runtime<br>
      environment and not FLASH.  We use cobalt on Mira BG/Q so your job<br>
      submission script is unfamiliar to me, however, it looks like
      bg_size<br>
      in your script specifies the number of nodes you want.  If so, it<br>
      should be set to 32 (and not 64) to give 1024 total MPI ranks and
      32<br>
      ranks per node.<br>
      <br>
      (ii) See the first paragraph of this email.<br>
      <br>
      (iii) Never use direct I/O.  You should be able to get the
      performance<br>
      you need from the FLASH parallel I/O implementations.  Please see
      "A<br>
      case study for scientific I/O: improving the FLASH astrophysics
      code"<br>
      (<a class="moz-txt-link-freetext" href="http://iopscience.iop.org/1749-4699/5/1/015001">http://iopscience.iop.org/1749-4699/5/1/015001</a>) for a discussion
      of<br>
      FLASH parallel I/O and usage of collective optimizations.<br>
      <br>
      (iv) The advantage of PFFT over FFTW is that PFFT was written by
      Anshu<br>
      so we have in-house knowledge of how it works.  I am unaware of
      any<br>
      performance comparisons between PFFT and FFTW.  <br>
      <br>
      It is probably possible to integrate FFTW in FLASH.  We have
      mapping<br>
      code inside the PFFT unit which is general enough to take FLASH
      mesh<br>
      data and create a slab decomposition for FFTW (where each MPI rank
      has<br>
      a complete 2D slab) instead of a pencil decomposition for PFFT
      (where<br>
      each MPI rank has a complete 1D pencil).<br>
      <br>
      (v) I don't know.  Maybe someone else can help with this.<br>
      <br>
      Chris<br>
      <br>
      <br>
      On 01/29/2013 09:07 AM, Marco Mazzuoli wrote:<br>
    </div>
    <blockquote cite="mid:DUB108-W227B0E5E2F9E30F618DFCFAB1F0@phx.gbl"
      type="cite">
      <style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style>
      <div dir="ltr">
            Dear Christopher,<br>
        <br>
        obviously you did agree. The error was due to memory corruption
        caused by a conflict between "qrealsize=8" and some parameter I
        defined as real(kind(1D0)) instead of real(kind(1.2E0)). Now
        Flash4 (version FLASH4-alpha_release) runs. Also the
        IO/IOMain/hdf5/parallel/PM works.<br>
        <br>
        I tried to implement the last version FLASH4 available on the
        website, but I met some troubles both in compiletime (in the
        interface of some Grid subroutines), which I have fixed, and in
        runtime. As for the runtime, I would ask you a pair of
        questions.<br>
        i) What does the following stdoutput, visualized just before the
        simulation to abort, mean?<br>
        <br>
--------------------------------------------------------------------------------------------------------------------<br>
         [Driver_initParallel]: Called MPI_Init_thread - requested
        level   2, given level   2<br>
        flash4: binding.c:290: _xlsmp_get_cpu_topology_lop: Assertion
        `thr < ncpus' failed.<br>
--------------------------------------------------------------------------------------------------------------------<br>
        <br>
        Please you can find attached here (jobANDmake.tar) the input
        file ("job.cmd") and the makefile with the compilation flags i
        put into "sites" ("Makefile.h").<br>
        Essentially I use a bgsize=64 (i.e. 1024 cores), with 32 ranks
        per node, on the BG/Q machine. Block size= 16x16x16,
        maxblocks=40, Max refinement level =6 (everywhere).<br>
        <br>
        ii) Notwithstanding I turned both maxblocks_tr (in tree.F90) and
        maxblocks_alloc (in paramesh_dimensions.F90) into 20*maxblocks,
        the following error holds when I use larger block dimensions
        (32x32x32) and maxblocks=8 (max refinement level=5):<br>
        <br>
--------------------------------------------------------------------------------------------------------------------<br>
        ...<br>
        ...<br>
         ERROR in process_fetch_list : guard block starting index -15 
        not larger than lnblocks 5  processor no.  40  maxblocks_alloc 
        80<br>
         ERROR in process_fetch_list : guard block starting index -3 
        not larger than lnblocks 5  processor no.  442  maxblocks_alloc 
        80<br>
         ERROR in process_fetch_list : guard block starting index -11 
        not larger than lnblocks 5  processor no.  92  maxblocks_alloc 
        80<br>
         ERROR in process_fetch_list : guard block starting index -11 
        not larger than lnblocks 5  processor no.  804  maxblocks_alloc 
        80<br>
         ERROR in process_fetch_list : guard block starting index -3 
        not larger than lnblocks 5  processor no.  218  maxblocks_alloc 
        80<br>
        Abort(0) on node 396 (rank 396 in comm -2080374784): application
        called MPI_Abort(comm=0x84000000, 0) - process 396<br>
        ...<br>
        ...<br>
--------------------------------------------------------------------------------------------------------------------<br>
        Why do you think it still occurs?<br>
        <br>
        iii) Do you know what is the speed up obtained by using
        IO/IOMain/direct/PM? Except the huge number of files, is there
        any other counter-indication?<br>
        <br>
        iv) From your experience could you explain me if it exists and,
        eventually, which is the advantage to use the PFFT instead of an
        other parallel fft library e.g. fftw3?<br>
        v) The direct solver implemented into the multigrid Poisson
        solver is based on the Ricker's algorithm (2008) and, for a
        rectangular domain with Dirichlet boundary conditions, it makes
        use of the Integrated Green's Function technique. The
        transform-space Green's function reads:<br>
        <br>
        G_{ijk}^l=-16\pi [ \Delta_{x_l}^{-2}\sin(i\pi/(2n_x))  + 
        \Delta_{y_l}^{-2}\sin(j\pi/(2n_y))  + 
        \Delta_{z_l}^{-2}\sin(k\pi/(2n_z)) ]^{-1}         \qquad (*)<br>
        <br>
        Could you suggest me how to obtain (*) in order me to be able to
        compute a similar Green's function which solves a
        non-homogeneous Helmholtz problem with Dirichlet boundary
        condition.<br>
        <br>
        It is clear that questions iv and v are out of the ordinary
        support, so I ask you whether you can give me some hints.<br>
        <br>
        Thank you so much for your suggestions, Christopher. They have
        been valuable.<br>
        <br>
        Sincerely,<br>
        <br>
            Marco<br>
        <pre><font style="font-size:10pt" color="#002060" size="2">

Ing. Marco Mazzuoli</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">Dipartimento di Ingegneria</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">delle Costruzioni, dell'Ambiente e</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">del Territorio (DICAT)</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">via Montallegro 1</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">16145 GENOVA-ITALY</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">tel.  +39 010 353 2497</font><font style="font-size:10pt" color="#002060" size="2">
cell. +39 338 7142904
</font><font style="font-size:10pt" color="#002060" size="2">e-mail <a class="moz-txt-link-abbreviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a></font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">       <a class="moz-txt-link-abbreviated" href="mailto:marco.mazzuoli84@gmail.com">marco.mazzuoli84@gmail.com</a>
</font><font style="font-size:10pt" size="2"><img moz-do-not-send="true" alt=""></font>



</pre>
        <br>
        <br>
        <div>
          <hr id="stopSpelling">Date: Mon, 28 Jan 2013 11:13:04 -0600<br>
          From: <a class="moz-txt-link-abbreviated" href="mailto:cdaley@flash.uchicago.edu">cdaley@flash.uchicago.edu</a><br>
          To: <a class="moz-txt-link-abbreviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a><br>
          Subject: Re: [FLASH-USERS] ERROR in mpi_morton_bnd_prol<br>
          <br>
          <div class="ecxmoz-cite-prefix">Hi Marco,<br>
            <br>
            I've had a quick glance at the minGridSpacing subroutine and
            it<br>
            looks OK.  Nonsensical results like this are normally an<br>
            indication of earlier memory corruption.  One other thing
            that is<br>
            possible is that the sizeof dxMin is different to dxMinTmp. 
            If<br>
            dxMin is declared with an explicit "kind=" then it may not
            be the<br>
            same size as that specified by MPI_DOUBLE_PRECISION.<br>
            <br>
            Since you already have "MaxRef" I think you can simplify
            your<br>
            subroutine to just<br>
            <br>
            use Grid_data, ONLY : gr_delta<br>
            dxMin = gr_delta(1:MDIM,MaxRef)<br>
            <br>
            <br>
            You attached a serial I/O FLASH subroutine and not parallel
            I/O.<br>
            Again I suspect earlier memory corruption is causing a
            problem.<br>
            <br>
            Try the following unit test on BG/Q.  It uses very similar
            units<br>
            to you.  If it works then it is a strong indication that
            your<br>
            custom code has a bug which is causing memory corruption.<br>
            <br>
            ./setup unitTest/PFFT_PoissonFD -auto -3d -maxblocks=200
            +pm4dev -parfile=test_paramesh_3d_64cube.par<br>
            <br>
            Try it with 1 node and 16 MPI ranks.  Then add 1 to both<br>
            lrefine_min and lrefine_max and run it again with 8 nodes
            and 128<br>
            MPI ranks.  Repeat as you wish and go as big as you like. 
            You<br>
            should find that this unit test works without problem and
            the<br>
            Linf value should keep getting less as you add refinement
            levels.<br>
            <br>
            <br>
            I would remove the qsmp=omp:noauto<br>
            -L/opt/ibmcmp/xlsmp/bg/3.1/lib64/ -lxlsmp flags because you
            are<br>
            not using OpenMP.  In general, be careful with the -qsmp
            flag as<br>
            by default it adds -qhot aggressive compilation option.<br>
            <br>
            It is only in the last few months that our target
            application for<br>
            Early Science on Mira BG/Q can use the aggr
            essive compilation<br>
            flag -qhot without generating bad answers.  Maybe it is
            still<br>
            causing problems in your application.<br>
            <br>
            Chris<br>
            <br>
            <br>
            <br>
            <br>
            On 01/28/2013 08:04 AM, Marco Mazzuoli wrote:<br>
          </div>
          <blockquote
            cite="mid:DUB108-W366344B68A27ECC78D6E5EAB180@phx.gbl">
            <style><!--
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}

--></style>
            <div dir="ltr">     Dear Christopher,<br>
              <br>
              I tried the code both by using serial hdf5 writing and
              parallel hdf5 writing.<br>
              <br>
              i) As for the former case, I found that, by implementing
              the code into the BG/Q machine, an error occurs in the
              following (homemade) subroutine which saves the minimal
              grid spacing for each Cartesian direction in the array
              dXmin [real,dimension(3)]:<br>
              -----
----------------------------------------------------------------------------<br>
              SUBROUTINE minGridSpacing()<br>
              <br>
                    USE Grid_data,      ONLY: gr_meshComm<br>
                    USE Dns_data,       ONLY: dXmin, MaxRef<br>
                    use Grid_interface, ONLY: Grid_getListOfBlocks,
              Grid_getDeltas<br>
                    use tree,           ONLY: lrefine<br>
              <br>
                    IMPLICIT NONE<br>
              <br>
              #include "constants.h"<br>
              #include "Flash.h"<br>
              #include "Flash_mpi.h"<br>
              <br>
                    INTEGER,DIMENSION(MAXBLOCKS) :: blockList<br>
                    INTEGER              &n
              bsp;       :: blockCount<br>
                    INTEGER                      :: lb, ierr<br>
              <br>
                    REAL,DIMENSION(MDIM)         :: dXminTMP<br>
              <br>
                    CALL
              Grid_getListOfBlocks(ALL_BLKS,blockList,blockCount)<br>
              <br>
              !     Initialization of dXminTMP<br>
                    dXminTMP(:)=9999.<br>
              <br>
                    DO lb = 1, blockCount<br>
              <br>
                      IF (lrefine(lb) .EQ. MaxRef) THEN<br>
                        CALL Grid_getDeltas(blockList(lb),dXminTMP)<br>
                        IF (ANY(dXminTMP.GE
              .0.)) EXIT<br>
                      END IF<br>
              <br>
                    END DO<br>
              !    ****** PRINT 1 ******<br>
                    PRINT*,dXminTMP(1),dXminTMP(2),dXminTMP(3),MaxRef<br>
              !    *********************<br>
              !      CALL MPI_Barrier (gr_meshComm, ierr)<br>
              <br>
              !     find the smallest grid spacing for each direction
              among all the blocks:<br>
                    CALL MPI_ALLREDUCE(dXminTMP, dXmin, 3,
              MPI_DOUBLE_PRECISION,       &<br>
                         MPI_MIN, gr_meshComm, ierr)<br>
              <br>
              !    ****** PRINT 2 ******<br>
                    PRINT*,dXmin(1),dXmin(2),dXmin(3)<br>
              !    *********************<br>
              <br>
                    IF(ierr.NE.0)PRINT*,"minGridSpacing(): MPI error!"<br>
              <br>
                    END SUBROUTINE minGridSpacing<br>
---------------------------------------------------------------------------------<br>
              <br>
              The stdoutput of the PRINTs are:<br>
              <br>
               0.970785156250000003E-01 0.112096874999999999
              0.112096874999999999 6<br>
               0.20917539062499999891198143586734659
              0.11209687499999999860111898897230276
              0.00000000000000000000000000000000000E+00<br>
              <br>
              (each one repeated 1024 times)<br>
              I do not know why the variables dXminTMP and dXmin contain
              different values (none of the stdoutput of dXminTMP is
              equal to dXmin).<br>
              Could you suggest me why? Is it regular that the output
              format is so different? Maybe some of the following flags
              I impose to the fortran compiler (mpixlf90_r) is wrong
              ?<br>
              <br>
              FFLAGS_OPT   = -g -O3 -qstrict -qsimd -qunroll=yes
              -qarch=qp -qtune=qp -q64 -qrealsize=8 -qthreaded -qnosave
              -qfixed -c \<br>
                             -qlistopt -qreport -WF,-DIBM
              -qsmp=omp:noauto -L/opt/ibmcmp/xlsmp/bg/3.1/lib64/ -lxlsmp<br>
              <br>
              ii) As for the latter case (parallel hdf5 writing), I
              detected the error I met last week (see previous
              conversation below). It is between line 660 and line 705
              of "io_writeData.F90" which please you can find attached
              here. Do you know why the "segmentation fault" may occur
              here?<br>
              <br>
              Thank you again for your help, Christopher.<br>
              <br>
              Sincerely,<br>
              <br>
                  Marco<br>
              <pre><font style="font-size:10pt" color="#002060" size="2">

Ing. Marco Mazzuoli</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">Dipartimento di Ingegneria</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">delle Costruzioni, dell'Ambiente e</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">del Territorio (DICAT)</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">via Montallegro 1</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">16145 GENOVA-ITALY</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">tel.  +39 010 353 2497</font><font style="font-size:10pt" color="#002060" size="2">
cell. +39 338 7142904
</font><font style="font-size:10pt" color="#002060" size="2">e-mail <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a></font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">       <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli84@gmail.com">marco.mazzuoli84@gmail.com</a>
</font><font style="font-size:10pt" size="2"><img moz-do-not-send="true" alt=""></font>



</pre>
              <br>
              <br>
              <div>
                <hr id="ecxstopSpelling">Date: Thu, 24 Jan 2013 10:19:34
                -0600<br>
                From: <a moz-do-not-send="true"
                  class="ecxmoz-txt-link-abbreviated"
                  href="mailto:cdaley@flash.uchicago.edu">cdaley@flash.uchicago.edu</a><br>
                To: <a moz-do-not-send="true"
                  class="ecxmoz-txt-link-abbreviated"
                  href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a><br>
                Subject: Re: [FLASH-USERS] ERROR in mpi_morton_bnd_prol<br>
                <br>
                <div class="ecxmoz-cite-prefix">Unfortunately, none of
                  the units that I have multithreaded are in<br>
                  your simulation.<br>
                  <b> If your problem fits in 512MB memory, I recommend
                    you run FLASH<br>
                    with 32 MPI ranks per node on BG/Q.  If I recall
                    correctly, the<br>
                    PPC core can execute 2 instructions per clock cycle,
                    but the<br>
                    instructions must be issued by different hardware
                    threads.<br>
                    Placing multiple MPI ranks per core achieves this
                    aim and allows<br>
                    us to hide memory latency.  Have a look at<br>
                    <a moz-do-not-send="true"
                      class="ecxmoz-txt-link-freetext"
                      href="http://flash.uchicago.edu/%7Ecdaley/Siam/Daley_MS30-3.pdf"
                      target="_blank">http://flash.uchicago.edu/~cdaley/Siam/Daley_MS30-3.pdf</a>
                    on page<br>
                    12.  By eye, a 16 MPI ranks per node run with 1
                    thread took ~275<br>
                    seconds and a 32 MPI ranks per node run with 1
                    thread took ~190<br>
                    seconds.<br>
                    <br>
                    You should also setup FLASH with +pm4dev.  This is
                    the latest<br>
                    Paramesh with Flash Center enhancements to make it
                    scale better.<br>
                    You should also use the latest FLASH release.<br>
                    <br>
                    In terms of debugging, you really need to look at
                    core file 948<br>
                    to find the instruction which caused the
                    segmentation fault.<br>
                    Most likely, there is some form of memory corruption
                    which you<br>
                    need to identify.<br>
                    <br>
                    It may be useful to setup FLASH without I/O (+noio)
                    and see if<br>
                    your simulation still fails.  You can compare the
                    integrated<br>
                    quantities file (default name is flash.dat) with a
                    healthy<br>
                    simulation run on your local cluster to see if it is
                    working<br>
                    correctly.<br>
                    <br>
                    It may be useful to remove compiler optimization fla
                    gs and use<br>
                    -O0 to see if optimization is causing a problem.<br>
                    <br>
                    Good luck, <br>
                    Chris<br>
                    <br>
                    <br>
                    On 01/24/2013 03:51 AM, Marco Mazzuoli wrote:<br>
                  </b></div>
                <b>
                  <blockquote
                    cite="mid:DUB108-W7ABC08C19CDA4CB3AEF6FAB140@phx.gbl">
                    <style><!--
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}

--></style>
                    <div dir="ltr">     Dear Christopher,<br>
                      <br>
                      thank you again. Of course, with the
                      multithreading up to 4 tasks per core (64 ranks
                      per node) using parallelization OpenMP you should
                      obtain the best performance because the threading
                      is still hardware, not software. Are there
                      routines which use also OpenMP libraries for
                      multithreading at the moment?<br>
                      Anyway, please you can find attached here the
                      setup_units file from my object directory. I have
                      heavily modified the Flash code (Physics) in order
                      to adapt it to my aim.<br>
                      However, the IO, the Poisson solver, the Paramesh4
                      AMR as well as the basic structure of Flash has
                      been kept unchanged.<br>
                      In particular, nothing has been changed of the
                      initialization (this why I am asking your help).<br>
                      <br>
                      I suppose the error I meet, comes from a writing
                      error on the BG/Q machine because the same code on
                      a Linux cluster machine works very well.<br>
                      If you had some further idea for my troubles
                      please let me know.<br>
                      Thank you very much, Christopher.<br>
                      <br>
                      Sincerely,<br>
                      <br>
                          Marco<br>
                      <pre><font style="font-size:10pt" color="#002060" size="2">

Ing. Marco Mazzuoli</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">Dipartimento di Ingegneria</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">delle Costruzioni, dell'Ambiente e</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">del Territorio (DICAT)</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">via Montallegro 1</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">16145 GENOVA-ITALY</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">tel.  +39 010 353 2497</font><font style="font-size:10pt" color="#002060" size="2">
cell. +39 338 
7142904
</font><font style="font-size:10pt" color="#002060" size="2">e-mail <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a></font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">       <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli84@gmail.com">marco.mazzuoli84@gmail.com</a>
</font><font style="font-size:10pt" size="2"><img moz-do-not-send="true" alt=""></font>



</pre>
                      <br>
                      <br>
                      <div>
                        <hr id="ecxstopSpelling">Date: Wed, 23 Jan 2013
                        11:43:57 -0600<br>
                        From: <a moz-do-not-send="true"
                          class="ecxmoz-txt-link-abbreviated"
                          href="mailto:cdaley@flash.uchicago.edu">cdaley@flash.uchicago.edu</a><br>
                        To: <a moz-do-not-send="true"
                          class="ecxmoz-txt-link-abbr
                          eviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a><br>
                        Subject: Re: [FLASH-USERS] ERROR in
                        mpi_morton_bnd_prol<br>
                        <br>
                        <div class="ecxmoz-cite-prefix">It is failing
                          with a segmentation fault (signal 11).<br>
                          <br>
                          You should look at the stderr file and also
                          core file 948.  There<br>
                          is a tool named bgq_stack which reports the
                          stack trace in a core<br>
                          file.<br>
                          <br>
                          bgq_stack ./flash4 core.948<br>
                          <br>
                          If unavailable you can translate the stack
                          addresses in this core<br>
                          file one at a time using addr2line.<br>
                          <br>
                          If the failing line is an innocent looking
                          piece of FLASH code<br>
                          then most like
                          ly there is some form of memory corruption
                          earlier<br>
                          on.  Maybe there is something specific to your
                          simulation or<br>
                          perhaps your HDF5 needs to be recompiled
                          against the latest<br>
                          driver version on BG/Q.<br>
                          <br>
                          A good debugging method is to try to reproduce
                          the same error in<br>
                          a smaller problem so that you can then repeat
                          the run on a local<br>
                          workstation.  Once on a local workstation you
                          can debug much more<br>
                          interactively and use excellent tools like
                          valgrind.<br>
                          <br>
                          Could you send me the setup_units from your
                          FLASH object<br>
                          directory?  I'm curious what units you are
                          using.  We have been<br>
                          doing a lot of work on the BG/Q recently
                          including multithreading<br>
                          the FLASH units needed for Supernova
                          simulations.  Our Supernova<br>
                          simulations are currently running on Mira BG/Q
                          - we are using 16<br>
                          MPI ranks per node and 4 OpenMP threads per
                          MPI rank.<br>
                          <br>
                          Chris<br>
                          <br>
                          <br>
                          On 01/23/2013 11:19 AM, Marco Mazzuoli wrote:<br>
                        </div>
                        <blockquote
                          cite="mid:DUB108-W282F61DC478D8AF1D0AB52AB150@phx.gbl">
                          <style><!--
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}

--></style>
                          <div dir="ltr"> Thank you Christopher,<br>
                            <br>
                            indeed I could use the UG at this stage, but
                            I am t
                            esting the code on the bluegene machine in
                            order asap to make larger simulations which
                            need the AMR to be used.<br>
                            <br>
                            Actually, I have already solved the problem
                            by reducing the dimensions of the blocks
                            (16x16x16) and by introducing a finer
                            refinement level. But of course the solution
                            you proposed sounds better. I guess it is
                            better to use the largest blocks with the
                            smallest number of cores.<br>
                            <br>
                            If I can I would ask you an other question.
                            Just after the initialization, when the code
                            writes the first checkpoint file (at row54
                            of io_initFile.F90: "call
                            io_h5init_file(fileID, filename, io_comm,
                            io_outputSplitNum)"), the run crashes giving
                            the following message in the stdout
                            put:<br>
                            <br>
---------------------------------------------------------------------------------------------------------<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "igrav"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "mgrid_max_iter_change"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "mgrid_solve_max_iter"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "mgrid_print_norm"...<br>
                             Runtim eParameters_read:  ignoring unknown
                            parameter "msgbuffer"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "eint_switch"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "order"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "slopeLimiter"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "LimitedSlopeBeta"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "charLimiting"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "use_avisc"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "use_flattening"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "use_steepening"...<br>
                             R untimeParameters_read:  ignoring unknown
                            parameter "use_upwindTVD"...<br>
                             RuntimeParameters_read:  ignoring unkno
                            wn parameter "RiemannSolver"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "entropy"...<br>
                             RuntimeParameters_read:  ignoring unknown
                            parameter "shockDetect"...<br>
                             MaterialProperties initialized<br>
                             Cosmology initialized<br>
                             Source terms initialized<br>
                              iteration, no. not moved =  0 0<br>
                             refined: total leaf blocks =  1<br>
                             refined: total blocks =  1<br>
                              starting MORTON ORDERING<br>
                              tot_blocks after  1<br>
                              max_blocks 2 1<br>
                              min_blocks 2 0<br>
                             Finished initialising block: 1<br>
                             INFO: Grid_fillGuardCells is ignoring
                            masking.<br>
                              iteration, no. not moved =  0 0<br>
                             refined: total leaf blocks =  8<br>
                             refined: total blocks =  9<br>
                              iteration, no. not moved =  0 7<br>
                              iteration, no. not moved =  1 0<br>
                             refined: total leaf blocks =  64<br>
                             refined: total blocks =  73<br>
                              iteration, no. not moved =  0 70<br>
                              iteration, no. not moved =  1 7<br>
                              iteration, no. not moved =  2 0<br>
                             refined: total leaf blocks =  512<br>
                             refined: total blocks =  585<br>
                              iteration, no. not moved =  0 583<br>
                              iteration, no. not moved =  1 21<br>
                              iteration, no. not moved =  2 0<br>
                             refined: tot al leaf blocks =  4096<br>
                             refined: total blocks =  4681<br>
                             Finished initialising block: 5<br>
                             Finished initialising block: 6<br>
                              iteration, no. not moved =  0 904<br>
                              iteration, no. not moved =  1 0<br>
                             refined: total leaf blocks =  32768<br>
                             refined: total blocks =  37449<br>
                             Finished initialising block: 6<br>
                             Finished initialising block: 7<br>
                             Finished initialising block: 8<br>
                             Finished initialising block: 9<br>
                             Finished initialising block: 10<br>
                             Finished initialising block: 11<br>
                             Finished initialising block: 12<br>
                            &nbs
                            p;Finished initialising block: 13<br>
                             Finished initialising block: 15<br>
                             Finished initialising block: 16<br>
                             Finished initialising block: 17<br>
                             Finished initialising block: 18<br>
                             Finished initialising block: 19<br>
                             Finished initialising block: 20<br>
                             Finished initialising block: 21<br>
                             Finished initialising block: 22<br>
                             Finished initialising block: 24<br>
                             Finished initialising block: 25<br>
                             Finished initialising block: 26<br>
                             Finished initialising block: 27<br>
                             Finished initialising block: 28<br>
                            ...<br>
                            ...<br>
                              Finished with Grid_initDomain, no restart<br>
                             Ready to call Hydro_init<br>
                             Hydro initialized<br>
                             Gravity initialized<br>
                             Initial dt verified<br>
                            2013-01-23 17:43:12.006 (WARN )
                            [0x40000e98ba0] :97333:ibm.runjob.cli
                            ent.Job: terminated by signal 11<br>
                            2013-01-23 17:43:12.011 (WARN )
                            [0x40000e98ba0]
                            :97333:ibm.runjob.client.Job: abnormal
                            termination by signal 11 from rank 948<br>
---------------------------------------------------------------------------------------------------------<br>
                            <br>
                            In particular 8 cores call the subroutine
                            "io_h5init_file" before the run crashes (I
                            checked it during the debug).<br>
                            What do you think it could depend on?<br>
                            <br>
                            Thank you again, Christopher.<br>
                            Since
                            rely,<br>
                            <br>
                                Marco<br>
                            <pre><font style="font-size:10pt" color="#002060" size="2">

Ing. Marco Mazzuoli</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">Dipartimento di Ingegneria</font><font style="font-size:10pt" color="#002060" size="2">
<font style="font-size:10pt" color="#002060" size="2">delle Costruzioni, dell'Ambiente e</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">del Territorio (DICAT)</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">via Montallegro 1</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">16145 GENOVA-ITALY</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="" color="#002060" size="2">tel.  +39 010 353 2497</font><font style="font-size:10pt" color="#002060" size="2">
cell. +39 338 7142904
</font><font style="font-size:10pt" color="#002060" size="2">e-mail <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a></font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="" color="#002060" size="2">       <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli84@gmail.com">marco.mazzuoli84@gmail.com</a>
</font><font style="font-size:10pt" size="2"><img moz-do-not-send="true" alt=""></font>



</font></pre>
                            <font style="font-size:10pt" color="#002060"
                              size="2"> <br>
                              <br>
                              <div>
                                <hr id="ecxstopSpelling">Date: Wed, 23
                                Jan 2013 10:16:20 -0600<br>
                                From: <a moz-do-not-send="true"
                                  class="ecxmoz-txt-link-abbreviated"
                                  href="mailto:cdaley@flash.uchicago.edu">cdaley@flash.uchicago.edu</a><br>
                                To: <a moz-do-not-send="true"
                                  class="ecxmoz-txt-link-abbreviated"
                                  href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a><br>
                                CC: <a moz-do-not-send="true"
                                  class="ecxmoz-txt-link-abbreviated"
                                  href="mailto:flash-users@flash.uchicago.edu">flash-users@flash.uchicago.ed

                                  u</a><br>
                                Subject: Re: [FLASH-USERS] ERROR in
                                mpi_morton_bnd_prol<br>
                                <br>
                                <div class="ecxmoz-cite-prefix">Hi
                                  Marco,<br>
                                  <br>
                                  You should increase maxblocks because
                                  a value of maxblocks=4 is too<br>
                                  low.  Y
                                  ou are using large blocks (32^3) and
                                  so memory usage prevents<br>
                                  you setting maxblocks too high, but I
                                  would suggest for this problem<br>
                                  you need a value of at least 10 for
                                  maxblocks.<br>
                                  <br>
                                  The specific error you show is
                                  happening because data cannot be found<br>
                                  in a array of size maxblocks_alloc,
                                  where the default value of<br>
                                  maxblocks_alloc is maxblocks * 10. 
                                  Inte rnally Paramesh has many<br>
                                  arrays of size maxblocks_alloc which h
                                  old e.g. various information<br>
                                  about the local view of the oct-tree. 
                                  A technique we have used in the<br>
                                  past when there is insufficie
                                  nt memory to make maxblocks much
                                  larger<br>
                                  and we need to avoid errors like you
                                  show is to make maxblocks_alloc<br>
                                  larger in amr_initialize.F90, e.g.
                                  maxblocks_alloc = maxblocks * 20.<br>
                                  You should also change maxblocks_tr to
                                  be the same size as<br>
                                  maxblocks_alloc.<br>
                                  <br>
                                  Finally, if you don't need AMR then
                                  you should use the FLASH uniform<br>
                                  grid (you have a fully refined domain
                                  at level 5).  Memory usage will<br>
                                  be less and guard cell fills will be
                                  much faster.<br>
                                  <br>
                                  Chris<br>
                                  <br>
                                  <br>
                                  On 01/18/2013 10:47 AM, Marco Mazzuoli
                                  wrote:<br>
                                </div>
                                <style><!--
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}

--></style>
                                <div dir="ltr">     Dear Flash users,<br>
                                  <br>
                                  I am trying to run Flash on a
                                  bluegene-type supercomputer.<br>
                                  The details of the present run are:<br>
                                  <br>
                                  #procs=1024 on #64 nodes (#16procs per
                                  node).<br>
                                  <br>
                                  The domain is rectangular.<br>
                                  Block size = 32x32x32 computational
                                  points<br>
                                  Max refinement level = 5<br>
                                  The whole domain is refined at level =
                                  5 such that N°bloc
                                  ks=1+8+64+512+4096=4681<br>
                                  Max_blocks per core = 4<br>
                                  <br>
                                  Do you know what the initialization
                                  error visualized in the standard
                                  output and proposed in the following,
                                  could depend on?<br>
                                  <br>
-----------------------------------------------------------------------------------------------------------------------<br>
                                  & nbsp;RuntimeParameters_read: 
                                  ignoring unknown parameter "igrav"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter
                                  "mgrid_max_iter_change"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter
                                  "mgrid_solve_max_iter"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown para
                                  meter "mgrid_print_norm"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "msgbuffer"...<br>
                                   RuntimeParameters_read:&n bsp;
                                  ignoring unknown parameter
                                  "eint_switch"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "order"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "slopeLimiter"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter
                                  "LimitedSlopeBeta"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "charLimiting"...<br>
                                    RuntimeParameters_read:  ignoring
                                  unknown parameter "use_avisc"...<
                                  br>  RuntimeParameters_read: 
                                  ignoring unknown parameter
                                  "use_flattening"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "use_steepening"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "use_upwindTVD"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "RiemannSolver"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "entropy"...<br>
                                   RuntimeParameters_read:  ignoring
                                  unknown parameter "shockDetect"...<br>
                                   MaterialProperties initialized<br>
                                   Cosmology initialized<br>
                                   Source terms initialized<b>
                                     Cosmology initialized<br>
                                     Source terms initialized<br>
                                      iteration, no. not moved =  0 0<br>
                                     refined: total leaf blocks =&
                                    nbsp; 1<br>
                                     refined: total blocks =  1<br>
                                      starting MORTON ORDERING<br>
                                      tot_blocks after  1<br>
                                      max_blocks 2 1<br>
                                      min_blocks 2 0<br>
                                     Finished initialising block: 1<br>
                                     INFO: Grid_fillGuardCells is
                                    ignoring masking.<br>
                                      iteration, no. not moved =  0 0<br>
                                     refined: total leaf blocks =  8<
                                    br>  refined: total blocks =  9<br>
                                      iterati
                                    on, no. not moved =  0 7<br>
                                      iteration, no. not moved =  1 0<br>
                                     refined: total leaf blocks =  64<br>
                                     refined: total blocks =  73<br>
                                      iteration, no. not moved =  0 70<br>
                                      iteration, no. not moved =  1 7<br>
                                      iteration, no. not moved =  2 0<br>
                                     refined: total leaf blocks =  512<br>
                                     refined: total blocks =  585<br>
                                      ERROR in mpi_morton_bnd_
                                    prol                      : guard
                                    block starting index -3  not larger
                                    than lnblocks 1  processor no.  8 
                                    maxblocks_alloc  40<br>
                                     ERROR in
                                    mpi_morton_bnd_prol&nbs
                                    p;                     : guard block
                                    starting index -3  not larger than
                                    lnblocks 1  processor no.  496 
                                    maxblocks_alloc  40<br>
                                     ERROR in
                                    mpi_morton_bnd_prol                     
                                    : guard block starting index -3  not
                                    larger than lnblocks 1  processor
                                    no.  569  maxblocks_alloc  40<br>
                                     ERROR in
                                    mpi_morton_bnd_prol       &n
                                    bsp;        &nb
                                    sp;     : guard block starting index
                                    -3  not larger than lnblocks 1 
                                    processor no.  172  maxblocks_alloc 
                                    40<br>
                                     ERROR in mpi_morton_bnd_prol
                                                          : guard block
                                    starting index -12  not larger than
                                    lnblocks 1  processor no.  368 
                                    maxblocks_alloc  40<br>
                                     ERROR in
                                    mpi_morton_bnd_prol                     
                                    : guard block starting index -12 
                                    not larger than lnblocks 1 
                                    processor no.  189  maxblocks_all
                                    oc  40<br>
                                    ...<br>
                                    ...<br>
                                    ...<br>
                                    Abort(1076419107) on node 442 (rank
                                    442 in comm -2080374784):
                                    application called MPI_A
                                    bort(comm=0x84000000, 1076419107) -
                                    process 442<br>
-----------------------------------------------------------------------------------------------------------------------<br>
                                    <br>
                                    Thank you in advance.<br>
                                    <br>
                                    Sincerely,<br>
                                    <br>
                                        Marco Mazzuoli<br>
                                    <pre><font style="font-size:10pt" color="#002060" size="2">

Ing. Marco Mazzuoli</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">Dipartimento di Ingegneria</font><font color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">delle Costruzioni, dell'Ambiente e</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">del Territorio (DICAT)</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">via Montallegro 1<font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">16145 GENOVA-ITALY</font><font style="font-size:10pt" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="2">tel.  +39 010 353 2497</font><font style="font-size:10pt" color="#002060" size="2">
cell. +39 338 7142904
</font><font style="font-size:10pt" color="#002060" size="2">e-mail <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli@unige.it">marco.mazzuoli@unige.it</a></font><font style="font-size:10p\00000d\00000at" color="#002060" size="2">
</font><font style="font-size:10pt" color="#002060" size="
2">       <a moz-do-not-send="true" class="ecxmoz-txt-link-abbreviated" href="mailto:marco.mazzuoli84@gmail.com">marco.mazzuoli84@gmail.com</a>
</font><font style="font-size:10pt" size="2"><img moz-do-not-send="true" alt=""></font>



</font></pre>
                                    <font style="font-size:10pt"
                                      color="#002060" size="2"> </font></b></div>
                                <b> <font style="font-size:10pt"
                                    color="#002060" size="2"> </font></b></div>
                              <b> </b></font></div>
                          <b> <font style="font-size:10pt"
                              color="#002060" size="2"> </font></b></blockquote>
                        <b> <font style="font-size:10pt"
                            color="#002060" size="2"> <font
                              style="font-size:10pt" color="#002060"
                              size="2"> <br>
                              < /font></font></font></b></div>
                      <b> <font style="font-size:10pt" color="#002060"
                          size="2"> <font style="font-size:10pt"
                            color="#002060" size="2"> <br>
                          </font> </font></b></div>
                    <b> <font style="font-size:10pt" color="#002060"
                        size="2"> </font></b></blockquote>
                  <b> <font style="font-size:10pt" color="#002060"
                      size="2"> <br>
                    </font></b></b></div>
              <b><b> </b></b></div>
            <b><b> </b></b></blockquote>
          <b><b> <br>
            </b></b></div>
      </div>
    </blockquote>
    <br>
  </body>
</html>