[FLASH-USERS] maximum number of cores ever used to run FLASH?

Thu Sep 18 09:51:44 EDT 2014

Thanks for this great description, Sean. Given that FLASH has hybrid
OpenMP/MPI for most of the physics, have you tried running this in
symmetric mode on the Phis on Stampede? I think it'd be a great study. I
suspect you'd want to turn off I/O entirely on the MICs, but we'd be
interested in talking about some of the options there as well.

Thanks,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435             |   Fax:   (512) 475-9445

On 9/18/14, 8:43 AM, "Sean Couch" <smc at flash.uchicago.edu> wrote:

>Sure thing.  
>
>
>This was for my custom core-collapse supernova application.*
>Functionally, it is nearly identical to the CCSN application
>(source/Simulation/SimulationMain/CCSN) packaged with the latest release
>of FLASH (v4.2.2), except that I¹m using MHD rather than
> plain hydro.  This setup uses the unsplit staggered mesh MHD solver,
>detailed mircrophysical tabular EOS (source/physics/Eos/EosMain/Nuclear),
>the new multipole gravity solver (source/Grid/GridSolvers/Multipole_new,
>Couch, Graziani, & Flocke, 2013, ApJ, 778,
> 181), approximate neutrino transport via a leakage scheme
>(source/physics/RadTrans/NeutrinoLeakage), and AMR via PARAMESH.
>
>
>The scaling study was done on BG/Q Mira at Argonne Leadership Computing
>Facility.  To control the number of AMR blocks per core, I use a custom
>version of Grid_markRefineDerefine.F90 that forces refinement up to the
>maximum level within a runtime-specified
> radius.  This test employed hybrid parallelism with AMR blocks
>distributed amongst the MPI ranks and OpenMP threading
>within block (i.e., the i,j,k loops are threaded).  I used 24^3 zones per
>block (this reduces the fractional memory overhead of guardcells and the
>communication per rank per step).  This application strong scales like a
>champ (Fig. 1 below), being fairly
> efficient down to ~4 AMR blocks per MPI rank.
>
>
>On hardware, Mira is a BG/Q with 16 cores per node, 1 GB memory per core,
>and capable of 4 hardware threads per core.  My application clocks in at
>memory usage per rank of about 1200 MB (large EOS table, MHD has lots of
>extra face variables, scratch arrays,
> and my application defines a number of new grid variables).  Thus, I
>have to run 8 MPI ranks per node in order to fit in memory.  I therefore
>run with 8 OpenMP threads per MPI rank.  This is not ideal; not every
>part of FLASH is threaded (I¹m looking at you,
> Grid, IOŠ).  The heavy-lifting physics routines are threaded and with
>24^3 zones per block and within-block threading, the thread-to-thread
>speedup is acceptable even up to 8 threads.
>
>
>The big run (32,768 nodes, 524,288 cores) had 2,097,152 leaf blocks (~29
>billion zones), 2,396,744 total blocks, and used 262,144 MPI ranks (thus
>8 leaf blocks per MPI rank).
>
>
>Note that Mira has an extremely fast communication fabric!  YMMV on other
>systems.  I have run a much smaller weak scaling study on TACC Stampede
>up to 4096 cores and it is also essentially perfect, but I have yet to go
>to any significant core count on
> Stampede (see Fig. 2).
>
>
>Hope this is helpful and informative!
>
>
>Sean
>
>
>
>
>
>
>
>
>
>Fig. 1 - Strong scaling of core-collapse SN application on Mira
>
>
>
>
>
>
>
>
>
>Fig. 2 - Weak scaling of FLASH CCSN application on TACC Stampede
>
>
>* - This particular simulation was started substantially ³post-bounce² in
>the parlance of the CCSN community.  Thus the shock was at a moderate
>radius and the neutrino leakage treatment was active.  The initial
>progenitor model package with FLASH¹s CCSN
> application is at the pre-bounce, collapse phase.  Therefore, if you
>want to run this scaling test yourself, you will have to generate
>post-bounce initial conditions by running the 1D CCSN application to an
>adequate post-bounce time, then converting those
> 1D results into the ASCII format used by the 1D initial conditions
>reader.
>
>
>
>--------------------------------------------------------
>Sean M. Couch, Ph.D.
>Flash Center for Computational Science
>Department of Astronomy & Astrophysics
>The University of Chicago
>5747 S Ellis Ave, Jo 315
>Chicago, IL  60637
>(773) 702-3899 - office
>(512) 997-8025 - cell
>www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
>
>
>
>
>
>
>
>
>
>
>
>On Sep 18, 2014, at 5:23 AM, Richard Bower <r.g.bower at durham.ac.uk> wrote:
>
>
>
>I'm very keen to see this too  (although I've not been running anything
>big with flash)... could you say something about the memory per
>core/node? This could be very useful for our next procurement... Richard
>
>
>On 18 Sep 2014, at 07:46, Stefanie Walch wrote:
>
>Hi Sean,
>
>Could you tell me which setup you used for the nice scaling plot you sent
>around?
>
>Cheers,
>Stefanie
>===================================
>Prof. Dr. Stefanie Walch
>Physikalisches Institut I
>Universität zu Köln
>Zülpicher Straße 77
>50937 Köln
>Germany
>email: walch at ph1.uni-koeln.de
>phone: +49 (0) 221 4703497
>
>On 17 Sep 2014, at 20:41, Sean Couch <smc at flash.uchicago.edu> wrote:
>
>For fun, let¹s play throwdown.  Can anybody beat 525k cores (2 million
>threads of execution)?  See attached (1 Mira node = 16 cores).
>
>Sean
>
><wkScaling.pdf>
>
>--------------------------------------------------------
>Sean M. Couch
>Flash Center for Computational Science
>Department of Astronomy & Astrophysics
>The University of Chicago
>5747 S Ellis Ave, Jo 315
>Chicago, IL  60637
>(773) 702-3899 - office
>www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
>
>
>
>
>
>
>On Sep 17, 2014, at 1:30 PM, Rodrigo Fernandez <rafernan at berkeley.edu>
>wrote:
>
>Dear FLASH Users/Developers,
>
>Does anybody know the maximum number of cores that FLASH has ever been
>run successfully with? Any reference for this? I need the information for
>a computing proposal.
>
>Thanks!
>
>Rodrigo
>
>
>
>
>
>
>
>
>
>
>
>
>--------------------------------------------------------------------------
>----------------------------
>Prof. Richard Bower                              Institute for
>Computational Cosmology
>                  
>University of Durham
>+44-191-3343526                                 r.g.bower at durham.ac.uk
>--------------------------------------------------------------------------
>----------------------------
>
>
>
>
>
>
>
>
>
>