[FLASH-USERS] maximum number of cores ever used to run FLASH?

Bill Barth bbarth at tacc.utexas.edu
Thu Sep 18 10:24:28 EDT 2014


OK, cool. We're here to support porting and tuning with funded people if
y'all would be willing to accept help! :)

Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435             |   Fax:   (512) 475-9445







On 9/18/14, 9:23 AM, "Sean Couch" <smc at flash.uchicago.edu> wrote:

>
>
>
>Hi Bill,
>
>
>I’ve thought about this and gotten as far as reading the section on the
>Phi’s in the Stampede user’s guide.  Haven’t actually tried it yet.  I’m
>co-PI of an XSEDE allocation on Stampede and if I can get them working
>that could mean more FLOPS per service
> unit and thus more science!
>
>
>If I, or someone I know, gives the Phi’s a try with FLASH I will let you
>know.
>
>
>Sean
>
>
>--------------------------------------------------------
>Sean M. Couch, Ph.D.
>Flash Center for Computational Science
>Department of Astronomy & Astrophysics
>The University of Chicago
>5747 S Ellis Ave, Jo 315
>Chicago, IL  60637
>(773) 702-3899 - office
>(512) 997-8025 - cell
>www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
>
>
>
>
>
>
>
>
>
>
>
>On Sep 18, 2014, at 8:51 AM, Bill Barth <bbarth at tacc.utexas.edu> wrote:
>
>
>Thanks for this great description, Sean. Given that FLASH has hybrid
>OpenMP/MPI for most of the physics, have you tried running this in
>symmetric mode on the Phis on Stampede? I think it'd be a great study. I
>suspect you'd want to turn off I/O entirely on the MICs, but we'd be
>interested in talking about some of the options there as well.
>
>Thanks,
>Bill.
>--
>Bill Barth, Ph.D., Director, HPC
>bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
>Office: ROC 1.435             |   Fax:   (512) 475-9445
>
>
>
>
>
>
>
>On 9/18/14, 8:43 AM, "Sean Couch" <smc at flash.uchicago.edu> wrote:
>
>
>Sure thing.  
>
>
>This was for my custom core-collapse supernova application.*
>Functionally, it is nearly identical to the CCSN application
>(source/Simulation/SimulationMain/CCSN) packaged with the latest release
>of FLASH (v4.2.2), except that I¹m using MHD rather than
>plain hydro.  This setup uses the unsplit staggered mesh MHD solver,
>detailed mircrophysical tabular EOS (source/physics/Eos/EosMain/Nuclear),
>the new multipole gravity solver (source/Grid/GridSolvers/Multipole_new,
>Couch, Graziani, & Flocke, 2013, ApJ, 778,
>181), approximate neutrino transport via a leakage scheme
>(source/physics/RadTrans/NeutrinoLeakage), and AMR via PARAMESH.
>
>
>The scaling study was done on BG/Q Mira at Argonne Leadership Computing
>Facility.  To control the number of AMR blocks per core, I use a custom
>version of Grid_markRefineDerefine.F90 that forces refinement up to the
>maximum level within a runtime-specified
>radius.  This test employed hybrid parallelism with AMR blocks
>distributed amongst the MPI ranks and OpenMP threading
>within block (i.e., the i,j,k loops are threaded).  I used 24^3 zones per
>block (this reduces the fractional memory overhead of guardcells and the
>communication per rank per step).  This application strong scales like a
>champ (Fig. 1 below), being fairly
>efficient down to ~4 AMR blocks per MPI rank.
>
>
>On hardware, Mira is a BG/Q with 16 cores per node, 1 GB memory per core,
>and capable of 4 hardware threads per core.  My application clocks in at
>memory usage per rank of about 1200 MB (large EOS table, MHD has lots of
>extra face variables, scratch arrays,
>and my application defines a number of new grid variables).  Thus, I
>have to run 8 MPI ranks per node in order to fit in memory.  I therefore
>run with 8 OpenMP threads per MPI rank.  This is not ideal; not every
>part of FLASH is threaded (I¹m looking at you,
>Grid, IOŠ).  The heavy-lifting physics routines are threaded and with
>24^3 zones per block and within-block threading, the thread-to-thread
>speedup is acceptable even up to 8 threads.
>
>
>The big run (32,768 nodes, 524,288 cores) had 2,097,152 leaf blocks (~29
>billion zones), 2,396,744 total blocks, and used 262,144 MPI ranks (thus
>8 leaf blocks per MPI rank).
>
>
>Note that Mira has an extremely fast communication fabric!  YMMV on other
>systems.  I have run a much smaller weak scaling study on TACC Stampede
>up to 4096 cores and it is also essentially perfect, but I have yet to go
>to any significant core count on
>Stampede (see Fig. 2).
>
>
>Hope this is helpful and informative!
>
>
>Sean
>
>
>
>
>
>
>
>
>
>Fig. 1 - Strong scaling of core-collapse SN application on Mira
>
>
>
>
>
>
>
>
>
>Fig. 2 - Weak scaling of FLASH CCSN application on TACC Stampede
>
>
>* - This particular simulation was started substantially ³post-bounce² in
>the parlance of the CCSN community.  Thus the shock was at a moderate
>radius and the neutrino leakage treatment was active.  The initial
>progenitor model package with FLASH¹s CCSN
>application is at the pre-bounce, collapse phase.  Therefore, if you
>want to run this scaling test yourself, you will have to generate
>post-bounce initial conditions by running the 1D CCSN application to an
>adequate post-bounce time, then converting those
>1D results into the ASCII format used by the 1D initial conditions
>reader.
>
>
>
>--------------------------------------------------------
>Sean M. Couch, Ph.D.
>Flash Center for Computational Science
>Department of Astronomy & Astrophysics
>The University of Chicago
>5747 S Ellis Ave, Jo 315
>Chicago, IL  60637
>(773) 702-3899 - office
>(512) 997-8025 - cell
>www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
><http://www.flash.uchicago.edu/~smc>
>
>
>
>
>
>
>
>
>
>
>
>On Sep 18, 2014, at 5:23 AM, Richard Bower <r.g.bower at durham.ac.uk> wrote:
>
>
>
>I'm very keen to see this too  (although I've not been running anything
>big with flash)... could you say something about the memory per
>core/node? This could be very useful for our next procurement... Richard
>
>
>On 18 Sep 2014, at 07:46, Stefanie Walch wrote:
>
>Hi Sean,
>
>Could you tell me which setup you used for the nice scaling plot you sent
>around?
>
>Cheers,
>Stefanie
>===================================
>Prof. Dr. Stefanie Walch
>Physikalisches Institut I
>Universität zu Köln
>Zülpicher Straße 77
>50937 Köln
>Germany
>email: walch at ph1.uni-koeln.de
>phone: +49 (0) 221 4703497
>
>On 17 Sep 2014, at 20:41, Sean Couch <smc at flash.uchicago.edu> wrote:
>
>For fun, let¹s play throwdown.  Can anybody beat 525k cores (2 million
>threads of execution)?  See attached (1 Mira node = 16 cores).
>
>Sean
>
><wkScaling.pdf>
>
>--------------------------------------------------------
>Sean M. Couch
>Flash Center for Computational Science
>Department of Astronomy & Astrophysics
>The University of Chicago
>5747 S Ellis Ave, Jo 315
>Chicago, IL  60637
>(773) 702-3899 - office
>www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
><http://www.flash.uchicago.edu/~smc>
>
>
>
>
>
>
>On Sep 17, 2014, at 1:30 PM, Rodrigo Fernandez <rafernan at berkeley.edu>
>wrote:
>
>Dear FLASH Users/Developers,
>
>Does anybody know the maximum number of cores that FLASH has ever been
>run successfully with? Any reference for this? I need the information for
>a computing proposal.
>
>Thanks!
>
>Rodrigo
>
>
>
>
>
>
>
>
>
>
>
>
>--------------------------------------------------------------------------
>----------------------------
>Prof. Richard Bower                              Institute for
>Computational Cosmology
>
>University of Durham
>+44-191-3343526                                 r.g.bower at durham.ac.uk
>--------------------------------------------------------------------------
>----------------------------
>
>
>
>
>
>
>



More information about the flash-users mailing list