[FLASH-USERS] maximum number of cores ever used to run FLASH?

Thu Sep 18 10:23:01 EDT 2014

Hi Bill,

I’ve thought about this and gotten as far as reading the section on the Phi’s in the Stampede user’s guide.  Haven’t actually tried it yet.  I’m co-PI of an XSEDE allocation on Stampede and if I can get them working that could mean more FLOPS per service unit and thus more science!  

If I, or someone I know, gives the Phi’s a try with FLASH I will let you know.

Sean

--------------------------------------------------------
Sean M. Couch, Ph.D.
Flash Center for Computational Science
Department of Astronomy & Astrophysics
The University of Chicago
5747 S Ellis Ave, Jo 315
Chicago, IL  60637
(773) 702-3899 - office
(512) 997-8025 - cell
www.flash.uchicago.edu/~smc

On Sep 18, 2014, at 8:51 AM, Bill Barth <bbarth at tacc.utexas.edu> wrote:

> Thanks for this great description, Sean. Given that FLASH has hybrid
> OpenMP/MPI for most of the physics, have you tried running this in
> symmetric mode on the Phis on Stampede? I think it'd be a great study. I
> suspect you'd want to turn off I/O entirely on the MICs, but we'd be
> interested in talking about some of the options there as well.
> 
> Thanks,
> Bill.
> --
> Bill Barth, Ph.D., Director, HPC
> bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
> Office: ROC 1.435             |   Fax:   (512) 475-9445
> 
> 
> 
> 
> 
> 
> 
> On 9/18/14, 8:43 AM, "Sean Couch" <smc at flash.uchicago.edu> wrote:
> 
>> Sure thing.  
>> 
>> 
>> This was for my custom core-collapse supernova application.*
>> Functionally, it is nearly identical to the CCSN application
>> (source/Simulation/SimulationMain/CCSN) packaged with the latest release
>> of FLASH (v4.2.2), except that I¹m using MHD rather than
>> plain hydro.  This setup uses the unsplit staggered mesh MHD solver,
>> detailed mircrophysical tabular EOS (source/physics/Eos/EosMain/Nuclear),
>> the new multipole gravity solver (source/Grid/GridSolvers/Multipole_new,
>> Couch, Graziani, & Flocke, 2013, ApJ, 778,
>> 181), approximate neutrino transport via a leakage scheme
>> (source/physics/RadTrans/NeutrinoLeakage), and AMR via PARAMESH.
>> 
>> 
>> The scaling study was done on BG/Q Mira at Argonne Leadership Computing
>> Facility.  To control the number of AMR blocks per core, I use a custom
>> version of Grid_markRefineDerefine.F90 that forces refinement up to the
>> maximum level within a runtime-specified
>> radius.  This test employed hybrid parallelism with AMR blocks
>> distributed amongst the MPI ranks and OpenMP threading
>> within block (i.e., the i,j,k loops are threaded).  I used 24^3 zones per
>> block (this reduces the fractional memory overhead of guardcells and the
>> communication per rank per step).  This application strong scales like a
>> champ (Fig. 1 below), being fairly
>> efficient down to ~4 AMR blocks per MPI rank.
>> 
>> 
>> On hardware, Mira is a BG/Q with 16 cores per node, 1 GB memory per core,
>> and capable of 4 hardware threads per core.  My application clocks in at
>> memory usage per rank of about 1200 MB (large EOS table, MHD has lots of
>> extra face variables, scratch arrays,
>> and my application defines a number of new grid variables).  Thus, I
>> have to run 8 MPI ranks per node in order to fit in memory.  I therefore
>> run with 8 OpenMP threads per MPI rank.  This is not ideal; not every
>> part of FLASH is threaded (I¹m looking at you,
>> Grid, IOŠ).  The heavy-lifting physics routines are threaded and with
>> 24^3 zones per block and within-block threading, the thread-to-thread
>> speedup is acceptable even up to 8 threads.
>> 
>> 
>> The big run (32,768 nodes, 524,288 cores) had 2,097,152 leaf blocks (~29
>> billion zones), 2,396,744 total blocks, and used 262,144 MPI ranks (thus
>> 8 leaf blocks per MPI rank).
>> 
>> 
>> Note that Mira has an extremely fast communication fabric!  YMMV on other
>> systems.  I have run a much smaller weak scaling study on TACC Stampede
>> up to 4096 cores and it is also essentially perfect, but I have yet to go
>> to any significant core count on
>> Stampede (see Fig. 2).
>> 
>> 
>> Hope this is helpful and informative!
>> 
>> 
>> Sean
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Fig. 1 - Strong scaling of core-collapse SN application on Mira
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Fig. 2 - Weak scaling of FLASH CCSN application on TACC Stampede
>> 
>> 
>> * - This particular simulation was started substantially ³post-bounce² in
>> the parlance of the CCSN community.  Thus the shock was at a moderate
>> radius and the neutrino leakage treatment was active.  The initial
>> progenitor model package with FLASH¹s CCSN
>> application is at the pre-bounce, collapse phase.  Therefore, if you
>> want to run this scaling test yourself, you will have to generate
>> post-bounce initial conditions by running the 1D CCSN application to an
>> adequate post-bounce time, then converting those
>> 1D results into the ASCII format used by the 1D initial conditions
>> reader.
>> 
>> 
>> 
>> --------------------------------------------------------
>> Sean M. Couch, Ph.D.
>> Flash Center for Computational Science
>> Department of Astronomy & Astrophysics
>> The University of Chicago
>> 5747 S Ellis Ave, Jo 315
>> Chicago, IL  60637
>> (773) 702-3899 - office
>> (512) 997-8025 - cell
>> www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sep 18, 2014, at 5:23 AM, Richard Bower <r.g.bower at durham.ac.uk> wrote:
>> 
>> 
>> 
>> I'm very keen to see this too  (although I've not been running anything
>> big with flash)... could you say something about the memory per
>> core/node? This could be very useful for our next procurement... Richard
>> 
>> 
>> On 18 Sep 2014, at 07:46, Stefanie Walch wrote:
>> 
>> Hi Sean,
>> 
>> Could you tell me which setup you used for the nice scaling plot you sent
>> around?
>> 
>> Cheers,
>> Stefanie
>> ===================================
>> Prof. Dr. Stefanie Walch
>> Physikalisches Institut I
>> Universität zu Köln
>> Zülpicher Straße 77
>> 50937 Köln
>> Germany
>> email: walch at ph1.uni-koeln.de
>> phone: +49 (0) 221 4703497
>> 
>> On 17 Sep 2014, at 20:41, Sean Couch <smc at flash.uchicago.edu> wrote:
>> 
>> For fun, let¹s play throwdown.  Can anybody beat 525k cores (2 million
>> threads of execution)?  See attached (1 Mira node = 16 cores).
>> 
>> Sean
>> 
>> <wkScaling.pdf>
>> 
>> --------------------------------------------------------
>> Sean M. Couch
>> Flash Center for Computational Science
>> Department of Astronomy & Astrophysics
>> The University of Chicago
>> 5747 S Ellis Ave, Jo 315
>> Chicago, IL  60637
>> (773) 702-3899 - office
>> www.flash.uchicago.edu/~smc <http://www.flash.uchicago.edu/~smc>
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sep 17, 2014, at 1:30 PM, Rodrigo Fernandez <rafernan at berkeley.edu>
>> wrote:
>> 
>> Dear FLASH Users/Developers,
>> 
>> Does anybody know the maximum number of cores that FLASH has ever been
>> run successfully with? Any reference for this? I need the information for
>> a computing proposal.
>> 
>> Thanks!
>> 
>> Rodrigo
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --------------------------------------------------------------------------
>> ----------------------------
>> Prof. Richard Bower                              Institute for
>> Computational Cosmology
>> 
>> University of Durham
>> +44-191-3343526                                 r.g.bower at durham.ac.uk
>> --------------------------------------------------------------------------
>> ----------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140918/7981700b/attachment.htm>