[FLASH-USERS] maximum number of cores ever used to run FLASH?

Sean Couch smc at flash.uchicago.edu
Thu Sep 18 09:43:24 EDT 2014


Sure thing.  

This was for my custom core-collapse supernova application.*  Functionally, it is nearly identical to the CCSN application (source/Simulation/SimulationMain/CCSN) packaged with the latest release of FLASH (v4.2.2), except that I’m using MHD rather than plain hydro.  This setup uses the unsplit staggered mesh MHD solver, detailed mircrophysical tabular EOS (source/physics/Eos/EosMain/Nuclear), the new multipole gravity solver (source/Grid/GridSolvers/Multipole_new, Couch, Graziani, & Flocke, 2013, ApJ, 778, 181), approximate neutrino transport via a leakage scheme (source/physics/RadTrans/NeutrinoLeakage), and AMR via PARAMESH.  

The scaling study was done on BG/Q Mira at Argonne Leadership Computing Facility.  To control the number of AMR blocks per core, I use a custom version of Grid_markRefineDerefine.F90 that forces refinement up to the maximum level within a runtime-specified radius.  This test employed hybrid parallelism with AMR blocks distributed amongst the MPI ranks and OpenMP threading within block (i.e., the i,j,k loops are threaded).  I used 24^3 zones per block (this reduces the fractional memory overhead of guardcells and the communication per rank per step).  This application strong scales like a champ (Fig. 1 below), being fairly efficient down to ~4 AMR blocks per MPI rank.   

On hardware, Mira is a BG/Q with 16 cores per node, 1 GB memory per core, and capable of 4 hardware threads per core.  My application clocks in at memory usage per rank of about 1200 MB (large EOS table, MHD has lots of extra face variables, scratch arrays, and my application defines a number of new grid variables).  Thus, I have to run 8 MPI ranks per node in order to fit in memory.  I therefore run with 8 OpenMP threads per MPI rank.  This is not ideal; not every part of FLASH is threaded (I’m looking at you, Grid, IO…).  The heavy-lifting physics routines are threaded and with 24^3 zones per block and within-block threading, the thread-to-thread speedup is acceptable even up to 8 threads.  

The big run (32,768 nodes, 524,288 cores) had 2,097,152 leaf blocks (~29 billion zones), 2,396,744 total blocks, and used 262,144 MPI ranks (thus 8 leaf blocks per MPI rank).

Note that Mira has an extremely fast communication fabric!  YMMV on other systems.  I have run a much smaller weak scaling study on TACC Stampede up to 4096 cores and it is also essentially perfect, but I have yet to go to any significant core count on Stampede (see Fig. 2).

Hope this is helpful and informative!

Sean


Fig. 1 - Strong scaling of core-collapse SN application on Mira


Fig. 2 - Weak scaling of FLASH CCSN application on TACC Stampede

* - This particular simulation was started substantially “post-bounce” in the parlance of the CCSN community.  Thus the shock was at a moderate radius and the neutrino leakage treatment was active.  The initial progenitor model package with FLASH’s CCSN application is at the pre-bounce, collapse phase.  Therefore, if you want to run this scaling test yourself, you will have to generate post-bounce initial conditions by running the 1D CCSN application to an adequate post-bounce time, then converting those 1D results into the ASCII format used by the 1D initial conditions reader.


--------------------------------------------------------
Sean M. Couch, Ph.D.
Flash Center for Computational Science
Department of Astronomy & Astrophysics
The University of Chicago
5747 S Ellis Ave, Jo 315
Chicago, IL  60637
(773) 702-3899 - office
(512) 997-8025 - cell
www.flash.uchicago.edu/~smc






On Sep 18, 2014, at 5:23 AM, Richard Bower <r.g.bower at durham.ac.uk> wrote:

> 
> I'm very keen to see this too  (although I've not been running anything big with flash)... could you say something about the memory per core/node? This could be very useful for our next procurement... Richard
> 
> 
> On 18 Sep 2014, at 07:46, Stefanie Walch wrote:
> 
>> Hi Sean,
>> 
>> Could you tell me which setup you used for the nice scaling plot you sent around?
>> 
>> Cheers,
>> Stefanie
>> ===================================
>> Prof. Dr. Stefanie Walch
>> Physikalisches Institut I
>> Universität zu Köln
>> Zülpicher Straße 77
>> 50937 Köln
>> Germany
>> email: walch at ph1.uni-koeln.de
>> phone: +49 (0) 221 4703497
>> 
>> On 17 Sep 2014, at 20:41, Sean Couch <smc at flash.uchicago.edu> wrote:
>> 
>>> For fun, let’s play throwdown.  Can anybody beat 525k cores (2 million threads of execution)?  See attached (1 Mira node = 16 cores).
>>> 
>>> Sean
>>> 
>>> <wkScaling.pdf>
>>> 
>>> --------------------------------------------------------
>>> Sean M. Couch
>>> Flash Center for Computational Science
>>> Department of Astronomy & Astrophysics
>>> The University of Chicago
>>> 5747 S Ellis Ave, Jo 315
>>> Chicago, IL  60637
>>> (773) 702-3899 - office
>>> www.flash.uchicago.edu/~smc
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sep 17, 2014, at 1:30 PM, Rodrigo Fernandez <rafernan at berkeley.edu> wrote:
>>> 
>>>> Dear FLASH Users/Developers,
>>>> 
>>>> Does anybody know the maximum number of cores that FLASH has ever been run successfully with? Any reference for this? I need the information for a computing proposal.
>>>> 
>>>> Thanks!
>>>> 
>>>> Rodrigo
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> ------------------------------------------------------------------------------------------------------
> Prof. Richard Bower                              Institute for Computational Cosmology
>                                                                  University of Durham
> +44-191-3343526                                 r.g.bower at durham.ac.uk
> ------------------------------------------------------------------------------------------------------
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140918/57ef03d3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strScaling.pdf
Type: application/pdf
Size: 114188 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140918/57ef03d3/attachment.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140918/57ef03d3/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wkScaling.pdf
Type: application/pdf
Size: 215555 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140918/57ef03d3/attachment-0001.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20140918/57ef03d3/attachment-0002.htm>


More information about the flash-users mailing list