[FLASH-USERS] Understanding FLASH memory requirements
Ryan Farber
rjfarber at umich.edu
Fri Apr 26 11:42:45 EDT 2019
Dear FLASH users,
I'm trying to fit my simulation on as few nodes as possible to reduce queue
times.
I currently have a run using 360 processors total distributed on 30 nodes
(each node has 12 cores). I have rather a lot of blocks so I had to use
-maxblocks=500 during setup.
However, when I try running 180 processors total distributed on 15 nodes
(maxblocks=1000), it crashes with "Driver init all done" the last output to
stdout and the following for the last few lines of the log file:
[ 04-24-2019 00:31:39.673 ] memory: /proc vsize (MB): 7041.90
(min) 7063.07 (max) 7054.76 (avg)
[ 04-24-2019 00:31:39.680 ] memory: /proc rss (MB): 1396.73
(min) 1477.68 (max) 1432.47 (avg)
[ 04-24-2019 00:31:39.685 ] [Driver_evolveFlash]: Entering evolution loop
[ 04-24-2019 00:31:39.691 ] step: n=1 t=0.000000E+00 dt=1.000000E+10
[ 04-24-2019 00:32:25.747 ] [hy_uhd_unsplit]:
gcNeed(MAGI_FACE_VAR,MAG_FACE_VAR) - FACES
[ 04-24-2019 00:32:26.302 ] [mpi_amr_comm_setup]:
buffer_dim_send=1115785, buffer_dim_recv=1117849
EOF
It would seem the 360 processors on 30 nodes, maxblocks=500 required quite
a bit less memory:
[ 04-23-2019 21:16:09.169 ] memory: /proc vsize (MB): 4097.80
(min) 4123.82 (max) 4101.31 (avg)
[ 04-23-2019 21:16:09.178 ] memory: /proc rss (MB): 928.78
(min) 997.62 (max) 958.68 (avg)
[ 04-23-2019 21:16:09.183 ] [Driver_evolveFlash]: Entering evolution loop
[ 04-23-2019 21:16:09.187 ] step: n=1 t=0.000000E+00 dt=1.000000E+10
INFO: Grid_fillGuardCells is ignoring masking.
[ 04-23-2019 21:16:10.428 ] [mpi_amr_comm_setup]:
buffer_dim_send=3460361, buffer_dim_recv=3550777
...(happily proceeds)
I've seen quite a few of my simulations crash here and the typical "fix"
would be requesting an extra node (more memory) on Stampede2. However, the
machine I'm working on now doesn't have nearly as much memory per node as
Stampede2's skylake nodes. So, I'd like to better understand FLASH's memory
requirements.
My estimate for the memory required:
NPROP_VARS = 26
NSPECIES = 2
NMASS_SCALARS = 1
NUNK_VARS = 29
NFACE_VARS = 2
NPROP_FLUX = 10
NSPECIES_FLUX = 2
NMASS_SCALARS_FLUX = 1
NFLUXES = 13
NVARS_TOTAL = 29 + (2*3) + 13 = 48
NSCRATCH_GRID_VARS = 3
NSCRATCH_CENTER_VARS = 71
total_grid_vars = 48+3+71 = 122
NDIM = 3
nxb = 8
NGUARD = 4
MAXBLOCKS = 500
procs = 360
bytes_per_GB = 1e9
grid_space_gb = total_grid_vars*MAXBLOCKS*procs*(nxb+2*NGUARD)**NDIM /
bytes_per_GB
grid_space_gb ~ 90
NPART_PROPS = 24
pt_maxPerProc = 1e6
particle_space_gb = NPART_PROPS * pt_maxPerProc * procs / bytes_per_GB
paritlce_space_gb ~ 9
In total, I estimate requiring 100 GB memory. Since I scale processors down
by a factor of two and scale maxblocks up by a factor of two, they should
require the same memory.
So, I'm confused about why the two runs report using such different amounts
of memory. I'm also confused about why my estimate is so far off the value
of rss*procs ~ 300 GB for the 360 procs, 30 nodes, maxblocks=500 case.
Thanks for reading! If you have any ideas, suggestions, pointers I'd
greatly appreciate the help!
Best,
--------
Ryan Farber
Graduate Student
University of Michigan, Ann Arbor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20190426/2156d99c/attachment.htm>
More information about the flash-users
mailing list