[FLASH-USERS] Large calculation with memory errors
Attal, Nitesh
nattal at uncc.edu
Tue Jun 9 14:48:41 EDT 2015
Hi,
I have been using FLASH for my simulations with diffusion (FALSH4). I want to run a simulation with resolution of 512 (i.e. nxb=nyb=nzb=8, AMRmin = AMRmax=7), the total size of the simulation will be (512)x(512)x(512x12). I receive a memory errors on Stampede and Mira, although I account for max number of blocks per processor, and number of processors that I run on. I started with lower AMR on given number of processors and increased both of them progressively. For example:
AMR levels number of procs requested number of procs ran on
min = 5 64 64
max = 5
min=6 512 512
max=6
min=7 4096 4096
max=7
The cases with AMR min=max=5 and 6 run, but AMR min=max=7 crashes with memory issue. Following are the error message:
forrtl: severe (41): insufficient virtual memory
Image PC Routine Line Source
libintlc.so.5 00002B9B18CFFA1E Unknown Unknown Unknown
libintlc.so.5 00002B9B18CFE4B6 Unknown Unknown Unknown
libifcore.so.5 00002B9B185CC01E Unknown Unknown Unknown
libifcore.so.5 00002B9B1853BB1E Unknown Unknown Unknown
libifcore.so.5 00002B9B1858A9B9 Unknown Unknown Unknown
flash4 00000000006FFA85 Unknown Unknown Unknown
flash4 000000000078D82F Unknown Unknown Unknown
flash4 000000000046BD9C Unknown Unknown Unknown
flash4 00000000004810DD Unknown Unknown Unknown
flash4 000000000061B5C1 Unknown Unknown Unknown
flash4 00000000004800A6 Unknown Unknown Unknown
flash4 000000000044CF96 Unknown Unknown Unknown
flash4 00000000004573FA Unknown Unknown Unknown
flash4 000000000042D86C Unknown Unknown Unknown
libc.so.6 0000003CA6A1ED5D Unknown Unknown Unknown
flash4 000000000042D769 Unknown Unknown Unknown
I have also, run AMR min=max= 7 on another Mira on 8192 processors, which runs into another error:
"mpi_amr_comm_setup.F90", line 504: 1525-108 Error encountered while attempting to allocate a data object. The program will stop.
2015-06-09 00:45:41.006 (INFO ) [0x40001aebde0] MIR-04000-37331-512:1211855:ibm.runjob.client.Job: exited with status 1
2015-06-09 00:45:41.006 (WARN ) [0x40001aebde0] MIR-04000-37331-512:1211855:ibm.runjob.client.Job: normal termination with status 1 from rank 1538
2015-06-09 00:45:41.006 (INFO ) [0x40001aebde0] tatu.runjob.client: task exited with status 1
2015-06-09 00:45:41.006 (INFO ) [0x400016334e0] 15540:tatu.runjob.monitor: monitor terminating
2015-06-09 00:45:41.012 (INFO ) [0x40001aebde0] tatu.runjob.client: monitor completed
where, the last lines of the log file are:
[ 06-09-2015 00:42:30.891 ] [GRID amr_refine_derefine]: refinement complete
[ 06-09-2015 00:45:08.546 ] [GRID gr_expandDomain]: iteration=6, create level=7
[ 06-09-2015 00:45:39.352 ] [mpi_amr_comm_setup]: buffer_dim_send=4081925, buffer_dim_recv=3988369
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20150609/2a026bdf/attachment.htm>
More information about the flash-users
mailing list