[FLASH-USERS] Large calculation with memory errors

Attal, Nitesh nattal at uncc.edu
Tue Jun 9 14:48:41 EDT 2015


Hi,

I have been using FLASH for my simulations with diffusion (FALSH4). I want to run a simulation with resolution of 512 (i.e. nxb=nyb=nzb=8, AMRmin = AMRmax=7), the total size of the simulation will be (512)x(512)x(512x12). I receive a memory errors on Stampede and Mira, although I account for max number of blocks per processor, and number of processors that I run on. I started with lower AMR on given number of processors and increased both of them progressively. For example:



AMR levels        number of procs  requested         number of procs ran on

min = 5                64                                                    64

max = 5



min=6                  512                                                  512

max=6



min=7                  4096                                                4096

max=7





The cases with AMR min=max=5 and 6 run, but AMR min=max=7 crashes with memory issue. Following are the error message:


forrtl: severe (41): insufficient virtual memory
Image              PC                Routine            Line        Source
libintlc.so.5      00002B9B18CFFA1E  Unknown               Unknown  Unknown
libintlc.so.5      00002B9B18CFE4B6  Unknown               Unknown  Unknown
libifcore.so.5     00002B9B185CC01E  Unknown               Unknown  Unknown
libifcore.so.5     00002B9B1853BB1E  Unknown               Unknown  Unknown
libifcore.so.5     00002B9B1858A9B9  Unknown               Unknown  Unknown
flash4             00000000006FFA85  Unknown               Unknown  Unknown
flash4             000000000078D82F  Unknown               Unknown  Unknown
flash4             000000000046BD9C  Unknown               Unknown  Unknown
flash4             00000000004810DD  Unknown               Unknown  Unknown
flash4             000000000061B5C1  Unknown               Unknown  Unknown
flash4             00000000004800A6  Unknown               Unknown  Unknown
flash4             000000000044CF96  Unknown               Unknown  Unknown
flash4             00000000004573FA  Unknown               Unknown  Unknown
flash4             000000000042D86C  Unknown               Unknown  Unknown
libc.so.6          0000003CA6A1ED5D  Unknown               Unknown  Unknown
flash4             000000000042D769  Unknown               Unknown  Unknown





I have also, run AMR min=max= 7 on another Mira on 8192 processors, which runs into another error:


"mpi_amr_comm_setup.F90", line 504: 1525-108 Error encountered while attempting to allocate a data object.  The program will stop.
2015-06-09 00:45:41.006 (INFO ) [0x40001aebde0] MIR-04000-37331-512:1211855:ibm.runjob.client.Job: exited with status 1
2015-06-09 00:45:41.006 (WARN ) [0x40001aebde0] MIR-04000-37331-512:1211855:ibm.runjob.client.Job: normal termination with status 1 from rank 1538
2015-06-09 00:45:41.006 (INFO ) [0x40001aebde0] tatu.runjob.client: task exited with status 1
2015-06-09 00:45:41.006 (INFO ) [0x400016334e0] 15540:tatu.runjob.monitor: monitor terminating
2015-06-09 00:45:41.012 (INFO ) [0x40001aebde0] tatu.runjob.client: monitor completed



where, the last lines of the log file are:


[ 06-09-2015  00:42:30.891 ] [GRID amr_refine_derefine]: refinement complete
[ 06-09-2015  00:45:08.546 ] [GRID gr_expandDomain]: iteration=6, create level=7
[ 06-09-2015  00:45:39.352 ] [mpi_amr_comm_setup]: buffer_dim_send=4081925, buffer_dim_recv=3988369


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20150609/2a026bdf/attachment.htm>


More information about the flash-users mailing list