<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Seyit,<div><br></div><div>Are you using parallel IO? What is your setup line? You might try adding, separately, ‘+parallelIO’ and ‘+hdf5typeIO’ to your setup line and trying again.<br><div><br></div><div apple-content-edited="true">
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><span class="Apple-style-span" style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; border-collapse: separate; border-spacing: 0px; -webkit-text-decorations-in-effect: none;"><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Sean</div><div><br class="Apple-interchange-newline">--------------------------------------------------------</div><div>Sean M. Couch</div><div>Hubble Fellow</div><div>Flash Center for Computational Science</div><div>Department of Astronomy & Astrophysics</div><div>The University of Chicago</div><div>5747 S Ellis Ave, Jo 315</div><div>Chicago, IL 60637</div><div>(773) 702-3899</div><div><a href="http://www.flash.uchicago.edu/~smc">www.flash.uchicago.edu/~smc</a></div><div><br></div></div></span></span></div><br class="Apple-interchange-newline"><br class="Apple-interchange-newline">
</div>
<br><div><div>On Jan 15, 2014, at 4:58 AM, Seyit Hocuk <<a href="mailto:seyit@astro.rug.nl">seyit@astro.rug.nl</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<div bgcolor="#FFFFFF" text="#000000">
Dear all,<br>
<br>
I have a restarting problem and hope that you can help me.<br>
<br>
I am, for the first time, running Flash on a BlueGene supercomputer
with 1024 cores and encountered a problem when restarting. My flash
version is 4-beta. The simulation ran fine and it created 2
checkpoint files. I wanted to restart from the second chekpoint
file, which has a filesize of 15 GB (15424362124 byte) and at the
moment the file is read, the simulation just stops. <br>
<br>
The last lines of the flash log file is the following:<br>
[ 01-11-2014 19:24:23.616 ] message: vsize (MB): 202.06
(min) 202.12 (max) 202.06 (avg)<br>
[ 01-11-2014 19:24:23.619 ] message: rss (MB): 1.67
(min) 1.67 (max) 1.67 (avg)<br>
[ 01-11-2014 19:24:23.628 ] [io_readData] file opened:
type=checkpoint name=xSHx_hdf5_chk_0002<br>
<br>
I get a lot of core file dumps, one per core (see attachement), and
terminal error output of the following:<br>
NumPartProps: 18<br>
NumPartProps: 18<br>
2014-01-11 20:02:48.793 (WARN ) [0x400011e91e0]
:749776:ibm.runjob.client.Job: terminated by signal 5<br>
2014-01-11 20:02:48.793 (WARN ) [0x400011e91e0]
:749776:ibm.runjob.client.Job: abnormal termination by signal 5 from
rank 208<br>
<br>
The supercomputer is the Italian supercomputer CINECA and has 16
cores nodes with 16 GB ram per node. My BGsize is 64 nodes with 16
ranks per node, meaning that I have 1 GB per core and 1024 cores in
total with 1 TB ram. <br>
<br>
The code is compiled with XLF compilers, i.e., mpixlf90 (not the
mpixlf90_r, of which I do not know the use) and mpixlc(xx). There
were several ".f" files that would not compile, so I solved it by
compiling them separately with mpixlf77. These files were
"fftsg2d.f", "fftsg3d.f", and "umap.F". I don't know how critical
that is, but the compilation is successful.<br>
<br>
Compiling the code in debug mode, i.e., with "<b>-g -qfullpath -O0
-qcheck</b>" instead of the normal "<b>-O3</b> -qintsize=4
-qrealsize=8 -c -qxlf90=autodealloc -qsuffix=cpp=F -qtune=auto
-qstrict -qarch=auto -qextname -qzerosize" showed some warnings:<br>
"io_writeData.F90", line 242.10: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is greater
than the terminal value.<br>
"io_writeData.F90", line 308.24: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is greater
than the terminal value.<br>
"io_writeData.F90", line 316.21: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 469.22: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is greater
than the terminal value.<br>
"io_writeData.F90", line 480.29: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 481.29: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 482.29: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 483.29: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 491.34: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 493.37: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 494.37: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 506.29: 1516-152 (S) Zero-sized arrays must
not be subscripted.<br>
"io_writeData.F90", line 578.24: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is greater
than the terminal value.<br>
"io_writeData.F90", line 744.10: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is greater
than the terminal value.<br>
** io_writedata === End of Compilation 1 ===<br>
1501-511 Compilation failed for file io_writeData.F90.<br>
make: *** [io_writeData.o] Error 1<br>
<br>
The errors also appear in "IO_init.F90"<br>
"IO_init.F90", line 157.13: 1511-013 (W) The value of the DO-loop
increment should be negative when initial value is greater than the
terminal value.<br>
"IO_init.F90", line 182.13: 1511-013 (W) The value of the DO-loop
increment should be negative when initial value is greater than the
terminal value.<br>
"IO_init.F90", line 249.11: 1511-013 (W) The value of the DO-loop
increment should be negative when initial value is greater than the
terminal value.<br>
"IO_init.F90", line 250.13: 1516-152 (S) Zero-sized arrays must not
be subscripted.<br>
"IO_init.F90", line 251.13: 1516-152 (S) Zero-sized arrays must not
be subscripted.<br>
"IO_init.F90", line 252.13: 1516-152 (S) Zero-sized arrays must not
be subscripted.<br>
"IO_init.F90", line 278.8: 1511-013 (W) The value of the DO-loop
increment should be negative when initial value is greater than the
terminal value.<br>
"IO_init.F90", line 279.37: 1516-152 (S) Zero-sized arrays must not
be subscripted.<br>
** io_init === End of Compilation 1 ===<br>
1501-511 Compilation failed for file IO_init.F90.<br>
<br>
There were no errors without the -qcheck option. So, I continued
without the -qcheck. The same crash happens with these compiler
options without any extra log information. These error/warning lines
might related to SCRATCH_GRID_VARS_* having end=0 and begin=1
values. I do not make use of the scratch array, so I assumed this is
not critical.<br>
<br>
Any help is appreciated.<br>
Best, <br>
Seyit<br>
<br>
<br>
<br>
<br>
</div>
<span><core.602></span><span><seyit.vcf></span></blockquote></div><br></div></body></html>