<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Sean,<br>
<br>
Thanks for your response. <br>
No I do not have parallel IO. So, just implementing this flag
would help? Any libraries needed? I will surely try it in that
case.<br>
<br>
Expanded Command line:<br>
--with-library=mpi --with-unit=IO --unit=Grid
--gridinterpolation=monotonic SH-dust --auto --portable --3d
--maxblocks=200 --objdir=ss-dust-3<br>
<br>
I use serial hdf5 by the way. It is my experience that I do not
lose much time writing a checkpoint file, so I did not think
parallel hdf5 was necessary. However, I am quite dissapointed in
the overall simulation speed. Could it also boost simulation
speed, because a lot of information is passed between (1024)
processors for the regular calculations and I think most of the
time is lost there. <br>
<br>
Kind regards,<br>
Seyit<br>
<br>
<br>
<br>
On 01/15/2014 02:53 PM, Sean Couch wrote:<br>
</div>
<blockquote
cite="mid:66BFA885-0689-456E-8C8B-2965164DFE04@flash.uchicago.edu"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Hi Seyit,
<div><br>
</div>
<div>Are you using parallel IO? What is your setup line? You
might try adding, separately, ‘+parallelIO’ and ‘+hdf5typeIO’ to
your setup line and trying again.<br>
<div><br>
</div>
<div apple-content-edited="true">
<div style="color: rgb(0, 0, 0); font-family: Helvetica;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: normal;
orphans: 2; text-align: -webkit-auto; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;"><span
class="Apple-style-span" style="color: rgb(0, 0, 0);
font-family: Helvetica; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing: normal;
line-height: normal; orphans: 2; text-align: -webkit-auto;
text-indent: 0px; text-transform: none; white-space:
normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; border-collapse: separate;
border-spacing: 0px; -webkit-text-decorations-in-effect:
none;"><span class="Apple-style-span"
style="border-collapse: separate; color: rgb(0, 0, 0);
font-family: Helvetica; font-style: normal;
font-variant: normal; font-weight: normal;
letter-spacing: normal; line-height: normal; orphans: 2;
text-align: -webkit-auto; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; border-spacing: 0px;
-webkit-text-decorations-in-effect: none;
-webkit-text-stroke-width: 0px;">
<div style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;">
<div>Sean</div>
<div><br class="Apple-interchange-newline">
--------------------------------------------------------</div>
<div>Sean M. Couch</div>
<div>Hubble Fellow</div>
<div>Flash Center for Computational Science</div>
<div>Department of Astronomy & Astrophysics</div>
<div>The University of Chicago</div>
<div>5747 S Ellis Ave, Jo 315</div>
<div>Chicago, IL 60637</div>
<div>(773) 702-3899</div>
<div><a moz-do-not-send="true"
href="http://www.flash.uchicago.edu/%7Esmc">www.flash.uchicago.edu/~smc</a></div>
<div><br>
</div>
</div>
</span></span></div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br>
<div>
<div>On Jan 15, 2014, at 4:58 AM, Seyit Hocuk <<a
moz-do-not-send="true" href="mailto:seyit@astro.rug.nl">seyit@astro.rug.nl</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<div bgcolor="#FFFFFF" text="#000000"> Dear all,<br>
<br>
I have a restarting problem and hope that you can help me.<br>
<br>
I am, for the first time, running Flash on a BlueGene
supercomputer with 1024 cores and encountered a problem
when restarting. My flash version is 4-beta. The
simulation ran fine and it created 2 checkpoint files. I
wanted to restart from the second chekpoint file, which
has a filesize of 15 GB (15424362124 byte) and at the
moment the file is read, the simulation just stops. <br>
<br>
The last lines of the flash log file is the following:<br>
[ 01-11-2014 19:24:23.616 ] message: vsize (MB):
202.06 (min) 202.12 (max) 202.06 (avg)<br>
[ 01-11-2014 19:24:23.619 ] message: rss (MB):
1.67 (min) 1.67 (max) 1.67 (avg)<br>
[ 01-11-2014 19:24:23.628 ] [io_readData] file opened:
type=checkpoint name=xSHx_hdf5_chk_0002<br>
<br>
I get a lot of core file dumps, one per core (see
attachement), and terminal error output of the following:<br>
NumPartProps: 18<br>
NumPartProps: 18<br>
2014-01-11 20:02:48.793 (WARN ) [0x400011e91e0]
:749776:ibm.runjob.client.Job: terminated by signal 5<br>
2014-01-11 20:02:48.793 (WARN ) [0x400011e91e0]
:749776:ibm.runjob.client.Job: abnormal termination by
signal 5 from rank 208<br>
<br>
The supercomputer is the Italian supercomputer CINECA and
has 16 cores nodes with 16 GB ram per node. My BGsize is
64 nodes with 16 ranks per node, meaning that I have 1 GB
per core and 1024 cores in total with 1 TB ram. <br>
<br>
The code is compiled with XLF compilers, i.e., mpixlf90
(not the mpixlf90_r, of which I do not know the use) and
mpixlc(xx). There were several ".f" files that would not
compile, so I solved it by compiling them separately with
mpixlf77. These files were "fftsg2d.f", "fftsg3d.f", and
"umap.F". I don't know how critical that is, but the
compilation is successful.<br>
<br>
Compiling the code in debug mode, i.e., with "<b>-g
-qfullpath -O0 -qcheck</b>" instead of the normal "<b>-O3</b>
-qintsize=4 -qrealsize=8 -c -qxlf90=autodealloc
-qsuffix=cpp=F -qtune=auto -qstrict -qarch=auto -qextname
-qzerosize" showed some warnings:<br>
"io_writeData.F90", line 242.10: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 308.24: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 316.21: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 469.22: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 480.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 481.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 482.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 483.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 491.34: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 493.37: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 494.37: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 506.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 578.24: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 744.10: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
** io_writedata === End of Compilation 1 ===<br>
1501-511 Compilation failed for file io_writeData.F90.<br>
make: *** [io_writeData.o] Error 1<br>
<br>
The errors also appear in "IO_init.F90"<br>
"IO_init.F90", line 157.13: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 182.13: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 249.11: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 250.13: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
"IO_init.F90", line 251.13: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
"IO_init.F90", line 252.13: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
"IO_init.F90", line 278.8: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 279.37: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
** io_init === End of Compilation 1 ===<br>
1501-511 Compilation failed for file IO_init.F90.<br>
<br>
There were no errors without the -qcheck option. So, I
continued without the -qcheck. The same crash happens with
these compiler options without any extra log information.
These error/warning lines might related to
SCRATCH_GRID_VARS_* having end=0 and begin=1 values. I do
not make use of the scratch array, so I assumed this is
not critical.<br>
<br>
Any help is appreciated.<br>
Best, <br>
Seyit<br>
<br>
<br>
<br>
<br>
</div>
<span><core.602></span><span><seyit.vcf></span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>