<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Seyit,<div><br></div><div>So long as your hdf5 library has been compiled with parallel support, you don’t need any other extra libraries. In my experience on BG systems (both P and Q), parallel IO is absolutely necessary. Keep in mind also that BG systems have low processor clock rates so your simulations might seem “slow” relative to other clusters, but the trade-off is incredibly fast communication on BG. In other words, use more cores if your sims are too slow! I find FLASH strong (and weak) scales extremely well on BG.</div><div><br></div><div>Sean</div><div><br></div><div><br><div apple-content-edited="true">
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><span class="Apple-style-span" style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; border-collapse: separate; border-spacing: 0px; -webkit-text-decorations-in-effect: none;"><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div><br class="Apple-interchange-newline">--------------------------------------------------------</div><div>Sean M. Couch</div><div>Hubble Fellow</div><div>Flash Center for Computational Science</div><div>Department of Astronomy & Astrophysics</div><div>The University of Chicago</div><div>5747 S Ellis Ave, Jo 315</div><div>Chicago, IL 60637</div><div>(773) 702-3899</div><div><a href="http://www.flash.uchicago.edu/~smc">www.flash.uchicago.edu/~smc</a></div><div><br></div></div></span></span></div><br class="Apple-interchange-newline"><br class="Apple-interchange-newline">
</div>
<br><div><div>On Jan 15, 2014, at 8:12 AM, Seyit Hocuk <<a href="mailto:seyit@astro.rug.nl">seyit@astro.rug.nl</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta content="text/html; charset=windows-1252" http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Sean,<br>
<br>
Thanks for your response. <br>
No I do not have parallel IO. So, just implementing this flag
would help? Any libraries needed? I will surely try it in that
case.<br>
<br>
Expanded Command line:<br>
--with-library=mpi --with-unit=IO --unit=Grid
--gridinterpolation=monotonic SH-dust --auto --portable --3d
--maxblocks=200 --objdir=ss-dust-3<br>
<br>
I use serial hdf5 by the way. It is my experience that I do not
lose much time writing a checkpoint file, so I did not think
parallel hdf5 was necessary. However, I am quite dissapointed in
the overall simulation speed. Could it also boost simulation
speed, because a lot of information is passed between (1024)
processors for the regular calculations and I think most of the
time is lost there. <br>
<br>
Kind regards,<br>
Seyit<br>
<br>
<br>
<br>
On 01/15/2014 02:53 PM, Sean Couch wrote:<br>
</div>
<blockquote cite="mid:66BFA885-0689-456E-8C8B-2965164DFE04@flash.uchicago.edu" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Hi Seyit,
<div><br>
</div>
<div>Are you using parallel IO? What is your setup line? You
might try adding, separately, ‘+parallelIO’ and ‘+hdf5typeIO’ to
your setup line and trying again.<br>
<div><br>
</div>
<div apple-content-edited="true">
<div style="font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><span class="Apple-style-span" style="font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; border-collapse: separate; border-spacing: 0px; -webkit-text-decorations-in-effect: none;">
<div style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;">
<div>Sean</div>
<div><br class="Apple-interchange-newline">
--------------------------------------------------------</div>
<div>Sean M. Couch</div>
<div>Hubble Fellow</div>
<div>Flash Center for Computational Science</div>
<div>Department of Astronomy & Astrophysics</div>
<div>The University of Chicago</div>
<div>5747 S Ellis Ave, Jo 315</div>
<div>Chicago, IL 60637</div>
<div>(773) 702-3899</div>
<div><a moz-do-not-send="true" href="http://www.flash.uchicago.edu/%7Esmc">www.flash.uchicago.edu/~smc</a></div>
<div><br>
</div>
</div>
</span></div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br>
<div>
<div>On Jan 15, 2014, at 4:58 AM, Seyit Hocuk <<a moz-do-not-send="true" href="mailto:seyit@astro.rug.nl">seyit@astro.rug.nl</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<div bgcolor="#FFFFFF" text="#000000"> Dear all,<br>
<br>
I have a restarting problem and hope that you can help me.<br>
<br>
I am, for the first time, running Flash on a BlueGene
supercomputer with 1024 cores and encountered a problem
when restarting. My flash version is 4-beta. The
simulation ran fine and it created 2 checkpoint files. I
wanted to restart from the second chekpoint file, which
has a filesize of 15 GB (15424362124 byte) and at the
moment the file is read, the simulation just stops. <br>
<br>
The last lines of the flash log file is the following:<br>
[ 01-11-2014 19:24:23.616 ] message: vsize (MB):
202.06 (min) 202.12 (max) 202.06 (avg)<br>
[ 01-11-2014 19:24:23.619 ] message: rss (MB):
1.67 (min) 1.67 (max) 1.67 (avg)<br>
[ 01-11-2014 19:24:23.628 ] [io_readData] file opened:
type=checkpoint name=xSHx_hdf5_chk_0002<br>
<br>
I get a lot of core file dumps, one per core (see
attachement), and terminal error output of the following:<br>
NumPartProps: 18<br>
NumPartProps: 18<br>
2014-01-11 20:02:48.793 (WARN ) [0x400011e91e0]
:749776:ibm.runjob.client.Job: terminated by signal 5<br>
2014-01-11 20:02:48.793 (WARN ) [0x400011e91e0]
:749776:ibm.runjob.client.Job: abnormal termination by
signal 5 from rank 208<br>
<br>
The supercomputer is the Italian supercomputer CINECA and
has 16 cores nodes with 16 GB ram per node. My BGsize is
64 nodes with 16 ranks per node, meaning that I have 1 GB
per core and 1024 cores in total with 1 TB ram. <br>
<br>
The code is compiled with XLF compilers, i.e., mpixlf90
(not the mpixlf90_r, of which I do not know the use) and
mpixlc(xx). There were several ".f" files that would not
compile, so I solved it by compiling them separately with
mpixlf77. These files were "fftsg2d.f", "fftsg3d.f", and
"umap.F". I don't know how critical that is, but the
compilation is successful.<br>
<br>
Compiling the code in debug mode, i.e., with "<b>-g
-qfullpath -O0 -qcheck</b>" instead of the normal "<b>-O3</b>
-qintsize=4 -qrealsize=8 -c -qxlf90=autodealloc
-qsuffix=cpp=F -qtune=auto -qstrict -qarch=auto -qextname
-qzerosize" showed some warnings:<br>
"io_writeData.F90", line 242.10: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 308.24: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 316.21: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 469.22: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 480.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 481.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 482.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 483.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 491.34: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 493.37: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 494.37: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 506.29: 1516-152 (S) Zero-sized
arrays must not be subscripted.<br>
"io_writeData.F90", line 578.24: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
"io_writeData.F90", line 744.10: 1511-013 (W) The value of
the DO-loop increment should be negative when initial
value is greater than the terminal value.<br>
** io_writedata === End of Compilation 1 ===<br>
1501-511 Compilation failed for file io_writeData.F90.<br>
make: *** [io_writeData.o] Error 1<br>
<br>
The errors also appear in "IO_init.F90"<br>
"IO_init.F90", line 157.13: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 182.13: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 249.11: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 250.13: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
"IO_init.F90", line 251.13: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
"IO_init.F90", line 252.13: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
"IO_init.F90", line 278.8: 1511-013 (W) The value of the
DO-loop increment should be negative when initial value is
greater than the terminal value.<br>
"IO_init.F90", line 279.37: 1516-152 (S) Zero-sized arrays
must not be subscripted.<br>
** io_init === End of Compilation 1 ===<br>
1501-511 Compilation failed for file IO_init.F90.<br>
<br>
There were no errors without the -qcheck option. So, I
continued without the -qcheck. The same crash happens with
these compiler options without any extra log information.
These error/warning lines might related to
SCRATCH_GRID_VARS_* having end=0 and begin=1 values. I do
not make use of the scratch array, so I assumed this is
not critical.<br>
<br>
Any help is appreciated.<br>
Best, <br>
Seyit<br>
<br>
<br>
<br>
<br>
</div>
<span><core.602></span><span><seyit.vcf></span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
<span><seyit.vcf></span></blockquote></div><br></div></body></html>