[FLASH-USERS] Issue with Cray Compiler
Alan Calder
alan.calder at stonybrook.edu
Thu Nov 12 21:14:04 EST 2020
Hi Klaus,
Thanks for the reply! And hope all is well with you!
On Thu, Nov 12, 2020 at 6:13 PM Klaus Weide <klaus at flash.uchicago.edu>
wrote:
> On Wed, 11 Nov 2020, Alan Calder wrote:
>
> > First I should note that things work well with the Gnu compilers. We do
> not
> > see this problem when compiling with the gnu compilers.
>
> That's good.
>
> Does this also mean that you don't get any "WARNING after gc filling"
> messages with GNU compilers, in the situation where you did get them with
> the Cray compiler?
>
> Yep. I ran with all the debugging options of the Gnu compiler and saw
nothing unusual. We also compiled and ran with the Nvidia compiler without
any issues.
>
> > The shortest possible summary is that the code seems to enter an
> unphysical
> > state and crash,
>
> I don't have an idea what's going on here. Just want to point you to a
> couple of runtime parameters that you may find useful in this kind of
> situation:
>
> D dr_dtMinContinue Minimum computed timestep to continue the simulation
> PARAMETER dr_dtMinContinue REAL 0.0 [0.0 ...]
>
> D dr_dtMinBelowAction Action to take when computed new timestep is
> below dr_dtMinContinue.
> D & Use 0 for none (abort immediately), 1 for "write checkpoint then
> abort"
> PARAMETER dr_dtMinBelowAction INTEGER 1 [0,1]
>
>
>
I'll try these.
> > and with the split hydro solver I see a warning
> >
> > WARNING after gc filling: min. unk(EINT_VAR)=9.9999999735241242E-11
> > PE=4 block=6
> > type=1
> >
>
> These indicate that you have very small values of internal energy in
> some cells. Not sure now whether that's kind of a normal thing for Sedov
> close to the origin... Maybe lower "smallE" to eliminate the warning?
> Since this happens at the very beginning, maybe you can plot the initial
> condition to figure out what's going on.
>
> Not sure why this would not apply equally to "split" and "unsplit" Hydro.
>
>
> > We compile with -c -g -G 2 -s real64 -s integer32 and the code generates
> a
> > few warnings, none of which seem relevant.
>
> I agree that the compilation warnings don't appear to be of immediate
> concern.
>
> Klaus
>
It may well be an issue with the Cray compiler. We were playing with
valgrind and a few other things and saw messages that suggested that
perhaps there might be something missing for the ARM architecture.
[acalder at fj-debug1 object]$ mpirun -n 1 valgrind ./flash4
==20316== Memcheck, a memory error detector
==20316== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20316== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==20316== Command: ./flash4
==20316==
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==20316== valgrind: Unrecognised instruction at address 0x58d09b8.
==20316== at 0x58D09B8: __cray_cpu_detect_arm (in
/lustre/software/CPE/cray/pe/cce-sve/10.0.1/cce/aarch64/lib/libu.so.1.0)
==20316== by 0x58D1A9F: memcpy (in
/lustre/software/CPE/cray/pe/cce-sve/10.0.1/cce/aarch64/lib/libu.so.1.0)
==20316== Your program just tried to execute an instruction that Valgrind
==20316== did not recognise. There are two possible reasons for this.
==20316== 1. Your program has a bug and erroneously jumped to a non-code
==20316== location. If you are running Memcheck and you just saw a
==20316== warning about a bad jump, it's probably your program's fault.
==20316== 2. The instruction is legitimate but Valgrind doesn't handle it,
==20316== i.e. it's Valgrind's fault. If you think this is the case or
==20316== you are not sure, please let us know and we'll try to fix it.
==20316== Either way, Valgrind will now raise a SIGILL signal which will
==20316== probably kill your program.
I'll keep digging and report anything I find.
Thanks!
Alan
--
Alan C. Calder
Department of Physics and Astronomy
State University of New York at Stony Brook
Stony Brook, NY 11794-3800
office: ESS 438
phone: (631) 632-1176
fax: (631) 632-1745
web: http://www.astro.sunysb.edu/acalder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20201112/59a26ed4/attachment-0001.htm>
More information about the flash-users
mailing list