[FLASH-USERS] Issue with Cray Compiler

Alan Calder alan.calder at stonybrook.edu
Thu Nov 12 21:14:04 EST 2020


Hi Klaus,

Thanks for the reply! And hope all is well with you!

On Thu, Nov 12, 2020 at 6:13 PM Klaus Weide <klaus at flash.uchicago.edu>
wrote:

> On Wed, 11 Nov 2020, Alan Calder wrote:
>
> > First I should note that things work well with the Gnu compilers. We do
> not
> > see this problem when compiling with the gnu compilers.
>
> That's good.
>
> Does this also mean that you don't get any "WARNING after gc filling"
> messages with GNU compilers, in the situation where you did get them with
> the Cray compiler?
>
> Yep. I ran with all the debugging options of the Gnu compiler and saw
nothing unusual. We also compiled and ran with the Nvidia compiler without
any issues.


>
> > The shortest possible summary is that the code seems to enter an
> unphysical
> > state and crash,
>
> I don't have an idea what's going on here. Just want to point you to a
> couple of runtime parameters that you may find useful in this kind of
> situation:
>
>    D dr_dtMinContinue  Minimum computed timestep to continue the simulation
>    PARAMETER dr_dtMinContinue      REAL    0.0     [0.0 ...]
>
>    D dr_dtMinBelowAction  Action to take when computed new timestep is
> below dr_dtMinContinue.
>    D & Use 0 for none (abort immediately), 1 for "write checkpoint then
> abort"
>    PARAMETER dr_dtMinBelowAction   INTEGER 1       [0,1]
>
>
>
I'll try these.



> > and with the split hydro solver I see a warning
> >
> > WARNING after gc filling: min. unk(EINT_VAR)=9.9999999735241242E-11
> >   PE=4     block=6
> >                     type=1
> >
>
> These indicate that you have very small values of internal energy in
> some cells. Not sure now whether that's kind of a normal thing for Sedov
> close to the origin...  Maybe lower "smallE" to eliminate the warning?
> Since this happens at the very beginning, maybe you can plot the initial
> condition to figure out what's going on.
>
> Not sure why this would not apply equally to "split" and "unsplit" Hydro.
>
>
> > We compile with -c -g -G 2 -s real64 -s integer32 and the code generates
> a
> > few warnings, none of which seem relevant.
>
> I agree that the compilation warnings don't appear to be of immediate
> concern.
>
> Klaus
>

It may well be an issue with the Cray compiler. We were playing with
valgrind and a few other things and saw messages that suggested that
perhaps there might be something missing for the ARM architecture.

[acalder at fj-debug1 object]$ mpirun -n 1 valgrind ./flash4
==20316== Memcheck, a memory error detector
==20316== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20316== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==20316== Command: ./flash4
==20316==
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==20316== valgrind: Unrecognised instruction at address 0x58d09b8.
==20316==    at 0x58D09B8: __cray_cpu_detect_arm (in
/lustre/software/CPE/cray/pe/cce-sve/10.0.1/cce/aarch64/lib/libu.so.1.0)
==20316==    by 0x58D1A9F: memcpy (in
/lustre/software/CPE/cray/pe/cce-sve/10.0.1/cce/aarch64/lib/libu.so.1.0)
==20316== Your program just tried to execute an instruction that Valgrind
==20316== did not recognise.  There are two possible reasons for this.
==20316== 1. Your program has a bug and erroneously jumped to a non-code
==20316==    location.  If you are running Memcheck and you just saw a
==20316==    warning about a bad jump, it's probably your program's fault.
==20316== 2. The instruction is legitimate but Valgrind doesn't handle it,
==20316==    i.e. it's Valgrind's fault.  If you think this is the case or
==20316==    you are not sure, please let us know and we'll try to fix it.
==20316== Either way, Valgrind will now raise a SIGILL signal which will
==20316== probably kill your program.


I'll keep digging and report anything I find.

Thanks!

Alan
-- 
Alan C. Calder
Department of Physics and Astronomy
State University of New York at Stony Brook
Stony Brook, NY 11794-3800

office: ESS 438
phone:  (631) 632-1176
fax:  (631) 632-1745
web: http://www.astro.sunysb.edu/acalder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20201112/59a26ed4/attachment-0001.htm>


More information about the flash-users mailing list