[FLASH-USERS] Code crash when moving to an AMD based machine
Haakon Andresen
haakon.andresen at astro.su.se
Tue Apr 30 04:36:31 EDT 2024
Hi,
I can with one caveat: I am using a modified version of FLASH4. I suspect the error is coming from part of the code that
has not been modified, but I am not 100% certain at this point in time.
I am using the Intel compilers on the machine where it works and cray (also tested gnu) on the machine where the code crashes.
After talking to a colleague, I have learned that the code compiled with Intel compilers runs on the AMD EPYC 7763 CPUs.
However, the Intel compilers are not available on the machine I am using.
Boundary conditions are "reflect" at the inner-boundary and "user" at the outer boundary
(the user condition essentially specifies an outflow/inflow with some modifications for density/gpot/radiation...
/Simulation/SimulationMain/CoreCollapse/Grid_bcApplyToRegionSpecialized.F90<https://github.com/snaphu-msu/BANG/blob/master/source/Simulation/SimulationMain/CoreCollapse/Grid_bcApplyToRegionSpecialized.F90> if it exists in the standard FLASH version).
I have traced the issue back to negative density values in some guard cells near an refinement boundary, but far away from the outer grid boundary and
not close to the inner grid boundary.
As for error messages, it is
"Error message is [EOS] rho < rhomin"
when using Cray and
"
Newton-Raphson failed in subroutine eos_helmholtz
(e and rho as input):
too many iterations 50
temp = NaN
dens = -2.2471185595023190E+307
pres = NaN
"
when using GNU. I am not sure these are informative for you since they could be specific to our version of
FLASH. The root cause is the negative densities that somehow appear in the guard-cell fill.
Thanks for your help.
Best,
Haakon
________________________________
From: Reyes, Adam <adam.reyes at rochester.edu>
Sent: 30 April 2024 10:04:20
To: Haakon Andresen
Cc: flash-users at flash.rochester.edu
Subject: Re: [FLASH-USERS] Code crash when moving to an AMD based machine
Hi Haakon,
Could you share a bit more context about what you’re observing, maybe the exact error from FLASH and the boundary conditions that you’re using? Are you using the same compiler between the two machines?
*********************************************
Adam Reyes
[FLASH.jpg]
Code Group Leader, Flash Center for Computational Science
Research Scientist, Dept. of Physics and Astronomy
University of Rochester
River Campus: Bausch and Lomb Hall, 369
500 Wilson Blvd. PO Box 270171, Rochester, NY 14627
Email adam.reyes at rochester.edu
Web https://flash.rochester.edu
(he / him / his)
[FLASH-pride-sml.png]
*********************************************
On Apr 29, 2024, at 4:02 PM, Haakon Andresen <haakon.andresen at astro.su.se> wrote:
Dear Flash users,
I am currently testing FLASH on a machine with AMD CPUs, specifically AMD EPYC 7763. Previously, I have only used the code on Intel based architecture. I have encountered a bug, which I believe is related to guard-cell fills, but I have not found the root cause yet.
I am doing core-collapse simulations with paramesh, the code initializes and writes the first checkpoint file, but then crashes in hydro-solver. The crash occurs during a call to the subroutine that is responsible for filling guard cells. My debugging has lead me all the way into the paramesh routines. The error, guard cells being filled with bad values, happens at a refinement boundary. The test is done in 1D.
The puzzling part is that the code, with the exact same setup, runs just fine on a different machine (an intel machine). I am wondering if anyone have seen similar behavior in the past. I am not sure that the root cause is in paramesh, but if anyone have experience anything similar I would love to hear about it, maybe it will help me identify the issue.
Best,
Haakon Andresen
_______________________________________________
flash-users mailing list
flash-users at flash.rochester.edu<mailto:flash-users at flash.rochester.edu>
For list info, including unsubscribe:
https://flash.rochester.edu/mailman/listinfo/flash-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20240430/3d5b6267/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH.jpg
Type: image/jpeg
Size: 23876 bytes
Desc: FLASH.jpg
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20240430/3d5b6267/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH-pride-sml.png
Type: image/png
Size: 12732 bytes
Desc: FLASH-pride-sml.png
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20240430/3d5b6267/attachment-0002.png>
More information about the flash-users
mailing list