[FLASH-USERS] Segmentation Fault After Lowering # of Processors

Joshua Martin joshua.martin.1 at stonybrook.edu
Mon Nov 6 10:25:33 EST 2023


FLASH Users,

I'm currently running a strong-scaling study and am running into some
issues when lowering the number of processors. The custom supernova problem
in FLASH I am using will run fine when on 16-96 cores (single node), but
when I use 8 cores or lower, I get a Signal 11 (Segmentation fault) while
attempting to load the same checkpoint file I use for the other runs:

--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 577551 on node dg036 exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

A few quick notes:

1) I am allocating a total of 37,000 blocks, and accordingly adjusting
maxblocks based on the number of processors I'm using.

2) FLASH will run on 1-8 cores if I use an earlier checkpoint file of ~15
GB. But it crashes when I try to use the ~30 GB checkpoint file from the
same problem.

3) This segfault happens when running the problem on 1-8 cores using the
GCC compiler on different architectures -- Intel Skylake, AMD Milan, and
Intel Sapphire Rapids. But, it loads the checkpoint file and runs fine when
using the AOCC compiler on the AMD Milan architecture, or when running on >
8 cores.

Any ideas what is going on with this?

Thank you!!

Josh Martin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20231106/68a2563f/attachment.htm>


More information about the flash-users mailing list