[FLASH-USERS] Segmentation Fault After Lowering # of Processors

Reyes, Adam adam.reyes at rochester.edu
Tue Nov 7 11:23:29 EST 2023


Hi Josh,

Are you able to reproduce this issue on any of the provided test problems? 

You could try compiling with debug flags to see if you’re able to get a usable back trace to narrow down where the problem might be.
*********************************************
Adam Reyes


Code Group Leader, Flash Center for Computational Science  
Research Scientist, Dept. of Physics and Astronomy
University of Rochester
River Campus: Bausch and Lomb Hall, 369  
500 Wilson Blvd. PO Box 270171, Rochester, NY 14627
Email adam.reyes at rochester.edu
Web https://flash.rochester.edu
 (he / him / his)


*********************************************



> On Nov 6, 2023, at 4:25 PM, Joshua Martin <joshua.martin.1 at stonybrook.edu> wrote:
> 
> FLASH Users,
> 
> I'm currently running a strong-scaling study and am running into some issues when lowering the number of processors. The custom supernova problem in FLASH I am using will run fine when on 16-96 cores (single node), but when I use 8 cores or lower, I get a Signal 11 (Segmentation fault) while attempting to load the same checkpoint file I use for the other runs:
> 
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 577551 on node dg036 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> A few quick notes:
> 
> 1) I am allocating a total of 37,000 blocks, and accordingly adjusting maxblocks based on the number of processors I'm using.
> 
> 2) FLASH will run on 1-8 cores if I use an earlier checkpoint file of ~15 GB. But it crashes when I try to use the ~30 GB checkpoint file from the same problem.
> 
> 3) This segfault happens when running the problem on 1-8 cores using the GCC compiler on different architectures -- Intel Skylake, AMD Milan, and Intel Sapphire Rapids. But, it loads the checkpoint file and runs fine when using the AOCC compiler on the AMD Milan architecture, or when running on > 8 cores.
> 
> Any ideas what is going on with this?
> 
> Thank you!!
> 
> Josh Martin.
> _______________________________________________
> flash-users mailing list
> flash-users at flash.rochester.edu
> 
> For list info, including unsubscribe:
> https://flash.rochester.edu/mailman/listinfo/flash-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20231107/fbe2665f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH.jpg
Type: image/jpeg
Size: 23876 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20231107/fbe2665f/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH-pride-sml.png
Type: image/png
Size: 12732 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20231107/fbe2665f/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1391 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20231107/fbe2665f/attachment.p7s>


More information about the flash-users mailing list