[FLASH-USERS] FLASH potential bug with output filenames on Lustre file system - io_h5file_interface.c

Reyes, Adam adam.reyes at rochester.edu
Tue Sep 5 15:09:32 EDT 2023


Hi Tim,

Thanks for the very detailed bug report! 

I would guess that the issue has to do with the use of the runtime parameter “output_dir” leading to an absolute path for the io output files being too long for the allocated strings. If that’s the case it should be pretty fixable.

As a workaround you should be able to copy the flash4 executable as well as any par and data files (such as EoS/opacity tables) to the desired output directory and launch FLASH directly from there. This way you can leave out the “output_dir” runtime parameter and have FLASH output directly to the CWD.

*********************************************
Adam Reyes


Code Group Leader, Flash Center for Computational Science  
Research Scientist, Dept. of Physics and Astronomy
University of Rochester
River Campus: Bausch and Lomb Hall, 369  
500 Wilson Blvd. PO Box 270171, Rochester, NY 14627
Email adam.reyes at rochester.edu
Web https://flash.rochester.edu
 (he / him / his)


*********************************************



> On Sep 5, 2023, at 7:20 PM, Timothy Mark Johnson <tmarkj at mit.edu> wrote:
> 
> Hi FLASH users and developers,
>  
> I seem to have found a bug in the code that saves output files in HDF5. I originally noticed this problem with my own simulation, but I was able to replicate the issue with the laserslab example simulation. I’m using the MIT Engaging cluster to run the simulations which has a high performance Lustre file system. I get the errors specifically when I set the FLASH output_directory to the Lustre file system while the simulation directory is in an NFS file system. The issue also only arises if I’m using more than one MPI process.
>  
> I’ve attached a copy of the error messages without my added debugging statements. I’ve also attached a copy with some debugging statements that I’ve added. The thing I’ve gleaned from the debugging statements is that if my filename in Fortran is too long, the filename in C gets messed up and includes extra bits at the end which cause the problem. This should be visible in the debugging output. I’ve also attached the modified io_h5file_interface.c and io_initFile.F90 files so you can see where I’ve added debugging statements. The io_h5file_interface.c function in question is io_h5init_file. To get the code to crash, I’ve changed the basenm in the flash.par to be “gasjetexp_”, the same as when I discovered this issue.
>  
> I’ve also attached output from when I changed the basenm to be “test_”. The code works fine in this case, but you can see weird characters at the end of strings. I had to set ignoreForcedPlot to true to get it to work. With the additional characters of “forced” added to the filename, it still crashes. It might just be that my filename plus path is too long.
>  
> Note that the crash code doesn’t give a good “stack trace”. I managed to get a good one with my own simulation which I’ve attached as “original_stacktrace.txt”. This is how I figured out to look at the files discussed earlier.
>  
> For additional details, I’m using FLASH4.6.2, gcc 6.3.0, openmpi 2.1.1, and HDF5 1.10.5. The gcc is provided by the cluster in a module file but I’ve built openmpi and HDF5 from source. I can provide more details about how I configured/built these libraries if needed. Some information (not very comprehensive) about the engaging cluster can be found here: https://engaging-web.mit.edu/eofe-wiki/.
>  
> For even more context, here is my FLASH setup command: ./setup -auto LaserSlab -2d +cylindrical -nxb=16 -nyb=16 +hdf5typeio species=cham,targ +mtmmmt +laser +uhd3t +mgd mgd_meshgroups=6 -parfile=example.par -without-unit=physics/sourceTerms/EnergyDeposition/EnergyDepositionMain/Laser/LaserIO +hdf5typeIO +parallelIO -objdir=test_laserslab
>  
> I’ve had to disable LaserIO due to some other problems I’ve had with parallel HDF5 output. I’ve also attached my flash.par file.
>  
> I hope what I’ve written here is enough to describe the problem. Hopefully other users using Lustre filesystems can be aware of this potential issue.
>  
> Best,
>  
> Tim Johnson
> PhD Candidate
> High Energy Density Physics Group
> MIT Plasma Science and Fusion Center
>  
> <flash.par><io_h5file_interface.c><io_initFile.F90><original_stacktrace.txt><output_no_debug.txt><output_with_debug.txt><output_working.txt>_______________________________________________
> flash-users mailing list
> flash-users at flash.rochester.edu <mailto:flash-users at flash.rochester.edu>
> 
> For list info, including unsubscribe:
> https://flash.rochester.edu/mailman/listinfo/flash-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/d24108f9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH.jpg
Type: image/jpeg
Size: 23876 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/d24108f9/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH-pride-sml.png
Type: image/png
Size: 12732 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/d24108f9/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1391 bytes
Desc: not available
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/d24108f9/attachment.p7s>


More information about the flash-users mailing list