[FLASH-USERS] FLASH potential bug with output filenames on Lustre file system - io_h5file_interface.c
Timothy Mark Johnson
tmarkj at mit.edu
Tue Sep 5 13:20:20 EDT 2023
Hi FLASH users and developers,
I seem to have found a bug in the code that saves output files in HDF5. I originally noticed this problem with my own simulation, but I was able to replicate the issue with the laserslab example simulation. I'm using the MIT Engaging cluster to run the simulations which has a high performance Lustre file system. I get the errors specifically when I set the FLASH output_directory to the Lustre file system while the simulation directory is in an NFS file system. The issue also only arises if I'm using more than one MPI process.
I've attached a copy of the error messages without my added debugging statements. I've also attached a copy with some debugging statements that I've added. The thing I've gleaned from the debugging statements is that if my filename in Fortran is too long, the filename in C gets messed up and includes extra bits at the end which cause the problem. This should be visible in the debugging output. I've also attached the modified io_h5file_interface.c and io_initFile.F90 files so you can see where I've added debugging statements. The io_h5file_interface.c function in question is io_h5init_file. To get the code to crash, I've changed the basenm in the flash.par to be "gasjetexp_", the same as when I discovered this issue.
I've also attached output from when I changed the basenm to be "test_". The code works fine in this case, but you can see weird characters at the end of strings. I had to set ignoreForcedPlot to true to get it to work. With the additional characters of "forced" added to the filename, it still crashes. It might just be that my filename plus path is too long.
Note that the crash code doesn't give a good "stack trace". I managed to get a good one with my own simulation which I've attached as "original_stacktrace.txt". This is how I figured out to look at the files discussed earlier.
For additional details, I'm using FLASH4.6.2, gcc 6.3.0, openmpi 2.1.1, and HDF5 1.10.5. The gcc is provided by the cluster in a module file but I've built openmpi and HDF5 from source. I can provide more details about how I configured/built these libraries if needed. Some information (not very comprehensive) about the engaging cluster can be found here: https://engaging-web.mit.edu/eofe-wiki/.
For even more context, here is my FLASH setup command: ./setup -auto LaserSlab -2d +cylindrical -nxb=16 -nyb=16 +hdf5typeio species=cham,targ +mtmmmt +laser +uhd3t +mgd mgd_meshgroups=6 -parfile=example.par -without-unit=physics/sourceTerms/EnergyDeposition/EnergyDepositionMain/Laser/LaserIO +hdf5typeIO +parallelIO -objdir=test_laserslab
I've had to disable LaserIO due to some other problems I've had with parallel HDF5 output. I've also attached my flash.par file.
I hope what I've written here is enough to describe the problem. Hopefully other users using Lustre filesystems can be aware of this potential issue.
Best,
Tim Johnson
PhD Candidate
High Energy Density Physics Group
MIT Plasma Science and Fusion Center
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flash.par
Type: application/octet-stream
Size: 7758 bytes
Desc: flash.par
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: io_h5file_interface.c
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: io_initFile.F90
Type: application/octet-stream
Size: 1714 bytes
Desc: io_initFile.F90
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment-0001.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: original_stacktrace.txt
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: output_no_debug.txt
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: output_with_debug.txt
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: output_working.txt
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20230905/e4b1a183/attachment-0003.txt>
More information about the flash-users
mailing list