[FLASH-USERS] FLASH initialisation hanging on Lustre filesystem
Bertini, Denis Dr.
D.Bertini at gsi.de
Wed Jun 19 04:07:28 EDT 2024
Hi Adam
Thanks for the quick answer !
I also do believe that it will be beneficial to use the one process read + broadcasting approach in the future
especially when using distributed file system lie lustre
Denis
________________________________
From: Reyes, Adam <adam.reyes at rochester.edu>
Sent: Wednesday, June 19, 2024 10:04:32 AM
To: Bertini, Denis Dr.
Cc: flash-users at flash.rochester.edu
Subject: Re: [FLASH-USERS] FLASH initialisation hanging on Lustre filesystem
Hi Denis,
That’s great to hear that you were able to figure it out!
The files get read two times in calls originating from "source/physics/Eos/EosMain/Tabulated/eos_initTabulated.F90” with calls to “eos_tabBrowseTables” and “eos_tabReadTables”. The first is just to determine the size of arrays to allocate for the tables and the second will actually fill them. If you follow the calls you will end up in the browse/read routines for each supported table type, “eos_tab[Read|Browse]Ionmix4Tables.F90 “ for the “.cn4” tables.
In principle these don’t need to be read by every single MPI process and instead could be read by one and broadcast to the rest
*********************************************
Adam Reyes
[FLASH.jpg]
Code Group Leader, Flash Center for Computational Science
Research Scientist, Dept. of Physics and Astronomy
University of Rochester
River Campus: Bausch and Lomb Hall, 369
500 Wilson Blvd. PO Box 270171, Rochester, NY 14627
Email adam.reyes at rochester.edu
Web https://flash.rochester.edu
(he / him / his)
[FLASH-pride-sml.png]
*********************************************
On Jun 19, 2024, at 9:51 AM, Bertini, Denis Dr. <D.Bertini at gsi.de> wrote:
Dear Flash developper,
I finally found out why flash simulation are systematically hanging on our lustre filesystem when using AMD EPYC 7k compute node ( 128 physical cores ).
The problem lies in concurrent reading on lustre.
When FLASH start initialization, all processes needs to read the same input data files ( the .cn4 etc ) and need to allocate
in memory parameters , data arrays etc ...
The problem is that this reading is of course asynchronous ( as it should be with MPI ) but do not need in principle complex synchronization or distributed lock mechanism
which should be only relevant in the writing case.
In fact when too many processes from one client try to read concurently the same .cn4 input file all the processes will hang and the lustre directory get corrupted and not accessible anymore.
Moving all the needed input files to the /tmp (strict posix ) on each node ( they are copied once per node )and adapting the flash.par file to read from /tmp and NOT from /lustre
solved the issue.
Using this approach, flash simulation jobs run now stable and can use all cores / node even on AMD EPYC 7k architecture.
Writing shows no problem since in this case there is no scaling issue ( collective MPI-IO is used )
FLASH I/O writing capability has been tested with a adapted/modified version of the official flash I/O benchmark
in order to be able to run with latest gfortran, MPI and HDF5 libraries.
For those interested the modified version can be freely downloaded here:
https://git.gsi.de/d.bertini/pp-flash/-/tree/main/flash_io?ref_type=heads
The FLASH I/O writing benchmarks is stable and shows good results on /lustre
As this seems to be related to a lustre bug ( client or server side ? ) it would be nice to create a small MPI program that just read these .cn4 file to
reproduce this problem.
Could you tell me which routines in flash are reading the .cn4 file for initialisation ?
Thanks in advance
---------
Denis Bertini
Abteilung: CIT
Ort: SB3 2.265a
Tel: +49 6159 71 2240
Fax: +49 6159 71 2986
E-Mail: d.bertini at gsi.de<mailto:d.bertini at gsi.de>
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de<http://www.gsi.de/>
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz
_______________________________________________
flash-users mailing list
flash-users at flash.rochester.edu<mailto:flash-users at flash.rochester.edu>
For list info, including unsubscribe:
https://flash.rochester.edu/mailman/listinfo/flash-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20240619/8aece88a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH.jpg
Type: image/jpeg
Size: 23876 bytes
Desc: FLASH.jpg
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20240619/8aece88a/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FLASH-pride-sml.png
Type: image/png
Size: 12732 bytes
Desc: FLASH-pride-sml.png
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20240619/8aece88a/attachment-0001.png>
More information about the flash-users
mailing list