[FLASH-USERS] Parallel HDF5 errors when no particles in sim

Aaron Tran aaron.tran at columbia.edu
Tue Oct 29 17:32:25 EDT 2019


Hi flash-users,

I am seeing some parallel HDF5 error messages with:

    ./setup SinkRotatingCloudCore -auto -3d +parallelIO

using FLASH v4.6.2, OpenMPI v3.1.1, various versions of HDF5 (1.8.21,
1.10.2, 1.10.5), and the default
SimulationMain/SinkRotatingCloudCore/flash.par

The simulation starts with no sink particles, and each particle file dump
triggers many error messages to STDOUT (example below).  After the first
sink particle forms, the error messages go away.  The error messages look
ominous, but the output files appear OK/usable both before and after the
first sink particle forms.

The error messages don't appear with serial HDF5 (omit "+parallelIO" from
./setup call, but use same compiled libraries).

Does anyone know the origin of these errors / how to fix them?  Besides,
say, starting with serial HDF5 then restarting with parallel after the
first particles form.  The errors seem harmless, but do clutter the log
files.

Thanks,
Aaron

Example error message (one such trace per rank per particle file):

HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 2:
  #000: H5Dio.c line 336 in H5Dwrite(): can't write data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 818 in H5D__write(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5Dmpio.c line 735 in H5D__contig_collective_write(): couldn't
finish shared collective MPI-IO
    major: Low-level I/O
    minor: Write failed
  #003: H5Dmpio.c line 2081 in H5D__inter_collective_io(): couldn't finish
collective MPI-IO
    major: Low-level I/O
    minor: Can't get value
  #004: H5Dmpio.c line 2125 in H5D__final_collective_io(): optimized write
failed
    major: Dataset
    minor: Write failed
  #005: H5Dmpio.c line 490 in H5D__mpio_select_write(): can't finish
collective parallel write
    major: Low-level I/O
    minor: Write failed
  #006: H5Fio.c line 165 in H5F_block_write(): write through page buffer
failed
    major: Low-level I/O
    minor: Write failed
  #007: H5PB.c line 1028 in H5PB_write(): write through metadata
accumulator failed
    major: Page Buffering
    minor: Write failed
  #008: H5Faccum.c line 826 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr =
18446744073709551615, size=0, eoa=6380768
    major: Invalid arguments to routine
    minor: Address overflowed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://flash.rochester.edu/pipermail/flash-users/attachments/20191029/7c752640/attachment.htm>


More information about the flash-users mailing list