Write a String with parallel HDF5

Does anyone know the ‘correct’ way to write a scalar variable (especially a String) using parallel HDF5? I think I have a way of doing it, but it’s so ugly I’m hoping there’s a better way!

The problem: suppose I have a String on one MPI rank that I want to write to an HDF5 file that has been opened for parallel I/O. This is especially tricky because only that rank knows the actual length of the string.

I’ve looked for HDF5 documentation on how they recommend to do something like this, but haven’t managed to find anything relevant - the documentation I’ve found for parallel HDF5 doesn’t go very far (refs - HDF5: A Brief Introduction to Parallel HDF5 and Parallel HDF5 Questions).

My solution, reduced to a MWE, is

parallel-hdf5-test-script.jl:

using HDF5, MPI

function main()
    MPI.Init()

    my_rank = MPI.Comm_rank(MPI.COMM_WORLD)

    output_file = h5open("test.h5", "cw", MPI.COMM_WORLD)

    # Generate a stupid String as an example
    s = "x" ^ rand(1:20)

    # Broadcast the string length from the process we want to write from
    string_size = Ref(length(s))
    MPI.Bcast!(string_size, MPI.COMM_WORLD; root=0)

    if my_rank != 0
        s = " " ^ string_size[]
    end

    # This needs to be called on all processes, with a String of the right length
    # The 'datatype' `var_hdf5_type` contains the length of `s`, which allows
    # the following `write_dataset()` call to work.
    io_var, var_hdf5_type = create_dataset(output_file, "foo", s)

    if my_rank == 0
        # Only need/want to write from a single rank
        write_dataset(io_var, var_hdf5_type, s)
    end

    close(output_file)
end

main()

To run, assuming you have set up HDF5.jl with MPI support:

$ mpirun -np 2 julia parallel-hdf5-test-script.jl

The solution is not so awful, but it wasn’t at all obvious to me to start with (it must have been about the 5th or 6th thing I tried). So even if there isn’t a better way, I’m making this post so I can at least find this solution again!

I’ve always found string handling in HDF5 confusing, so an extra, more specific question: Is the Bcast!() of the string length absolutely necessary? My current guess/understanding is that it is, because the length is part of the ‘string type’ that HDF5 uses to write it, and so has to be included when the variable is created (and variable creation is a collective process that has to be done on all ranks at once).

Yep, it’s a mess.

I don’t think Parallel HDF5 supports variable-length datatypes (at least it didn’t last I checked), so you have to encode it them as fixed-length strings. But dataset creation has to be done collectively, which means all ranks need to know the length of the string to encode in the datatype.

Possible workarounds: