Best way to implement distributed read/write with Zarr.jl?

ConnectedSystems · April 29, 2022, 7:14am

Trying out Zarr.jl for comparison with NetCDF.jl

I ran into an issue with distributed writes, where workers were silently hanging.

I thought I could pass the data store created by zopen() via pmap(), similar to a SharedArray or DistributedArray, but it seems I have to get each worker to reopen the store.

Is this the best way to approach writing to a common data store?

using Zarr, Distributed
using Base.Iterators


# Spin up workers if needed
if nprocs() == 1
    addprocs(4, exeflags="--project=.")
    @everywhere begin
        # Activate environment and precompile
        using Pkg; Pkg.activate(@__DIR__)
        Pkg.instantiate(); Pkg.precompile()

        using Zarr

        z_fn = "./zdev.zarr"
    end
end

# Set up Zarr store
d_dims = (100, 100, 16, 4)
z1 = zcreate(Float32, d_dims..., path=z_fn, fill_value=0.0, chunks=(d_dims[1], d_dims[2], d_dims[3], 1))

@everywhere begin
    function simulate!(z_fn, reps, scen)

        # Each worker has to open the data store, else worker silently hangs
        zr = zopen(z_fn, "w")

        # Run "model" and store results
        for i in 1:reps
            zr[:, :, i, scen] .= rand(100, 100)
        end
    end
end


_ = pmap((x) -> simulate!(z_fn, x[1], x[2]), product(d_dims[3], 1:d_dims[4]));

Topic		Replies	Views
[ANN] SmallZarrGroups.jl - Save and load hierarchy of arrays and metadata Package Announcements	0	241	March 21, 2023
Problems with distributed package General Usage package , parallel , memory-allocation , distributed	0	131	May 6, 2024
[ANN] DiskArrays.jl Package Announcements	3	1626	November 2, 2020
ZMQ and Distributed General Usage zmq , parallel , distributed	1	565	December 10, 2022
Setting up worker local buffers when using Distributed.jl Performance distributed	2	163	January 11, 2024

Best way to implement distributed read/write with Zarr.jl?

Related topics