JLD2 seems slow at write operations compared to serialize and HDF5

data

#1

EDITED: Thanks to Elrod for politely pointing out a very obvious mistake in my code. I’m leaving the question open as even with the fix JLD2 is still significantly slower than HDF5.

Hi all,

I asked a question on StackOverflow here about the fastest way to save matrices of floats. I was directed to JLD2 and the docs page suggests that performance might be comparable to serialize. I just put together a quick test for writing matrices, and found JLD2 to be about 4 times slower than serialize and HDF5. Is this expected timings, or is there a way to speed this up? I’m on julia v0.6 on Ubuntu 16.04, and just ran a Pkg.update() before posting. My (very simple) test code follows:

using JLD2, FileIO, HDF5
function f_create_jld(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n).jld2"
        x = randn(1000,1000)
        @save fp x
        rm(fp)
    end
end
function f_create_dlm(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n).csv"
        writedlm(fp, randn(1000,1000), ',')
        rm(fp)
    end
end
function f_create_h5(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n).h5"
        h5write(fp, "G/D", randn(1000, 1000))
        rm(fp)
    end
end
function f_create_slz(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n)"
        fid1 = open(fp, "w")
        serialize(fid1, randn(1000, 1000))
        close(fid1)
        rm(fp)
    end
end
N = 1
f_create_jld(N)
f_create_dlm(N)
f_create_h5(N)
f_create_slz(N)

Then setting N = 10, I get:

julia> @time f_create_jld(N)
  0.452258 seconds (924 allocations: 76.376 MiB, 1.94% gc time)

julia> @time f_create_dlm(N)
  2.784344 seconds (10.02 M allocations: 418.429 MiB, 0.57% gc time)

julia> @time f_create_h5(N)
  0.106710 seconds (214 allocations: 76.303 MiB, 1.11% gc time)

julia> @time f_create_slz(N)
  0.105692 seconds (224 allocations: 76.313 MiB, 4.09% gc time)

#2
function f_create_jld(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        for n = 1:N
            fp = "$(dp)$(n).jld2"
            x = randn(1000,1000)
            @save fp x
            rm(fp)
        end
    end
end

I’d try removing one of those for loops. That’d only account for about 10x, but would put jld2 closer.


#3

Oh jesus, that’s embarrassing.

Thanks for pointing that out. I must have looked 10 times and not spotted this.


#4

ps have edited the question and title to reflect the correction.