JLD2 seems slow at write operations compared to serialize and HDF5

EDITED: Thanks to Elrod for politely pointing out a very obvious mistake in my code. I’m leaving the question open as even with the fix JLD2 is still significantly slower than HDF5.

Hi all,

I asked a question on StackOverflow here about the fastest way to save matrices of floats. I was directed to JLD2 and the docs page suggests that performance might be comparable to serialize. I just put together a quick test for writing matrices, and found JLD2 to be about 4 times slower than serialize and HDF5. Is this expected timings, or is there a way to speed this up? I’m on julia v0.6 on Ubuntu 16.04, and just ran a Pkg.update() before posting. My (very simple) test code follows:

using JLD2, FileIO, HDF5
function f_create_jld(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n).jld2"
        x = randn(1000,1000)
        @save fp x
        rm(fp)
    end
end
function f_create_dlm(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n).csv"
        writedlm(fp, randn(1000,1000), ',')
        rm(fp)
    end
end
function f_create_h5(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n).h5"
        h5write(fp, "G/D", randn(1000, 1000))
        rm(fp)
    end
end
function f_create_slz(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        fp = "$(dp)$(n)"
        fid1 = open(fp, "w")
        serialize(fid1, randn(1000, 1000))
        close(fid1)
        rm(fp)
    end
end
N = 1
f_create_jld(N)
f_create_dlm(N)
f_create_h5(N)
f_create_slz(N)

Then setting N = 10, I get:

julia> @time f_create_jld(N)
  0.452258 seconds (924 allocations: 76.376 MiB, 1.94% gc time)

julia> @time f_create_dlm(N)
  2.784344 seconds (10.02 M allocations: 418.429 MiB, 0.57% gc time)

julia> @time f_create_h5(N)
  0.106710 seconds (214 allocations: 76.303 MiB, 1.11% gc time)

julia> @time f_create_slz(N)
  0.105692 seconds (224 allocations: 76.313 MiB, 4.09% gc time)
function f_create_jld(N::Int)
    dp = "/home/colin/Temp/"
    for n = 1:N
        for n = 1:N
            fp = "$(dp)$(n).jld2"
            x = randn(1000,1000)
            @save fp x
            rm(fp)
        end
    end
end

I’d try removing one of those for loops. That’d only account for about 10x, but would put jld2 closer.

Oh jesus, that’s embarrassing.

Thanks for pointing that out. I must have looked 10 times and not spotted this.

ps have edited the question and title to reflect the correction.