JLD takes too long reading names from a file

LewisHein · February 17, 2017, 2:40pm

JLD seems to take an awfully long time just reading the names present in a JLD file – not reading values, just names.

julia> outfile = jldopen("/tmp/blah.jl", "w")
Julia data file version 0.1.1: /tmp/blah.jl

julia> for i in 1:10000
   write(outfile, "$(rand())", rand())
   end
julia> names(outfile) #compile (though it doesn't seem to make any difference)
julia> @time names(outfile)
44.540928 seconds (40.15 k allocations: 1.688 MB)

When the file contains ~75,000 entries. this names() call takes several hours.

On the contrary,

  time h5ls /tmp/blah.jl

Takes only

 real	0m0.284s
 user	0m0.236s
 sys	0m0.030s

What is going on? How do I fix this? (I’m willing to mess around with the internals of JLD if necessary)

vchuravy · April 11, 2017, 9:12am

A quick investigation with @profile and ProfileView.view(C=true) shows that most of the time is spend within HDF5 especially the H5Gget_objname_by_idx function (see https://github.com/JuliaIO/HDF5.jl/blob/0366bb050d8ded8dff2d8f148818151610bbb75b/src/HDF5.jl#L987 where the call originates).

Since profiling seems to indicate the time is not spend within Julia h5ls must be using a different API to access the information or HDF5.jl is using the API the wrong way.

kristoffer.carlsson · April 11, 2017, 9:28am

FWIW:

using JLD2

Base.names(jld::JLD2.JLDFile) = keys(jld.datasets)

outfile = jldopen("/tmp/blah.jl", "w");
for i in 1:10000
    write(outfile, "$(rand())", rand())
end
close(outfile)

julia> @time infile = jldopen("/tmp/blah.jl", "r");
  0.000445 seconds (141 allocations: 11.354 KiB)

julia> @time names(infile);
  0.008764 seconds (3.38 k allocations: 169.777 KiB)

https://github.com/simonster/JLD2.jl

Topic		Replies	Views
Dict loaded from jld file very long time... Why 22 seconds? General Usage jld	7	1809	August 24, 2017
JLD2 seems slow at write operations compared to serialize and HDF5 General Usage data	3	1169	November 20, 2017
ANN: JLD2 (JLD in pure Julia) Community	15	3682	October 26, 2017
Saving workspace in JLD New to Julia jld	2	1791	April 18, 2020
How to read range of JLD file? Data question , jld	19	3963	January 18, 2017

JLD takes too long reading names from a file

Related topics