I have a long-running simulation that logs to an HDF5 file on each sample. I preallocate space, and write, say, a vector to index 1 on the first sample, then to index 2 on the next sample, etc. I’m finding that logging doubles my simulation’s runtime. That’s not unexpected, but I also wonder if that’s necessary. Using the Profiler, it seems my simulation is spending a ton of time in HDF5.jl:1748; setindex!
. I’m hoping we can find a way to make this faster, ideally by changing how I’m logging, or potentially by finding something that we can do to HDF5.jl itself.
Here’s how I’m calling it:
colons = (Colon() for i in 1:num_dims) # patterned after EllipsisNotation.jl
dataset[colons..., index] = data # write to, e.g., dataset(:, :, 100)
Note that data
can have any number of dimensions (I might use it for scalars or vectors or matrices, etc.), but it will always have the same dimensions from sample to sample.
Here are the results of the Profiler for one instance:
171 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1748; setindex!(::HDF5.HDF5Dataset, ::Array{Float64,1}, ::Colon, ::Int64)
26 .\tuple.jl:108; ntuple(::HDF5.##16#17{HDF5.HDF5Dataset,Tuple{Colon,Int64}}, ::Int64)
25 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1748; #16
25 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1045; size(::HDF5.HDF5Dataset, ::Int64)
14 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1047; ndims(::HDF5.HDF5Dataset)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1041; size(::HDF5.HDF5Dataset)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2220; h5s_get_simple_extent_dims(::Int32)
11 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1043; size(::HDF5.HDF5Dataset)
1 .\essentials.jl:0; cnvt_all(::Type{T} where T, ::UInt64, ::UInt64, ::Vararg{UInt64,N} where N)
3 .\essentials.jl:35; cnvt_all(::Type{T} where T, ::UInt64, ::UInt64, ::Vararg{UInt64,N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1040; size(::HDF5.HDF5Dataset)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1162; dataspace(::HDF5.HDF5Dataset)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2152; h5d_get_space
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1041; size(::HDF5.HDF5Dataset)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2218; h5s_get_simple_extent_dims(::Int32)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2220; h5s_get_simple_extent_dims(::Int32)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1042; size(::HDF5.HDF5Dataset)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:791; close(::HDF5.HDF5Dataspace)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2053; h5s_close
7 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1043; size(::HDF5.HDF5Dataset)
1 .\essentials.jl:0; cnvt_all(::Type{T} where T, ::UInt64, ::UInt64, ::Vararg{UInt64,N} where N)
3 .\essentials.jl:35; cnvt_all(::Type{T} where T, ::UInt64, ::UInt64, ::Vararg{UInt64,N} where N)
1 .\essentials.jl:35; cnvt_all(::Type{T} where T, ::UInt64)
13 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1700; setindex!(::HDF5.HDF5Dataset, ::Array{Float64,1}, ::UnitRange{Int64}, ::Int64)
6 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1816; hdf5_to_julia(::HDF5.HDF5Dataset)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1853; hdf5_to_julia_eltype(::HDF5.HDF5Datatype)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2152; h5t_get_native_type(::Int32, ::Int64)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1861; hdf5_to_julia_eltype(::HDF5.HDF5Datatype)
1 .\dict.jl:473; getindex
1 .\dict.jl:322; ht_keyindex(::Dict{Any,DataType}, ::Tuple{Int32,Void,UInt64})
1 .\dict.jl:210; hashindex
1 .\tuple.jl:296; hash(::Tuple{Int32,Void,UInt64}, ::UInt64)
1 .\hashing.jl:10; hash
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1820; hdf5_to_julia(::HDF5.HDF5Dataset)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1824; hdf5_to_julia(::HDF5.HDF5Dataset)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1162; dataspace(::HDF5.HDF5Dataset)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:406; Type
3 .\base.jl:129; finalizer(::Any, ::Any)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1828; hdf5_to_julia(::HDF5.HDF5Dataset)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1835; hdf5_to_julia(::HDF5.HDF5Dataset)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:791; close(::HDF5.HDF5Dataspace)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2053; h5s_close
130 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1701; setindex!(::HDF5.HDF5Dataset, ::Array{Float64,1}, ::UnitRange{Int64}, ::Int64)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1704; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1707; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1711; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1714; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 .\operators.jl:107; !=(::Type{T} where T, ::Type{T} where T)
17 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1717; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1752; hyperslab(::HDF5.HDF5Dataset, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:0; dataspace(::HDF5.HDF5Dataset)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1162; dataspace(::HDF5.HDF5Dataset)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2152; h5d_get_space
8 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1754; hyperslab(::HDF5.HDF5Dataset, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
8 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2220; h5s_get_simple_extent_dims(::Int32)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1755; hyperslab(::HDF5.HDF5Dataset, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1765; hyperslab(::HDF5.HDF5Dataset, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1770; hyperslab(::HDF5.HDF5Dataset, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1784; hyperslab(::HDF5.HDF5Dataset, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2053; h5s_select_hyperslab(::Int32, ::Int64, ::Array{UInt64,1}, ::Array{UInt64,1}, ::Array{UInt64,1}, ::Ptr{Void})
1 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1718; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1721; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
3 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2053; h5d_write(::Int32, ::Int32, ::Int32, ::Int32, ::Int64, ::Array{Float64,1})
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:1724; _setindex!(::HDF5.HDF5Dataset, ::Type{T} where T, ::Array{Float64,1}, ::UnitRange{Int64}, ::Vararg{Union{Int64, Range{Int64}},N} where N)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:791; close(::HDF5.HDF5Dataspace)
2 C:\Users\Tucker\.julia\HDF5\src\HDF5.jl:2053; h5s_close
It seems a lot of time is spent inside this function of HDF5.jl, but not inside either of the two functions that it calls:
1699 function setindex!(dset::HDF5Dataset, X::Array, indices::Union{AbstractRange{Int},Int}...)
1700 T = hdf5_to_julia(dset)
1701 _setindex!(dset, T, X, indices...)
1702 end
I need to triple the amount of logging I’m doing in this simulation, and that would make for a very slow simulation. Does anyone see the culprit here? Am I doing something wrong, or is there a general improvement we could make in HDF5.jl?