How to better initialize a dictionary containing multidimensional array?

Hi everyone, I want to create a Dict() object where the keys are NTuple{6, Int} and the corresponding values are rank-6 arrays. I found the following function to create such Dict object with 64 items are much slower than I expected:

function test_time()
a = Dict()
for k in Iterators.product([0:1, 0:1, 0:1, 0:1, 0:1, 0:1]…)
a[k] = Array{Float64}(undef, 16, 16, 16, 16, 16, 16)
end
return a
end

julia> @btime test_time();
1.371 s (497 allocations: 8.00 GiB)

In the function above, I add 64 key-value pairs sequentially. But a single array initialization only cost little time :

julia> @btime Array{Float64}(undef, 16, 16, 16, 16, 16, 16);
1.681 μs (2 allocations: 128.00 MiB)

And 1.371 s / 64 is much larger than 1.681 μs.

Can anybody help me with the problem, or provide better practices ?

Thanks in advance !

Instead of using Dict{Any, Any} you can create the Dict with the proper types:

function test_time_new()
a = Dict{NTuple{6, Int64},Array{Float64}}()
for k in Iterators.product([0:1, 0:1, 0:1, 0:1, 0:1, 0:1]...)
a[k] = Array{Float64}(undef, 16, 16, 16, 16, 16, 16)
end
return a
end
julia> @btime test_time();
  372.200 μs (564 allocations: 8.00 GiB)

julia> @btime test_time_new();
  256.400 μs (447 allocations: 8.00 GiB)

But I suspect that you may be able to overthink your data representation to gain much more performance.

1 Like

Further type restriction might help for down-stream tasks:

a = Dict{NTuple{6, Int64}, Array{Float64, 6}}()
                                       #^^^
1 Like
function test_time_new2()
a = Dict{NTuple{6, Int64},Array{Float64, 6}}()
for k in Iterators.product([0:1, 0:1, 0:1, 0:1, 0:1, 0:1]...)
a[k] = Array{Float64}(undef, 16, 16, 16, 16, 16, 16)
end
return a
end
julia> @btime test_time();
  268.700 μs (564 allocations: 8.00 GiB)

julia> @btime test_time_new();
  259.400 μs (447 allocations: 8.00 GiB)

julia> @btime test_time_new2();
  257.900 μs (447 allocations: 8.00 GiB)

My first timings were wrong, the function are too volatile to window switches and mouse movings…

1 Like

Thanks for your comments, your timing results are much more reasonable compared with mine, Is this a Julia version problem ? My version is 1.10.2

I observe similar timings on my laptop. Is there a chance that you don’t have enough available memory and your OS has to resort to the swap space? That might slow down the allocations.

2 Likes

I am not quite sure but it seems to be a version problem. I switch to Julia 1.7.2 on the same machine and the timing results looks good as @oheil. How can different versions result in such a huge difference if it is so :upside_down_face:

This is possible in general, but I lack the skill to connect your code to the former changes in versions.

My Julia version is 1.11.5

1 Like

Looking at the timing differences at OPs and mine (plenty of RAM) this seems to be reasonable.

Can you explain what you want to do downstream with this dictionary?

I see the same slow behavior on a machine with plenty of RAM (48 GB) and Julia 1.11.5.

julia> @b test_time()
315.842 ms (204 allocs: 8.000 GiB, 99.55% gc time, without a warmup)

julia> @b Array{Float64}(undef, 2^6, 16, 16, 16, 16, 16, 16)  # allocate all in one step
168.777 μs (3 allocs: 8.000 GiB, 81.04% gc time)

julia> @b begin GC.enable(false); test_time(); GC.enable(true) end
232.363 μs (204 allocs: 8.000 GiB)

Most of the time is spent on garbage collection. I wonder why.

1 Like