Unexpected allocations when accessing IdDict

StefanMathis · August 7, 2021, 7:45pm

Hello,

I found that accessing an IdDict can lead to memory allocations, while accessing a Dict does not. I first encountered this behaviour in combination with the Memoization.jl package, therefore the original thread can be found here: https://github.com/marius311/Memoization.jl/issues/19

However, as marius311 showed with an MWE, the allocations do not result from Memoization.jl, but rather from the get! method of the IdDict. Please find the MWE below:

const cache_dict = Dict{Tuple{Vector{Int},Int},Int}()
const cache_iddict = IdDict{Tuple{Vector{Int},Int},Int}()

# this is basically what @memoize expands to:
function func_dict(ctr, a)
    _getter() = a + 1
    get!(_getter, cache_dict, (ctr, a))
end

function func_iddict(ctr, a)
    _getter() = a + 1
    get!(_getter, cache_iddict, (ctr, a))
end

function main()
    ctr = Int[0]
    for i = 1:10
        println("Run $i")
        println("Using Dict: $(@allocated func_dict(ctr, 1))")
        println("Using IdDict: $(@allocated func_iddict(ctr, 1))")
    end
end

main()

When you run this MWE, you’ll find that after the initial compilation run, the IdDict variant allocates 64 bytes for i = 1 and 32 bytes for each subsequent loop run. The Dict variant on the other hand allocates no memory.

Is there an explanation for that?

jling · August 7, 2021, 7:48pm

I have also seen non-negligible allocation when use LRU cache I just assumed it’s the cost of function cache

tim.holy · August 7, 2021, 9:06pm

It might be just a gcframe from the string interpolation in the error message:
https://github.com/JuliaLang/julia/blob/0e8bb9596e29703e65d11214284e2fcac1b19bda/base/iddict.jl#L87

Maybe try changing that so it dispatches to a different function, e.g., https://github.com/JuliaLang/julia/blob/0cbb6850fb50b9a58778a4c0cb3a6b6004b2d8b3/base/intfuncs.jl#L54-L57

(You don’t need the @noinline anymore, Julia now handles that for you.)

StefanMathis · August 7, 2021, 9:21pm

This explanation makes sense for me, since I saw much higher allocations in my real-life application where the key is the ID of a struct instance.

Regarding the IdDict vs Dict issue, the comparison of the setindex! function for dict shows that no error message is provided for typed dicts, therefore it makes sense that no allocations happen here:
https://github.com/JuliaLang/julia/blob/0cbb6850fb50b9a58778a4c0cb3a6b6004b2d8b3/base/dict.jl#L380

When I find some time, I’ll try the solution with the error function. Out of curiosity: Why is @noinline not needed anymore?

Oscar_Smith · August 7, 2021, 9:22pm

The compiler was taught not to inline errors

StefanMathis · August 8, 2021, 9:27am

I created a pull request with a dedicated error function, see https://github.com/JuliaLang/julia/pull/41826

StefanMathis · August 16, 2021, 7:13am

Update: Apparently, the error string interpolation is not causing the interpolations. Any other ideas why these allocations happen?

kristoffer.carlsson · August 16, 2021, 7:54am

My guess is that it is the creation of the (ctr, a) tuple (on the heap) that doesn’t get elided because you need to actually form the object to compute its objectid.

Edit: This might not be true, cf the simpler example:

const cache_dict = Dict{Tuple{Vector{Int},Int},Int}()
const cache_iddict = IdDict{Tuple{Vector{Int},Int},Int}()

function func_dict(d)
    _getter() = 1
    get!(_getter, cache_dict, d)
end

function func_iddict(d)
    _getter() = 1
    get!(_getter, cache_iddict, d)
end

function main()
    ctr = Int[0]
    d = (ctr, 1)
    for i = 1:10
        println("Run $i")
        println("Using Dict: $(@allocated func_dict(d))")
        println("Using IdDict: $(@allocated func_iddict(d))")
    end
end

main()

StefanMathis · August 16, 2021, 11:37am

Please ignore the last post, it was stupid. Nevertheless, I still think that the implementation of get! is causing those allocations. Could the creation of val by the ccall on line 176 cause the allocations, assuming that the dict returns a tuple?

https://github.com/JuliaLang/julia/blob/b40ae6b79b51c9c83a947e4317e36f957c59ae0b/base/iddict.jl#L175

Edit: @kristoffer.carlsson is probably right with his assumption that the objectid calculation is what is actually causing the allocations. Consider an example where an integer is used as key:

const cache_dict = Dict{Int,Int}()
const cache_iddict = IdDict{Int,Int}()

function func_dict(d)
    _getter() = 1
    get!(_getter, cache_dict, d)
end

function func_iddict(d)
    _getter() = 1
    get!(_getter, cache_iddict, d)
end

function main()
    d = 1
    for i = 1:10
        println("Run $i")
        println("Using Dict: $(@allocated func_dict(d))")
        println("Using IdDict: $(@allocated func_iddict(d))")
    end
end

main()

as well as the calculation of the objectid of a tuple:

96 == @allocated objectid( (Int[0],1.0))

StefanMathis · August 30, 2021, 4:28pm

After some further testing, I found that @kristoffer.carlsson is definitely right about the issue. Consider the following MWE:

a = Int[0, 2, 3, 4]
b = Float64[1.0, 2.0, 45.0]
@allocated objectid((a,b))
@allocated objectid((a,b))

The first run results in 160 allocations, all subsequent ones in 32 allocations. I guess this is unavoidable due to the tuple formation. Therefore, I marked the answer of @kristoffer.carlsson as solution. Thanks to all for the discussion!

Topic		Replies	Views
Can we reuse Dictionary allocation? Performance question , performance	2	482	August 10, 2022
Lookup in Dict{Int,Float64} allocates Performance dictionary , memory-allocation	2	496	January 31, 2022
How can I pre-allocate a dict? General Usage	5	689	April 4, 2023
`IdDict{UInt64, Float64}` is much slower than `Dict{UInt64, Float64}` for retrieving values Performance dictionary	2	94	November 27, 2024
Performance of IdDict vs Dict Performance	13	1397	May 19, 2023

Unexpected allocations when accessing IdDict

Related topics