Help understanding Dict allocation

LeoK987 · October 3, 2018, 3:45am

I amd trying to find out about a huge allocation in my code (with --track-allocation=user):

      608     ValueDict = Dict{Function, Array{Float64,1}}()
        0     for atom in Atoms
        -         values = zeros(Float64, iEnd)
        0         AtomValues(values, atom, iStart, iEnd)
8347801470         merge!(ValueDict, Dict((atom)=>values))
       -     end

So I made a simple test:

        - function f1(i=50)   # testing
        -     1
        - end
        - 
        - function f2(i=50)   # testing
        -     1
        - end
        - 
        - function testDict()
   800080 	ar = [1.:100000.;]
      736 	fDict = Dict{Function, Array{Float64,1}}()
        - 	
        0 	dicf = Dict(f1=>ar)
  2153071 	merge!(fDict, dicf)
        0 	dicf = Dict(f2=>ar)
  2152399 	merge!(fDict, dicf)
        - 	
  7435020 	println(typeof(dicf))
        - 	
    18512 	aF = [f1, f2]
  2411338 	println(typeof(f1), " ", typeof(aF[1]))
        - 	
   463126  	dicf1 = Dict(aF[1]=>ar)
      336 	merge!(fDict, dicf1)
   306246  	dicf1 = Dict(aF[2]=>ar)
      336 	merge!(fDict, dicf1)
        - 
     2144 	println(typeof(dicf1))
        - 	
        0 	for f in aF
      992 	 	dicf2 = Dict(f=>ar)
        0 		merge!(fDict, dicf2)
     3888 		println(typeof(dicf2))
        - 	end
        - end

It makes me more confused. Can anyone explain about the different allocations in the above simple code?

mauro3 · October 3, 2018, 6:01am

Presumably you are not interested in the memory cost of compilation:

        - function f1(i=50)   # testing
        -     1
        - end
        - 
        - function f2(i=50)   # testing
        -     1
        - end
        - 
        - function testDict()
   802160 	ar = [1.:100000.;]
      608 	fDict = Dict{Function, Array{Float64,1}}()
        - 
        0 	dicf = Dict(f1=>ar)
      480 	merge!(fDict, dicf)
        0 	dicf = Dict(f2=>ar)
      480 	merge!(fDict, dicf)
        - 
     1744 	println(typeof(dicf))
        - 
      112 	aF = [f1, f2]
     1936 	println(typeof(f1), " ", typeof(aF[1]))
        - 
      496  	dicf1 = Dict(aF[1]=>ar)
        0 	merge!(fDict, dicf1)
      496  	dicf1 = Dict(aF[2]=>ar)
        0 	merge!(fDict, dicf1)
        - 
     1744 	println(typeof(dicf1))
        - 
        0 	for f in aF
      992 	 	dicf2 = Dict(f=>ar)
        0 		merge!(fDict, dicf2)
     3888 		println(typeof(dicf2))
        - 	end
        - end
        - 
        - using Profile
        - testDict()
        - Profile.clear_malloc_data()
        - 
        - testDict()
        -

although a few places are still a bit odd.

foobar_lv2 · October 3, 2018, 10:40am

Use Valuedict[atom]=values instead of merge!(ValueDict, Dict((atom)=>values)). The latter allocates an entire new Dict for every guy you want to put in.

A merge-combine that uses only a single hash eval is unfortunately missing in Base, but your combine function is “overwrite” anyway, so you get away with setindex!.

Also, are you sure that you can’t make your keys concretely typed? A Dict{Function, Foo_T} is barely better than a Dict{Any, Foo_T}.

LeoK987 · October 3, 2018, 11:11am

Thanks to all for the tips.

@foobar_lv2, I want to use the functions like f1, f2 etc. How can I be more concrete than using “Function”?

foobar_lv2 · October 3, 2018, 11:20am

Are your functions really “function functions” or are they closures (a single parametrized function)?

If they are closures, then you can define

struct Foo_func{T1,T2,T3}<:Function
param1::T1
param2::T2 
param3::T3 
end 

function (foo::Foo_func)(x)
#do something with x and foo.param1, foo.param2, foo.param3
end

valdict=Dict{Foo_func{Float64, Bool, Int32}, Array{Float64, 1}}()

PS: Regarding functions as arguments: Think very well on whether you want to dispatch on

function bar(foo::Function, x)
#....
end

or

function bar(foo:T, x) where {T<:Function}
#....
end

Both are semantically the same, but the second version tells julia to always specialize on the specific function, and the first allows julia to use heuristics to decide whether to specialize or not. That can save compile time, but can cost runtime. @code_native, @code_llvm, @code_warntype etc will lie to you if you use the first definition: They will tell you the code for the second definition while julia will in reality compile the first version (they will show you the fully realized method instead of the really existing method). This makes for very unfun profiling sessions.

mauro3 · October 3, 2018, 11:46am

Both are semantically the same, but the second version tells julia to always specialize on the specific function, and the first allows julia to use heuristics to decide whether to specialize or not. That can save compile time, but can cost runtime. @code_native , @code_llvm , @code_warntype etc will lie to you if you use the first definition: They will tell you the code for the second definition while julia will in reality compile the first version (they will show you the fully realized method instead of the really existing method). This makes for very unfun profiling sessions.

Do you have an example of that? Or a reference?

foobar_lv2 · October 3, 2018, 11:54am

https://github.com/JuliaLang/julia/blob/master/doc/src/devdocs/functions.md

I ran into this issue several times on 0.6, and will test more on 1.0 later. The simplest inlined cases apparently get correctly inferred in 1.0.

mauro3 · October 3, 2018, 12:08pm

Interesting read, thanks!

Topic		Replies	Views
Allocations in the loop New to Julia	4	411	January 27, 2022
Lookup in Dict{Int,Float64} allocates Performance dictionary , memory-allocation	2	507	January 31, 2022
Fast Dict value modification by accessing Dict.vals Performance	16	3662	March 14, 2019
Unexpected allocations when accessing IdDict Performance	9	639	August 30, 2021
Stack overflow when merging 50000 dicts New to Julia	35	2774	May 20, 2018

Help understanding Dict allocation

Related topics