Dict loaded from jld file very long time... Why 22 seconds?

My Dict
julia> DTT_HH
Dict{Int64,Any} with 381701 entries:
61670 => 4150042
119601 => 5600255
136672 => 8300228
231648 => 5600221
? => ?

julia> save(“TT_Dicts.jld”,“DTT_HH”,DTT_HH)

julia> @time x=load(“TT_Dicts.jld”,“DTT_HH”)
22.373567 seconds (10.89 M allocations: 476.183 MB, 9.09% gc time)
Dict{Int64,Any} with 381701 entries:
247825 => 6300114
43031 => 7500068
117566 => 5700126
231648 => 5600221
? => ?

Similar Dict is loading in 0.2 sek

julia> D=Dict(zip([1:1:500000;],rand(500000)))
Dict{Int64,Float64} with 500000 entries:
247825 => 0.40960171167545956
450728 => 0.7363745698597213
43031 => 0.8869847906671853
253410 => 0.03773283268230743
? => ?

julia> save(“Dict.jld”,“D”,D)

julia> @time D1=load(“Dict.jld”,“D”)
0.281323 seconds (563 allocations: 33.155 MB, 77.93% gc time)
Dict{Int64,Float64} with 500000 entries:
247825 => 0.40960171167545956
450728 => 0.7363745698597213
43031 => 0.8869847906671853
349542 => 0.507127647847206
? => ?

julia> DTT_HH
Dict{Int64,Any} with 381701 entries:

what to do with Any Array ?
Any idea ?
Paul

Maybe. What happens if you make that same dictionary be strictly typed?

W dniu 2017-01-21 21:18, Christopher Rackauckas pisze:

What happens if you make that same dictionary be strictly typed?

Sorry, “strictly typed” I understand , What do You mean ?
Paul

The Dict{K,V} notation means its keys have type K and its values have type V. In your slow example, Dict{Int64,Any}, the keys are Int64 but there is no available type information about the values (hence they are recorded as the Any type). It may perform faster if, instead of Any, the values are all of the same concrete type and this is known to the compiler. Notice that’s what happened in your fast-performing example Dict{Int64,Float64}, all the values were guaranteed to be of the same type Float64. This situation is called “strictly-typed”, unlike the case where values have Any type. Having strict type information can allow Julia to make significant optimisations.

1 Like

W dniu 2017-01-22 12:24, felix pisze:

[felix] felix http://discourse.julialang.org/users/felix
January 22

The |Dict{K,V}| notation means its keys have type |K| and its values
have type |V|. In your slow example, |Dict{Int64,Any}|, the keys are
|Int64| but there is no available type information about the values
(hence they are recorded as the |Any| type). It may perform faster if,
instead of |Any|, the values are all of the same concrete type /and
this is known to the compiler/. Notice that’s what happened in your
fast-performing example |Dict{Int64,Float64}|, all the values were
guaranteed to be of the same type |Float64|. This situation is called
“strictly-typed”, unlike the case where values have |Any| type. Having
strict type information can allow Julia to make significant optimisations.


Visit Topic
http://discourse.julialang.org/t/dict-loaded-from-jld-file-very-long-time-why-22-seconds/1629/5
or reply to this email to respond.


    In Reply To

[programista] programista
http://discourse.julialang.org/users/programista
January 22

W dniu 2017-01-21 21:18, Christopher Rackauckas pisze: What happens if
you make that same dictionary be strictly typed? Sorry, “strictly
typed” I understand , What do You mean ? Paul


Visit Topic
http://discourse.julialang.org/t/dict-loaded-from-jld-file-very-long-time-why-22-seconds/1629/5
or reply to this email to respond.

To unsubscribe from these emails, click here
http://discourse.julialang.org/email/unsubscribe/e6178c85c63318f02dd27acb2edeee161ce9b8cd1fbfce1151c2c2b64ccb143e.

Ok,
But What to do if I have valuse as string?

julia> TT_baza=readcsv(“HH_baza/TT_baza.txt”)
381701x3 Array{Any,2}:
1 100139 “T1_0”
1 100143 “T1_0”
1 100147 “T1_0”

julia> @time DTT=Dict(zip(1:size(TT_baza,1),TT_baza[:,3]))
0.765525 seconds (2.82 M allocations: 93.261 MB, 9.13% gc time)
Dict{Int64,Any} with 381701 entries:
247825 => “T1_0”
43031 => “T2_0”
349542 => “T1_0”

Paul

That array is still typed with Any

I ran into a similar problem with JLD, it took over 20 minutes to load my data. I just wanted to leave this here if anybody runs into something similar, with JLD2 it only took 50 seconds to load the same data.

using JLD2
@time load("outfiles/aggregated_results.jld2")
 49.773390 seconds (9.72 M allocations: 2.651 GiB, 7.02% gc time)
Dict{String,Any} with 7 entries: