I want access to dictionary with vector, or array. If it possible natively, i want like below code:
julia> foo = Dict("Ac" => 4, "B" => 1, "d" => 2)
Dict{String, Int64} with 3 entries:
"B" => 1
"Ac" => 4
"d" => 2
julia> bar = ["Ac", "Ac", "d"]
3-element Vector{String}:
"Ac"
"Ac"
"d"
julia> foo[bar]
3-element Vector{Int64}:
4
4
2
Of course if bar
is array of integer, then i can use array of array but i need string index. Is there good solution?
1 Like
julia> getindex.(Ref(foo), bar)
3-element Vector{Int64}:
4
4
2
3 Likes
@rocco_sprmnt21 , returning an integer as default seems to further speed up your solution. Some type stability matter?
I was looking for an acronym for βnot availableβ value.
I still donβt quite understand (euphemism) the type stability issues, so I donβt know how to generally take them into account.
In this case I could use the default βInfβ as an acronym for βindex not foundβ
Type stability means the function returns the same type every time (or rather the type of the output is entirely determined by the type of the inputs). Returning Inf
wonβt help, youβll need an Int
. -1
or 0
or typemin(Int)
are typical choices. If those are meaningful values, maybe there is no good integer choice. If so, you can abandon type stability, I would probably go with missing
.
1 Like
ah β¦ hereβs the problem that Inf is only defined for float numbers and not for integers too.
I had been told this, but I had forgotten it.
Maybe because it didnβt quite convince me why there is no Inf for integers.
a small adaptation
[foo[k] for k in bar if k in keys(foo)]
1 Like
@rocco_sprmnt21 , silent failures may not be a good idea. It might be better something like (improved with feedback from @gustaphe ):
[haskey(foo, k) ? foo[k] : missing for k in bar]
1 Like
Then get.(Ref(foo), bar, missing)
is nicer to me.
Or get.((foo,), bar, missing)
They are nicer but benchmark here twice as slow as the comprehension.
What about on your end?
Also haskey(y, x)
is better than x in keys(y)
.
2 Likes
ptoche
December 30, 2021, 2:34am
14
Thatβs nice! The list comprehension is very natural in Python though, which can bias oneβs appreciation.
Of dubious interest, but possibly worthy of note:
using BenchmarkTools
const dict = Dict(i => j for (j,i) in enumerate('A':'Z'))
const keys = collect('B':'Y')
@benchmark get.(Ref(dict), keys, missing)
BenchmarkTools.Trial: 10000 samples with 590 evaluations.
Range (min β¦ max): 200.120 ns β¦ 1.306 ΞΌs β GC (min β¦ max): 0.00% β¦ 82.69%
Time (median): 210.534 ns β GC (median): 0.00%
Time (mean Β± Ο): 219.662 ns Β± 55.169 ns β GC (mean Β± Ο): 1.39% Β± 4.74%
βββββββββ
ββββββββββ β
βββββββββββββββββββββββββββββββββ
ββββββββ
ββ
β
β
ββ
β
ββ
β
β
βββββ
ββ
β β
200 ns Histogram: log(frequency) by time 313 ns <
Memory estimate: 280 bytes, allocs estimate: 3.
@benchmark [haskey(dict, k) ? dict[k] : missing for k in keys]
BenchmarkTools.Trial: 10000 samples with 257 evaluations.
Range (min β¦ max): 299.370 ns β¦ 3.284 ΞΌs β GC (min β¦ max): 0.00% β¦ 89.86%
Time (median): 307.296 ns β GC (median): 0.00%
Time (mean Β± Ο): 323.600 ns Β± 101.617 ns β GC (mean Β± Ο): 1.00% Β± 3.26%
ββ
βββββ βββββ β
βββββββββββββββββββββββββββ
β
ββ
βββββββ
β
ββββ
β
β
β
β
βββββββββββββββ β
299 ns Histogram: log(frequency) by time 555 ns <
Memory estimate: 256 bytes, allocs estimate: 1.
3 Likes
But broadcasting is very natural in Julia, and faster in this case (indeed probably in most cases where it applies)
1 Like
ptoche
December 30, 2021, 7:44am
16
Indeed, broadcasting is great. Youβre right the median time is smaller, but there are 3 allocations vs 1 allocation with comprehension. Is that not a tie of sorts?
We should really be trying with a key vector containing some non-existing keys.
julia> using BenchmarkTools
julia> const dict = Dict(i => j for (j, i) in enumerate('A':'N'));
julia> const keys = 'H':'Z';
julia> @benchmark [haskey(dict, k) ? dict[k] : missing for k in keys]
BenchmarkTools.Trial: 10000 samples with 671 evaluations.
Range (min β¦ max): 185.680 ns β¦ 890.450 ns β GC (min β¦ max): 0.00% β¦ 74.12%
Time (median): 188.527 ns β GC (median): 0.00%
Time (mean Β± Ο): 195.079 ns Β± 38.636 ns β GC (mean Β± Ο): 1.28% Β± 5.32%
βββββ β
βββββββββ
ββββββββββββ
ββββββ
β
βββ
βββββββββββββββ
βββ
β
ββ
ββ
ββ
β
ββββ β
186 ns Histogram: log(frequency) by time 346 ns <
Memory estimate: 448 bytes, allocs estimate: 2.
julia> @benchmark get.(Ref(dict), keys, missing)
BenchmarkTools.Trial: 10000 samples with 651 evaluations.
Range (min β¦ max): 188.880 ns β¦ 1.158 ΞΌs β GC (min β¦ max): 0.00% β¦ 82.22%
Time (median): 192.018 ns β GC (median): 0.00%
Time (mean Β± Ο): 198.881 ns Β± 57.219 ns β GC (mean Β± Ο): 2.24% Β± 6.31%
βββ
β β
ββββββββββ
β
ββ
ββββββββββββββββββββββββββββββββ
βββ
ββββ
ββ
β
βββββ β
189 ns Histogram: log(frequency) by time 350 ns <
Memory estimate: 472 bytes, allocs estimate: 4.
julia> @benchmark [get(dict, k, missing) for k in keys]
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
Range (min β¦ max): 160.060 ns β¦ 811.277 ns β GC (min β¦ max): 0.00% β¦ 69.18%
Time (median): 161.849 ns β GC (median): 0.00%
Time (mean Β± Ο): 166.790 ns Β± 35.890 ns β GC (mean Β± Ο): 1.66% Β± 5.97%
ββ
β β
βββββ
βββ
βββββββββββββββββββββββββββββββββββββββββ
ββββββ
ββ
βββ
β β
160 ns Histogram: log(frequency) by time 316 ns <
Memory estimate: 448 bytes, allocs estimate: 2.
julia> @benchmark Union{Int64, Missing}[get(dict, k, missing) for k in keys]
BenchmarkTools.Trial: 10000 samples with 903 evaluations.
Range (min β¦ max): 123.165 ns β¦ 622.011 ns β GC (min β¦ max): 0.00% β¦ 77.72%
Time (median): 124.857 ns β GC (median): 0.00%
Time (mean Β± Ο): 128.293 ns Β± 26.097 ns β GC (mean Β± Ο): 1.25% Β± 4.98%
ββββ β
βββββββ
ββββββββββββββββββββββββββββββββ
β
β
ββββ
βββββ
βββ
βββ
ββ
β
ββ β
123 ns Histogram: log(frequency) by time 210 ns <
Memory estimate: 240 bytes, allocs estimate: 1.
The compromise method is actually the fastest one here. Iβm a bit miffed that the more elegant call is worse from that perspective.
1 Like