How to get values of dictionary by using array of keys?

I want access to dictionary with vector, or array. If it possible natively, i want like below code:

julia> foo = Dict("Ac" => 4, "B" => 1, "d" => 2)
Dict{String, Int64} with 3 entries:
  "B"  => 1
  "Ac" => 4
  "d"  => 2

julia> bar = ["Ac", "Ac", "d"]
3-element Vector{String}:
 "Ac"
 "Ac"
 "d"

julia> foo[bar]
3-element Vector{Int64}:
 4
 4
 2

Of course if bar is array of integer, then i can use array of array but i need string index. Is there good solution?

1 Like
julia> getindex.(Ref(foo), bar)
3-element Vector{Int64}:
 4
 4
 2
3 Likes
get.([foo],bar,"na")
1 Like

@rocco_sprmnt21, returning an integer as default seems to further speed up your solution. Some type stability matter?

Or

[foo[i] for i in bar]
3 Likes

I was looking for an acronym for β€œnot available” value.
I still don’t quite understand (euphemism) the type stability issues, so I don’t know how to generally take them into account.
In this case I could use the default β€œInf” as an acronym for β€œindex not found” :smiley:

Type stability means the function returns the same type every time (or rather the type of the output is entirely determined by the type of the inputs). Returning Inf won’t help, you’ll need an Int. -1 or 0 or typemin(Int) are typical choices. If those are meaningful values, maybe there is no good integer choice. If so, you can abandon type stability, I would probably go with missing.

1 Like

ah … here’s the problem that Inf is only defined for float numbers and not for integers too.
I had been told this, but I had forgotten it.
Maybe because it didn’t quite convince me why there is no Inf for integers.

a small adaptation

[foo[k] for k in bar if k in keys(foo)]
1 Like

@rocco_sprmnt21, silent failures may not be a good idea. It might be better something like (improved with feedback from @gustaphe):

[haskey(foo, k) ? foo[k] : missing for k in bar] 
1 Like

Then get.(Ref(foo), bar, missing) is nicer to me.

Or get.((foo,), bar, missing)
They are nicer but benchmark here twice as slow as the comprehension.
What about on your end?

Also haskey(y, x) is better than x in keys(y).

2 Likes

That’s nice! :grinning: The list comprehension is very natural in Python though, which can bias one’s appreciation.

Of dubious interest, but possibly worthy of note:

using BenchmarkTools
const dict = Dict(i => j for (j,i) in enumerate('A':'Z'))
const keys = collect('B':'Y')

@benchmark get.(Ref(dict), keys, missing)
BenchmarkTools.Trial: 10000 samples with 590 evaluations.
 Range (min … max):  200.120 ns …  1.306 ΞΌs  β”Š GC (min … max): 0.00% … 82.69%
 Time  (median):     210.534 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   219.662 ns Β± 55.169 ns  β”Š GC (mean Β± Οƒ):  1.39% Β±  4.74%

   β–ƒβ–‡β–‡β–ˆβ–‡β–†β–†β–†β–…β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–ƒβ–‚β–                                         β–‚
  β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–‡β–‡β–‡β–‡β–…β–‡β–‡β–‡β–‡β–†β–‡β–†β–…β–‡β–…β–…β–…β–†β–…β–…β–„β–…β–…β–…β–‡β–†β–†β–†β–…β–„β–…β–„ β–ˆ
  200 ns        Histogram: log(frequency) by time       313 ns <

 Memory estimate: 280 bytes, allocs estimate: 3.

@benchmark [haskey(dict, k) ? dict[k] : missing for k in keys] 
BenchmarkTools.Trial: 10000 samples with 257 evaluations.
 Range (min … max):  299.370 ns …   3.284 ΞΌs  β”Š GC (min … max): 0.00% … 89.86%
 Time  (median):     307.296 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   323.600 ns Β± 101.617 ns  β”Š GC (mean Β± Οƒ):  1.00% Β±  3.26%

  β–ˆβ–…β–†β–„β–„β–ƒβ– ▁▄▂▁▁                                                 ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–†β–†β–‡β–†β–‡β–ˆβ–‡β–†β–…β–…β–†β–…β–†β–†β–†β–†β–†β–†β–…β–…β–ƒβ–†β–†β–…β–…β–…β–…β–…β–„β–ƒβ–ƒβ–‚β–„β–„β–„β–‚β–„β–„β–„β–„β–ƒβ–„β–„ β–ˆ
  299 ns        Histogram: log(frequency) by time        555 ns <

 Memory estimate: 256 bytes, allocs estimate: 1.
3 Likes

But broadcasting is very natural in Julia, and faster in this case (indeed probably in most cases where it applies) :smiley:

1 Like

Indeed, broadcasting is great. You’re right the median time is smaller, but there are 3 allocations vs 1 allocation with comprehension. Is that not a tie of sorts?

We should really be trying with a key vector containing some non-existing keys.

julia> using BenchmarkTools
julia> const dict = Dict(i => j for (j, i) in enumerate('A':'N'));
julia> const keys = 'H':'Z';
julia> @benchmark [haskey(dict, k) ? dict[k] : missing for k in keys]
BenchmarkTools.Trial: 10000 samples with 671 evaluations.
 Range (min … max):  185.680 ns … 890.450 ns  β”Š GC (min … max): 0.00% … 74.12%
 Time  (median):     188.527 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   195.079 ns Β±  38.636 ns  β”Š GC (mean Β± Οƒ):  1.28% Β±  5.32%

  β–ˆβ–ˆβ–„β–‚β–                                                         β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–†β–…β–„β–†β–„β–‡β–‡β–†β–„β–β–„β–β–„β–…β–ƒβ–„β–‡β–†β–„β–…β–…β–†β–„β–…β–β–„β–„β–ƒβ–ƒβ–„β–ƒβ–„β–„β–„β–„β–†β–ƒβ–„β–…β–†β–β–…β–…β–„β–…β–„β–…β–„β–…β–…β–‡β–ˆβ–†β–‡ β–ˆ
  186 ns        Histogram: log(frequency) by time        346 ns <

 Memory estimate: 448 bytes, allocs estimate: 2.

julia> @benchmark get.(Ref(dict), keys, missing)
BenchmarkTools.Trial: 10000 samples with 651 evaluations.
 Range (min … max):  188.880 ns …  1.158 ΞΌs  β”Š GC (min … max): 0.00% … 82.22%
 Time  (median):     192.018 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   198.881 ns Β± 57.219 ns  β”Š GC (mean Β± Οƒ):  2.24% Β±  6.31%

  β–‡β–ˆβ–…β–                                                         β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ƒβ–„β–ƒβ–β–…β–…β–„β–…β–„β–β–ƒβ–ƒβ–β–β–β–β–ƒβ–β–ƒβ–β–β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–β–ƒβ–„β–…β–„β–„β–…β–„β–„β–„β–…β–„β–…β–…β–„β–†β–‡β–‡β–‡ β–ˆ
  189 ns        Histogram: log(frequency) by time       350 ns <

 Memory estimate: 472 bytes, allocs estimate: 4.

julia> @benchmark [get(dict, k, missing) for k in keys]
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
 Range (min … max):  160.060 ns … 811.277 ns  β”Š GC (min … max): 0.00% … 69.18%
 Time  (median):     161.849 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   166.790 ns Β±  35.890 ns  β”Š GC (mean Β± Οƒ):  1.66% Β±  5.97%

  β–ˆβ–…β–ƒ                                                           ▁
  β–ˆβ–ˆβ–ˆβ–‡β–…β–†β–‡β–…β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–β–β–β–β–β–β–ƒβ–β–β–β–β–β–ƒβ–ƒβ–β–β–„β–…β–„β–„β–„β–„β–ƒβ–…β–„β–…β–†β–‡β–…β–‡ β–ˆ
  160 ns        Histogram: log(frequency) by time        316 ns <

 Memory estimate: 448 bytes, allocs estimate: 2.

julia> @benchmark Union{Int64, Missing}[get(dict, k, missing) for k in keys]
BenchmarkTools.Trial: 10000 samples with 903 evaluations.
 Range (min … max):  123.165 ns … 622.011 ns  β”Š GC (min … max): 0.00% … 77.72%
 Time  (median):     124.857 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   128.293 ns Β±  26.097 ns  β”Š GC (mean Β± Οƒ):  1.25% Β±  4.98%

  β–‡β–ˆβ–„β–ƒ                                                          ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–ƒβ–„β–β–ƒβ–ƒβ–β–β–„β–„β–β–β–ƒβ–β–β–β–β–ƒβ–ƒβ–β–β–β–β–ƒβ–β–ƒβ–β–β–ƒβ–β–β–„β–…β–…β–…β–†β–„β–β–…β–†β–„β–„β–†β–…β–„β–„β–…β–„β–ƒβ–…β–†β–…β–…β–‡β–ˆ β–ˆ
  123 ns        Histogram: log(frequency) by time        210 ns <

 Memory estimate: 240 bytes, allocs estimate: 1.

The compromise method is actually the fastest one here. I’m a bit miffed that the more elegant call is worse from that perspective.

1 Like