How to get values of dictionary by using array of keys?

rmsmsgood · December 26, 2021, 4:11pm

I want access to dictionary with vector, or array. If it possible natively, i want like below code:

julia> foo = Dict("Ac" => 4, "B" => 1, "d" => 2)
Dict{String, Int64} with 3 entries:
  "B"  => 1
  "Ac" => 4
  "d"  => 2

julia> bar = ["Ac", "Ac", "d"]
3-element Vector{String}:
 "Ac"
 "Ac"
 "d"

julia> foo[bar]
3-element Vector{Int64}:
 4
 4
 2

Of course if bar is array of integer, then i can use array of array but i need string index. Is there good solution?

pdeffebach · December 26, 2021, 4:15pm

julia> getindex.(Ref(foo), bar)
3-element Vector{Int64}:
 4
 4
 2

rocco_sprmnt21 · December 26, 2021, 6:35pm

get.([foo],bar,"na")

rafael.guerra · December 26, 2021, 7:50pm

@rocco_sprmnt21, returning an integer as default seems to further speed up your solution. Some type stability matter?

lungben · December 26, 2021, 7:51pm

Or

[foo[i] for i in bar]

rocco_sprmnt21 · December 26, 2021, 8:05pm

I was looking for an acronym for “not available” value.
I still don’t quite understand (euphemism) the type stability issues, so I don’t know how to generally take them into account.
In this case I could use the default “Inf” as an acronym for “index not found”

gustaphe · December 26, 2021, 9:26pm

Type stability means the function returns the same type every time (or rather the type of the output is entirely determined by the type of the inputs). Returning Inf won’t help, you’ll need an Int. -1 or 0 or typemin(Int) are typical choices. If those are meaningful values, maybe there is no good integer choice. If so, you can abandon type stability, I would probably go with missing.

rocco_sprmnt21 · December 27, 2021, 12:41pm

ah … here’s the problem that Inf is only defined for float numbers and not for integers too.
I had been told this, but I had forgotten it.
Maybe because it didn’t quite convince me why there is no Inf for integers.

rocco_sprmnt21 · December 27, 2021, 12:48pm

a small adaptation

[foo[k] for k in bar if k in keys(foo)]

rafael.guerra · December 27, 2021, 4:18pm

@rocco_sprmnt21, silent failures may not be a good idea. It might be better something like (improved with feedback from @gustaphe):

[haskey(foo, k) ? foo[k] : missing for k in bar]

gustaphe · December 27, 2021, 4:21pm

Then get.(Ref(foo), bar, missing) is nicer to me.

rafael.guerra · December 27, 2021, 4:22pm

Or get.((foo,), bar, missing)
They are nicer but benchmark here twice as slow as the comprehension.
What about on your end?

gustaphe · December 27, 2021, 4:26pm

Also haskey(y, x) is better than x in keys(y).

ptoche · December 30, 2021, 2:34am

That’s nice! The list comprehension is very natural in Python though, which can bias one’s appreciation.

Of dubious interest, but possibly worthy of note:

using BenchmarkTools
const dict = Dict(i => j for (j,i) in enumerate('A':'Z'))
const keys = collect('B':'Y')

@benchmark get.(Ref(dict), keys, missing)
BenchmarkTools.Trial: 10000 samples with 590 evaluations.
 Range (min … max):  200.120 ns …  1.306 μs  ┊ GC (min … max): 0.00% … 82.69%
 Time  (median):     210.534 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   219.662 ns ± 55.169 ns  ┊ GC (mean ± σ):  1.39% ±  4.74%

   ▃▇▇█▇▆▆▆▅▄▃▃▃▂▂▂▃▂▁                                         ▂
  ▇█████████████████████████▇█▇▇▇▇▅▇▇▇▇▆▇▆▅▇▅▅▅▆▅▅▄▅▅▅▇▆▆▆▅▄▅▄ █
  200 ns        Histogram: log(frequency) by time       313 ns <

 Memory estimate: 280 bytes, allocs estimate: 3.

@benchmark [haskey(dict, k) ? dict[k] : missing for k in keys] 
BenchmarkTools.Trial: 10000 samples with 257 evaluations.
 Range (min … max):  299.370 ns …   3.284 μs  ┊ GC (min … max): 0.00% … 89.86%
 Time  (median):     307.296 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   323.600 ns ± 101.617 ns  ┊ GC (mean ± σ):  1.00% ±  3.26%

  █▅▆▄▄▃▁ ▁▄▂▁▁                                                 ▁
  ███████▇███████▇▇▇▆▆▇▆▇█▇▆▅▅▆▅▆▆▆▆▆▆▅▅▃▆▆▅▅▅▅▅▄▃▃▂▄▄▄▂▄▄▄▄▃▄▄ █
  299 ns        Histogram: log(frequency) by time        555 ns <

 Memory estimate: 256 bytes, allocs estimate: 1.

gustaphe · December 30, 2021, 7:30am

But broadcasting is very natural in Julia, and faster in this case (indeed probably in most cases where it applies)

ptoche · December 30, 2021, 7:44am

Indeed, broadcasting is great. You’re right the median time is smaller, but there are 3 allocations vs 1 allocation with comprehension. Is that not a tie of sorts?

gustaphe · December 30, 2021, 10:42am

We should really be trying with a key vector containing some non-existing keys.

julia> using BenchmarkTools
julia> const dict = Dict(i => j for (j, i) in enumerate('A':'N'));
julia> const keys = 'H':'Z';
julia> @benchmark [haskey(dict, k) ? dict[k] : missing for k in keys]
BenchmarkTools.Trial: 10000 samples with 671 evaluations.
 Range (min … max):  185.680 ns … 890.450 ns  ┊ GC (min … max): 0.00% … 74.12%
 Time  (median):     188.527 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   195.079 ns ±  38.636 ns  ┊ GC (mean ± σ):  1.28% ±  5.32%

  ██▄▂▁                                                         ▂
  █████▇█▆▅▄▆▄▇▇▆▄▁▄▁▄▅▃▄▇▆▄▅▅▆▄▅▁▄▄▃▃▄▃▄▄▄▄▆▃▄▅▆▁▅▅▄▅▄▅▄▅▅▇█▆▇ █
  186 ns        Histogram: log(frequency) by time        346 ns <

 Memory estimate: 448 bytes, allocs estimate: 2.

julia> @benchmark get.(Ref(dict), keys, missing)
BenchmarkTools.Trial: 10000 samples with 651 evaluations.
 Range (min … max):  188.880 ns …  1.158 μs  ┊ GC (min … max): 0.00% … 82.22%
 Time  (median):     192.018 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   198.881 ns ± 57.219 ns  ┊ GC (mean ± σ):  2.24% ±  6.31%

  ▇█▅▁                                                         ▂
  ████▇▃▄▃▁▅▅▄▅▄▁▃▃▁▁▁▁▃▁▃▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▃▁▃▄▅▄▄▅▄▄▄▅▄▅▅▄▆▇▇▇ █
  189 ns        Histogram: log(frequency) by time       350 ns <

 Memory estimate: 472 bytes, allocs estimate: 4.

julia> @benchmark [get(dict, k, missing) for k in keys]
BenchmarkTools.Trial: 10000 samples with 788 evaluations.
 Range (min … max):  160.060 ns … 811.277 ns  ┊ GC (min … max): 0.00% … 69.18%
 Time  (median):     161.849 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   166.790 ns ±  35.890 ns  ┊ GC (mean ± σ):  1.66% ±  5.97%

  █▅▃                                                           ▁
  ███▇▅▆▇▅▃▃▃▃▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▃▁▁▁▁▁▃▃▁▁▄▅▄▄▄▄▃▅▄▅▆▇▅▇ █
  160 ns        Histogram: log(frequency) by time        316 ns <

 Memory estimate: 448 bytes, allocs estimate: 2.

julia> @benchmark Union{Int64, Missing}[get(dict, k, missing) for k in keys]
BenchmarkTools.Trial: 10000 samples with 903 evaluations.
 Range (min … max):  123.165 ns … 622.011 ns  ┊ GC (min … max): 0.00% … 77.72%
 Time  (median):     124.857 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   128.293 ns ±  26.097 ns  ┊ GC (mean ± σ):  1.25% ±  4.98%

  ▇█▄▃                                                          ▁
  ████▇▆▅▃▄▁▃▃▁▁▄▄▁▁▃▁▁▁▁▃▃▁▁▁▁▃▁▃▁▁▃▁▁▄▅▅▅▆▄▁▅▆▄▄▆▅▄▄▅▄▃▅▆▅▅▇█ █
  123 ns        Histogram: log(frequency) by time        210 ns <

 Memory estimate: 240 bytes, allocs estimate: 1.

The compromise method is actually the fastest one here. I’m a bit miffed that the more elegant call is worse from that perspective.

Topic		Replies	Views
Accessing multiple values of a dictionary General Usage proposal	5	7564	December 17, 2022
Is the way, how to get array of value from the Dict by array of keys? General Usage dictionary	2	533	May 30, 2020
Convert array to dictionary New to Julia	16	5421	September 26, 2018
How can I access multiple values of a dictionary using a tuple of keys? General Usage	5	2877	September 22, 2021
Use get! to get all elements in array New to Julia	15	985	August 26, 2019

How to get values of dictionary by using array of keys?

Related topics