# Collect values in a dict

I have a list of pairs of dog-owners and dog-names.
The dog-names are non-unique, and I want to collect a dict keyed on the dog-names where the values are the owners having a dog of that name.

Here’s an example using numbers to make it simple:

Given this:

``````
julia> l = [x => x%3 for x in 1:10]
10-element Vector{Pair{Int64, Int64}}:
1 => 1
2 => 2
3 => 0
4 => 1
5 => 2
6 => 0
7 => 1
8 => 2
9 => 0
10 => 1
``````

I want this:

``````julia> d
Dict{Int64, Vector{Int64}} with 3 entries:
0 => [3, 6, 9]
2 => [2, 5, 8]
1 => [1, 4, 7, 10]
``````

Here’s a way to do it, but I find it complicated and hard to read:

``````l = [x => x%3 for x in 1:10]
d = Dict{Int, Vector{Int}}()

for e in l
push!(d, e[2] => push!(get(d,e[2],Int[]), e[1]))
end
``````

What is the “idiomatic” way of doing this?

Thanks a lot.

Not sure if it is more “idiomatic”, but I find that `get!` is a very useful function for things like this:

``````for (x,y) in l
v = get!(d, y) do
Int[]
end
push!(v, x)
end
``````
1 Like

Thanks @Per ,

The combination of get! and push! is indeed very nice.

I actually find it easier to read without the temporary `v`:

``````l = [x => x%3 for x in 1:10];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(d,y,Int[]), x)
end
d

Dict{Int64, Vector{Int64}} with 3 entries:
0 => [3, 6, 9]
2 => [2, 5, 8]
1 => [1, 4, 7, 10]
``````

Or even:

``````l = [x => x%3 for x in 1:10];
d = Dict{Int, Vector{Int}}()
[push!(get!(d,y,Int[]), x) for (x,y) in l];
d

Dict{Int64, Vector{Int64}} with 3 entries:
0 => [3, 6, 9]
2 => [2, 5, 8]
1 => [1, 4, 7, 10]
``````

I’ll wait a bit marking yours as Solution to see if others have alternatives.

In your real example is l really a list or a dict? I guess it does not matter too much.

I will provide a ‘nastier’ (and probably slower) solution for the existing suggestion to shine

``````ldict = Dict(x => x%3 for x in 1:10)

d = Dict{Int, Vector{Int}}()
vals = collect(values(ldict))
ks = collect(keys(ldict))
for v in unique(vals)
d[v] = ks[findall(isequal(v),vals)]
end
d
``````

This version allocates a new empty `Int[]` array on every iteration regardless of whether it is actually needed.

If you don’t want to define a variable `v`, you can still do:

``````push(get!(d, y) do; Int[]; end, x)
``````

or equivalently `push(get!(() -> Int[], d, y), x)` (skipping the `do` syntax).

In addition to allocating an `Int[]` array on each iteration, this version also allocates an array of the results of the comprehension (a list of `push!` results), even though you don’t need it. Don’t use comprehensions to write loops if you don’t want to construct an array of results.

I think it’s a bit nicer with a `DefaultDict` rather than explicit `get!`:

``````using DataStructures
d = DefaultDict(()->Int[])
for (k,v) in l
push!(d[v], k)
end
``````
1 Like

Or (my favorite) `push(get!(Vector{Int} , d, y), x)`

5 Likes

Thank you all!

For my current application this is not performance critical, but I guess this being Julia, the idiomatic way is the performant one .

So I benchmarked all of the proposed solutions (on 1000 elements in 20 classes).

The solutions fall into a number of categories:

• The 3 solutions I proposed are all taking more than 50 µs and 1100 allocations
• The solutions by @Per , @stevengj , @kristoffer.carlsson using a function to provide defaults, are the most efficient at 25 µs and 128 allocations
• The solution by @yha using DefaultDict takes about the same time as mine, but half the allocations.
• The solution by @bernhard using `findall` takes 92 µs and 188 allocations (and is similar to how I would probably have done in R),

I’ll try to remember the one by @kristoffer.carlsson , which looks “right” to me.

It of course also works in a comprehension, though that does have a bit of a bad smell:

``````l = [x => x%3 for x in 1:10];
d = Dict{Int, Vector{Int}}()
[ push!(get!(Vector{Int} , d, y), x) for (x,y) in l ];
d
``````

Thanks again, below is the code and benchmark numbers.

Code
``````using  DataStructures, BenchmarkTools

function f1(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for e in l
push!(d, e[2] => push!(get(d,e[2],Int[]), e[1]))
end
d
end

function f2(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
v = get!(d, y) do
Int[]
end
push!(v, x)
end
d
end

function f3(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(d,y,Int[]), x)
end
d
end

function f4(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
[push!(get!(d,y,Int[]), x) for (x,y) in l];
d
end

function f5(;cases=10, classes = 3)
ldict = Dict(x => x % classes for x in 1:cases)

d = Dict{Int, Vector{Int}}()
vals = collect(values(ldict))
ks = collect(keys(ldict))
for v in unique(vals)
d[v] = ks[findall(isequal(v),vals)]
end
d
end

function f6(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(() -> Int[], d, y), x)
end
d
end

function f7(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = DefaultDict(()->Int[])
for (k,v) in l
push!(d[v], k)
end
d
end

function f8(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(Vector{Int} , d, y), x)
end
d
end

@info "Original solution"
@btime f1(cases = 1000; classes = 20)
@info "Per"
@btime f2(cases = 1000; classes = 20)
@info "tp2750 v2"
@btime f3(cases = 1000; classes = 20)
@info "tp2750 v3"
@btime f4(cases = 1000; classes = 20)
@info "berhard"
@btime f5(cases = 1000; classes = 20)
@info "stevengj"
@btime f6(cases = 1000; classes = 20)
@info "yha"
@btime f7(cases = 1000; classes = 20)
@info "kristoffer.carlsson"
@btime f8(cases = 1000; classes = 20)

[ Info: Original solution
55.791 μs (1108 allocations: 117.09 KiB)
[ Info: Per
24.631 μs (128 allocations: 40.53 KiB)
[ Info: tp2750 v2
50.627 μs (1108 allocations: 117.09 KiB)
[ Info: tp2750 v3
52.914 μs (1109 allocations: 125.03 KiB)
[ Info: berhard
92.287 μs (188 allocations: 88.67 KiB)
[ Info: stevengj
24.479 μs (128 allocations: 40.53 KiB)
[ Info: yha
53.809 μs (617 allocations: 48.17 KiB)
[ Info: kristoffer.carlsson
24.647 μs (128 allocations: 40.53 KiB)
``````

The difference here is due to the `DefaultDict` being untyped, which is compared to a typed dict in the other benchmarks. If you use `d = DefaultDict{Int,Vector{Int}}(()->Int[])` it performs the same as the `get!`-based solutions on my machine.
Surprisingly, I see that `DefaultDict{Int,Vector{Int}}(Vector{Int})` performs worse, which doesn’t happen with `get!(Vector{Int}...)`.

1 Like

Thank you for the clarification @yha

DefaultDict definitely gives the shortest and cleanest solution.