Thank you all!
For my current application this is not performance critical, but I guess this being Julia, the idiomatic way is the performant one .
So I benchmarked all of the proposed solutions (on 1000 elements in 20 classes).
The solutions fall into a number of categories:
- The 3 solutions I proposed are all taking more than 50 µs and 1100 allocations
- The solutions by @Per , @stevengj , @kristoffer.carlsson using a function to provide defaults, are the most efficient at 25 µs and 128 allocations
- The solution by @yha using DefaultDict takes about the same time as mine, but half the allocations.
- The solution by @bernhard using
findall
takes 92 µs and 188 allocations (and is similar to how I would probably have done in R),
I’ll try to remember the one by @kristoffer.carlsson , which looks “right” to me.
It of course also works in a comprehension, though that does have a bit of a bad smell:
l = [x => x%3 for x in 1:10];
d = Dict{Int, Vector{Int}}()
[ push!(get!(Vector{Int} , d, y), x) for (x,y) in l ];
d
Thanks again, below is the code and benchmark numbers.
Code
using DataStructures, BenchmarkTools
function f1(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for e in l
push!(d, e[2] => push!(get(d,e[2],Int[]), e[1]))
end
d
end
function f2(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
v = get!(d, y) do
Int[]
end
push!(v, x)
end
d
end
function f3(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(d,y,Int[]), x)
end
d
end
function f4(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
[push!(get!(d,y,Int[]), x) for (x,y) in l];
d
end
function f5(;cases=10, classes = 3)
ldict = Dict(x => x % classes for x in 1:cases)
d = Dict{Int, Vector{Int}}()
vals = collect(values(ldict))
ks = collect(keys(ldict))
for v in unique(vals)
d[v] = ks[findall(isequal(v),vals)]
end
d
end
function f6(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(() -> Int[], d, y), x)
end
d
end
function f7(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = DefaultDict(()->Int[])
for (k,v) in l
push!(d[v], k)
end
d
end
function f8(;cases=10, classes = 3)
l = [x => x % classes for x in 1:cases];
d = Dict{Int, Vector{Int}}();
for (x,y) in l
push!(get!(Vector{Int} , d, y), x)
end
d
end
@info "Original solution"
@btime f1(cases = 1000; classes = 20)
@info "Per"
@btime f2(cases = 1000; classes = 20)
@info "tp2750 v2"
@btime f3(cases = 1000; classes = 20)
@info "tp2750 v3"
@btime f4(cases = 1000; classes = 20)
@info "berhard"
@btime f5(cases = 1000; classes = 20)
@info "stevengj"
@btime f6(cases = 1000; classes = 20)
@info "yha"
@btime f7(cases = 1000; classes = 20)
@info "kristoffer.carlsson"
@btime f8(cases = 1000; classes = 20)
[ Info: Original solution
55.791 μs (1108 allocations: 117.09 KiB)
[ Info: Per
24.631 μs (128 allocations: 40.53 KiB)
[ Info: tp2750 v2
50.627 μs (1108 allocations: 117.09 KiB)
[ Info: tp2750 v3
52.914 μs (1109 allocations: 125.03 KiB)
[ Info: berhard
92.287 μs (188 allocations: 88.67 KiB)
[ Info: stevengj
24.479 μs (128 allocations: 40.53 KiB)
[ Info: yha
53.809 μs (617 allocations: 48.17 KiB)
[ Info: kristoffer.carlsson
24.647 μs (128 allocations: 40.53 KiB)