Vector to lists of indices grouped by key

First note that you have shared the code of an incorrect method for group, but this is a minor issue.

The implementation is similar but not identical and it is faster, here is an example:

julia> using SplitApplyCombine

julia> using BenchmarkTools

julia> arr = rand(1:1000, 10^6);

julia> function sol(arr)
           d = Dict{Int, Vector{Int}}()
           for (k, v) in pairs(arr)
               c = get!(d, v, Int[])
               push!(c, k)
           end
           return d
       end
sol (generic function with 1 method)

julia> @benchmark sol($arr)
BenchmarkTools.Trial: 84 samples with 1 evaluation.
 Range (min … max):  51.139 ms … 79.473 ms  ┊ GC (min … max): 5.99% … 5.12%
 Time  (median):     59.537 ms              ┊ GC (median):    6.03%
 Time  (mean ± σ):   59.924 ms ±  4.510 ms  ┊ GC (mean ± σ):  5.82% ± 1.20%

                         ▁ █
  ▃▁▁▁▃▃▃▃▁▃▆▅▃▆▃▃▃▃▃█▆▅▆█▆██▆▃▅▃▃▅▆▁▃▅▅▅▁▃▃▁▃▁▃▃▁▁▁▁▁▁▁▃▅▁▁▃ ▁
  51.1 ms         Histogram: frequency by time        70.8 ms <

 Memory estimate: 82.41 MiB, allocs estimate: 1005018.

julia> @benchmark group($arr, eachindex($arr))
BenchmarkTools.Trial: 158 samples with 1 evaluation.
 Range (min … max):  23.603 ms … 42.635 ms  ┊ GC (min … max): 0.00% … 6.79%
 Time  (median):     31.774 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   31.785 ms ±  3.384 ms  ┊ GC (mean ± σ):  4.44% ± 4.96%

               ▂   ▂ ▂▁ ▁█▂▄ ▅▂ ▄▂▄▁▁▂ ▁
  ▃▁▁▁▅▁▃▃▅▃▅▅▃███▃█▃██▆████▅█████████▆█▅▆▃▃▁▁▁▃▃▅▃▁▁▁▁▁▁▁▁▃▅ ▃
  23.6 ms         Histogram: frequency by time        41.5 ms <

 Memory estimate: 21.45 MiB, allocs estimate: 6024.

To fix the problem with sol one needs to make creation of initial Int[] vector lazy and only done when needed:

julia> function sol2(arr)
           d = Dict{Int, Vector{Int}}()
           for (k, v) in pairs(arr)
               c = get!(() -> Int[], d, v)
               push!(c, k)
           end
           return d
       end
sol2 (generic function with 1 method)

julia> @benchmark sol2($arr)
BenchmarkTools.Trial: 188 samples with 1 evaluation.
 Range (min … max):  22.640 ms … 37.099 ms  ┊ GC (min … max): 0.00% … 3.39%
 Time  (median):     26.503 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.695 ms ±  2.338 ms  ┊ GC (mean ± σ):  1.84% ± 1.88%

          ▃▂ ▂ ▇  ▂ ▃▃▃█▂  █
  ▃▃▁▁▄▅█▇██▆█▅█▆▅█▅█████▅██▆▇█▆▅▁▁▃▁▃▄▅▅▁▁▁▁▄▃▃▁▁▄▃▁▁▁▄▁▁▃▃▃ ▃
  22.6 ms         Histogram: frequency by time        33.8 ms <

 Memory estimate: 21.44 MiB, allocs estimate: 6018.
4 Likes