MATLAB-like `unique` where you can recreate the original with indices

iamed2 · March 16, 2022, 5:37pm

@mcabbott mentioned GroupSlices.jl, which has all the functions necessary to generate the vectors, but is several times slower than `unique.

I managed to get an implementation that only works for vectors that replicates what Octave does, and it’s pretty close to the performance of unique while providing the two extra vectors. Unfortunately, it’s not the “stable” unique that Julia currently has (that preserves order of first appearance), it’s a “sorted” unique.

julia> @benchmark unique(A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   80.909 μs …  12.138 ms  ┊ GC (min … max):  0.00% … 98.59%
 Time  (median):      86.341 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   104.336 μs ± 301.032 μs  ┊ GC (mean ± σ):  10.56% ±  3.68%

  ▅▆▆█▇▄▂▁▁     ▁▂▃▃▃▂▂▂▂▁▁▁▁                                   ▂
  █████████████████████████████▇▇▇▆▇█▇▇▇▇▇▇▇██▇▇▆▅▆▆▅▅▅▅▆▅▆▅▅▅▅ █
  80.9 μs       Histogram: log(frequency) by time        166 μs <

 Memory estimate: 65.95 KiB, allocs estimate: 27.

julia> @benchmark matlab_unique(A)  # 3-output with GroupSlices
BenchmarkTools.Trial: 2633 samples with 1 evaluation.
 Range (min … max):  1.364 ms … 21.726 ms  ┊ GC (min … max):  0.00% … 92.33%
 Time  (median):     1.578 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   1.890 ms ±  1.643 ms  ┊ GC (mean ± σ):  12.27% ± 13.07%

   █
  ▇█▅▃▂▂▂▂▂▂▂▁▁▂▂▂▁▂▁▁▁▁▁▂▂▁▂▂▂▁▂▂▂▁▁▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▁▂▂▂▂▂▂▂ ▂
  1.36 ms        Histogram: frequency by time        10.8 ms <

 Memory estimate: 1.15 MiB, allocs estimate: 3840.

julia> @benchmark matlab_unique2(A)  # 3-output using sortperm, comparison, and index math, like Octave or old MATLAB
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   85.626 μs … 29.064 ms  ┊ GC (min … max):  0.00% … 99.43%
 Time  (median):     120.276 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   259.169 μs ±  1.356 ms  ┊ GC (mean ± σ):  50.93% ± 10.34%

  █▂                                                           ▁
  ██▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▄▁▃▁▃▄▄ █
  85.6 μs       Histogram: log(frequency) by time      7.23 ms <

 Memory estimate: 498.28 KiB, allocs estimate: 21.

julia> @benchmark matlab_unique3(A)  # PooledArray, only returns 1st and 3rd outputs
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   89.099 μs …  30.173 ms  ┊ GC (min … max):  0.00% … 99.31%
 Time  (median):     101.688 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   152.401 μs ± 792.605 μs  ┊ GC (mean ± σ):  27.92% ±  6.43%

  ▄▆▆▇▇█▇▄▃▂▂▂▂▂▃▃▄▃▃▂      ▁▁▁▁                                ▂
  ████████████████████████████████▇▅▇▆▆▇▇█▇█▇▆▆▆▃▅▃▆▆▆▇▆▇▆▄▅▄▃▅ █
  89.1 μs       Histogram: log(frequency) by time        229 μs <

 Memory estimate: 204.91 KiB, allocs estimate: 36.

The PooledArray approach is promising as it matches the output of unique; it would be nice to have a good way to get ia from that. Using indexin would nearly double the runtime, placing it above matlab_unique2.

Topic		Replies	Views
Unique (indices) method similar to MATLAB General Usage question	7	3630	December 8, 2023
Equal method to matlab unique in rows General Usage question	2	345	November 14, 2022
Is there a function similar to numpy unique with inverse? General Usage question	5	1904	May 12, 2022
Get indices of unique rows in 2D array General Usage indexing , matrix	12	449	January 25, 2024
Unique rows indexes in array General Usage	14	2958	September 15, 2020

MATLAB-like `unique` where you can recreate the original with indices

Related topics