Question about Ref and about unique

gitboy16 · October 22, 2021, 10:01am

Please see the below code:

x = rand(10_000_000);
@time unique(x);
# 1.6 seconds
@time unique(Ref(x))[1];
# 0.006373 seconds
unique(x) == unique(Ref(x))[1]
# true

Why would anyone use unique on it owns instead of unique + Ref ?
Can’t Julia optimise this automatically?

Thank you

jipolanco · October 22, 2021, 10:11am

They are not the same:

x = [2, 3, 2, 3]
unique(x)                # [2, 3]
unique(Ref(x))[1]        # [2, 3, 2, 3]
unique(Ref(x))[1] === x  # true

Note that unique(Ref(x))[1] just returns x, since Ref(x) is equivalent here to a vector with a single element x.

carstenbauer · October 22, 2021, 10:12am

julia> x = rand(1:10, 10);

julia> x
10-element Vector{Int64}:
  3
  6
  1
  8
  7
  7
 10
  6
  1
  3

julia> unique(x)
6-element Vector{Int64}:
  3
  6
  1
  8
  7
 10

julia> unique(Ref(x))
1-element Vector{Vector{Int64}}:
 [3, 6, 1, 8, 7, 7, 10, 6, 1, 3]

Note that unique(Ref(x)) doesn’t do what you think it does. Ref(x) effectively makes x “look like a scalar” for unique which is why it just returns x itself - the only unique element - in a vector. And, of course, that’s much cheaper / faster than actually finding the unique values.

gitboy16 · October 22, 2021, 10:21am

Thank you.
Unique in Julia seems to be very slow compare to unique in R for the example I provided … not sure to understand why…

x = runif(1e7)
system.time(unique(x))
# 0.85 sec

jipolanco · October 22, 2021, 10:50am

If you don’t care about the order of the returned elements, you can try something like the following, which seems to reduce the timings by a factor ~2:

x = rand(10_000_000);

# Assumes `xs` is already sorted.
function unique_sorted(xs)
    xprev = first(xs)
    ys = [xprev]
    for x ∈ xs
        if x != xprev
            push!(ys, x)
        end
        xprev = x
    end
    ys
end

function unique_sort(xs)
    ys = sort(xs)
    unique_sorted(ys)
end


julia> using BenchmarkTools

julia> @benchmark unique($x)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  1.813 s …    2.103 s  ┊ GC (min … max): 4.31% … 5.35%
 Time  (median):     2.045 s               ┊ GC (median):    3.82%
 Time  (mean ± σ):   1.987 s ± 153.230 ms  ┊ GC (mean ± σ):  3.32% ± 2.64%

  █                                             █          █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  1.81 s         Histogram: frequency by time          2.1 s <

 Memory estimate: 323.67 MiB, allocs estimate: 80.

julia> @benchmark unique_sort($x)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  878.788 ms … 986.507 ms  ┊ GC (min … max): 0.43% … 8.84%
 Time  (median):     893.003 ms               ┊ GC (median):    0.04%
 Time  (mean ± σ):   905.269 ms ±  40.648 ms  ┊ GC (mean ± σ):  1.69% ± 3.57%

  ██    █   ██                                                █  
  ██▁▁▁▁█▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  879 ms           Histogram: frequency by time          987 ms <

 Memory estimate: 223.13 MiB, allocs estimate: 20.

gitboy16 · October 22, 2021, 10:58am

Thank you @jipolanco , this is helpful and improve performance but it is still slower than R. I think I can write something in C which is faster than the R version. I will also try to write it in Julia to see how it compares. Thanks again.

lmiq · October 22, 2021, 11:00am

Please share it! It would be nice to see what algorithm R is using.

rafael.guerra · October 22, 2021, 11:52am

Instead of unique(x), another way of writing unique sorted:

function uniquesorted(x)
   y = sort(x)
   y[diff([y; Inf]) .!= 0]
end

x = rand(10_000_000);
using BenchmarkTools
@btime unique($x)       # 1.254 s (80 allocations: 360 MiB)
@btime uniquesorted($x) #  932 ms (12 allocations: 306 MiB)

rafael.guerra · October 22, 2021, 4:40pm

As a side note: with 64-bit floats and x = rand(10_000_000), the probabability that unique(x) != x should be extremelly low, no?

Topic		Replies	Views
Fastest way to `unique!` General Usage	17	3183	October 11, 2020
MATLAB-like `unique` where you can recreate the original with indices General Usage question , matlab , data , arrays	8	1244	December 8, 2023
Add unique!(f, itr) #28415 Internals & Design	6	643	December 5, 2018
Unique (indices) method similar to MATLAB General Usage question	7	3515	December 8, 2023
Unique doesn't seem to obey == definitions General Usage	7	579	October 10, 2020

Question about Ref and about unique

Related topics