``````x = rand(10_000_000);
@time unique(x);
# 1.6 seconds
@time unique(Ref(x));
# 0.006373 seconds
unique(x) == unique(Ref(x))
# true
``````

Why would anyone use `unique` on it owns instead of unique + Ref ?
Can’t Julia optimise this automatically?

Thank you

1 Like

They are not the same:

``````x = [2, 3, 2, 3]
unique(x)                # [2, 3]
unique(Ref(x))        # [2, 3, 2, 3]
unique(Ref(x)) === x  # true
``````

Note that `unique(Ref(x))` just returns `x`, since `Ref(x)` is equivalent here to a vector with a single element `x`.

2 Likes
``````julia> x = rand(1:10, 10);

julia> x
10-element Vector{Int64}:
3
6
1
8
7
7
10
6
1
3

julia> unique(x)
6-element Vector{Int64}:
3
6
1
8
7
10

julia> unique(Ref(x))
1-element Vector{Vector{Int64}}:
[3, 6, 1, 8, 7, 7, 10, 6, 1, 3]
``````

Note that `unique(Ref(x))` doesn’t do what you think it does. `Ref(x)` effectively makes `x` “look like a scalar” for `unique` which is why it just returns `x` itself - the only unique element - in a vector. And, of course, that’s much cheaper / faster than actually finding the unique values.

2 Likes

Thank you.
Unique in Julia seems to be very slow compare to unique in R for the example I provided … not sure to understand why…

``````x = runif(1e7)
system.time(unique(x))
# 0.85 sec
``````
1 Like

If you don’t care about the order of the returned elements, you can try something like the following, which seems to reduce the timings by a factor ~2:

``````x = rand(10_000_000);

# Assumes `xs` is already sorted.
function unique_sorted(xs)
xprev = first(xs)
ys = [xprev]
for x ∈ xs
if x != xprev
push!(ys, x)
end
xprev = x
end
ys
end

function unique_sort(xs)
ys = sort(xs)
unique_sorted(ys)
end

julia> using BenchmarkTools

julia> @benchmark unique(\$x)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
Range (min … max):  1.813 s …    2.103 s  ┊ GC (min … max): 4.31% … 5.35%
Time  (median):     2.045 s               ┊ GC (median):    3.82%
Time  (mean ± σ):   1.987 s ± 153.230 ms  ┊ GC (mean ± σ):  3.32% ± 2.64%

█                                             █          █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
1.81 s         Histogram: frequency by time          2.1 s <

Memory estimate: 323.67 MiB, allocs estimate: 80.

julia> @benchmark unique_sort(\$x)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min … max):  878.788 ms … 986.507 ms  ┊ GC (min … max): 0.43% … 8.84%
Time  (median):     893.003 ms               ┊ GC (median):    0.04%
Time  (mean ± σ):   905.269 ms ±  40.648 ms  ┊ GC (mean ± σ):  1.69% ± 3.57%

██    █   ██                                                █
██▁▁▁▁█▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
879 ms           Histogram: frequency by time          987 ms <

Memory estimate: 223.13 MiB, allocs estimate: 20.
``````
1 Like

Thank you @jipolanco , this is helpful and improve performance but it is still slower than R. I think I can write something in C which is faster than the R version. I will also try to write it in Julia to see how it compares. Thanks again.

Please share it! It would be nice to see what algorithm R is using.

1 Like

Instead of `unique(x)`, another way of writing unique sorted:

``````function uniquesorted(x)
y = sort(x)
y[diff([y; Inf]) .!= 0]
end

x = rand(10_000_000);
using BenchmarkTools
@btime unique(\$x)       # 1.254 s (80 allocations: 360 MiB)
@btime uniquesorted(\$x) #  932 ms (12 allocations: 306 MiB)
``````

As a side note: with 64-bit floats and `x = rand(10_000_000)`, the probabability that `unique(x) != x` should be extremelly low, no?

1 Like