# Opposite of unique

Hi!
Is there a function that does the opposite of unique? Say

``````nonunique([1,2,3,4,3,5,3,2]) = [2,3]
``````

Thanks a lot!

5 Likes
``````julia> using DataStructures

julia> nonunique(v) = [k for (k, v) in counter(v) if v > 1]
nonunique (generic function with 1 method)

julia> nonunique(v)
2-element Vector{Int64}:
2
3
``````
4 Likes

Equivalently to the above, you can use `countmap` from `StatsBase` instead of `counter` from `DataStructures`:

``````using StatsBase
[k for (k, v) in countmap(v) if v > 1]
``````
2 Likes

For fun, code for this without using any packages and with only passing over the data once.
This is kinda ugly because i am golfing it at bit:

``````function nonunique(v)
seen = Dict{eltype(v), Ref{Int}}()
[x for x in v if 2 == (get!(()->Ref(0), seen, x)[]+=1)]
end
``````

which does (still unsorted could sort after)

``````julia> nonunique([1,2,3,4,3,5,3,2])
2-element Array{Int64,1}:
3
2
``````

another fun way: use `sort` then `diff` to find things that occur after things that are the same as them, then `unique` to drop extra multiples

``````function nonunique(v)
sv = sort(v)
return unique(@view sv[[diff(sv).==0; false]])
end
``````

I’ld probably use one of the packages and two passes though.
Or a `Dict` and two passes

2 Likes

That’s not quite the opposite to `unique`. Note that `unique([1,2,3,4,3,5,3,2]) == [1,2,3,4,5]`, not `[1,4,5]`.

3 Likes

Maybe a better name would be `only_repeated_values` or something like that.

3 Likes

@stillyslalom @CameronBieganek @oxinabox thanks for coding something for me! I was just asking if there was a built in function, didn’t expect you to write the solution I ended up using `symdiff(v,unique(v))`, which works for my specific case (no more than 2 of the same number, and I also need unique, so that is available for free).
Thanks again!

2 Likes

R calls this function `duplicated`. Rather, R’s `duplicated()` would return the indices of the second 2 and second and third 3 in the Ribiero’s example. Then again, R was never known for having overly-descriptive function names. 1 Like

Using the beautiful Multisets.jl package:

``````using Multisets
v = [1,2,3,4,3,5,3,2]
M = Multiset(v)
U = Multiset(Set(M))
collect(keys(M-U))

2-element Vector{Int64}:
2
3
``````

And another way:

``````using Multisets
v = [1,2,3,4,3,5,3,2]
M = Multiset(v)
collect(keys(M))[values(M).>1]

2-element Vector{Int64}:
2
3
``````
7 Likes

On this topic, see also the fast solutions by Przemyslaw Szufel and Bogumił Kamiński in stackoverflow.

NB! log y axis ``````results
9-element BenchmarkTools.BenchmarkGroup:
tags: []
"dict" => Trial(357.094 μs)
"symdiff" => Trial(473.081 μs)
"szufelinplace" => Trial(39.201 μs)
"countmap" => Trial(229.767 μs)
"counter" => Trial(196.279 μs)
"multiset1" => Trial(792.868 μs)
"sort" => Trial(65.060 μs)
"szufel" => Trial(42.194 μs)
"multiset2" => Trial(429.816 μs)
``````
3 Likes

@gustaphe, very nice summary but what a weird logarithmic scale axis that one is (with ticks at `10^4.8`, etc.). Integer powers would be easier to read.

1 Like

Yeah, for some reason that’s the default behaviour in GR. You can set the ticks manually, but I didn’t feel like it.

1 Like

How’s this for effort?  5 Likes

I’d say you’ve studied this enough that you can propose that your best function gets added to `Base` =]

2 Likes

@gustaphe, it is apparent that you’ve found the right plunger shapes to unclog the non-unique problem. Thanks for the inspirational drawing.

1 Like

a = [1,2,3,4,3,5,3,2]
[i for i in unique(a) if sum(i .== a) > 1 ]

1 Like

This is not really how things work. New methods are added to `Base` only if they are sufficiently basic, I have to say that I think I never needed something as specific as this.

2 Likes

It’s funny, that’s probably what I would have written. A fairly intuitive solution.

It’s by far the slowest of the suggested ones. It really stands out. 2 Likes

When working with datasets I would try dataframe utilities for such kind of vector problems. It seems another good approach in this case (for current versions).

``````using DataFrames, BenchmarkTools
nonunique(x) =
filter(:nrow => >(1), combine(groupby(DataFrame(x = x), :x), nrow)).x

julia> @btime nonunique([1, 2, 3, 4, 3, 5, 3, 2])
12.550 μs (202 allocations: 17.28 KiB)
2-element Array{Int64,1}:
2
3
``````

ps. The result is unsorted.

1 Like