Opposite of unique

Hi!
Is there a function that does the opposite of unique? Say

nonunique([1,2,3,4,3,5,3,2]) = [2,3]

Thanks a lot!

5 Likes
julia> using DataStructures

julia> nonunique(v) = [k for (k, v) in counter(v) if v > 1]
nonunique (generic function with 1 method)

julia> nonunique(v)
2-element Vector{Int64}:
 2
 3
4 Likes

Equivalently to the above, you can use countmap from StatsBase instead of counter from DataStructures:

using StatsBase
[k for (k, v) in countmap(v) if v > 1]
2 Likes

For fun, code for this without using any packages and with only passing over the data once.
This is kinda ugly because i am golfing it at bit:

function nonunique(v)
    seen = Dict{eltype(v), Ref{Int}}()
    [x for x in v if 2 == (get!(()->Ref(0), seen, x)[]+=1)]
end

which does (still unsorted could sort after)

julia> nonunique([1,2,3,4,3,5,3,2])
2-element Array{Int64,1}:
 3
 2

another fun way: use sort then diff to find things that occur after things that are the same as them, then unique to drop extra multiples

function nonunique(v)
    sv = sort(v)
    return unique(@view sv[[diff(sv).==0; false]])
end

I’ld probably use one of the packages and two passes though.
Or a Dict and two passes

2 Likes

That’s not quite the opposite to unique. Note that unique([1,2,3,4,3,5,3,2]) == [1,2,3,4,5], not [1,4,5].

3 Likes

Maybe a better name would be only_repeated_values or something like that.

3 Likes

@stillyslalom @CameronBieganek @oxinabox thanks for coding something for me! I was just asking if there was a built in function, didn’t expect you to write the solution :slight_smile:
I ended up using symdiff(v,unique(v)), which works for my specific case (no more than 2 of the same number, and I also need unique, so that is available for free).
Thanks again!

2 Likes

R calls this function duplicated. Rather, R’s duplicated() would return the indices of the second 2 and second and third 3 in the Ribiero’s example. Then again, R was never known for having overly-descriptive function names. :wink:

1 Like

Using the beautiful Multisets.jl package:

using Multisets
v = [1,2,3,4,3,5,3,2]
M = Multiset(v)
U = Multiset(Set(M))
collect(keys(M-U))

2-element Vector{Int64}:
 2
 3

And another way:

using Multisets
v = [1,2,3,4,3,5,3,2]
M = Multiset(v)
collect(keys(M))[values(M).>1]

2-element Vector{Int64}:
 2
 3
7 Likes

On this topic, see also the fast solutions by Przemyslaw Szufel and Bogumił Kamiński in stackoverflow.

NB! log y axis
results

results
9-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "dict" => Trial(357.094 μs)
  "symdiff" => Trial(473.081 μs)
  "szufelinplace" => Trial(39.201 μs)
  "countmap" => Trial(229.767 μs)
  "counter" => Trial(196.279 μs)
  "multiset1" => Trial(792.868 μs)
  "sort" => Trial(65.060 μs)
  "szufel" => Trial(42.194 μs)
  "multiset2" => Trial(429.816 μs)
3 Likes

@gustaphe, very nice summary but what a weird logarithmic scale axis that one is (with ticks at 10^4.8, etc.). Integer powers would be easier to read.

1 Like

Yeah, for some reason that’s the default behaviour in GR. You can set the ticks manually, but I didn’t feel like it.

1 Like

How’s this for effort? :stuck_out_tongue:
results

5 Likes

I’d say you’ve studied this enough that you can propose that your best function gets added to Base =]

2 Likes

@gustaphe, it is apparent that you’ve found the right plunger shapes to unclog the non-unique problem. Thanks for the inspirational drawing.

1 Like

a = [1,2,3,4,3,5,3,2]
[i for i in unique(a) if sum(i .== a) > 1 ]

1 Like

This is not really how things work. New methods are added to Base only if they are sufficiently basic, I have to say that I think I never needed something as specific as this.

2 Likes

It’s funny, that’s probably what I would have written. A fairly intuitive solution.

It’s by far the slowest of the suggested ones. It really stands out.

results

2 Likes

When working with datasets I would try dataframe utilities for such kind of vector problems. It seems another good approach in this case (for current versions).

using DataFrames, BenchmarkTools
nonunique(x) = 
  filter(:nrow => >(1), combine(groupby(DataFrame(x = x), :x), nrow)).x

julia> @btime nonunique([1, 2, 3, 4, 3, 5, 3, 2])
  12.550 μs (202 allocations: 17.28 KiB)
2-element Array{Int64,1}:
 2
 3

ps. The result is unsorted.

1 Like