Opposite of unique

Ribeiro · March 23, 2021, 4:58pm

Hi!
Is there a function that does the opposite of unique? Say

nonunique([1,2,3,4,3,5,3,2]) = [2,3]

Thanks a lot!

stillyslalom · March 23, 2021, 5:08pm

julia> using DataStructures

julia> nonunique(v) = [k for (k, v) in counter(v) if v > 1]
nonunique (generic function with 1 method)

julia> nonunique(v)
2-element Vector{Int64}:
 2
 3

CameronBieganek · March 23, 2021, 5:11pm

Equivalently to the above, you can use countmap from StatsBase instead of counter from DataStructures:

using StatsBase
[k for (k, v) in countmap(v) if v > 1]

oxinabox · March 23, 2021, 7:02pm

For fun, code for this without using any packages and with only passing over the data once.
This is kinda ugly because i am golfing it at bit:

function nonunique(v)
    seen = Dict{eltype(v), Ref{Int}}()
    [x for x in v if 2 == (get!(()->Ref(0), seen, x)[]+=1)]
end

which does (still unsorted could sort after)

julia> nonunique([1,2,3,4,3,5,3,2])
2-element Array{Int64,1}:
 3
 2

another fun way: use sort then diff to find things that occur after things that are the same as them, then unique to drop extra multiples

function nonunique(v)
    sv = sort(v)
    return unique(@view sv[[diff(sv).==0; false]])
end

I’ld probably use one of the packages and two passes though.
Or a Dict and two passes

gustaphe · March 23, 2021, 7:08pm

That’s not quite the opposite to unique. Note that unique([1,2,3,4,3,5,3,2]) == [1,2,3,4,5], not [1,4,5].

Henrique_Becker · March 23, 2021, 7:48pm

Maybe a better name would be only_repeated_values or something like that.

Ribeiro · March 23, 2021, 8:22pm

@stillyslalom @CameronBieganek @oxinabox thanks for coding something for me! I was just asking if there was a built in function, didn’t expect you to write the solution
I ended up using symdiff(v,unique(v)), which works for my specific case (no more than 2 of the same number, and I also need unique, so that is available for free).
Thanks again!

adolgert · March 23, 2021, 10:52pm

R calls this function duplicated. Rather, R’s duplicated() would return the indices of the second 2 and second and third 3 in the Ribiero’s example. Then again, R was never known for having overly-descriptive function names.

rafael.guerra · March 24, 2021, 12:04am

Using the beautiful Multisets.jl package:

using Multisets
v = [1,2,3,4,3,5,3,2]
M = Multiset(v)
U = Multiset(Set(M))
collect(keys(M-U))

2-element Vector{Int64}:
 2
 3

And another way:

using Multisets
v = [1,2,3,4,3,5,3,2]
M = Multiset(v)
collect(keys(M))[values(M).>1]

2-element Vector{Int64}:
 2
 3

rafael.guerra · March 24, 2021, 10:51am

On this topic, see also the fast solutions by Przemyslaw Szufel and Bogumił Kamiński in stackoverflow.

gustaphe · March 24, 2021, 12:57pm

NB! log y axis
results

results
9-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "dict" => Trial(357.094 μs)
  "symdiff" => Trial(473.081 μs)
  "szufelinplace" => Trial(39.201 μs)
  "countmap" => Trial(229.767 μs)
  "counter" => Trial(196.279 μs)
  "multiset1" => Trial(792.868 μs)
  "sort" => Trial(65.060 μs)
  "szufel" => Trial(42.194 μs)
  "multiset2" => Trial(429.816 μs)

rafael.guerra · March 24, 2021, 2:20pm

@gustaphe, very nice summary but what a weird logarithmic scale axis that one is (with ticks at 10^4.8, etc.). Integer powers would be easier to read.

gustaphe · March 24, 2021, 2:39pm

Yeah, for some reason that’s the default behaviour in GR. You can set the ticks manually, but I didn’t feel like it.

gustaphe · March 24, 2021, 7:46pm

How’s this for effort?
results

Ribeiro · March 24, 2021, 7:57pm

I’d say you’ve studied this enough that you can propose that your best function gets added to Base =]

rafael.guerra · March 24, 2021, 8:23pm

@gustaphe, it is apparent that you’ve found the right plunger shapes to unclog the non-unique problem. Thanks for the inspirational drawing.

Andrei_Bobrov · March 24, 2021, 8:32pm

a = [1,2,3,4,3,5,3,2]
[i for i in unique(a) if sum(i .== a) > 1 ]

Henrique_Becker · March 24, 2021, 9:00pm

This is not really how things work. New methods are added to Base only if they are sufficiently basic, I have to say that I think I never needed something as specific as this.

gustaphe · March 25, 2021, 6:06am

It’s funny, that’s probably what I would have written. A fairly intuitive solution.

It’s by far the slowest of the suggested ones. It really stands out.

results

qsong · March 25, 2021, 12:37pm

When working with datasets I would try dataframe utilities for such kind of vector problems. It seems another good approach in this case (for current versions).

using DataFrames, BenchmarkTools
nonunique(x) = 
  filter(:nrow => >(1), combine(groupby(DataFrame(x = x), :x), nrow)).x

julia> @btime nonunique([1, 2, 3, 4, 3, 5, 3, 2])
  12.550 μs (202 allocations: 17.28 KiB)
2-element Array{Int64,1}:
 2
 3

ps. The result is unsorted.

Topic		Replies	Views
Number of each unique value in an array General Usage	4	4884	March 26, 2024
Unique! and count New to Julia	5	682	December 22, 2021
Is there a function similar to numpy unique with inverse? General Usage question	5	1737	May 12, 2022
Confused by unique! example in Base Collections docs New to Julia	5	284	October 2, 2022
How would I check for unique values across many arrays without for loops? General Usage	7	1010	June 2, 2020

Opposite of unique

Related topics