Unique (indices) method similar to MATLAB

Is there a method in Julia similar to MATLAB’s unique method? In Julia there is unique, but it does not have the same functionality. Specifically, I would like to get the list of indices of unique elements.

julia> unique([9,2,9,5])
3-element Array{Int64,1}:
 9
 2
 5

julia> indices = [1,2,4]
3-element Array{Int64,1}:
 1
 2
 4

In Matlab, the unique method provides the indices in the 2nd output argument, but Julia does not.

Is there any existing similar method? Unique values in array - MATLAB unique

x->findfirst.(.==(unique(x)), Ref(x) ) perhaps?

5 Likes

You can pass a function to unique in order to get the indices of the unique elements:

idx = unique(z -> x[z], 1:length(x))
x[idx]
7 Likes

This comment still seems relevant: Redirecting to Google Groups

One reason I probably still never notice is that we have Dict:

itr = pairs(a)
ua = Dict{valtype(itr),keytype(itr)}()
for (idx, val) in itr
    ua[val] = idx
end

julia> ua
Dict{Int64,Int64} with 3 entries:
  9 => 3
  2 => 2
  5 => 4

You can use Iterators.reverse if you prefer the first instance rather than last instance.

I’m well aware that looks longer than unique, but I nevertheless stand by the sense that I never miss it. I think it’s because in circumstances where I’d use it, the Dict may be useful in its own right. You also don’t have to read any documentation if you do it this way, whereas with Matlab’s unique you’re always poring over the docs to remember which index output is which.

3 Likes

In the link @tim.holy posted there is a link to a Github Issue:

https://github.com/JuliaLang/julia/issues/1845

2 Likes

For completeness, perhaps this thread should mention GroupSlices.jl (not written by me!) which in fact doesn’t return Matlab’s 3rd output:

A = [9, 2, 9, 5];
C = unique(A)
ia = unique(i -> A[i], 1:length(A)) # [1, 2, 4]
A[ia] == C
using GroupSlices
ig = groupslices(A) # [1, 2, 1, 4]
A[ig] == A # not matlab's ic
ia == firstinds(ig)
4 Likes

Here is a pure Julia implementation of MATLAB’s unique, which returns ic contrary to the previous answer using GroupSlices :

uniq(A; dims=1) = begin
  @assert ndims(A) ∈ (1, 2)
  slA = ndims(A) > 1 ? eachslice(A; dims) : A

  ia = unique(i -> slA[i], axes(A, dims))
  sort!(ia; by=i -> slA[i])

  C = stack(slA[ia]; dims)
  slC = ndims(A) > 1 ? eachslice(C; dims) : C

  ic = map(r -> findfirst(==(slA[r]), slC), axes(A, dims))

  C, ia, ic
end

And basic testing:

using Test

main() = begin
  let A = [
    3 1
    3 3
    1 3
    3 2
    2 3
    1 1
    1 2
    2 3
    3 3
    3 3
  ]
    C, ia, ic = uniq(A; dims=1)
    @test ia == [6, 7, 3, 5, 1, 4, 2]
    @test ic == [5, 7, 3, 6, 4, 1, 2, 4, 7, 7]
    @test C == A[ia, :]
    @test A == C[ic, :]

    C, ia, ic = uniq(A'; dims=2)
    @test ia == [6, 7, 3, 5, 1, 4, 2]
    @test ic == [5, 7, 3, 6, 4, 1, 2, 4, 7, 7]
    @test C == A'[:, ia]
    @test A' == C[:, ic]
  end

  let A = [9, 2, 9, 5]
    C, ia, ic = uniq(A)
    @test ia == [2, 4, 1]
    @test ic == [3, 1, 3, 2]
    @test C == A[ia]
    @test A == C[ic]
  end
end

main()

Useful piece of functionaly indeed! The major challenge is where to put it to make discoverable :slight_smile:
I added uniqueview() into DataManipulation.jl quite some time ago, its result defines existing Julia functions whenever possible to retrieve various indices:

julia> using DataManipulation

julia> A = [9, 2, 9, 5]

julia> uv = uniqueview(A)  # like unique(), but a view
3-element DataManipulation.UniqueView{Int64, Vector{Int64}, Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}}:
 9
 2
 5

# parentindices() is a Base Julia function, specifically for views - makes sense here:
julia> parentindices(uv)
([1, 2, 4],)


# these are used to go from A to uv:
julia> A[parentindices(uv)[1]]
3-element Vector{Int64}:
 9
 2
 5

# no Base function to go in the other direction, so we define a new one:
julia> using DataManipulation: inverseindices

julia> inverseindices(uv)
4-element Vector{Int64}:
 1
 2
 1
 3

# this is to go back from uv to A
julia> uv[inverseindices(uv)]
4-element Vector{Int64}:
 9
 2
 9
 5

Btw, working with indices manually isn’t needed for lots of usecases. For example, apply a function to all elements, performing actual computations only for distinct values:

julia> @modify(a -> a + rand(), A |> uniqueview |> Elements())
4-element view(::Vector{Float64}, [1, 2, 1, 3]) with eltype Float64:
 9.523911282035403  # 1 and 3 elements are exactly the same, rand() only computed once for them
 2.8311892880800316
 9.523911282035403
 5.5151051390572094

or assign distinct numbers to unique values:

julia> @modify(Au -> 1:length(Au), A |> uniqueview)
4-element view(::UnitRange{Int64}, [1, 2, 1, 3]) with eltype Int64:
 1
 2
 1
 3

Here, @modify comes from Accessors.jl – reexported by DataManipulation.jl.