Is there a function similar to numpy unique with inverse?

In python numpy, I can write my code as,

unique_vals, reverse_ids = np.unique(vals, return_inverse=True)

Here, reverse_ids is the list of indices such that,

unique_vals[reverse_ids] == vals

Is there something similar available in Julia? As far as I know, there is a unique function but it doesn’t have any such parameters.

I’m not sure how that works in numpy in the first place - is unique_vals not a proper numpy array but rather some special construct? unique in julia doesn’t give you a slice or view into the existing array, it creates a new array with the unique values of the old array. Indexing with indices relating to vals into that doesn’t really make a whole lot of sense to me…

Ah, I think I got what it’s doing from the numpy documentation. Seems like this should work for you:

julia> a = [1,2,6,4,2,3,2];

julia> u = unique(a)
5-element Vector{Int64}:
 1
 2
 6
 4
 3

julia> indices = [ findfirst(==(ux), u) for ux in a ]
7-element Vector{Int64}:
 1
 2
 3
 4
 2
 5
 2

julia> u[indices] == a
true

But no, that’s not built in as far as I know. Seems kind of niche, when you usually have vals already anyway :person_shrugging:

1 Like

I’m not aware of such functionality built into Base or any mainstream packages. You can implement it yourself quite easily. Not the necessarily the most efficient, but it should be significantly faster than @Sukera’s solution (which has the advantage of not being overengineered for simple cases):

   function unique_ids(itr)
     v = Vector{eltype(itr)}()
     d = Dict{eltype(itr), Int}()
     revid = Vector{Int}()
     for val in itr
       if haskey(d, val)
         push!(revid, d[val])
       else
         push!(v, val)
         d[val] = length(v)
         push!(revid, length(v))
       end
     end
     (v, revid)
   end
1 Like

I wasn’t sure how far to optimize either - could be that numpy does some funky things like restoring dimensions as well :person_shrugging:

See also:

And in particular this solution:

uniqueinds(x) = unique(i -> x[i], eachindex(x))

However, this isn’t quite the same thing: it’s equivalent to return_index=True in numpy.unique, not return_inverse=True.

2 Likes