GPU-friendly one-hot encoding

Given a vector of indices I want to get a matrix where only one element with corresponding index is one and others are zeros. On CPU it’s pretty easy, e.g.:

function onehot(s::AbstractVector, n_dims::Int=maximum(s))
    x = zeros(Int, n_dims, length(s))
    for (j, i) in enumerate(s)
        x[i, j] = 1
    end
    return x
end

julia> onehot([1, 2, 3, 4])
4×4 Array{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

But for GPU it’s not very efficient technique. Are there any better alternatives?

2 Likes

The simple answer is don’t worry about indexing yourself, let Tullio handle it:

julia> using CUDA, Tullio, KernelAbstractions

julia> function onehot(s::AbstractVector, n_dims=maximum(s))
           x = similar(s, n_dims, length(s))
           @tullio x[i, j] = (i == s[j]) (i ∈ 1:n_dims)
       end
onehot (generic function with 2 methods)

julia> onehot([1,2,3,4])
4×4 Array{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

julia> onehot(cu([1,2,3,4]))
4×4 CuArray{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

When you do using KernelAbstractions before defining onehot, Tullio is able to do efficient GPU codegen. If you do using LoopVectorization as well, you’ll get faster CPU codegen on hardware native types like Int.

Of course, making an actual dense array for a one-hot vector is often a bad idea but I’ll let others who know what they’re talking about chime in there.

5 Likes

I haven’t tried it, but surely something like this would work:

julia> hot1(s, k=unique(s)) = s' .== k;

julia> hot1([1,2,3,5,2,1])
4×6 BitMatrix:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  1  0  0

Flux also has a lazy one-hot matrix, with some GPU operations defined.

1 Like

Here’s what I see on my machine:

julia> hot1(s, k=unique(s)) = s' .== k;

julia> hot1([1,2,3,5,2,1])
4×6 BitArray{2}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  1  0  0

julia> hot1(cu([1,2,3,5,2,1]))
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/.julia/packages/GPUArrays/ZxsKE/src/host/indexing.jl:43
ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Bool,2,1}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{Adjoint{Int64,CuDeviceArray{Int64,1,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

though I agree, I’d think this should work…

One issue is that

julia> unique(cu([1,2,3,4,1,2]))
4-element Array{Int64,1}:
 1
 2
 3
 4

That’s a pity. I wonder if hot1(cu([1,2,3,5,2,1]), 1:5) works? Or something like ifelse.(s' .== k, 1, 0) to make integers?

Also un-GPU-tested, the shortest @tullio version would be:

julia> hot2(s) = @tullio x[s[j], j] := 1;

julia> hot2([1,2,3,5,2,1])
5×6 Matrix{Int64}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  0  0  0
 0  0  0  1  0  0
4 Likes

Wow, what a wonderful package! Having experimented with Einstein notation myself a while ago I realize how much work have been put into the package, and I appreciate it.

A followup question, can this be generalized to multidimensional arrays? E.g. given N-dimensional array of indices create an N+1 dimensional one-hot encoded array?

Yes, it should work fine in N dimensions. Something like this, perhaps:

julia> js = rand(1:10, 2,3)
2×3 Matrix{Int64}:
 5  6  4
 3  3  7

julia> @tullio out[i, js[i,k], k] := 1  # axes(out,2) from extrema(js)
2×5×3 OffsetArray(::Array{Int64, 3}, 1:2, 3:7, 1:3) with eltype Int64 with indices 1:2×3:7×1:3:
[:, :, 1] =
 0  0  1  0  0
 1  0  0  0  0

[:, :, 2] =
 0  0  0  1  0
 1  0  0  0  0

[:, :, 3] =
 0  1  0  0  0
 0  0  0  0  1

julia> @tullio out[i,j,k] := j==js[i,k]  (j in 1:10)
2×10×3 Array{Bool, 3}:
[:, :, 1] =
 0  0  0  0  1  0  0  0  0  0
 0  0  1  0  0  0  0  0  0  0

[:, :, 2] =
 0  0  0  0  0  1  0  0  0  0
 0  0  1  0  0  0  0  0  0  0

[:, :, 3] =
 0  0  0  1  0  0  0  0  0  0
 0  0  0  0  0  0  1  0  0  0

It’s probably my favourite julia package (together with LoopVectorization.jl).

I think no, currently there’s not a great solution for n dimensional codegen without writing each case explicitly. I wonder if one way would be to have Tullio have primitives for CartesianIndices?

There’s an open issue here: https://github.com/mcabbott/Tullio.jl/issues/11

1 Like

I could be wrong but I think @dfdx is talking about having one function definition that works for containers of any dimensionallity which I think Tullio won’t handle, right?

Though, this case is simple enough that perhaps one can just do it in a linear fashion and then reshape after the fact?

Yes, any N is fine, as long as it never changes!

There was a branch trying to make things like this work via CartesianIndices:

@tullio x[i, j, ..] = (i == s[j, ..])  (i in 1:10)

but it got pretty complicated. But indeed this one could be done by just reshape(onehot(vec(s), :, size(s)...).

1 Like

Perfect! Thanks both of you!

(I accepted the one-liner hot2() as an answer as it is slightly faster in my tests, but I appreciate the first implementation equally)

That’s very interesting that works. How does Tullio know to make the other indices zero? Does it initialize an array of zeros? If so, wouldn’t that be inefficient?

Can confirm it works on the GPU btw

julia> hot2(s) = @tullio x[s[j], j] := 1;

julia> hot2(cu([1,2,3,5,2,1]))
5×6 CuArray{Int64,2}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  0  0  0
 0  0  0  1  0  0

It should only zero the output array when the loops aren’t sure to fill it.

julia> @tullio mat[i,i] := (5:7)[i] # needed here
3×3 Matrix{Int64}:
 5  0  0
 0  6  0
 0  0  7

julia> s = [1,3,2];

julia> @tullio out[s[i]] := i^2 # not needed, but s = [1,4,2] would need it
3-element OffsetArray(::Vector{Int64}, 1:3) with eltype Int64 with indices 1:3:
 1
 9
 4

julia> @tullio row[j] := mat[i,j] # doesn't zero output
3-element Vector{Int64}:
 5
 6
 7