GPU-friendly one-hot encoding

dfdx · October 30, 2020, 3:53pm

Given a vector of indices I want to get a matrix where only one element with corresponding index is one and others are zeros. On CPU it’s pretty easy, e.g.:

function onehot(s::AbstractVector, n_dims::Int=maximum(s))
    x = zeros(Int, n_dims, length(s))
    for (j, i) in enumerate(s)
        x[i, j] = 1
    end
    return x
end

julia> onehot([1, 2, 3, 4])
4×4 Array{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

But for GPU it’s not very efficient technique. Are there any better alternatives?

Mason · October 30, 2020, 6:28pm

The simple answer is don’t worry about indexing yourself, let Tullio handle it:

julia> using CUDA, Tullio, KernelAbstractions

julia> function onehot(s::AbstractVector, n_dims=maximum(s))
           x = similar(s, n_dims, length(s))
           @tullio x[i, j] = (i == s[j]) (i ∈ 1:n_dims)
       end
onehot (generic function with 2 methods)

julia> onehot([1,2,3,4])
4×4 Array{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

julia> onehot(cu([1,2,3,4]))
4×4 CuArray{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

When you do using KernelAbstractions before defining onehot, Tullio is able to do efficient GPU codegen. If you do using LoopVectorization as well, you’ll get faster CPU codegen on hardware native types like Int.

Of course, making an actual dense array for a one-hot vector is often a bad idea but I’ll let others who know what they’re talking about chime in there.

mcabbott · October 30, 2020, 6:37pm

I haven’t tried it, but surely something like this would work:

julia> hot1(s, k=unique(s)) = s' .== k;

julia> hot1([1,2,3,5,2,1])
4×6 BitMatrix:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  1  0  0

Flux also has a lazy one-hot matrix, with some GPU operations defined.

Mason · October 30, 2020, 6:39pm

Here’s what I see on my machine:

julia> hot1(s, k=unique(s)) = s' .== k;

julia> hot1([1,2,3,5,2,1])
4×6 BitArray{2}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  1  0  0

julia> hot1(cu([1,2,3,5,2,1]))
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/.julia/packages/GPUArrays/ZxsKE/src/host/indexing.jl:43
ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Bool,2,1}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(==),Tuple{Base.Broadcast.Extruded{Adjoint{Int64,CuDeviceArray{Int64,1,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Array{Int64,1},Tuple{Bool},Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

though I agree, I’d think this should work…

One issue is that

julia> unique(cu([1,2,3,4,1,2]))
4-element Array{Int64,1}:
 1
 2
 3
 4

mcabbott · October 30, 2020, 7:01pm

That’s a pity. I wonder if hot1(cu([1,2,3,5,2,1]), 1:5) works? Or something like ifelse.(s' .== k, 1, 0) to make integers?

Also un-GPU-tested, the shortest @tullio version would be:

julia> hot2(s) = @tullio x[s[j], j] := 1;

julia> hot2([1,2,3,5,2,1])
5×6 Matrix{Int64}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  0  0  0
 0  0  0  1  0  0

dfdx · October 30, 2020, 9:32pm

Wow, what a wonderful package! Having experimented with Einstein notation myself a while ago I realize how much work have been put into the package, and I appreciate it.

A followup question, can this be generalized to multidimensional arrays? E.g. given N-dimensional array of indices create an N+1 dimensional one-hot encoded array?

mcabbott · October 30, 2020, 9:47pm

Yes, it should work fine in N dimensions. Something like this, perhaps:

julia> js = rand(1:10, 2,3)
2×3 Matrix{Int64}:
 5  6  4
 3  3  7

julia> @tullio out[i, js[i,k], k] := 1  # axes(out,2) from extrema(js)
2×5×3 OffsetArray(::Array{Int64, 3}, 1:2, 3:7, 1:3) with eltype Int64 with indices 1:2×3:7×1:3:
[:, :, 1] =
 0  0  1  0  0
 1  0  0  0  0

[:, :, 2] =
 0  0  0  1  0
 1  0  0  0  0

[:, :, 3] =
 0  1  0  0  0
 0  0  0  0  1

julia> @tullio out[i,j,k] := j==js[i,k]  (j in 1:10)
2×10×3 Array{Bool, 3}:
[:, :, 1] =
 0  0  0  0  1  0  0  0  0  0
 0  0  1  0  0  0  0  0  0  0

[:, :, 2] =
 0  0  0  0  0  1  0  0  0  0
 0  0  1  0  0  0  0  0  0  0

[:, :, 3] =
 0  0  0  1  0  0  0  0  0  0
 0  0  0  0  0  0  1  0  0  0

Mason · October 30, 2020, 9:47pm

It’s probably my favourite julia package (together with LoopVectorization.jl).

I think no, currently there’s not a great solution for n dimensional codegen without writing each case explicitly. I wonder if one way would be to have Tullio have primitives for CartesianIndices?

There’s an open issue here: https://github.com/mcabbott/Tullio.jl/issues/11

Mason · October 30, 2020, 9:54pm

I could be wrong but I think @dfdx is talking about having one function definition that works for containers of any dimensionallity which I think Tullio won’t handle, right?

Though, this case is simple enough that perhaps one can just do it in a linear fashion and then reshape after the fact?

mcabbott · October 30, 2020, 10:08pm

Yes, any N is fine, as long as it never changes!

There was a branch trying to make things like this work via CartesianIndices:

@tullio x[i, j, ..] = (i == s[j, ..])  (i in 1:10)

but it got pretty complicated. But indeed this one could be done by just reshape(onehot(vec(s), :, size(s)...).

dfdx · October 30, 2020, 10:52pm

Perfect! Thanks both of you!

(I accepted the one-liner hot2() as an answer as it is slightly faster in my tests, but I appreciate the first implementation equally)

Mason · October 30, 2020, 10:57pm

mcabbott:

Also un-GPU-tested, the shortest @tullio version would be:

julia> hot2(s) = @tullio x[s[j], j] := 1;

julia> hot2([1,2,3,5,2,1])
5×6 Matrix{Int64}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  0  0  0
 0  0  0  1  0  0

That’s very interesting that works. How does Tullio know to make the other indices zero? Does it initialize an array of zeros? If so, wouldn’t that be inefficient?

Can confirm it works on the GPU btw

julia> hot2(s) = @tullio x[s[j], j] := 1;

julia> hot2(cu([1,2,3,5,2,1]))
5×6 CuArray{Int64,2}:
 1  0  0  0  0  1
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  0  0  0
 0  0  0  1  0  0

mcabbott · October 31, 2020, 9:14am

It should only zero the output array when the loops aren’t sure to fill it.

julia> @tullio mat[i,i] := (5:7)[i] # needed here
3×3 Matrix{Int64}:
 5  0  0
 0  6  0
 0  0  7

julia> s = [1,3,2];

julia> @tullio out[s[i]] := i^2 # not needed, but s = [1,4,2] would need it
3-element OffsetArray(::Vector{Int64}, 1:3) with eltype Int64 with indices 1:3:
 1
 9
 4

julia> @tullio row[j] := mat[i,j] # doesn't zero output
3-element Vector{Int64}:
 5
 6
 7

Topic		Replies	Views
All the ways to do one-hot encoding General Usage	30	11331	October 20, 2024
Fastest way of contracting arrays Performance tullio , loopvectorization	8	747	July 10, 2021
Realistically, how close is Gaius.jl to becoming a full replacement for BLAS in Julia? Internals & Design tullio , loopvectorization , openblas	13	4718	August 16, 2020
Avoid memory allocation in passing arrays Performance question , memory-allocation , tullio	14	2453	August 21, 2021
Element-wise operations in Tullio.jl on GPU Numerics gpu , tullio	2	1136	November 28, 2021

GPU-friendly one-hot encoding

Related topics