GPU-friendly one-hot encoding

The simple answer is don’t worry about indexing yourself, let Tullio handle it:

julia> using CUDA, Tullio, KernelAbstractions

julia> function onehot(s::AbstractVector, n_dims=maximum(s))
           x = similar(s, n_dims, length(s))
           @tullio x[i, j] = (i == s[j]) (i ∈ 1:n_dims)
       end
onehot (generic function with 2 methods)

julia> onehot([1,2,3,4])
4×4 Array{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

julia> onehot(cu([1,2,3,4]))
4×4 CuArray{Int64,2}:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

When you do using KernelAbstractions before defining onehot, Tullio is able to do efficient GPU codegen. If you do using LoopVectorization as well, you’ll get faster CPU codegen on hardware native types like Int.

Of course, making an actual dense array for a one-hot vector is often a bad idea but I’ll let others who know what they’re talking about chime in there.

5 Likes