The simple answer is don’t worry about indexing yourself, let Tullio handle it:
julia> using CUDA, Tullio, KernelAbstractions
julia> function onehot(s::AbstractVector, n_dims=maximum(s))
x = similar(s, n_dims, length(s))
@tullio x[i, j] = (i == s[j]) (i ∈ 1:n_dims)
end
onehot (generic function with 2 methods)
julia> onehot([1,2,3,4])
4×4 Array{Int64,2}:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
julia> onehot(cu([1,2,3,4]))
4×4 CuArray{Int64,2}:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
When you do using KernelAbstractions
before defining onehot
, Tullio is able to do efficient GPU codegen. If you do using LoopVectorization
as well, you’ll get faster CPU codegen on hardware native types like Int
.
Of course, making an actual dense array for a one-hot vector is often a bad idea but I’ll let others who know what they’re talking about chime in there.