The simple answer is don’t worry about indexing yourself, let Tullio handle it:
julia> using CUDA, Tullio, KernelAbstractions
julia> function onehot(s::AbstractVector, n_dims=maximum(s))
x = similar(s, n_dims, length(s))
@tullio x[i, j] = (i == s[j]) (i ∈ 1:n_dims)
onehot (generic function with 2 methods)
julia> onehot([1,2,3,4])
4×4 Array{Int64,2}:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
julia> onehot(cu([1,2,3,4]))
4×4 CuArray{Int64,2}:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
When you do using KernelAbstractions
before defining onehot
, Tullio is able to do efficient GPU codegen. If you do using LoopVectorization
as well, you’ll get faster CPU codegen on hardware native types like Int
Of course, making an actual dense array for a one-hot vector is often a bad idea but I’ll let others who know what they’re talking about chime in there.