Element-wise operations in Tullio.jl on GPU

Not the most friendly error message, but what you need to do is to load some extra packages:

julia> using KernelAbstractions, CUDAKernels

julia> CUDA.allowscalar(false);

julia> mul(A, B, C) = @tullio C[k] = A[k] * B[k];  # run macro after loading packages

julia> d_a, d_b, d_c = cu.((a, b, c));

julia> @btime mul($d_a, $d_b, $d_c);
  min 34.842 μs, mean 48.070 μs (86 allocations, 3.33 KiB)

julia> @btime CUDA.@sync mul($d_a, $d_b, $d_c);
  min 43.031 μs, mean 108.234 μs (86 allocations, 3.33 KiB)

julia> @btime CUDA.@sync $d_c .= $d_a .* $d_b; 
  min 17.036 μs, mean 176.596 μs (7 allocations, 480 bytes)

julia> @btime mul($a, $b, $c);  # CPU
  min 122.099 μs, mean 123.461 μs (2 allocations, 32 bytes)

julia> c ≈ collect(d_c)
true
3 Likes