Not the most friendly error message, but what you need to do is to load some extra packages:
julia> using KernelAbstractions, CUDAKernels
julia> CUDA.allowscalar(false);
julia> mul(A, B, C) = @tullio C[k] = A[k] * B[k]; # run macro after loading packages
julia> d_a, d_b, d_c = cu.((a, b, c));
julia> @btime mul($d_a, $d_b, $d_c);
min 34.842 μs, mean 48.070 μs (86 allocations, 3.33 KiB)
julia> @btime CUDA.@sync mul($d_a, $d_b, $d_c);
min 43.031 μs, mean 108.234 μs (86 allocations, 3.33 KiB)
julia> @btime CUDA.@sync $d_c .= $d_a .* $d_b;
min 17.036 μs, mean 176.596 μs (7 allocations, 480 bytes)
julia> @btime mul($a, $b, $c); # CPU
min 122.099 μs, mean 123.461 μs (2 allocations, 32 bytes)
julia> c ≈ collect(d_c)
true