CUDA : vectorization not working with functions of complex argument

Hi,

Before stating my problem, I would like to say that I’m a fully enthusiastic and convinced Julia user, and to thank the developers for their awesome work.

The CUDA module is especially of great help to write fast code executed on the GPU without (most of the time) having to write a single CUDA kernel.
However, working with functions of complex arguments, I noticed most of functions seem not to be implemented, except for basic operations and exp.

Here is a simple code to test the missing functions :

using CUDA
CUDA.allowscalar(false)

T = Complex{Float32} 
# T = Float32
A = CUDA.rand(T, 10);
R1 = similar(A);
E = exp.(A) # works
R = sqrt.(A) # works only with T = Float32
map!(sqrt, R1, A) # works only with T = Float32

I’m familiar with MapReduce frameworks (like Thrust in C++), so I tried to define the following :

function map_kernel!(f, A::CuDeviceVector{T}, B::CuDeviceVector{T}) where T
    n, = size(A)
    i = (blockIdx().x - 1)*blockDim().x + threadIdx().x

    stride = blockDim().x * gridDim().x
    for i in i:stride:n
        A[i] = f(B[i])
    end
    return nothing
end

function my_map!(f, A::AbstractVector{T}, B::AbstractVector{T}) where T
    n, = size(A)
    @cuda threads=32 blocks=n (
        map_kernel!(f,A,B))
    A
end

and now this works :

R2 = similar(A);
my_map!(sqrt, R2, A)

This code is not generic enough but I don’t understand why the generic map! function provided by Julia would not be able to do the same in my simple case.
Also, it would be even better if the broadcasting operator would work, as for Real numbers.

Could somebody explain me if I’m missing something, and why the broadcasted exp function would work on complex arrays when others fail ?

Thank you,
Nicolas

Sorry for a mistake,
blocks should be blocks=Int(round(n/32)) and not n in my_map!.

That’s because we have GPU-specific versions of e.g. sqrt in CUDA.jl, but those definitions are typically pretty narrow. When you call map!, those are automatically converted (so exp becomes CUDA.exp which doesn’t have a version for complex numbers). We do this because for some intrinsics it’s required to use the GPU specific version, because the Base definition is GPU compatible (e.g. because they call into a CPU-specific math library). Nowadays the situation is better, as you noticed e.g. Base.sqrt just works because people like @Oscar_Smith porting many math functions to native Julia.

Anyway, this won’t be relevant in the next version of CUDA.jl, where we use a new mechanism that overrides incompatible base methods with their GPU-specific counterparts without fully replacing the function.

Ok, sorry I did not realize that these functions don’t even exist in native CUDA.
Also my example with map! is confusing because for some reason the Julia map! works with sqrt if I do :

map!(sqrt, R1, A);  # does not work with Complex
f = x -> sqrt(x)
map!(f, R1, A); # works

Yes, that’s expected. There’s a mapping from Base.sqrt to CUDA.sqrt, and using that definition you introduce a new function f that is not mapped.