Best way to call an OpenCL kernel with arguments of type CLArray

ranocha · June 2, 2018, 1:55pm

I would like to perform some computations on GPUs using CLArrays, since these support standard array syntax. However, I would like to call some existing OpenCL kernels on these CLArrays. Some possibilities that might seem to be natural (at least for me) do not work.

using OpenCL, CLArrays
device, ctx, queue = cl.create_compute_context()
mult_kernel = """
kernel void mult(global float const* a, global float* b)
{
  int gid = get_global_id(0);
  b[gid] = 2*a[gid];
}
"""
p = cl.Program(ctx, source=mult_kernel) |> cl.build!
mult_cl = cl.Kernel(p, "mult")

# using buffers: calling kernels works, but buffers do not support the array interface
a = rand(Float32, 50_000)
a_buff = cl.Buffer(Float32, ctx, (:r, :copy), hostbuf=a)
b_buff = cl.Buffer(Float32, ctx, :rw, length(a))
queue(mult_cl, size(a), nothing, a_buff, b_buff)
b = cl.read(queue, b_buff)
@show norm(b - 2a)

# calling queue with arguments of type CLArray throws an error
d_a = CLArray(a)
d_b = CLArray(similar(a))
queue(mult_cl, size(a), nothing, d_a, d_b)

# using gpu_call with a julia function works, but I would like to call an existing OpenCL kernel
function mult_julia(state, a, b)
  idx = GPUArrays.@linearidx a state
  @inbounds b[idx] = 2*a[idx]
end
gpu_call(mult_julia, d_a, (d_a, d_b))
mapreduce(x->x^2, +, d_b-2*d_a)

# calling gpu_call with an OpenCL kernel throws an error
gpu_call(mult_cl, d_a, (d_a, d_b))

What is the best way to call existing OpenCL kernels with CLArrays as arguments?

ranocha · June 2, 2018, 5:16pm

After browsing the source of CLArrays.jl and OpenCL.jl, I might have found a solution.

# this works
ctx = CLArrays.context(d_a)
queue = CLArrays.global_queue(d_a)
p = cl.Program(ctx, source=mult_kernel) |> cl.build!
mult_cl = cl.Kernel(p, "mult")
queue(mult_cl, size(d_a), nothing, pointer(d_a), pointer(d_b))
mapreduce(x->x^2, +, d_b-2*d_a)

Here, it is essential that the command queue queue and the context ctx are the corresponding ones of the CLArrays. Otherwise, I get CLError(code=-38, CL_INVALID_MEM_OBJECT).

Nevertheless, I would like to know whether this approach works in general and whether there is some better possibility.

sdanisch · June 4, 2018, 9:41am

I made a pr to have this integrated a bit nicer:
https://github.com/JuliaGPU/CLArrays.jl/pull/30

sdanisch · June 4, 2018, 9:48am

Your solution is also fine!

ranocha · June 4, 2018, 9:56am

Thank your very much, Simon!

In your PR, you wrote “Note, that the caching of the functor is not very nice, so for repeated calls, one might want to do this part manually:”
So, for repeated calls, I should call clfunc = CLFunction(f, _args, ctx) only once and then use clfunc(_args, global_size, threads), correct?

sdanisch · June 4, 2018, 9:58am

Yes! Or benchmark the difference Would be interesting to know how bad the dictionary look up really is
Another side effect ist, that I’m not hashing the actual kernel string and instead just the function name + function argument types.

ranocha · June 4, 2018, 9:59am

Okay, thank you again. I will test it when I’m back at a machine running OpenCL…

Topic		Replies	Views
Generic Kernels for CLArrays GPU	1	639	July 16, 2018
Packing structs for OpenCL GPU gpuarrays	1	2154	July 23, 2018
CLBlast, a tuned OpenCL BLAS library GPU gpu , gpuarrays	6	1520	August 9, 2018
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3324	December 20, 2018
CUDA kernel: how to pass an array of functions GPU cuda	7	1486	February 8, 2021

Best way to call an OpenCL kernel with arguments of type CLArray

Related topics