Hi everyone,
I’m trying to move some operations of an existing package on GPU using KernelAbstractions. Since my package defines custom structs, I had to Adapt them, specializing adapt_structure
.
Everything works fine in the sense that the kernels are running (and giving the expected result). However, I noticed that the function adapt_structure
is called on the kernel arguments (my custom structs) everytime the kernel is called, which is unexpected to me. This is undesirable since some of my adapt_structure
are allocating. Also, note that the behavior is not the same if the kernel is called on the cpu or on a CUDA device, and not the same if @Const
is used or not (see example below).
I am trying to understand what’s going on. Why is KernelAbstractions calling adapt_structure
? What should be the workflow : always pass “host” structures to kernel and leave KernelAbstractions adapt them? That seems costly so I bet it’s not the right way. Should I have non-allocating adapt_structure
functions so that these automatic calls are “transparent” (regarding performances)? Is there a way to prevent KernelAbstractions from calling Adapt?
Below is a mwe illustrating the different calls to adapt_structure
combined to KA. It is ran on the CPU and on CUDA, and with or without the use of @Const
. Thanks in advance for your clarifications.
module MWE
using KernelAbstractions
using Adapt
using CUDA
struct Foo{T}
data::T
adapt_count::Int
end
function Adapt.adapt_structure(to, foo::Foo)
return Foo(adapt(to, foo.data), foo.adapt_count + 1)
end
@kernel function test_no_const_kernel!(x, foo)
I = @index(Global)
@print(foo.adapt_count)
@print("\n")
x[I] += 1 # dummy action
end
@kernel function test_const_kernel!(x, @Const(foo))
I = @index(Global)
@print(foo.adapt_count)
@print("\n")
x[I] += 1 # dummy action
end
function mwe(backend)
@show backend
# Dummy array of size 1
x = KernelAbstractions.zeros(backend, Float32, 1)
# Build kernels
kernel_no_const = test_no_const_kernel!(backend)
kernel_const = test_const_kernel!(backend)
# Adapt and execute without @Const
println("WITHOUT @Const")
foo_host = Foo([1, 2, 3], 0)
@show foo_host.adapt_count # -> cpu = cuda = 0
println("Explicit call to adapt before kernel")
foo_device = adapt(backend, foo_host)
@show foo_device.adapt_count # -> cpu = cuda = 1
println("Running kernel with 'foo_device'")
print("foo_device.adapt_count = ")
kernel_no_const(x, foo_device; ndrange = size(x)) # -> cpu = 1, cuda = 2
KernelAbstractions.synchronize(backend)
# Adapt and execute with @Const
println("WITH @Const")
foo_host = Foo([1, 2, 3], 0)
@show foo_host.adapt_count # -> cpu = cuda = 0
println("Explicit call to adapt before kernel")
foo_device = adapt(backend, foo_host)
@show foo_device.adapt_count # -> cpu = cuda = 1
println("Running kernel with 'foo_device'")
print("foo_device.adapt_count = ")
kernel_const(x, foo_device; ndrange = size(x)) # -> cpu = 2, cuda = 3
KernelAbstractions.synchronize(backend)
end
mwe(CPU())
println()
mwe(get_backend(CUDA.zeros(Float32, 1)))
end