Relation between KernelAbstractions and Adapt

Hi everyone,

I’m trying to move some operations of an existing package on GPU using KernelAbstractions. Since my package defines custom structs, I had to Adapt them, specializing adapt_structure.

Everything works fine in the sense that the kernels are running (and giving the expected result). However, I noticed that the function adapt_structure is called on the kernel arguments (my custom structs) everytime the kernel is called, which is unexpected to me. This is undesirable since some of my adapt_structure are allocating. Also, note that the behavior is not the same if the kernel is called on the cpu or on a CUDA device, and not the same if @Const is used or not (see example below).

I am trying to understand what’s going on. Why is KernelAbstractions calling adapt_structure? What should be the workflow : always pass “host” structures to kernel and leave KernelAbstractions adapt them? That seems costly so I bet it’s not the right way. Should I have non-allocating adapt_structure functions so that these automatic calls are “transparent” (regarding performances)? Is there a way to prevent KernelAbstractions from calling Adapt?

Below is a mwe illustrating the different calls to adapt_structure combined to KA. It is ran on the CPU and on CUDA, and with or without the use of @Const. Thanks in advance for your clarifications.

module MWE
using KernelAbstractions
using Adapt
using CUDA

struct Foo{T}
    data::T
    adapt_count::Int
end

function Adapt.adapt_structure(to, foo::Foo)
    return Foo(adapt(to, foo.data), foo.adapt_count + 1)
end

@kernel function test_no_const_kernel!(x, foo)
    I = @index(Global)
    @print(foo.adapt_count)
    @print("\n")
    x[I] += 1 # dummy action
end

@kernel function test_const_kernel!(x, @Const(foo))
    I = @index(Global)
    @print(foo.adapt_count)
    @print("\n")
    x[I] += 1 # dummy action
end

function mwe(backend)
    @show backend

    # Dummy array of size 1
    x = KernelAbstractions.zeros(backend, Float32, 1)

    # Build kernels
    kernel_no_const = test_no_const_kernel!(backend)
    kernel_const = test_const_kernel!(backend)

    # Adapt and execute without @Const
    println("WITHOUT @Const")
    foo_host = Foo([1, 2, 3], 0)
    @show foo_host.adapt_count # -> cpu = cuda = 0
    println("Explicit call to adapt before kernel")
    foo_device = adapt(backend, foo_host)
    @show foo_device.adapt_count # -> cpu = cuda = 1
    println("Running kernel with 'foo_device'")
    print("foo_device.adapt_count = ")
    kernel_no_const(x, foo_device; ndrange = size(x)) # -> cpu = 1, cuda = 2
    KernelAbstractions.synchronize(backend)

    # Adapt and execute with @Const
    println("WITH @Const")
    foo_host = Foo([1, 2, 3], 0)
    @show foo_host.adapt_count # -> cpu = cuda = 0
    println("Explicit call to adapt before kernel")
    foo_device = adapt(backend, foo_host)
    @show foo_device.adapt_count # -> cpu = cuda = 1
    println("Running kernel with 'foo_device'")
    print("foo_device.adapt_count = ")
    kernel_const(x, foo_device; ndrange = size(x)) # -> cpu = 2, cuda = 3
    KernelAbstractions.synchronize(backend)
end

mwe(CPU())
println()
mwe(get_backend(CUDA.zeros(Float32, 1)))

end

Adapt is used to convert host objects to device counterparts, e.g. convert CuArray to a CuDeviceArray (which is not a mutable type anymore, and leaves out irrelevant fields). In that sense, it is not particularly related to KernelAbstractions.jl, but used by every Julia GPU back-end (although KA.jl uses Adapt as well for other purposes).

Why does that matter to you? Small CPU allocations are generally cheap. In general, you should keep Adapt functions cheap (e.g. not perform any API calls); most users assume that calling adaptors is cheap and simply results in a differently-represented version of an object.