Animated plot and CUDA computations with Makie

Hello everyone ,
I’m learning GPU programming with Julia and I want to generate a procedurally animated picture inside a window with CUDA , for instance I’m porting the GPU Ripple example from Chapter 5 “Cuda by example” book.
I managed to make it work with Makie but rendering doesn’t look performing really well and even worse I have to transfer the resulting array from GPU to CPU and then update the image.
Is there a better way to do that with Makie or any other library in Julia ? I tried to look into GLFW, GLMakie etc but I had difficulty to understand how those libraries work.

module GPU_RIPPLE

using CUDA
using Makie
using AbstractPlotting
using Colors

const DIM = 1024
const PI = 3.1415926535897932

function kernel(image, ticks)
    # map from threadIdx/blockIdx to pixel position
    x = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    y = threadIdx().y + (blockIdx().y - 1) * blockDim().y
    offset = x + (y-1) * blockDim().x * gridDim().x

    # calculate value at the position
    fx = x - DIM/2
    fy = y - DIM/2
    d = CUDA.sqrt(fx * fx + fy * fy)
    grey = (128.0 + 127.0 * CUDA.cos(d/10.0 - ticks/7.0) / (d/10.0 + 1.0))/255
    image[offset] = grey
    return nothing
end

function main()

    imoutput = zeros(RGB{Float32}, DIM, DIM)
    
    img_node = Node(imoutput)

    num_threads = 16
    num_blocks = ceil(Int, DIM/num_threads)
    blocks = (num_blocks, num_blocks)
    threads = (num_threads, num_threads)

    scene = Scene(background_color=:black)
    scene = image!(img_node, show_axis=false)
    display(scene)

    ticks = 1
    @async while isopen(scene)
        d_imoutput = CuArray(imoutput)
        CUDA.@sync begin
            @cuda blocks=blocks threads=threads kernel(d_imoutput, ticks)
        end
        img_node[] = Array(d_imoutput)
        ticks +=1
        sleep(1/300)
    end
    
end

end

There’s no reason to allocate imoutput for every iteration. It’s also performing a copy, where you should just construct a empty CuArray. But ultimately there should be some integration between the plotting library and CUDA.jl so that the array can be used directly, and that is currently not implemented.

Thank you very much for your answer @maleadt