Query.jl and CUDA

hovi · February 7, 2023, 1:07pm

I am playing with Query.jl and it works fine with CPU code, I even measured some usecases and it works faster than using “classic” for loops code.
What I don’t know is, if it has any support for CUDA or any kind of guidelines as I see the biggest usage in kernel code. Does anyone have experience in using Query.jl in CUDA kernel code?

To be specific, I would like to filter the cycle that goes across the threads, for example something like this:

for i in index:stride:l |> @filter(_ != some_id)
...

Instead of:

for i in index:stride:l
    if i == some_id
        continue
    end
...

Topic		Replies	Views
CUDA.jl kernel is half as fast as c++ Kernel Performance cuda , cudajl	11	1558	September 26, 2022
CUDA.jl 2.0: Per-thread streams, Float16, CUSPARSE clean-up Package Announcements	2	802	October 2, 2020
CUDA.jl - Multiple Threads to Initiate Same CUDA Algorithm GPU parallel , multithreading , cuda , concurrency	3	1742	April 26, 2022
What are the "limitations" of CUDA.jl relative to CPU code and where are they rooted GPU question	1	1889	January 9, 2022
Interpolation.jl + Cuda.jl? Possible? Performance package	2	902	December 15, 2021

Query.jl and CUDA

Related topics