Query.jl and CUDA

I am playing with Query.jl and it works fine with CPU code, I even measured some usecases and it works faster than using “classic” for loops code.
What I don’t know is, if it has any support for CUDA or any kind of guidelines as I see the biggest usage in kernel code. Does anyone have experience in using Query.jl in CUDA kernel code?

To be specific, I would like to filter the cycle that goes across the threads, for example something like this:

for i in index:stride:l |> @filter(_ != some_id)
...

Instead of:

for i in index:stride:l
    if i == some_id
        continue
    end
...