hey! cool! first of all, that was my only experience with GPUs on custom kernels (that worked), so I’m really no expert - whether your example runs faster than mine could be due to better hardware or your better use of it ![]()
the linear indexing thing: good question, i have no idea tbh. it used to be the case that on custom arrays, linear indexing was faster. I don’t think that’s true anymore in modern julia and base arrays (they all implement a fast indexing method, i.e. the linear index) - I might be wrong though. anyways, I don’t actually know whether this is available on the GPU in a custom kernel. so, I wrote the linear index to be sure that this will work in the kernel. notice that this is different from cartesian indexing on a CuArray in your julia code (where it will work). anyway, would be great to see your example to learn a bit more about this!