Mapslices very slow

new to CUDA

mapsilces seems to run very slow.
reduce is very fast but requires an accumulator (e.g. +) instead of a function (e.g. sum).

I’d like to apply the linear_interpolation function to slices of a matrix (or tensor).
Can a GPU do this faster than CPU?

Would I have to re-write the linear_interpolation function as a custom Kernel?
I’d rather not do this

mapslices is known to be slow, at least on CPU

1 Like

*slices (mapslices, eachslice) functions are not supported, because they take a function that transforms a slice and as such cannot be compiled into a single kernel. Instead, we need to call the function N times (for every slice) which results in many small kernels being launched.

1 Like

many thanks:)