new to CUDA

`mapsilces`

seems to run very slow.
`reduce`

is very fast but requires an accumulator (e.g. +) instead of a function (e.g. sum).

I’d like to apply the `linear_interpolation`

function to slices of a matrix (or tensor).
Can a GPU do this faster than CPU?

Would I have to re-write the `linear_interpolation`

function as a custom Kernel?
I’d rather not do this

`mapslices`

is known to be slow, at least on CPU

opened 03:52PM - 20 Apr 18 UTC

performance
arrays

Consider the following piece of code:
```
n=10^6;
X = randn(1,n);
V(x) = 0.5… * dot(x,x);
f(V, X) = [ V((@view X[:,n])) for n = 1:size(X,2) ]
```
Timing on this reveals:
```
julia> @time mapslices(V, X, 1);
1.177690 seconds (12.00 M allocations: 267.024 MiB, 6.91% gc time)
julia> @time f(V,X);
0.047745 seconds (1.00 M allocations: 53.406 MiB, 5.58% gc time)
```
which is substantially different.
Note that in this simple example, my data could be a column vector; however, this is a surrogate for problems where I have time series data and ``X`` is d x n with 1< d << n.

1 Like

maleadt
September 8, 2022, 6:18am
#3
`*slices`

(mapslices, eachslice) functions are not supported, because they take a function that transforms a slice and as such cannot be compiled into a single kernel. Instead, we need to call the function N times (for every slice) which results in many small kernels being launched.

1 Like