new to CUDA
mapsilces
seems to run very slow.
reduce
is very fast but requires an accumulator (e.g. +) instead of a function (e.g. sum).
I’d like to apply the linear_interpolation
function to slices of a matrix (or tensor).
Can a GPU do this faster than CPU?
Would I have to re-write the linear_interpolation
function as a custom Kernel?
I’d rather not do this
mapslices
is known to be slow, at least on CPU
opened 03:52PM - 20 Apr 18 UTC
performance
arrays
Consider the following piece of code:
```
n=10^6;
X = randn(1,n);
V(x) = 0.5… * dot(x,x);
f(V, X) = [ V((@view X[:,n])) for n = 1:size(X,2) ]
```
Timing on this reveals:
```
julia> @time mapslices(V, X, 1);
1.177690 seconds (12.00 M allocations: 267.024 MiB, 6.91% gc time)
julia> @time f(V,X);
0.047745 seconds (1.00 M allocations: 53.406 MiB, 5.58% gc time)
```
which is substantially different.
Note that in this simple example, my data could be a column vector; however, this is a surrogate for problems where I have time series data and ``X`` is d x n with 1< d << n.
1 Like
maleadt
September 8, 2022, 6:18am
3
*slices
(mapslices, eachslice) functions are not supported, because they take a function that transforms a slice and as such cannot be compiled into a single kernel. Instead, we need to call the function N times (for every slice) which results in many small kernels being launched.
1 Like