How can i repeat a CuArray in a fast way? I have a 3 vector that is supposed to be subtracted from every row in a n x 3 matrix. The CPU code just transposes and repeats it, but this takes up to 3 seconds on GPU and takes up 1.7 GB. for a 100000x3 matrix while only taking 12 ms on cpu and allocating 22.9 mb.
offset = Array([0.0, 0.0, 0.0])
cuda_offset = CuArray([0.0, 0.0, 0.0])
sx = 1000000
@btime repeat(transpose(cuda_offset), outer = [sx,1]) # takes 3.7 s
@btime repeat(transpose(offset), outer = [sx,1]) # takes 12 ms
Indexing every row in a loop is even slower (20s on gpu and 94.127ms on cpu).
Is there any other way to subtract a vector form every row of a matrix in a fast way, without writing a new kernel (Be differentiable with Zygote)?
Use broadcasting? e.g.
A .- [1 2 3] subtracts
[1 2 3] from each row of an m \times 3 matrix
Well ill be damned. Thanks, that solves my problem at this time. I think i need to read up a little more on that. How would it work if i would have it apply to the middle dimension of my mx3xn array A? What is the default behavior for the . operator, when the arrays don’t have the same dimensions? I assumed it was only for arrays of the same size.
Nvm found the docs:
so Julia provides
broadcast, which expands singleton dimensions in array arguments to match the corresponding dimension in the other array without using extra memory, and applies the given function elementwise:
A.-[ 1;; 2;;3] # 3-element array for dim 2 (2 ';') ;; between elements
A.-[ 1;;;2] # 2-element array for dim 3 (3';') ;;; between elements