TensorCast & CUDA

mcabbott · June 16, 2020, 12:16pm

Here’s the expansion, the function orient is usually just reshape to put the dimensions along the :s.

julia> @pretty @reduce D[m,a] := sum(p) C[p,a] + L[p,m]
begin
    local dotterel = orient(PermuteDims(C), (*, :, :))
    local ant = orient(PermuteDims(L), (:, *, :))
    D = dropdims(sum(@__dot__(dotterel + ant), dims = 3), dims = 3)
end

Because reshaping a transposed Array usually gives something very slow, sometimes orient makes a copy, and perhaps it is incorrectly making a CPU array. I thought this was fixed in https://github.com/mcabbott/TensorCast.jl/pull/10 but perhaps not. Do you mind making an issue?

Your work-around puts the index to be summed first instead of last, in which case there is no transposing required. The macro @reduce is unfortunately not smart enough to notice that possibility.

It will always make an intermediate allocation like T, broadcasting before reducing. There is an option lazy which uses LazyArrays.jl to avoid this allocation, but I’m not confident it has kept up with changes in that package, nor whether it will work with a CuArray.

Topic		Replies	Views
CUDA on slices General Usage cuda , broadcasting , cuarrays , tensors , mapslices	5	1352	September 6, 2022
Scalar indexing is disallowed GPU cuda	4	1282	August 15, 2023
How to use OffsetArray with CUDA General Usage cuda , offsetarrays	8	231	August 21, 2024
Indexing adjoints of CuArrays GPU	4	300	April 10, 2023
How to set the diagonal part of a GPU Array GPU	6	1391	November 12, 2018

TensorCast & CUDA

Related topics