Here’s the expansion, the function orient
is usually just reshape
to put the dimensions along the :
s.
julia> @pretty @reduce D[m,a] := sum(p) C[p,a] + L[p,m]
begin
local dotterel = orient(PermuteDims(C), (*, :, :))
local ant = orient(PermuteDims(L), (:, *, :))
D = dropdims(sum(@__dot__(dotterel + ant), dims = 3), dims = 3)
end
Because reshaping a transposed Array
usually gives something very slow, sometimes orient
makes a copy, and perhaps it is incorrectly making a CPU array. I thought this was fixed in https://github.com/mcabbott/TensorCast.jl/pull/10 but perhaps not. Do you mind making an issue?
Your work-around puts the index to be summed first instead of last, in which case there is no transposing required. The macro @reduce
is unfortunately not smart enough to notice that possibility.
It will always make an intermediate allocation like T
, broadcasting before reducing. There is an option lazy
which uses LazyArrays.jl to avoid this allocation, but I’m not confident it has kept up with changes in that package, nor whether it will work with a CuArray.