 # Flux Zygote Gradient: Understanding Mutating arrays is not supported

Namaste,
I am trying to implement CTC loss in Flux/Zygote. And running into `Error: Mutating arrays is not supported`. So I cooked up the toy example below to gain better understanding.

Given an input `x`, I perform some sort of “triangualtion" operation on it, where another matrix `a` is built based on `x` in a complicated way. The loss is the very last element of `a`. When I try to differentiate the loss, the first two implementations throw an error.

`triangulate1` errors because (I am guessing) of `setindex!` operations?
`triangulate3` works because of matrix operations, and no in-place editing of `a` (the vector)?
Why `triangulate2` fails is beyond me… Can you please explain.
What is the best practice when it comes to complicated losses like this that depend on a complex function of the input?
Also, is it explained somewhere what ‘mutating arrays’ means and why it does not work?

FYI: I wrote a python implementation of CTC in Theano, and AD just worked, I did not have to manually calculate the gradients. I am hoping I can pull that off in Julia too. Thank you.
Cash

``````using Zygote
using LinearAlgebra

function triangulate1(x::Matrix{T}) where T
a = 0*x
a[1, 1] = x[1, 1]
for i in 2:size(x, 2)
for j in 1:i
a[j, i] = sum(a[1:j, i-1]) * x[j, i]
end
end
a
end

function triangulate2(x::Matrix{T}) where T
m, n = size(x)
a = x[:,1] .* [1; zeros(T, m-1)]
for i in 2:n
a = accumulate(+, a) .* [x[1:i, i]; zeros(T, m-i)]
end
a
end

function triangulate3(x::Matrix{T}) where T
m, n = size(x)
a = x[:,1] .* [1; zeros(T, m-1)]
A = LowerTriangular(fill(1, m, m))
for i in 2:n
a = A*a .* [x[1:i, i]; zeros(T, m-i)]
end
a
end

x1 = (1:4).*(1:4)'
# 4×4 Array{Int64,2}:
#  1  2   3   4
#  2  4   6   8
#  3  6   9  12
#  4  8  12  16

triangulate1(x1)
# 4×4 Array{Int64,2}:
#  1  2   6    24
#  0  4  36   336
#  0  0  54  1152
#  0  0   0  1536

triangulate2(x1)
# 4-element Array{Int64,1}:
#    24
#   336
#  1152
#  1536

triangulate3(x1)
# 4-element Array{Int64,1}:
#    24
#   336
#  1152
#  1536

# ERROR: Mutating arrays is not supported

# ERROR: Mutating arrays is not supported
# (This takes much longer time to error out)

# 4×4 Array{Int64,2}:
#  1536  288  32   0
#     0  240  96   0
#     0    0  96   0
#     0    0   0  96

``````

Ideally I want `triangulate2` to work as it seems to be way faster than the other two!
`triangulate3` while being differentiable, is way too slow :

``````# Compile for floats
> x3 = randn(3, 3);  triangulate1(xf3); triangulate2(xf3); triangulate3(xf3);

> xf1k = randn(1000, 1000);
> @time triangulate1(xf1k);
0.338104 seconds (500.50 k allocations: 1.319 GiB, 16.27% gc time)
> @time triangulate2(xf1k);
0.004638 seconds (5.02 k allocations: 31.168 MiB)
> @time triangulate3(xf1k);
8.077139 seconds (7.02 k allocations: 7.481 GiB, 0.80% gc time)
``````

You can do this, but it’s still pretty slow!

``````function triangulate4(x::AbstractMatrix)
a = x[1,1]
for t in 2:size(x,2)
a = cumsum(vcat(a,0)) .* x[1:t,t]
end
a
end

#===== times =====#

julia> @btime triangulate1(x) setup=(x=randn(1000,1000));
312.223 ms (500501 allocations: 1.32 GiB)

julia> @btime triangulate2(x) setup=(x=randn(1000,1000));
4.782 ms (5016 allocations: 31.17 MiB)

julia> @btime triangulate3(x) setup=(x=randn(1000,1000));
3.202 s (7016 allocations: 7.48 GiB)

julia> @btime triangulate4(x) setup=(x=randn(1000,1000));
3.086 ms (21898 allocations: 16.48 MiB)

15.796 s (48150 allocations: 44.73 GiB)

4.376 s (61003 allocations: 14.93 GiB)
``````

Thanks. So the execution of `triangulate4` is as fast as `triangulate2` and it is differentiable.
But the gradient is still slow. So there is a lot of scope here as a manual differentiation would be as fast as the original function, given the nature of the function here.

I was also looking for general pointers regarding such functions, as this is a toy problem that only mimics my more complex problem.

You may want to have a look at this PR: https://github.com/FluxML/Flux.jl/pull/1287

1 Like

I am trying to make it simpler, make Zygote do the differentiating, instead of writing own gradient code. Also trying to understand how RNN like loopy things are handled by Flux/Zygote.

My python implementaion in theano is really simple, no manual gradients. Trying to get that in Julia too…

1 Like

Do you happen to have that Julia equivalent of the linked theano CTC losses and some benchmark numbers for both? I can’t really see the equivalence between the `triangulate*` functions above (mutation + loop heavy, lots of single-element indexing) and `plain_ctc`/`log_ctc` (linalg heavy, `scan` instead of loops, mostly slice indexing). My hunch is a more direct translation of the latter would also be more Zygote-friendly.

I found that the best way to use Zygote is let it figure out the adjoints for 90% the operations, and manually define the rest. In particular, Zygote fits best into a function/non-mutating programming style, and adjoints are sometimes much simpler (or just different) than the underlying operation.

Thank you. Good to know about the 90% guideline.
Do you have a simple example where writing adjoint by our-selves makes more sense than using Zygote’s capabilities?

Eg your example: the adjoint is rather simple and has a very regular structure.