On AD of matrix, from Tracker to Zygote

Since the Flux has moved to Zygote, I tried to modify some previous codes.
The motivation is to calculate second-order derivatives of a matrix, but I met the following errors.

With the setup

x0 = 0.0; x1 = 2.0
y0 = 0.0; y1 = 1.0
nx = 23; ny = 11

xVec = Vector(range(x0, stop=x1, length=nx))
xMesh = repeat(reshape(xVec, 1, :), ny, 1)
yVec = Vector(range(y0, stop=y1, length=ny))
yMesh = repeat(yVec, 1, nx)

xMesh1D = reshape(xMesh, (1, :))
yMesh1D = reshape(yMesh, (1, :))
mesh = cat(xMesh1D, yMesh1D; dims=1)
X = deepcopy(mesh)

Y = zeros(1, nx*ny);

the previous codes are

using Flux
import Tracker

m = Chain(Dense(2, 20, tanh), Dense(20, 20, tanh), Dense(20, 1))

mat_x(x) = [1.0 0.0] * x
mat_y(x) = [0.0 1.0] * x

u(x) = sin.(π .* mat_x(x) ./ 2.) .* mat_y(x) .+ 
       mat_x(x) .* (2. .- mat_x(x)) .* mat_y(x) .* (1. .- mat_y(x)) .* m(x)

ux(x) = mat_x(Tracker.forward(u, x)[2](1)[1])
uy(x) = mat_y(Tracker.forward(u, x)[2](1)[1])
uxx(x) =  mat_x(Tracker.forward(ux, x)[2](1)[1])
uyy(x) = mat_y(Tracker.forward(uy, x)[2](1)[1])

uxx(X)

and it gives

Tracked 1×253 Array{Float64,2}:
 0.0  -0.266183  -0.451705  -0.562292  …  0.284977  0.160405  -3.02169e-16

However, when I turned to Zygote with,

vx(x) = mat_x(Flux.Zygote.pullback(u, x)[2](1)[1])
vy(x) = mat_y(Flux.Zygote.pullback(u, x)[2](1)[1])
vxx(x) =  mat_x(Flux.Zygote.pullback(vx, x)[2](1)[1])
vyy(x) = mat_y(Flux.Zygote.pullback(vy, x)[2](1)[1])

vxx(X)

it gives

MethodError: no method matching reshape(::Int64, ::Tuple{Int64,Int64})

Also I tried another function definition,

vx(x) = mat_x(Flux.Zygote.pullback(u, x)[2](ones(size(x)))[1])
vy(x) = mat_y(Flux.Zygote.pullback(u, x)[2](ones(size(x)))[1])
vxx(x) =  mat_x(Flux.Zygote.pullback(vx, x)[2](ones(size(x)))[1])
vyy(x) = mat_y(Flux.Zygote.pullback(vy, x)[2](ones(size(x)))[1])

vxx(X)

but now it gives

DimensionMismatch("A has dimensions (2,1) but B has dimensions (2,253)")

Stacktrace:
 [1] gemm_wrapper!(::Array{Float64,2}, ::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,2}, ::LinearAlgebra.MulAddMul{true,true,Bool,Bool}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:545
 [2] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:305 [inlined]
 [3] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:372 [inlined]
 [4] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:203 [inlined]
 [5] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:153 [inlined]
 [6] #1204 at /home/tianbai/.julia/packages/Zygote/oMScO/src/lib/array.jl:246 [inlined]
 [7] #3072#back at /home/tianbai/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49 [inlined]
 [8] mat_x at ./In[3]:6 [inlined]
 [9] vx at ./In[6]:1 [inlined]
 [10] (::typeof(∂(vx)))(::Array{Float64,2}) at /home/tianbai/.julia/packages/Zygote/oMScO/src/compiler/interface2.jl:0
 [11] (::Zygote.var"#28#29"{typeof(∂(vx))})(::Array{Float64,2}) at /home/tianbai/.julia/packages/Zygote/oMScO/src/compiler/interface.jl:38
 [12] vxx(::Array{Float64,2}) at ./In[6]:3
 [13] top-level scope at In[9]:1

Any ideas on how to fix the bug? Thanks :slight_smile:

This looks like it’s mat_x acting on the gradient of u with x, and is called with x=X. But the output isn’t a scalar, so 1 is the wrong shape, which is why gradient complains.

By chance it seems that Tracker’s backward function doesn’t error, while Zygote’s does. But I’d be hesitant to trust whatever it is returning!

julia> u(X)
1×253 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  9.79717e-17  1.10218e-16  1.22465e-16

julia> Tracker.gradient(u, X)
ERROR: Function output is not scalar

julia> Zygote.gradient(u, X)
ERROR: Output should be scalar; gradients are not defined for output [0.0 0.0 0.0 …

julia> Tracker.forward(u, X)[2](1)
([0.0 0.1582320954152618 … -1.5754075920569963 -1.5707963267948966; 0.0 0.0 … 1.2246467991473532e-16 1.2246467991473532e-16] (tracked),)

julia> Zygote.pullback(u, X)[2](1)
ERROR: MethodError: no method matching reshape(::Int64, ::Tuple{Int64,Int64})

Then vxx(x) looks like a second derivative, which is a harder problem.

It reports the same errors for the following definition
u(x)=Tracker.forward(u, x)[2](ones(size(x)))[1]
so maybe there’s a mistake in the original version.

Any ideas on the formal way to calculate 2nd-order derivatives?

Indeed, these three give the same answer, I could have checked first! My point was just that I’m not sure this is guaranteed.

Tracker.forward(u, X)[2](1)[1] # strange input
Tracker.forward(u, X)[2](ones(size(X)))[1]
Zygote.gradient(sum∘u, X)[1]

Am afraid I don’t have useful ideas on 2nd derivatives, my attempt to do the obvious thing leads me to all kinds of errors:

julia> wx(x) = Zygote.gradient(sum∘u, x)[1][1,:]
wx (generic function with 1 method)

julia> wx(X)
253-element Array{Float64,1}:
  0.0                
  0.1582320954152618 
  0.31822739114337645
  0.4791527213152272 
…

julia> Zygote.gradient(sum∘wx, X)
Internal error: encountered unexpected error in runtime:
BoundsError(a=Array{Any, (4,)}[
  Core.Compiler.VarState(typ=Zygote.Pullback{Tuple{typeof(Base.get), Base.Dict{GlobalRef, Any}, GlobalRef, Nothing}, Any}, undef=false),

Zygote does seem to handle some second derivatives, e.g. Zygote.gradient(sin', 0), but I have heard it said that for larger things you ought to be mixinf forward and reverse?