[blog post] Implement your own AD with Julia in ONE day

Yes, if you just write

Zygote.@grad Base.:(*)(lhs::Matrix, rhs::Matrix) = gemm('N', 'N', lhs, rhs), grad -> (grad * transpose(rhs), transpose(lhs) * grad)

You will get an gemm not defined error, which means this is called.

This is probably not the gradient expression issue, but might be a Zygote bug somewhere.

OK, so much for that theory! Zygote is clearly above my pay gradeā€¦

I am really about a corporate practitioner whoā€™s will just make use of whatever is already there and they donā€™t have the ability to come up with anything new. E.g. I know ā€œsenior data scientistā€ that donā€™t know how to fit a logistic regression, so I think you think to highly of these practitioners I am thinking about. But thatā€™s an extreme case, a practitioner I am thinking about is someone like me who can make use of a LSTM, ResBlock, but wouldnā€™t at any point try to invent something new, I just know how to throw a few layers together and I roughly know what each layer is meant to do kinda practitioner.

Well, maybe AutoML? I donā€™t think throw a few layers together can make a paper, but people do need that in product. And that is actually why we are still supporting Python, and things like PyTorch is porting caffe2 for product, tensorflow is supporting training on mobiles, and there is even new framework written in C/C++ for ARMs etc.

I personally think, if someone are using Python, without Numba, C/C++, Cython, etc. then Python is fine and probably you wonā€™t have much motivation in switching the language unless you want to use fresher techs.

And it is always fine to write in you favored language and framework, there is ONNX for model and many other formats for parameters.

And since it is way easier to implement new things in Julia, I believe at some point, there will be plenty of existing models here and then engineers without ML background will be able to use them in product. And thatā€™s when everyone moves to Julia.

1 Like

I am not sure that catering to the demands of someone like this is a priority for many Julia packages at the moment. I donā€™t see a problem with this; a lot of commercial software exists precisely to fill in this niche.

One further comment on this benchmark: if the tr gradient returns I rather than a full matrix, then Zygote is much quicker.

using Zygote, Flux, LinearAlgebra, BenchmarkTools

f(x) = tr(x * x')
r = rand(30,30);
@btime Tracker.gradient(f,$r)[1][1:3] # 19.662 Ī¼s (128 allocations: 70.44 KiB)

Zygote.@grad LinearAlgebra.tr(x) = tr(x), Ī”-> (Ī” * I ,)
@btime Zygote.gradient(f,$r)[1][1:3] # 4.239 Ī¼s (7 allocations: 28.89 KiB)

Zygote.@grad LinearAlgebra.tr(x) = tr(x), Ī”-> (Ī” * Matrix(I, size(x)), )
@btime Zygote.gradient(f,$r)[1][1:3] # 37.053 Ī¼s (12 allocations: 37.27 KiB)

Impressive and elegant blog post. Thanks for writing it! :slight_smile:
A small note: you have a minor typo in your equation 3. I believe you meant for the 2 to be subscripted. Itā€™s in the Compute-graph section. Thus changing y1=z2x to y_1=z_2 x. Similarly you have an issue in y2=b\cdot x which I believe should be y_2=b\cdot x.

1 Like

Well, I does not contain the shape info, which can cause error. And to be fair, AutoGrad.jl use grad * Matrix{T}(I, size(output)) as well.

I think this is not the main problem. Iā€™m still trying to understand how Zygoteā€™s slow down happens, which was not what I expected.

Hey again,

Just playing around a bit with your nice AD package :slight_smile: . Iā€™m trying a simple affine transformation for a small MLP. The following code leads to an error. I cannot really understand fully whatā€™s going wrong but my guess would be that it doesnā€™t have a registered method for dealing with a matrix vector multiplication. Am I right about that or is there something else going on here?

using YAAD

function yaadderivtest(nhidden=100_000, ninput=1_000)
    W = Variable(rand(nhidden, ninput) .- 0.5)
    b = Variable(rand(nhidden) .- 0.5)
    x = rand(ninput)
    f(W, b) = sum(W*x .+ b)
    y = tr(f(W, b))
    backward(y)
end

yaadderivtest()

I think I implemented matrix multiplication, but sum is not implemented. Maybe you want to show me the error msg in the issue.

Best

Roger

Sure, the error is this:

ERROR: MethodError: no method matching Array{T,2} where T(::LinearAlgebra.UniformScaling{Bool}, ::Tuple{})
Closest candidates are:
  Array{T,2} where T(::LinearAlgebra.UniformScaling, ::Integer, ::Integer) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/uniformscaling.jl:325
  Array{T,2} where T(::LinearAlgebra.UniformScaling, ::Tuple{Int64,Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/uniformscaling.jl:326
Stacktrace:
 [1] gradient(::typeof(tr), ::Float64, ::Float64, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/operators/linalg.jl:4
 [2] #gradient#3(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::YAAD.Trait.Method{typeof(tr)}, ::Float64, ::Float64, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:297
 [3] gradient(::YAAD.Trait.Method{typeof(tr)}, ::Float64, ::Float64, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:297
 [4] gradient(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:290
 [5] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::Function, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:247
 [6] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::YAAD.Trait.Method{typeof(tr)}, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:240
 [7] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:239
 [8] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:219
 [9] yaadderivtest(::Int64, ::Int64) at ./REPL[2]:7
 [10] yaadderivtest() at ./REPL[2]:2
 [11] top-level scope at none:0

Yes, this is because sum is not implemented. (See the backtrace). Iā€™m still exploring to see if thereā€™s an elegant way to implement all iteratorsā€™ backward propagation. Which is a problem when this is not a tracked type in AbtracyArray (while Tracker has different tracked type).

But if you want to use sum, simply register this function:

sum(x::AbstractNode)=register(sum, x)

gradient(::typeof(sum), grad:: Number, output, x::AbstractArray)=(fill!(similar(x), grad), )

Actually, Iā€™m learning Cassette.jl and Zygote.jl recently, I believe thatā€™ll be the future of ad.

Bests

Roger

Thanks for the reply. Indeed Iā€™m trying Zygote.jl as well. And have my hopes up for Capstan.jl. :slight_smile:

Yeah, I might try how simple can it be to implement your own source 2source AD with cassette next time. lol.

1 Like