[blog post] Implement your own AD with Julia in ONE day

Roger-luo · October 23, 2018, 5:55pm

Yes, if you just write

Zygote.@grad Base.:(*)(lhs::Matrix, rhs::Matrix) = gemm('N', 'N', lhs, rhs), grad -> (grad * transpose(rhs), transpose(lhs) * grad)

You will get an gemm not defined error, which means this is called.

This is probably not the gradient expression issue, but might be a Zygote bug somewhere.

improbable22 · October 23, 2018, 6:03pm

OK, so much for that theory! Zygote is clearly above my pay grade…

xiaodai · October 23, 2018, 11:04pm

I am really about a corporate practitioner who’s will just make use of whatever is already there and they don’t have the ability to come up with anything new. E.g. I know “senior data scientist” that don’t know how to fit a logistic regression, so I think you think to highly of these practitioners I am thinking about. But that’s an extreme case, a practitioner I am thinking about is someone like me who can make use of a LSTM, ResBlock, but wouldn’t at any point try to invent something new, I just know how to throw a few layers together and I roughly know what each layer is meant to do kinda practitioner.

Roger-luo · October 23, 2018, 11:14pm

Well, maybe AutoML? I don’t think throw a few layers together can make a paper, but people do need that in product. And that is actually why we are still supporting Python, and things like PyTorch is porting caffe2 for product, tensorflow is supporting training on mobiles, and there is even new framework written in C/C++ for ARMs etc.

I personally think, if someone are using Python, without Numba, C/C++, Cython, etc. then Python is fine and probably you won’t have much motivation in switching the language unless you want to use fresher techs.

And it is always fine to write in you favored language and framework, there is ONNX for model and many other formats for parameters.

And since it is way easier to implement new things in Julia, I believe at some point, there will be plenty of existing models here and then engineers without ML background will be able to use them in product. And that’s when everyone moves to Julia.

Tamas_Papp · October 24, 2018, 5:53am

I am not sure that catering to the demands of someone like this is a priority for many Julia packages at the moment. I don’t see a problem with this; a lot of commercial software exists precisely to fill in this niche.

improbable22 · October 24, 2018, 1:28pm

One further comment on this benchmark: if the tr gradient returns I rather than a full matrix, then Zygote is much quicker.

using Zygote, Flux, LinearAlgebra, BenchmarkTools

f(x) = tr(x * x')
r = rand(30,30);
@btime Tracker.gradient(f,$r)[1][1:3] # 19.662 μs (128 allocations: 70.44 KiB)

Zygote.@grad LinearAlgebra.tr(x) = tr(x), Δ-> (Δ * I ,)
@btime Zygote.gradient(f,$r)[1][1:3] # 4.239 μs (7 allocations: 28.89 KiB)

Zygote.@grad LinearAlgebra.tr(x) = tr(x), Δ-> (Δ * Matrix(I, size(x)), )
@btime Zygote.gradient(f,$r)[1][1:3] # 37.053 μs (12 allocations: 37.27 KiB)

DoktorMike · October 24, 2018, 8:07pm

Impressive and elegant blog post. Thanks for writing it!
A small note: you have a minor typo in your equation 3. I believe you meant for the 2 to be subscripted. It’s in the Compute-graph section. Thus changing y1=z2x to y_1=z_2 x. Similarly you have an issue in y2=b\cdot x which I believe should be y_2=b\cdot x.

Roger-luo · October 24, 2018, 8:41pm

Well, I does not contain the shape info, which can cause error. And to be fair, AutoGrad.jl use grad * Matrix{T}(I, size(output)) as well.

I think this is not the main problem. I’m still trying to understand how Zygote’s slow down happens, which was not what I expected.

DoktorMike · November 3, 2018, 2:59pm

Hey again,

Just playing around a bit with your nice AD package . I’m trying a simple affine transformation for a small MLP. The following code leads to an error. I cannot really understand fully what’s going wrong but my guess would be that it doesn’t have a registered method for dealing with a matrix vector multiplication. Am I right about that or is there something else going on here?

using YAAD

function yaadderivtest(nhidden=100_000, ninput=1_000)
    W = Variable(rand(nhidden, ninput) .- 0.5)
    b = Variable(rand(nhidden) .- 0.5)
    x = rand(ninput)
    f(W, b) = sum(W*x .+ b)
    y = tr(f(W, b))
    backward(y)
end

yaadderivtest()

Roger-luo · November 3, 2018, 3:42pm

I think I implemented matrix multiplication, but sum is not implemented. Maybe you want to show me the error msg in the issue.

Best

Roger

DoktorMike · November 3, 2018, 3:45pm

Sure, the error is this:

ERROR: MethodError: no method matching Array{T,2} where T(::LinearAlgebra.UniformScaling{Bool}, ::Tuple{})
Closest candidates are:
  Array{T,2} where T(::LinearAlgebra.UniformScaling, ::Integer, ::Integer) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/uniformscaling.jl:325
  Array{T,2} where T(::LinearAlgebra.UniformScaling, ::Tuple{Int64,Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/uniformscaling.jl:326
Stacktrace:
 [1] gradient(::typeof(tr), ::Float64, ::Float64, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/operators/linalg.jl:4
 [2] #gradient#3(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::YAAD.Trait.Method{typeof(tr)}, ::Float64, ::Float64, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:297
 [3] gradient(::YAAD.Trait.Method{typeof(tr)}, ::Float64, ::Float64, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:297
 [4] gradient(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:290
 [5] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::Function, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:247
 [6] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::YAAD.Trait.Method{typeof(tr)}, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:240
 [7] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}, ::Float64) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:239
 [8] backward(::CachedNode{Node{YAAD.Trait.Method{typeof(tr)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(sum)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(Base.Broadcast.materialize)},Tuple{CachedNode{Node{YAAD.Trait.Broadcasted{typeof(+)},Tuple{CachedNode{Node{YAAD.Trait.Method{typeof(*)},Tuple{Variable{Array{Float64,2}},Array{Float64,1}},NamedTuple{(),Tuple{}}},Array{Float64,1}},Variable{Array{Float64,1}}},NamedTuple{(),Tuple{}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Array{Float64,1},Array{Float64,1}}}}},NamedTuple{(),Tuple{}}},Array{Float64,1}}},NamedTuple{(),Tuple{}}},Float64}},NamedTuple{(),Tuple{}}},Float64}) at /home/michael/.julia/packages/YAAD/xLJz2/src/comput_graph.jl:219
 [9] yaadderivtest(::Int64, ::Int64) at ./REPL[2]:7
 [10] yaadderivtest() at ./REPL[2]:2
 [11] top-level scope at none:0

Roger-luo · November 3, 2018, 4:04pm

Yes, this is because sum is not implemented. (See the backtrace). I’m still exploring to see if there’s an elegant way to implement all iterators’ backward propagation. Which is a problem when this is not a tracked type in AbtracyArray (while Tracker has different tracked type).

But if you want to use sum, simply register this function:

sum(x::AbstractNode)=register(sum, x)

gradient(::typeof(sum), grad:: Number, output, x::AbstractArray)=(fill!(similar(x), grad), )

Actually, I’m learning Cassette.jl and Zygote.jl recently, I believe that’ll be the future of ad.

Bests

Roger

DoktorMike · November 3, 2018, 4:08pm

Thanks for the reply. Indeed I’m trying Zygote.jl as well. And have my hopes up for Capstan.jl.

Roger-luo · November 3, 2018, 4:26pm

Yeah, I might try how simple can it be to implement your own source 2source AD with cassette next time. lol.

Topic		Replies	Views
Automatic Differentiation (AD) in Julia vs. Python (or PyTorch) Machine Learning autodiff	14	1475	January 16, 2025
Automatic Differentiation (AD) in Python compared to Julia and AD Basics Machine Learning	6	2225	October 22, 2021
PyTorch and Julia Machine Learning	12	15311	March 27, 2019
[blog post] Implement your own TOP Performance quantum circuit emulator in ONE day! Community blog-post	0	486	December 21, 2020
State of AD in 2024 General Usage machine-learning , autodiff	4	2169	April 6, 2024

[blog post] Implement your own AD with Julia in ONE day

Related topics