[blog post] Implement your own AD with Julia in ONE day

Note that Flux has an explicit @grad rule for matrix multiplication, which should be fast, while it looks like Zygote does not (yet?): https://github.com/FluxML/Zygote.jl/blob/master/src/lib/array.jl compare to https://github.com/FluxML/Flux.jl/blob/master/src/tracker/array.jl line 327.

So perhaps it must fall back on some generic for-loop multiplication, and having a go with this Naive matrix multiplication is super slow in Julia 1.0? version gives me a slowdown of almost this magnitude.