Let me show you an example where Yota and Zygote behave differently:
using Zygote
using BenchmarkTools
foo(A) = sum([x + 1 for x in A])
A = rand(10_000);
@btime foo'(A);
# ==> 106.426 μs (45 allocations: 939.03 KiB)
Zygote does a good job differentiating through array comprehension, but it hides a performance issue - the same function can be written much more efficiently:
foo2(A) = sum(A .+ 1)
@btime foo2'(A)
# ==> 7.989 μs (3 allocations: 78.23 KiB)
Yota intentionally doesn’t support things like array comprehensions:
using Yota
using BenchmarkTools
foo(A) = sum([x + 1 for x in A])
A = rand(10_000);
@btime grad(foo, A)
# ==> ERROR: MethodError: no method matching var"#1#2"()
# ==> ...
# ==> [3] foo at ./REPL[6]:1 [inlined]
So you have to look at foo()
and realize this is not what Yota expects. You go and rewrite it to foo2()
, which works fine:
foo2(A) = sum(A .+ 1)
@btime grad(foo2, A);
# ==> 14.151 μs (22 allocations: 157.02 KiB)
(note that here Yota is slower than Zygote due to constant overhead which is negligible in real ML models)
Surely, it would be better for both libraries to show warnings or even rewrite such cases automatically, but we are not there yet.
so Zygote shouldn’t have much issue with performance, barring cases where its harder for Julia to actually optimise the differentiating code
Note that putting restrictions on supported code opens the doors to optimizations beyond what the compiler can do. Avalon/Yota expect ML models to be pure computational graphs without side effects. Such graphs can be transformed in many different ways, e.g. by eliminating common subgraphs or replacing known primitives with their in-place versions, etc. As far as I know, doing the same thing for pullback-based AD is a way harder.