What's the state of Automatic Differentiation in Julia January 2023?

Note Enzyme has seen some pretty major improvements since January.

Enzyme v0.11 fixed GC support, dynamic dispatch handling, a rule system, and added linear algebra support. The linear algebra is done via fallbacks to differentiating the kernels, which would be better handled with a high level rule to continue using BLAS, but it at least works.

As an example, here’s a fairly dynamic code that works fine:

using Enzyme

A = Any[2.0 3.0
     2.0 4.0]
function f(x::Array{Float64}, y::Array{Float64})
    y[1] = (A * x)[1]
    return nothing
end

x  = [2.0, 2.0]
bx = [0.0, 0.0]
y  = [0.0]
by = [1.0];

Enzyme.autodiff(Reverse, f, Duplicated(x, bx), Duplicated(y, by)); # Works fine!

Forward and reverse. And now on main (unreleased), here’s an example showing it’s working with a globally defined Lux neural network (values compared to Zygote):

using Enzyme

x  = [2.0, 2.0]
bx = [0.0, 0.0]
y  = [0.0,0.0]

using ComponentArrays, Lux, Random

rng = Random.default_rng()
Random.seed!(rng,100)
dudt2 = Lux.Chain(x -> x.^3,
                  Lux.Dense(2, 50, tanh),
                  Lux.Dense(50, 2))
p, st = Lux.setup(rng, dudt2)

function f(x::Array{Float64}, y::Array{Float64})
    y .= dudt2(x, p, st)[1]
    return nothing
end

Enzyme.autodiff(Reverse, f, Duplicated(x, bx), Duplicated(y, ones(2)))

function f2(x::Array{Float64})
    dudt2(x, p, st)[1]
end

using Zygote
bx2 = Zygote.pullback(f2, x)[2](ones(2))[1]
bx

@show bx - bx2

#=
2-element Vector{Float64}:
 -9.992007221626409e-16
 -1.7763568394002505e-15
=#

It’s of course not perfect yet and since the rule system has just landed it needs people to start writing some rules (especially for things like NNLib kernels for full Flux support), but IMO it passed many of its major usability milestones and now needs the community to start helping it get the required rules.

And the forward mode is very robust from what I can tell. I haven’t had any issues I’ve ran into with it, other than it’s not clear how to do the equivalent of PreallocationTools.jl

6 Likes

Lots of great progress, but like, to be clear Tamas’ example still corrupts memory. I think what he said about always testing the results agaisnt another system is right, though I’d prefer to use finite differences for such validation.

Like I mentioned on that Github issue, if you turn on runtime activity Enzyme.API.runtimeActivity!(true), those primal data corruptions should not occur.

I intend to fix these, but in the interim, I’ve been chasing other features (like GC/type unstability, and also defending my thesis on Monday :stuck_out_tongue: ).

@Tamas_Papp, perhaps just turn that flag on by default?

7 Likes

I’ll also caution, there are still some type unstable and GC-specific calls Enzyme doesn’t yet handle, but most of the common ones should now be covered.

4 Likes

Yeah not perfect and one should still be double checking their results, but this is a massive step forward and it seems “most” codes I throw at it work these days (on the unreleased main). I’m looking for more bugs :sweat_smile: but it’s doing at lot better than before.

1 Like

Apropos this, here is something about “quest issues” that @jar1 (not sure the ids are for the same user) just dropped on slack for marshalling contributors. At least in some cases it was very successful.

I should add that the use case was very similar: These are exactly the steps you need to take to improve this error message, even if you have never contributed to OSS. Or course for AD rules the experience needed to enter is a bit higher.

3 Likes

I really would like to second the remark on ForwardDiff which seems to be the least “cool” in the bunch. Kudos to the developers! We use it quite heavily and only once ran into this issue which we could resolve by just using replacing out workaround for log(1+x) by log1p.

2 Likes

On my M1 and Julia 1.9 rc3 with Enzyme#main, I am getting a difference between Enzyme and Zygote gradients of

2-element Vector{Float64}:
 0.15407736749039058
 0.22841827115822566

by running the same code above.

Edit: also ] test Enzyme causes a segfault. Is the M1 not supported by Enzyme?

Like I said in the other thread (Use Enzyme in flux - #7 by wsmoses) the code Chris provided is not a good way to call it because of the type unstable closure capture of dudt, p, and st. These should be passed explicitly as Const parameters for performance and other reasons.

M1 indeed is not officially supported, partially because we don’t have a dev machine or CI for it. If you can isolate and post which test fails on GitHub we can try to look into it.

I can concur that it’s giving something odd on my M2.

Thanks William. I tried Windows and got the exact same difference in gradients. So it’s not an M1 thing. Enzyme tests pass on Windows. I suppose that example is not supposed to work at all?

On my Intel CPU, the example just throws errors.

julia> Enzyme.autodiff(Reverse, f, Duplicated(x, bx), Duplicated(y, ones(2)))
warning: didn't implement memmove, using memcpy as fallback which can result in errors
ERROR: Enzyme execution failed.
Enzyme: not yet implemented in reverse mode, jl_getfield
Stacktrace:
 [1] getindex
   @ ./tuple.jl:29
 [2] getindex
   @ ./tuple.jl:0

Stacktrace:
 [1] throwerr(cstr::Cstring)
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/YBQJk/src/compiler.jl:2536
 [2] macro expansion
   @ ~/.julia/packages/Enzyme/YBQJk/src/compiler.jl:8646 [inlined]
 [3] enzyme_call
   @ ~/.julia/packages/Enzyme/YBQJk/src/compiler.jl:8338 [inlined]
 [4] CombinedAdjointThunk
   @ ~/.julia/packages/Enzyme/YBQJk/src/compiler.jl:8301 [inlined]
 [5] autodiff
   @ ~/.julia/packages/Enzyme/YBQJk/src/Enzyme.jl:205 [inlined]
 [6] autodiff
   @ ~/.julia/packages/Enzyme/YBQJk/src/Enzyme.jl:228 [inlined]
 [7] autodiff(::EnzymeCore.ReverseMode{false}, ::typeof(f), ::Duplicated{Vector{Float64}}, ::Duplicated{Vector{Float64}})
   @ Enzyme ~/.julia/packages/Enzyme/YBQJk/src/Enzyme.jl:214
 [8] top-level scope
   @ REPL[12]:1

This is common across Julia 1.8.5, 1.9-rc3, and master.
This was with Enzyme 0.11.1. I also tried 0.11.0 on 1.9-rc3.

Well @ChrisRackauckas did say he was using the unreleased main branch of Enzyme, not any of the registered versions.

:man_facepalming:
Now I get

julia> @show bx - bx2
bx - bx2 = [0.1540773674903897, 0.22841827115822566]
2-element Vector{Float64}:
 0.1540773674903897
 0.22841827115822566

julia> versioninfo()
Julia Version 1.9.0-rc3
Commit 1853b90328 (2023-04-26 15:51 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: 28 × Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)

@ChrisRackauckas we seem to all be getting the exact same non-zero difference on MacOS, Windows and Linux. Which OS did you use to run the above example :sweat_smile:?

For CI I’m happy to help you set up Cirrus on M1, if that helps

2 Likes

Peculiar. It was on Windows, and now I cannot recreate it. I do not know the manifest of the time, though I do know that it was in a clean REPL. It is quite mysterious to me :sweat_smile:. Maybe a difference in Julia version?

Probably not. I tested on Julia 1.9 rc3. Chris Elrod tests 3 Julia versions. It’s most likely dependencies but I am guessing :person_shrugging:

Anyways, I think Enzyme will be well served by a simpler AbstractDifferentiation API for non-mutating functions. Probably worth giving it another go soon-ish.

1 Like

Again, I think Enzyme is really promising, blazing fast when it works, and probably the best way forward. I really appreciate all the great work developers have put into it. It is just that, at the moment, it is not stable enough for those who are not prepared to invest extra effort in checking and debugging (again, for nontrivial examples).

For my particular issue, yes, the workaround works. But I ended up rewriting the function in a C-style approach: instead of higher order functions, just loop over stuff and increment a float. This works fine.

My other meta-gripe with Enzyme is that it is tricky to understand the error messages. Whenever something goes wrong, hundreds of lines of IR are dumped on the user. Again, I understand that mapping that back to Julia code is tricky, but in this form it is difficult to isolate errors and produce MWEs.

I think the global capture issue is causing wrong gradients though so my initial tests didn’t pass.

Edit: no segfaults though which is an improvement from last time I played with Enzyme for AbstractDifferentiation

3 Likes