Diffractor release

After many moons of waiting, we are finally ready to cut the first public release of Diffractor.jl (v0.2), our Next-generation Automatic Differentiation package. Due to its close coupling with the Julia compiler, it has been difficult/impossible to commit to supporting a released version of Julia, however with the advent of Julia v1.10, we are now ready to release a Diffractor with forward-mode only enabled.

Last-time you heard about Diffractor, it was about a reverse-mode tool. A successor to Zygote, based on better codegen, with better support for nested AD. Through adventure and misadventure, we are releasing today a tool with a focus on forward-mode. We will be back for reverse mode soon, and heading for a glorious mixed-mode future.

What is the current status of Diffractor?

Diffractor is currently supported on Julia v1.10+. While the best performance is generally achieved by running on Julia nightly due to constant compiler improvements, the current release of Diffractor is guaranteed to work on Julia v1.10.
Currently, forward-mode is the only fully-functional mode and is now shipping in some closed source products. It is in a position to compete with ForwardDiff.jl, and with TaylorDiff.jl. It is not as battle-tested as ForwardDiff.jl, but it has several advantages: Primarily, as it is not an operator overloading AD, it frees one from the need to relax type-constants and worry about the types of containers. Furthermore, Like TaylorDiff.jl, it supports Taylor series based computation of higher order derviatives. It directly and efficiently uses ChainRules.jl’s frules, no need for a wrapper macro to import them etc.
Mutation of arrays works, but mutable structs need a bit more work; expect updates on this soon.

Why is this happenning?
Diffractor is a core piece enabling functionality in other projects at JuliaHub, which gives us an excuse to build the compiler and tooling features required to make Diffractor work well.
We strongly believe that Julia has the promise to provide a truly robust and performant automatic differentiation environment.
Diffractor is a step closer to this reality, and a confluence of compiler and ecosystem tooling advances are clearing roadblocks between where we stand, and our vision of what Diffractor can become.

What is next for Diffractor?

  • Mutation support (forward mode only): it should work right now for arrays, but not yet for mutable structs.
  • Really fast jacobians: we have a couple of really cool ideas to make computing jacobians faster, they’re just not ready for public release yet. (If you have a cool problem that needs whole jacobians tell us more)
  • Reviving reverse mode: which will fulfill its initial promise from a few years ago.
  • Mixed mode: for even faster jacobians.

How do I use Diffractor?
As Diffractor requires Julia 1.10-alpha0, you need to be on that — or better yet 1.11-DEV. Yes its raw and alpha, but so is Diffractor. If you like to live on the edge and are ready to play with things with sharp edges, then 1.11-DEV and Diffractor will get you the best performance. If not, hold tight a bit.
With that said we are now doing a release and are committed to SemVer, so we won’t (intentionally) break things for you.

The interface is provided by conforming to AbstractDifferentiation.jl. We don’t have our own public API. We might in the future if we can’t expose things we want to through AbstractDifferentiation, but ideally we wouldn’t.
AbstractDifferentiation exposes all the usual derivative , jacobian etc. See our getting started in the docs.

How can I get involved?
We would love for you to try it out, and open bug reports etc. We don’t make promises for a particular timeline for fixing it, but we will be paying attention.
Adding support for your packages using ChainRules.jl frules where there are smart ways to compute forward derivatives.
We have plenty of open issues, and unless assigned to someone then they are open to be worked on.
They might be lacking details to work on them but you can ask for more if you don’t know, we’re happy to engage the community.
Furthermore a lot of the linked packages also have plenty of open issues to tackle, e.g. AbstractDifferentiation.jl, ChainRules.jl, ChainRulesCore.jl, so helping with those also is very appreciated (and they tend to have lower bars for entry)

Come say hi to Keno or Frames at Juliacon 2023, we’d love to chat with you about Diffractor and all things AD.

81 Likes

where can I read more about this; I guess I always knew you can mix-match forward/reverse and op overload/source code transformation, but where can I learn more about this

You want to find a text on AD.
A list of resources can be found in the ChainRules docs here: FAQ · ChainRules?

In short Forward and Reverse, vs Operator Overloading and Source Code Transformation are independent.
Operator Overloading and Source Code Transformation is implementation detail (though user visible).
Forward and reverse are algorithmic differences

2 Likes

I understand it’s experimental, but the very first line from introduction errors:

julia> using Diffractor: DiffractorForwardBackend
ERROR: UndefVarError: `DiffractorForwardBackend` not defined
1 Like

Can you comment on why this is an advantage in practice? I found in practice with source-to-source in Zygote that you end up needing to worry about the types anyways, so making generic containers able to support duals wasn’t really any worse.

After the difficulties of using things like Zygote over the years, the operator overloading approach of ForwardDiff sure seems less error prone and robust. Or do you think that challenges with correctness of gradients/debugging compilation/etc. were specific to Zygote and/or reverse-mode and forward-mode Diffractor is less likely to have those issues?

(edit: BTW, an answer of “this will be better because I am committed to maintaining it over the next YYY years, and we have the advantage of redesigning things from scratch” is a good enough answer for me. @oxinabox everything you touch is gold :wink:

@oxinabox thanks for the announcement, eager to install and play with it once I get my nightly setup!

That’s probably because it’s a dev documentation (I checked, ForwardBackend appeared after the commit associated with the registration).

As soon as the Diffractor.jl people tag the v0.2.0 release on the repo, a stable documentation will appear which is coherent with what we get when we do

pkg> add Diffractor

In the meantime, for those who want to experiment and follow the docs, I suggest

pkg> add Diffractor#main

EDIT: Wait, I can actually tag it myself! I am one of the Diffractor.jl people :exploding_head:

4 Likes

Even for people who know a bit about AD, the reading list on the Diffractor docs looks absolutely terrifying. Is there a middle ground that would allow us to understand how the package works, at least on a surface level?

9 Likes

Nice! I had kind of thought as forward mode as a solved problem, as the modifications to user code needed to use forwarddiff are relatively minor. Can someone maybe comment more on the relative merits of both approaches, in terms of runtime, compile time, etc? Is diffractor meant to replace forwarddiff, or do both approaches have their strengths?

1 Like

Thanks @gdalle!

Seems like magic — works with structs/functions constrained to Float64 only, unlike ForwardDiff. Also curious to hear what are potential advantages of the latter.

As a both demonstration and a play on words - here is Diffractor working together with optics (:

julia> struct S
       a::Float64
       b::Float64
       end

julia> f(s::S) = log(s.a, s.b)

julia> s = S(2.5, 10)

# want the derivative of f at s wrt s.a
# define the optic to the target variable:
julia> o = @optic _.a  # from Accessors.jl

julia> derivative(DiffractorForwardBackend(), x -> f(set(s, o, x)), o(s))
(-1.0970062262191218,)
4 Likes

Yes, it’s not the worst thing. And often cleans up the code a bit.
But it is nice to not have to worry about it.
primarily is perhaps be strong. Its less of a huge advantage in and of itself, and more the thing which will be most obvious to users.

After the difficulties of using things like Zygote over the years, the operator overloading approach of ForwardDiff sure seems less error prone and robust. Or do you think that challenges with correctness of gradients/debugging compilation/etc. were specific to Zygote and/or reverse-mode and forward-mode Diffractor is less likely to have those issues?

I would argue it is a forwards vs reverse mode thing. Largely orthogonal to Source Code Generation vs Operator Overloading.
Reverse-mode is a lot more complicated, has a lot more things that can go wrong; and Zygote combining it with its very novel source code transform approach adds more.
Diffractor also is less novel since it was implemented after Zygote, so lessons have been learned.

3 Likes

A big thing is that it exposes a lot more opertunities for optimization.
For operator overloading you are basically constrained to do the same thing as the primal value.
Where as we can, for example. do things like detect that a particular code-path does not occur between the input and the output value and then not do AD on that (at least for a particular input).
This stuff particularly starts to matter when computing whole Jacobians, especially sparse.
Though matrix coloring based sparsed jacobains stuff like SparseDiff have similar advantages.
So we should be seeing faster run times.
All in all what its going to do to compile time is complex, the optimizations themselves take time, but they might also remove code that otherwise would have to be compiled.
(because JITing the ForwardDiff.Dual type0

Is diffractor meant to replace forwarddiff, or do both approaches have their strengths?

Let’s see what happens in time.
Give it a few years, run some benchmarks and we will get a good idea of trade-offs.
This is very fresh off the press.

6 Likes

Very helpful, thanks. A few other questions:

That reading list is mostly for reverse mode, the stuff that hasn’t been touched lately.
But I don’t have a good list for forwards mode at all.

I think what @jlperla is touching on is the problem where some source to source ADs will try to diff through/generate code for basically every function, including those it has no business sticking its nose in. If the AD is then not robust enough to handle many language constructs or functions (e.g. Zygote), this results in loud or silent errors. In contrast, most codepaths which do not interact directly with the flow of gradients (i.e. are never called with tracked types) work just fine with an overloading-based AD.

How does Diffractor address this issue? Does it just support enough of the language/stdlib code that it’s unlikely to arise in practice? Is there some analysis which smartly does not apply the AD transform where it’s not required (and if so, do you have pointers to where that code lives)? etc.

3 Likes

How does Diffractor address this issue? Does it just support enough of the language/stdlib code that it’s unlikely to arise in practice?

For 1, yes, it’s much easier to support everything in Forwards Mode.

Is there some analysis which smartly does not apply the AD transform where it’s not required (and if so, do you have pointers to where that code lives)?

For 2:
There is. That’s at the heart of our advantages. Minimising what we AD.

Bits of it live in in the function in Diffractor called visit. Approximately that allows tracing backwards from what outputs are having derivates taken of

Some more of it it’s hopefully getting stabilized soon (in particular related compiler work done) that we can add it to Diffractor.

It’s also much easier in forwards mode.
There is a function overload for del_star_internal that makes sure we don’t do AD if all inputs are ZeroBundles, which means that it doesn’t have a path to an input being differentiated.

2 Likes

Thanks as always @ToucheSir. You have nailed my question.

1 Like

Having a hard time following these discussions due to poor understanding of AD (it’s on my TODO), but does this mean that I can now autodiff functions restricted to Float64 instead of Reals?

2 Likes

Yes. Exactly that.

12 Likes

It seems however that the docs that got built with v0.2 are mostly incomplete?