That’s possible, but now that 0.2.0 has been tagged and released, those docs won’t change. All I did was trigger the build so that people would see something that corresponds to the stable version, however incomplete it is
I don’t quite understand this point. Are you saying if somehow primal and dual evaluation hit different code path? Because isn’t that literally what operator overloading AD does? namely you don’t push Dual
through things that don’t lie on path from input to output
Yes, I kinda misspoke.
it has that in-common with operator overloading AD, and not in-common with Zygote.
Great news! Could you explain a little how Diffractor does things differently than Enzyme? Superficially, it sounded to me like there would be similarities, but I might very well be mistaken.
I believe right now most of the work on enzyme has been on reverse mode, where as Diffractor lately forwards mode. Though I think Enzyme has forwards as well.
Roughly speaking the core ideas are pretty similar. It’s source code transformation.
But Enzyme works at the LLVM level and Diffractor at IR level.
That makes certain optimizations and supports easier or harder.
A particular thing they both have in common is the Optimize-AD-Optimize idea.
Yes. Actually that’s a big advantage over ForwardDiff.
Because ForwardDiff is operator overloading at the scalar level, it can’t hit BLAS because the memory layout is Array of Structs (where the struct is the dual), our memory layout is struct of Arrays.
It might already be working right now, if not we just need to write the frule
s
- Do you anticipate it is relatively easy to hook into things like GitHub - JuliaDiff/SparseDiffTools.jl: Fast jacobian computation through sparsity exploitation and matrix coloring ?
For that example in particular the point is moot.
The stuff that I mentioned as coming for very fast jacobians will obsolete that.
At least @Keno and @ChrisRackauckas are sure it will, I am not quite 100% convinced, but am 90%.
We will be very aggressively taking advantage of sparsity structure (and I have that working on a local branch, but its just a bit mixed up in other things and blocked on finalizing other compile pieces to make it reliable).
More generally for taking advantage of things that work with ForwardDiff, it should be easy enough.
Ideally everything would use AbstractDifferentiation,jl and that that would expose enough APIs for all that.
Though we are away from that right now I think.
- Is this as reliant on ChainRules definitions as reverse-mode was or do lower-level primitives tend to work bertter? If so, do you feel there is a critical mass of forward-rules (given that many people may have only written reverse-mode rules without a forward-mode to test with).
Lower level rules work better in forwards mode, at least once we have mutation support sorted.
Because there is no more need to write rules for working around that kinda thing.
Rules are much more just for “I know something smart that will let me do this better”
Thanks for the great work and the communication @oxinabox!
Is Diffractor already doing well at higher-order derivatives? Do you know of any (hopefully simple-ish) examples where Diffractor is better able to handle higher-order derivatives than say Zygote or ForwardDiff? Or if such examples don’t exist yet, are there any good benchmarks to keep my eyes on?
Only been looking at forward mode lately.
I don’t have any examples at hand.
But I can say in the project we use it for internally,
we have derivatives very nested.
the problem has:
- one set of derivatives in the function definition
- Then we take some taylor-derivatives of order 1-7 as part of preparing it for the solvers
- Then we take another set of derivates to compute the jacobian.
That example though is all on flat scalar-ish code. Fairly simple (No arrays, some tuples and structs, some limited control flow)
The magic here that makes it work, is to optimize the code in between each call to AD.
That way you never have to AD through the diffractor code (which is complicated code),
just though simple code which is basically inlined frule
s.
I haven’t run any benchmark against anything else for nesting.
Color differentiation of Jacobians can be very suboptimal based on sparsity patterns. As part of my course I always had students derive examples of this to prove that forward and reverse mode are both not optimal algorithms in the wrong sparsity context (and can be asymptotically as bad as doing dense operations). Right now the best out there in Julia is GitHub - brianguenter/FastDifferentiation.jl: Fast derivative evaluation for this. I think the only place where Diffractor will turn out to be useful is in sparse forward-mode Jacobians, using a similar-ish algorithm to avoid the issues of coloring on direct calculations of Jacobians.
Remember, Zygote is a reverse-mode AD. This Diffractor does not have a reverse-mode. And the real question is performance against TaylorDiff.jl.
There’s trade-offs. Operator-overloading techniques are more ergonomic in the sense that they work with ::Float64
things. However, they are less ergonomic in the sense that it’s much more difficult to make use of mutating caches (i.e. PreallocationTools.jl kinds of tools), which greatly inhibits the performance right now in actual use cases. It’s exciting but at this point SciML cannot make use of Diffractor for example to improve the ergonomics of the automatic forward-mode Jacobians for users without taking a pretty big performance hit and removing the ergonomics allowing non-allocating code with Jacobians.
Enzyme also has this issue with no PreallocationTools.jl-like operations and we’ll need to find some way to hook into the AD’s lower level to work around it.
Enzyme indeed also has forward mode, that is actually more stable than Enzyme’s reverse mode.