Hi at all,
I am currently struggling with three (competing?) goals on code design. I want code, that: (1) is fast, (b) is differentiable and (c) differentiable by current (and future) AD-libraries (like at least ForwardDiff.jl and ReverseDiff.jl). And last but not least, easy maintainability (which is somehow connected to readable code) would be a thing ![]()
I am facing some issues, thinking about how to make good code with that goals in mind. I am just sharing my thoughts and I am happy about corrections, comments or updates on them ![]()
(1) Using Buffers
One of the core design patterns to make code fast (independent of Julia) is using buffers, that are allocated once and used e.g. during computations inside of a loop. Often, this doesn’t add that much complexity to the code. In Julia, I want static-typed buffers (for performance) so I need to preallocate with a given type (like Float64), however in an AD-run, I need the corresponding AD-primitives (like e.g. ForwardDiff.Dual or ReverseDiff.TrackedReal). What is the “nice” way of doing this?
(1a) I know the PreallocationTools.jl, but as I understand it’s for ForwardDiff only. However, there is now ReverseDiff as optional dependency… is there a ReverseDiff-support planned maybe?
EDIT: ReverseDiff.jl works with PreallocationTools.jl!
(1b) Zygote.jl seems complicated with buffers, because of allowing only non-mutable array operations. However, there is the new Zygote.Buffer - but this feels like additional code is necessary, especially for Zygote.
(2) Common AD-interface: ChainRules.jl
I really like the idea of ChainRules.jl and that deploying one forward and one backward rule (frule and rrule) is mathematically enough to build an interface to further AD-“backends”. Adding custom rules is not just for non-differntiable foreign calls, but also necessary for performance (e.g. AD-shortcuts over iterative procedures). However, I see multiple (big!) libraries, that don’t use ChainRules.jl and decide to implement dedicated dispatches for AD-primitives instead. Is this because ChainRules adds overhead compared to a pure AD-dispatch (like for ForwardDiff.Dual)?
Finally, it might be target-oriented to discuss this at an example:
# a struct, that "lives" some time during the application
# and stores values and buffers, that are reused (it's mutable)
mutable struct LongLifeStruct
# the type of `a` will change during the application
# `AbstractArray{<:Real}` is bad, but works with AD, because AD-primitives are
# subtypes of Real. I assume PreallocationTools is the way to go here?
a::AbstractArray{<:Real}
end
function doSomething(str::LongLifeStruct)
str.a[:] = ... # whatever
end
# a struct, that is allocated for a special calculation and
# freed afterwards (immutable)
struct ShortLifeStruct{T}
# the type of `a` will not change during its "lifespan"
# therefore we can allocate it "typed"
# is this correct?
a::AbstractArray{T}
end
What would be (special cases neglected) the “correct” (=fast) way of implementing AD-support (ForwardDiff, ReverseDiff, Zygote, …) for the little code example above?
Thank you all!