Will cassette.jl (new backend for ReverseDiff.jl) obviate need for ML framework?

I saw the JuliaCon2017 talk by Jarrett Revels about ReverseDiff.jl and how he’s working on cassette.jl to overhaul the backend for ReverseDiff.jl, which should address most (or all) of it’s current limitations.

If cassette.jl can actually differentiate arbitrary Julia code without having to use something like Autograd which has a library-defined set of things that can be differentiated, you wouldn’t really need to use something like Knet or TensorFlow right?

I could just write a neural network’s forward pass using GPUArrays and then get the gradient with respect to my weights using ReverseDiff.jl ? Am I missing something here?

1 Like

Cassette will make this much better, but you can already use ReverseDiff.jl to take gradients of a neural net written in pure Julia. It’s great!

There are still many nice features in TensorFlow etc., such as visualization, logging, saving and restoring state, and nice GPU integration, but it’s already really nice to do everything in very straightforward Julia code.

1 Like

I think this is effectively what you get with the new version of FluxML. It’s more a lightweight set of generic Julia constructions, that happen to implement a ML framework.

I guess the issue with doing it totally generically will be efficiency? Flux already does something clever with caching gradients.

Certainly I’ve used ForwardDiff with generic scientific Julia code, to optimise nasty looking things with numeric quadrature + etc. inside them. Worked pretty well - but when you only have a few degrees of freedom to optimise, you don’t really notice how much time it takes!

3 Likes

Thanks for watching the talk!

The short answer is “no.” There’s a host of features that ML frameworks can provide on top of AD, like optimized model components or loss functions, distributed scheduling frameworks, and cool ML-specific APIs (like Flux).

That being said, there are some nifty new designs being planned on top of Cassette for both forward-mode and reverse-mode AD. @MikeInnes, @denizyuret, myself, and a host of other folks have all been collaborating on designing an AD interface that works naturally for both ML and traditional AD use cases. It’s my hope this Cassette-based, native Julia AD interface will be solid enough to replace framework-specific AD sometime next year (hopefully before next JuliaCon).

Note that Cassette alone actually isn’t a tool for AD - it’s a framework for writing context-specific compiler extensions, and on top of that, doing what I’m nebulously calling “contextual dispatch” and “contextual metadata propagation.” You can read a little bit about it here. That README is a bit out of date by this point, but I plan on updating it next week. It’s now more up-to-date.

8 Likes