Any Julia deep learning frameworks use automatic differentiation?

Phil_Tomson · January 19, 2017, 7:03pm

Do any of the current Julia-based deep-learning frameworks use automatic differentiation? I’m pretty sure Mocha.jl does not. MXNET.jl seems to, but it actually is just a wrapper around the mxnet C++ code (so the AD is taking place in C++). It seems like this is an application where Julia would shine because we could use macros to perform AD on a function instead of having to manually define an AST for a function (as is done in TensorFlow and MXNet) - AST derivation could be done by macros in Julia.

TsurHerman · January 19, 2017, 9:26pm

Someone pointed me out to Knet when I was asking a similar question,
I looked into it and it looks really impressive.
https://github.com/denizyuret/Knet.jl

jrevels · January 19, 2017, 11:15pm

I think there are a lot people playing around with this, but not many playing around with incorporating native-language AD (rather than just backprop-on-my-own-graph-type AD). Knet.jl looks pretty cool.

For anybody who wants to start hacking on such a framework, but needs a native Julia AD package, ReverseDiff + ForwardDiff implement best-of-breed AD in Julia (I’m biased though, since I’m the primary author of those packages ;)). A lot of cool features are in the works, including ReverseDiff support for GPU-backed arrays (word on the street is that ML folks are fond of the GPU).

TsurHerman · January 22, 2017, 8:09am

Well Knet is already remarkable in the sense that there is no special language to represent layers and such.
You just define your predict function and the loss function, as a mathematical expression.

If you use matrices W and vector bias b , going through a sigmoid , you get a normal perceptron layer
but there is no restriction for that , and you can have any topology expressible in julia code.

Moreover it uses the GPU for performing simple element-wise operations, and element-wise with stencil operations using cuDNN.

Theoretically a much higher speedup would be gained using differentiating symbolically the Loss function and compiling
a specialized version for that(cudaNative initiative) to the GPU. resulting in a single kernel pass.

dfdx · January 22, 2017, 6:55pm

This is one of the main long-term goals in XDiff.jl. Julia has everything needed to generate highly efficient specialized code without making a programmer to think about all the internal details.

TsurHerman · January 23, 2017, 8:16am

You should accommodate for a simple way of processing convolutions.
And a simple way for processing deep networks.

much deeper than:
function ann(w1, w2, w3, x1)

I am acquainted with ANN for over 15 years, novel but useless, until deep learning and convolutional neural networks
changed everything

dfdx · January 23, 2017, 1:01pm

Yes, convolution is on my list. Just added an issue for it to keep it mind. Deep networks are still just functions of their inputs, so should already work. Anyway, my main point is that Julia is much better suited for symbolic programming and code generation than Python or C++, for example.

malmaud · January 23, 2017, 4:36pm

I’m curious of the model/code you have in mind where the kind of AD that something like TensorFlow.jl has is insufficient.

TsurHerman · January 24, 2017, 9:48am

good question , and probably once I am finished answering it I am going to realize that tensor flow is sufficient.

The set up I wanted to research is as follows:

an RNN which has a hidden state and input neurons

predicting a signal … lets say 10 steps into the future.
I want to train it in batches as follows:
record the initial hidden state … run for 20 time steps. I now have 10 measurements with an answer whether my prediction was right. backpropagate the results to nudge the hidden state (yes the hidden state) , keeping the weights fixed.

now nudge the weights using backpropagation. repeat.

once training is complete , prediction still involves “fixing” the hidden state.

TensorFlow uses abstraction like layers and optimizers and such , and probably once I am done experimenting I am going to realise that these abstractions are the best way … but until then I want to understand clearly what gets multiplied where.

ChrisRackauckas · January 24, 2017, 1:15pm

Have you checked out the JuliaML stuff? It’s at kind of a “developer preview, use caution” phase, but it’s modular to support this kind of tweakability for research in the methods itself.

malmaud · January 24, 2017, 9:06pm

TensorFlow doesn’t actually use abstractions like layers. This is matrix multiplication in TensorFlow:

X = placeholder(Float32)
Y = placeholder(Float32)
Z = X*Y

Topic		Replies	Views
Julia end-to-end LSTM for one CPU Machine Learning	24	4938	June 2, 2017
Automatic differentiation - Julia implementation advantages Machine Learning	8	4090	June 7, 2019
Automatic Differentiation (AD) in Julia vs. Python (or PyTorch) Machine Learning autodiff	14	1592	January 16, 2025
Automatic Differentiation (AD) in Python compared to Julia and AD Basics Machine Learning	6	2240	October 22, 2021
State of AD in 2024 General Usage machine-learning , autodiff	4	2310	April 6, 2024

Any Julia deep learning frameworks use automatic differentiation?

Related topics