Do any of the current Julia-based deep-learning frameworks use automatic differentiation? I’m pretty sure Mocha.jl does not. MXNET.jl seems to, but it actually is just a wrapper around the mxnet C++ code (so the AD is taking place in C++). It seems like this is an application where Julia would shine because we could use macros to perform AD on a function instead of having to manually define an AST for a function (as is done in TensorFlow and MXNet) - AST derivation could be done by macros in Julia.

# Any Julia deep learning frameworks use automatic differentiation?

**TsurHerman**#2

Someone pointed me out to Knet when I was asking a similar question,

I looked into it and it looks really impressive.

**jrevels**#3

I think there are a lot people playing around with this, but not many playing around with incorporating native-language AD (rather than just backprop-on-my-own-graph-type AD). Knet.jl looks pretty cool.

For anybody who wants to start hacking on such a framework, but needs a native Julia AD package, ReverseDiff + ForwardDiff implement best-of-breed AD in Julia (I’m biased though, since I’m the primary author of those packages ;)). A lot of cool features are in the works, including ReverseDiff support for GPU-backed arrays (word on the street is that ML folks are fond of the GPU).

**TsurHerman**#4

Well Knet is already remarkable in the sense that there is no special language to represent layers and such.

You just define your predict function and the loss function, as a mathematical expression.

If you use matrices W and vector bias b , going through a sigmoid , you get a normal perceptron layer

but there is no restriction for that , and you can have any topology expressible in julia code.

Moreover it uses the GPU for performing simple element-wise operations, and element-wise with stencil operations using cuDNN.

Theoretically a much higher speedup would be gained using differentiating symbolically the Loss function and compiling

a specialized version for that(cudaNative initiative) to the GPU. resulting in a single kernel pass.

**dfdx**#5

This is one of the main long-term goals in XDiff.jl. Julia has everything needed to generate highly efficient specialized code without making a programmer to think about all the internal details.

**TsurHerman**#6

You should accommodate for a simple way of processing convolutions.

And a simple way for processing deep networks.

much deeper than:

`function ann(w1, w2, w3, x1)`

I am acquainted with ANN for over 15 years, novel but useless, until deep learning and convolutional neural networks

changed everything

**dfdx**#7

Yes, convolution is on my list. Just added an issue for it to keep it mind. Deep networks are still just functions of their inputs, so should already work. Anyway, my main point is that Julia is much better suited for symbolic programming and code generation than Python or C++, for example.

**malmaud**#8

I’m curious of the model/code you have in mind where the kind of AD that something like TensorFlow.jl has is insufficient.

**TsurHerman**#9

good question , and probably once I am finished answering it I am going to realize that tensor flow is sufficient.

The set up I wanted to research is as follows:

- an RNN which has a hidden state and input neurons

predicting a signal … lets say 10 steps into the future.

I want to train it in batches as follows:

record the initial hidden state … run for 20 time steps. I now have 10 measurements with an answer whether my prediction was right. backpropagate the results to nudge the hidden state (yes the hidden state) , keeping the weights fixed.

now nudge the weights using backpropagation. repeat.

once training is complete , prediction still involves “fixing” the hidden state.

TensorFlow uses abstraction like layers and optimizers and such , and probably once I am done experimenting I am going to realise that these abstractions are the best way … but until then I want to understand clearly what gets multiplied where.

**ChrisRackauckas**#10

TensorFlow uses abstraction like layers and optimizers and such , and probably once I am done experimenting I am going to realise that these abstractions are the best way … but until then I want to understand clearly what gets multiplied where.

Have you checked out the JuliaML stuff? It’s at kind of a “developer preview, use caution” phase, but it’s modular to support this kind of tweakability for research in the methods itself.

**malmaud**#11

TensorFlow doesn’t actually use abstractions like layers. This is matrix multiplication in TensorFlow:

```
X = placeholder(Float32)
Y = placeholder(Float32)
Z = X*Y
```