Julia end-to-end LSTM for one CPU

hpoit · May 26, 2017, 5:56pm

I wonder how this core would look like in Julia

Together with a Julia frontend

How do I build a Julia abstraction for all of this, integrated with JuliaDB?

The GPU part would not be a priority at the moment, as I first want to run an LSTM on a macOS CPU. How do I take the first 100 steps?

dfdx · May 26, 2017, 7:31pm

Are you asking about writing your own backend in Julia or wrapping TensorFlow’s backend? Why would you do any of these when TensorFlow.jl and Knet.jl exist?

hpoit · May 26, 2017, 8:00pm

Christopher Rackauckas @ChrisRackauckas 16:50 on https://gitter.im/JuliaML/chat
Knet doesn’t use computational graphs. It uses dispatch on the types in generic Julia code and overloads the methods using their specific array type in order to turn your NN code into GPU code. Take a look at the tutorial, and note that it’s essentially just Julia code with two lines from KNet.jl: one call to Autograd.jl and many calls to create KNet arrays. By making it a KNet array instead of an Array, it then overloads what * etc. all mean to make your Julia NN code run on the GPU and all of that, but that means that the tutorial is essentially just a “how to write an NN in Julia”

Mike Innes: Building a graph has genuine benefits – e.g. parallelism, deployment, fusing operations and memory management. PyTorch and Knet will both struggle with those. Of course, it’s also true that TensorFlow’s API is severely limited by Python

This might be a starting point for a great discourse
https://www.tensorflow.org/extend/architecture

ScottPJones · May 26, 2017, 8:21pm

Isn’t the core of TensorFlow all C++ code (with a C API that makes it easier to interface with)?
Python is just one of the two languages they concentrated on for the client libraries (along with C++).

malmaud · May 26, 2017, 8:27pm

TensorFlow.jl is exactly the attempt to make a Julian API for TensorFlow.

hpoit · May 26, 2017, 9:18pm

Does TensorFlow.jl wrap Python?

dfdx · May 26, 2017, 9:23pm

Essentially, TensorFlow provides 3 main advantages:

Automated differentiation.
Code generation for CPU and GPU.
Distributed computations.

I don’t know much about TF’s model of distributed computations, so can’t really comment on this.

I wrote specifically automated differentiation because in TF it’s not exactly the same as automatic differentiation e.g. in Knet.jl. Citing @denizyuret:

Automatic differentiation is the idea of using symbolic derivatives only at the level of elementary operations, and computing the gradient of a compound function by applying the chain rule to intermediate numerical results. For example, pure symbolic differentiation of \sin^2(x) could give us 2\sin(x)\cos(x) directly. Automatic differentiation would use the intermediate numerical values x_1=\sin(x), x_2=x_1^2 and the elementary derivatives dx_2/dx_1=2x_1, dx_1/dx=\cos(x) to compute the same answer without ever building a full gradient expression.

AD is pretty good, actually, especially being backed by GPU arrays. Yet, as you mentioned, it doesn’t create a computational graph which limits many optimizations.

An alternative approach is to use symbolic differentiation. SD is less straightforward to implement and has its own limitations (e.g. no loops on loss function), but it can produce exactly what AD is missing - computational graph (for which we already have Julia’s AST). To my knowledge, there are currently 2 packages providing symbolic differentiation on array types - ReverseDiffSource.jl by Frédéric Testard and mine XDiff.jl. Both are not in the best shape (ReverseDiffSource doesn’t support Julia 0.6 yet, XDiff.jl is currently under the major refactoring), but if you are looking for symbolic computational graphs like in TensorFlow helping one of these projects may be a good start.

Code generation comes from symbolic graphs and shouldn’t be too hard (especially given awesome CUDANative.jl), yet making it produce really highly optimized code may take many man-hours, and this is exactly where TF has the advantage over not-so-well-known projects.

I can dive deeper into the details of (1) and (2) if you really want to step this way, but you should be aware that this way is quite long yet.

hpoit · May 26, 2017, 9:25pm

I just want the most correct way for Julia, without rushing.

hpoit · May 26, 2017, 9:32pm

I understand it doesn’t but is there something to be said if the backend was made with a Python frontend in mind? In the end I just want to assume I don’t want C++ or Python in the design.

jekbradbury · May 26, 2017, 9:58pm

TensorFlow.jl wraps the TensorFlow core (mostly C++), not the Python frontend. If you want a pure Julia deep learning framework, check out Knet.jl.

hpoit · May 26, 2017, 10:01pm

If you scroll up you’ll see from Chris’ comment that Knet doesn’t use computational graphs.

dpsanders · May 26, 2017, 10:11pm

Rather than ask that question here, why don’t you look at the source code instead?

dpsanders · May 26, 2017, 10:13pm

https://github.com/JuliaDiff/ReverseDiff.jl builds up a computational graph for automatic differentiation (in reverse mode).

hpoit · May 26, 2017, 10:13pm

I looked at the docs, it was sufficient.

hpoit · May 26, 2017, 10:16pm

So that’s one other decision to make: which type of differentiation to use for the computational graph.

kristoffer.carlsson · May 26, 2017, 10:50pm

It is good and fun to talk about different design strategies sometimes but it is important to note that you get experience and insight when you actually implement things. You have had your package https://github.com/hpoit/MLN.jl/ going for 10 months now and it has links to tutorials and documentation and release notes. These, as well as all the Julia files, are still after hundreds of commit completely empty. At some point you have to get dirty and actually try write some code instead of just discussing it. Remember that when you ask question you other people spend their time to answer them in order to help you. I think it would be fair that next time you could add a bit of actual runnable Julia code that shows what you have tried so far. That would make it easier to see where you are and how to progress from what you have implemented so far.

hpoit · May 26, 2017, 10:52pm

I like to ponder before doing anything. For example, it seems like Julia was very well pondered before it was initiated. I’m on the paper stage, which I believe comes before the doing stage.

kristoffer.carlsson · May 26, 2017, 11:41pm

Julia is not done and I would not say it was very well pondered… Like everything in Julia is changing all the time. The file extension was changed once, the names for the basic types just got changed, the type system gets revamped, function types gets added etc etc. Julia is the result of an incredible amount of work where bad ideas have been scrapped and good ideas have been kept and the only way to know if many of them were good or not was by trying them.

Yes, it is useful to ponder on things sometime but at some point there has to be some action too.

hpoit · May 26, 2017, 11:55pm

Initiated, not finished or completed, is what I meant. I like action, in the right amount.

dfdx · May 27, 2017, 1:23am

I guess you are talking about the tape which indeed is a kind of computational graph. However, it’s different from what you typically get with symbolic differentiation. The key difference is whether you can further transform the graph, e.g. fuse operations, find common subexpression, generate code, etc. Consider following example:

u::Vector{Float32}
v::Vector{Float32}

x = u + v
y = 2x
z = sum(y)

in symbolic differentiation you get something like:

dz_dz = 1.0
dz_dy = dz_dx * ones(size(u))
dz_dx = dz_dy * 2
dz_dv = dz_dx * 1
dz_du = dz_dx * 1

which is easily simplified to:

dz_dz = 1.0
dz_dy = ones(size(u))
dz_dx = 2 * dz_dy 
dz_dv = dz_dx
dz_du = dz_dx

if you only need derivatives w.r.t. inputs u and v, you can throw away unused variables and get:

dz_dv = fill(2, size(u))
dz_du = dz_dv

Generating code for GPU or, for example, dstributed calculaton on the cluster is also trivial.

ReverseDiff.jl, on the other hand, provides an exact implementation for each of recorded instructions and their derivatives, binding them to the tape and cache. Optimizing the tape looks pretty hard to me (I also didn’t find any such optimizations in the code) and moving the code to GPU will probably require a special kind of GPU tape.

Topic		Replies	Views
Any Julia deep learning frameworks use automatic differentiation? Machine Learning	10	3619	January 24, 2017
Automatic Differentiation (AD) in Python compared to Julia and AD Basics Machine Learning	6	2237	October 22, 2021
Starting a Deep Learning project, should I keep using Julia or jump to Python? Machine Learning question	15	12900	December 8, 2019
Introducing Seep.jl, a tensorflow like library for julia Machine Learning	15	4854	March 31, 2017
PyTorch and Julia Machine Learning	12	15329	March 27, 2019

Julia end-to-end LSTM for one CPU

Related topics