Flux ready for a beginner deep learning project?




I’m about to embark on a client facing Deep learning project where I’d need to get up to speed on some basic DL things and then start a research project within a couple of weeks or so. (they are aware of my current state of DL knowledge, but are hopeful that I can learn quickly an apply my subject matter expertise).

I’m looking at a software stack and the “safe” answer is python + pytorch, but I’d like to use Julia (for various reasons, some are indeed practical).

Is flux ready for a beginner to solve real client facing problems with? I do not want to jeopardize the project.

My plan is to work from fast.ai’s pytorch tutorials and then move onto the specific research problem, but I don’t want to invest time and then find out the fiddling with indices and model parameters in the translation or having to re-implement features will slow me down too much. I also don’t want to be in the situation of questioning whether some implementation issue or bug had derailed my experiment.

How likely are either of these scenarios? Could I ameliorate them easily?



Also have a look at Knet.jl.


I would say not yet, just because the GPU integration isn’t all complete yet. However, it’s by far the easiest for a beginner to get their model up and running. GPU support from CUArrays requires building Julia from source until Julia v0.7, and CLArrays are a bit off. But Flux is so fun to use and Mike is working hard to fix GPU support and get native Julia convolutions.


Knet is quite complete and has very good documentation and plenty of examples. It’s a bit low level, it basically provides GPU arrays with some operations (conv4, …) and auto-differentiation and then you write your model yourself in plain Julia, so it’s nice to learn the basics.


Thanks, it definitely looks fun and elegant.

So it’s just waiting on 0.7? Any idea how far off that is?


I’ve looked at knet, but was put off by the low level nature…how difficult is it to use and realistically how much extra user overhead would it add?


Well, CUArrays can already be used but it needs a source build of Julia to work on v0.6. It’s unfortunate because the whole reason I want to use Flux is to avoid installation issues and other low level stuff :smile:, but yes to me it’ll be the dream sooner rather than later.

As for v0.7, I will not even try to estimate a date.


Also somewhere on the horizon:

One thing we been working with the Julia community on a native port to ROCm, which will also upgrade to our newer BLAS ( rocBLAS) and FFT( rocFFT) library which is a huge improvement over clBLAS and clFFT.


The most exciting thing will be to have a sort of CLNative (ROCnative?) for AMD cards to go alongside CUDAnative. But FWIW, I benchmarked multiplying 5000x5000 matrices via CLBLAS (through CLArrays) at 45 ms, and and HipBLAS (through ccall) at about 25 ms, so definitely a big “free” improvement once it lands – although another response was less optimistic about the FFTs.
The 25ms was a high end AMD card (Vega 64); any idea how a good NVidia card compares? I’ve heard CUBLAS is much faster than CLBLAS; would be neat to know if this levels the playing field a bit.


This is a great docker image and I experimented with it once on AWS.

ANN: Docker image for CUDA packages

Given that you don’t have experience with DL and have limited time for learning, I’d stick with Python + PyTorch (or even Keras) for now: even though I like the progress for deep learning in Julia, I have to admit that:

  • Knet is a bit low-level for a beginner: to efficiently use it you have to understand autodiff, know most popular cost functions, have at least rough understanding of optimization algorithms and general principles like splitting your data into training, validation and testing datasets
  • Flux is much simpler for beginner, but it still has gaps (e.g. GPU integration); experienced users can often fill in these gaps quickly, but in your case it may take too long
  • Python has many more utilities, accessors, datasets and other useful things; as an example, with PyTorch’s ImageFolder I was able to create an image dataset in minutes, while with Julia I spent a week and eventually had to switch to other tasks.

Note, that this is advice for your specific case of limited time and no previous experience, in other conditions or on the later stage of your project it should be fine to use pure Julia tools.


Thanks so much for this answer, definitely gets to the core of my question.


I don’t really have anything to say other than that I love Flux.jl and wanted to share my enthusiasm. Writing your ML stuff in ordinary Julia code? No digging through reams of documentation for a specialized package? If that isn’t infinitely more appealing than TensorFlow I don’t know what is. Seems perfect for beginners and experienced users alike (though, to echo everyone else’s sentiments, probably more so in 0.7).

I can’t wait to see this thing in a mature state.


What would be really interesting is to know why Flux or KNet do not work out in this case. I think it should be fairly easy to put together a docker container with Flux and GPU support for Julia 0.6.

Do you know what you want to do with Flux? Does it work on the CPU at least? I think we generally know where the issues are, but it would be good to collect them somewhere and knock them out one by one.


Thanks for responding.

Re: Docker, I can theoretically do that, and would if this was a hobby project. but frankly I can’t justify spending more than a trivial amount of time to mess around with packaging and GPU support myself. How long would it take?

Edit: The docker image posted here earlier might work.

The problem is, only at a high level to start with(learning certain specific features from specially processed image and video data, so I’d need the Conv3d to be finished at least) as I still have to cram tutorials. I don’t know where my experimentation will lead and I don’t know how difficult it will be to get working or to code all the attendant infrastructure, mostly due to inexperience.

So I’d be taking a risk at investing time in the stack then finding myself stuck, dead ended or wondering if I messed up some implementation detail, then having to learn pytorch anyway…meanwhile delays.

Mostly due to these


It’s true the Julia ML story is still maturing, but I also think it depends heavily on what kind of models you’re working with. As a beginner you’ll most likely be sticking to convolutions and RNNs, both of which work great on the GPU (at least on Flux master, very soon to be tagged). We’ve trained plenty of realistic models and we’re well past the point where you’d be running into really basic numerical issues.

As always, the Python ecosystem is great and if you need something from there, that would make sense. On the flip side, anything that’s missing is likely to be much, much easier to add in Julia (someone needed multithreaded training and they just wrote out the 20 lines to implement it). If you have an idea of what you need – like the 3D convolutions – I’m happy to help push it forward.


It’s also good to keep in mind that PyCall is extremely robust. So if there is some minor part of your project that you absolutely can’t get by without something in Python, you always have a fallback.

Also you may find Images.jl relevant. Julia has all sorts of great existing packages, you just have to know where to look (which is a little unfortunate).



Thanks for the offer of support.

Do you know if this issue is solved? https://github.com/FluxML/Flux.jl/issues/182

Generally true, but not when it comes to interfaces between dataloaders, infrastructure and custom layers in python pytorch etc -> flux, I assume …right?


It depends on what you are doing. I suppose that in theory you could probably do interfaces between mixed Python, Julia layers in TensorFlow or MXNet but I wouldn’t try it. You certainly would not be able to combine layers between pytorch and flux because as far as I know pytorch does backpropagation on a fixed computational graph (you can “weld” networks together at the ends, but the performance would be awful). In many cases (perhaps most cases?) it would be perfectly fine to use a Python data loader. It’s possible that there would be significant data re-formatting overhead but since most of the data in this case would be Int32 that probably wouldn’t be the case in practice.

I looked back at your original post and saw that it said “client facing” which sounds scary, though I guess it could mean a lot of different things. If you were doing optimization (i.e. non machine learning optimization) or solving differential equations, to my knowledge what’s available in Julia is far superior to anything you’ll find in Python. Machine learning definitely isn’t there yet. If you are a “beginner” and looking to do something “client facing”, I’m imagining you probably want something that is mostly a “pre-canned” solution, for which you are definitely better off in Python. Depending on what you are doing, I might also recommend using a pre-trained network, which are gradually becoming more widely available.

The really cool thing about Flux and auto-differentiation is that it really is very flexible and “plug-and-play”. It’s pretty much just ordinary Julia objects which you can combine with other Julia objects any way you like. This is very different than the standard approach which involves having a huge complicated code base of dedicated C++ objects for building computational graphs. This is what has me so excited: the simplicity of Flux compared to e.g. TensorFlow for something that offers at least as much functionality is almost mind-blowing.


The issue has been fixed, at least for the master branch.


Thanks for your comprehensive reply.

Interesting. I plan on looking into this.

Sure, but pytorch also has dynamic graph.

Client facing for me, in that it’s not just a toy personal project.

All of that makes sense to me…It’s just hard to assess before I was familiar with the technical considerations, though I’m in a better position to do so now.

Yup. I love this and will be key to Julia’s success ie differentiable programming and prob programming beyond neural nets…but also for general purpose coding.


The really cool thing about Flux and auto-differentiation is that it really is very flexible and “plug-and-play”.

But PyTorch and Keras also do auto differentiation for you as well (I think even TensorFlow takes care of it). So how is this different from what’s available in Flux?