If anyone knows pytorch well enough, it might be useful to have that as an additional comparison?
Wow the statistics cheatsheet is great! Once DataFrames hits 1.0 we should submit a PR for this.
That^
In the short term, GitHub - MikeInnes/Mjolnir.jl: A little less conversation, a little more abstraction
In the medium term, Add `AbstractInterpreter` to parameterize compilation pipeline by staticfloat · Pull Request #33955 · JuliaLang/julia · GitHub and future PRs building on it will allow faster and more composable Zygote, Cassette and other code transform packages built on typed IR
A side question: why does Flux.jl have dependency on the Juno.jl? Will there be any missing features of Flux if I use VSCode with Julia extension instead of Atom with Juno?
Juno.jl is an extremely lightweight dependency along the same lines as RecipesBase.jl to allow packages to integrate with Juno without depending on the heavier Atom.jl. I can’t speak from experience to what features that enables for Juno though.
Glancing at the source, it appears to simply be used for defining a nice, foldable representation for the Juno REPL which they might be able to do by depending on TreeViews instead?
I am tempted to add pytorch code for comparison but I fail to see the point of comparing frameworks using such a small network. Am I missing something?
@bafonso PyTorch has a very different approach than Keras. In fact, one of the main reasons I moved from Keras to Pytorch was the break-down of the training loop.
Where in Keras you would have something like model.fit(data) in Pytorch you have this breakdown of the “fit” to:
for epoch in range(epochs):
gen = DataGenerator(....)
for x, y in gen:
model.zero_grad()
output = model(x)
loss = criterion(output, y)
loss.backward()
optimizer.step()
There are many reasons that make this more powerful than the Keras approach. One example is that you can easily call the optimizer step after, say, every 5 batches, essentially increasing the batchsize 5 times without taking more GPU RAM, which is important for large inputs (Video, 3D volumes…). like that:
for epoch in range(epochs):
gen = DataGenerator(....)
for x, y in gen:
output = model(x)
loss = criterion(output, y)
loss.backward()
if epoch % 5 ==0:
optimizer.step()
model.zero_grad()
@Alon, I understand the differences between keras and pytorch, I’ve used both and now I am trying to test some small projects on Julia. I was just mentioning that I think it would be beneficial to use bigger networks to compare performance as opposed to a small network. I guess using a small network lets you see more the syntax differences but will not let you conclude anything about real world performance.
Can you elaborate more about how to use Revise? I’m very curious about it!
Load your package via using Revise, MyPackage
. Then it will track your files and reload them automatically when there are changes. (You can also load Revise
your Julia startup.jl
file so that it is always used.)
If you do ] add Revise
to your project in the vscode extension once then then everytime you use vscode afterwards I believe it will automatically load in the background.
The only trick, I think, is that if you activate separate manifest/project files… Which I strongly encourage… You would need to add the package for all of them.
I think VSCode loads Revise.jl automatically on startup these days. It’s a setting for the extension.
I was thinking , If there is Just similar Pytorch standards and norms based pytorch-julia
official.
I am waiting if this happens, I would be happy with my favorite pytorch and julia.
Flux all the way. Much better than pytorch. That said, I’d say Flux is closer to a research project than a production grade tool. Keep your eyes peeled for bugs and be at peace with things changing over time and/or lock your version. That’s not a gripe - but it is personal experience.
Flux isnt much better than pytorch dont be fanboying 2much. Its great tool but it have a VERY long way before it’ll catch up to pytorch/tensorflow in terms of stability, features and in a lot of cases performance (GPU).
I agree with the stability statement and the performance statement. Sometimes I get mixed up about what Flux is given how much it’s changed over time. Let me clarify my statement.
“Flux is great for doing research on neural network topologies and advanced concepts but that comes at a cost. The syntax is bar-none my preferred tool for this kind of work. But you will notice it’s rough around the edges for day-to-day use. For many use cases and concerns - there are better tools. For most of my projects i’ve found that it is better to just use autodiff and write my own interfaces for training, data handling, optimizers, etc. using Flux as a template.”
I cant speak about persona preference (As im mostly using jax now days and love it). But from non biased view i would almost always use other stuff then flux just cause of features presented in other high quality deep learning frameworks.
In flux u wont get easy mixed precision (or AMP libs like in pytorch).
You wont get a lot of built-in layers and optimizers.
You wont get a lot of ready to plug-in preprocessing and augment functions.
You wont get good scalability for learning on big clusters (as is issued by GPU performance on Flux) and in most deep learning areas it’s must have for state of the art models and research.
You wont get easy (and good) deployment options and autoscaling stuff for production.
And you wont get a tons of great utils and libraries build around it.
We could make a very long list of stuff that is non-existing in flux right now and is existing in other big deep learning frameworks but i think everyone reading can see my point.
For flux defence its a great tool for some task and its very hard to catch up for frameworks build by Mega-Corporations with bigger budget on NN frameworks then (**probably) whole Julia language budget .
And I’m pretty sure if u are focusing on some small models or in extremely custom areas flux can be better choice then pytorch or tensorflow (probably?).
But its so limited that i wouldn’t recommended it for newcomers or people who works with popular deep learning areas, just cause you’ll have to switch to other stuff sooner or later if u start working in some company or if u want to do some state of the art research etc.
You would just lose a ton of time (just like me ) to implement everything urself and In my last 5 years of working partialy with that i can say:
It is just not worth it u just waste your time to reinvent wheel again.
And i would say that this is the biggest issues for me to not fully switch for Julia.
Deep learning in Julia isn’t even close to python ecosystem and writing everything urself is just enormous task.
I mean yes, but all these arguments have been presented and discussed ad nauseaum on Discourse and pretty much every other community forum. So have counterpoints where people with different use cases (mostly SciML people doing state of the art research/industry work, but really anything that diverges from “mainstream” ML) talk about how Python libraries suck.
I agree it’s ill advised to point newcomers to ML/DL to Flux without some kind of support and have pushed back on multiple occasions about just that. Frankly, it doesn’t help the ecosystem grow either, as someone lacking both domain and language experience is less likely to be comfortable contributing to any of the areas you mention.
Conversely however, there is a small but significant (and growing) crowd of experienced ML users who are frustrated with how things are done in Python land and looking for an alternative. Courting them over won’t lead to world domination for Julia, but it would help carve out a big enough ecosystem (because they will fill in gaps to scratch their own itch) for you to not feel like you have to reinvent the wheel every so often. Let me plug https://github.com/FluxML/ML-Coordination-Tracker here. Many of us working on it (myself included) don’t use Julia currently, but would like to sooner than later and are trying to get something actionable out of all this instead of just venting on Discourse .
Edit: as much as I like Jax too, its supporting ecosystem is barely better!
I partially agree with your statment.
As for Jax. Jax have by far better ecosystem casue u can easy use a lot stuff used in tensorflow ecosystem.
Also I’m just pointing out stuff that i didnt see in this topic and response to ckneale as is mostly hyping Julia and Flux and as a person implementing deep learning state of the art models for last 5 years of my professional life and trying a lot of stuff (Julia, S4TF, Jax, Tensorflow, Mxnet, Pytorch, Keras, Theano etc.) I’m just pointing out stuff that isnt there and can be break point for a most of the people wanting to use it in their jobs.
Julia community is living in a bubble and thats why such a cool and beautiful language cant take off for last ~3 years outside academia (ofc.), ODEs and some other small niches. It have potential to fully replace python. And few years ago there where sentiment that julia will be new language for DL, when DL community wasn’t as big and small number of people can create alternatives for fresh existing DL python frameworks but it just never take off
“Conversely however, there is a small but significant (and growing) crowd of experienced ML users who are frustrated with how things are done in Python land and looking for an alternative.”
=> I saw same sentiment 3/4 years ago and here we are Julia ML ecosystem are still the same (ODE’s are better, very basic ML stuff is better, DL is worse by a huge margin. And even worse for Julia Big Deep Learning players are trying to move outside python but to other languages not Julia. => Facebook try Kotlin [ Paving ] and Google try Swift [ S4TF ] )
Warning: this is going to be a long one…
This is about using libraries that build on top of (incompatible with JAX) TF APIs, in which case you might as well compare against PyCall.
The reason you didn’t see it is because we’ve had many, many threads about this in the past year alone with all of the arguments made thus far and more! I’m sure “breaking point” has appeared at least a dozen times. Doesn’t make your points less valid, of course, but @anon92994695 and others on this forum are no stranger to them.
My personal impression is that Julia’s ML ecosystem was hyped a too hard around 2018-early 2019 and burned many people (including yourself, I assume) who weren’t expecting to face sharp edges or pay the early adopter tax. Continuing with the hype cycle analogy however, I think we’re starting to emerge from the trough of disillusionment. In other words, you’ll see less flashy evangelism about Julia, but more and more users behind the scenes.
I used to think this too, but I think a fairer explanation would be that us ML folk are in a different bubble . Let me explain:
I have been playing around with Julia ML off and on for over a year now. I say "
off and on" because I usually end up digging myself out after accruing many, usually small, frustrations. This includes (but is far from limited to):
- Import times
- Debugging and AD errors
- IDE tooling
- Gaps in the library ecosystem (augmentation, preprocessing, layers, etc.)
- No multi-GPU or easy auto-scaling. No, MPI doesn’t count.
So pretty much your list from earlier. More than once, I was unsure why I was even bothering to spend more time on this.
But then, I’d go on Slack or Discourse and see something amazing done by non-ML Julia users. Like all of the crazy econ models that run lightning fast and fit in a page of code. Or the geo/climate models that can run on a cluster with minimal work. Or @tamasgal’s amazing series of posts about why Julia is such a breath of fresh air in particle physics (worth a read for any JAX advocates).
And then I’d remember wanting to put a hole through my monitor the day before when conda went into a dependency resolver death spiral for the umpteenth time (hyperbole, but I’m sure you can relate). Or the week before that wasted wresting with numpy and pandas to run parallel preprocessing on 10GB (not TB, GB) of data without triggering Python’s GIL and/or blowing up my machine’s RAM. Or the headache I’ll have to deal with tomorrow because some DL library pulled of a set of incredibly dirty and fragile hacks that would be trivial to implement with multiple dispatch.
In the ML community, Julia is language with a small minority of users. But in the Julia community, we are the minority. I’d be happy to use Julia for ML work in the next decade, but I do think it’ll happen. To use the example of another LLVM-based language:
- Despite much naysaying, Rust carved out a niche alongside C/C++ and is steadily gaining steam.
- Unlike Rust, Julia already has a healthy ecosystem across multiple domains and organizations.
- As the ML community grows and research focuses shift, more and more users are going to become frustrated with Python and start shopping for alternatives. In other words, growth means that language mindshare is not a zero-sum affair.
- More domain experts are going to be learning ML than ML people gaining domain knowledge. Julia has been pretty good at attracting the former crowd, so I don’t anticipate being alienated any time soon.
One thing Julia DL is lacking are more developers with solid ML experience who are willing to put in the work to flesh out the ecosystem. I’m not sure if/how there’s a good way to make this happen, but it would be a huge help.