Deep learning in Julia

Fair enough, I clearly remembered this wrong, so I corrected my post. Anyways, I’m not trying to say that Lux devs need to keep acknowledging Flux everywhere. All I want is for discussions to avoid antagonism. I fully acknowledge that part of my post is in many ways is counter to that request. But this is the first time I’ve brought this up so directly, and I hope that we can move forward to just talking about this stuff without constantly making it a comparison (at least when comparison is unwarranted).


I’ve mentioned this on slack in the past, but flagging it here for visibility.

Enzyme-dev here

Standing invite to help anyone who wants to integrate Enzyme into Flux/Lux. I don’t know the internals of these frameworks to be able to easily understand what’s happening on the inside and help fix/push that side forward, but if someone has a bit of that knowledge and can throw it over the fence to me on the Enzyme side – happy to help.

If someone is interested, perhaps we can set up a weekly sync?

Same applies to any other integration. DMs here / slack / email / etc open


Me too, and while its design is intellectually interesting, I think its main practical selling point is that Billy, Valentin, and co. are actively pushing it forward. These days there isn’t any other AD system in Julia getting the same love. Answering your question about slow downs, I think the lack of time spent on getting really robust AD is what led to a loss of momentum for ML in Julia. My only thing about Enzyme is that because it is LLVM based, the bar to understand enough to contribute is higher. This bar is already high enough for Zygote, so my experience that is a concern for long term maintenance if the current team were to ever move on. A nice complement on this front is Tapir.jl which is probably slightly higher than Zygote but lower than Enzyme in terms of the barrier to entry. But already the dev documentation process looks better than Zygote.

I use mostly Jax + Flax these days for work, and one of the things I really miss is custom models with shape inference for the parameter sizes. For example, in Flax I can have a nn.Dense(features=10) which specifies the output dimension without knowing the input a priori. In Flux, we tried addressing this with Flux.outputsize and @autosize which solves the problem, but in a way that’s not as elegant. That being said, when shape inference fails in Jax, it is a PITA, and the fallback option isn’t very ergonomic. At least in Julia the fallback is still an easy enough option.

Another thing that I don’t think we do well is building and using recurrent networks. Our current solution is annoying for AD and also suboptimal for performance. I don’t think any framework in any language addresses this well. There’s ways to make the user experience clean, but that moves a lot of the burden to the recurrent layer authors. I don’t think solving this will lead to ML momentum in Julia, but it would be a nice thing to have for those of us that still use RNNs.


EnzymeRules are pure julia, so adding a rule for lapack/nnlib/etc only requires an understanding of julia! See Custom rules · Enzyme.jl

In fact the nnlib rules i added earlier are an example of that :slight_smile: NNlib.jl/ext/NNlibEnzymeCoreExt/NNlibEnzymeCoreExt.jl at master · FluxML/NNlib.jl · GitHub

There’s even an open issue I made to nnlib.jl for adding more pure-julia nnlib enzymerules that come up from flux in practice! Missing EnzymeRule for `∇conv_data!` · Issue #565 · FluxML/NNlib.jl · GitHub

Of course we definitely need more docs and all contributions welcome!

So I think really the missing thing is someone going through flux, saying what rules are missing, then adding (purely julia) enzymerules for them


Agreed, the main thing is someone identifying what fails right now then just adding rules. I think DifferentiationInterface.jl might help a lot here. @gdalle has graciously started replacing our custom gradient testing function with the one from DI. If we do that across Flux and add Enzyme to the AD backends tested, then we’ll get coverage of what’s left quickly.

Referencing my previous comments, I was thinking more about diagnosing and fixing bugs within Enzyme, not contributing new rules. But my point was really minor cause it is a hypothetical if the current Enzyme team were to move on to other work. Hopefully we don’t expect that to happen any time soon :slight_smile:.


Yeah, this has been stalled by Adrian and I hammering out the right design for DifferentiationInterface, but the upcoming v0.2 (hopefully next week) will be in really good shape. Once it’s out, I’m planning to focus on downstream integration, from the easy targets (Optimization.jl and LogDensityProblemsAD.jl) to the harder ones (NNLib.jl and then Lux.jl and Flux.jl).
DI is the ultimate AD bugfinder (see the list of issues we have uncovered), so missing rules will pop up quickly once we start testing with Enzyme.jl. And we can also quickly figure out what’s fast and what’s slow: an example AD speed comparison using DITest is already up on Julia AD Benchmarks · The SciML Benchmarks

I have written my first EnzymeRules this week and it wasn’t entirely painless. I might ask you for help to write a docs PR about it @wsmoses, cause many devs will probably want to do it soon.


Oh yeah for sure and we definitely should make it much easier to write EnzymeRules (suggestions/code welcome!)

I just wanted to be clear that writing EnzymeRules requires no knowledge of LLVM / compilers / etc, just Julia.


I can only imagine how you must feel. In any case, I must admit that I don’t understand such a decision from a strategic point of view (another machine learning framework). Based on the recent stats provided by @StefanKarpinski the growth rate of the Julia ecosystem in terms of usability seems to be leveling off slightly. In addition, in comparison to other mature ecosystems, the resources of the Julia community, with all due respect, seem to be limited. Thus I am very surprised with creation of another machine learning framework instead of developing further the leading one. That said, I’m wondering if you might consider leading the efforts towards a more practical approach to machine learning and possibly creating and leading the efforts of a Julia Machine Learning Organization? Personally, I would find such an approach very useful, and I see potential in coordinating efforts not only for Flux.jl but also for packages such as AlphaZero.jl, POMDPs.jl, JuliaReinforcementLearning.jl, Agents.jl, and Transformers.jl, just to name a few. Taking this opportunity, I would like to thank you. I took my very first steps in Julia thanks to Flux.jl, and it was a really cool experience (I won’t forget those days with AlphaZero.jl). Additionally, since I have the opportunity, I would like to ask: Why do you “use Jax + Flax mostly for work these days”?


Thanks for such kind words, and I’m glad you had an enjoyable first experience with Flux. Though I can’t really take all the credit here: @CarloLucibello @ToucheSir and @mcabbott all help lead the FluxML efforts, and of course a lion’s share of the credit goes to Mike Innes for creating the packages. Plus tons of contributors over the years.

I should clarify that there is a good technical reason for developing Lux.jl. And I think it makes sense from a strategic point of view: SciML is where Julia shines right now, and so a framework that prioritizes the needs of SciML is a good idea.

As you correctly identified, maintainer time is already limited. I think the team I mentioned above is already at capacity with just FluxML. I also want to highlight the great work being done in ML by other teams: JuliaML (lead by @juliohm) which provides datasets and data utilities for Flux, and JuliaAI (which @cpfiffer helps lead) pushing generative AI in Julia. All the packages you mentioned also have great maintainers behind them.

Like @CameronBieganek highlighted, there’s a trade-off between coordination and being able to explore and develop fast. I personally don’t feel there’s a need to augment the existing work being done by all these different teams. What we could use as a community is more tutorial, blog series, etc. type content that puts all this work together in a digestible manner for new users. Fortunately, this can be done by any motivated individual!

None of the Julia ML frameworks have good parallelization primitives like vmap or pmap. These just make it easy to take prototypes and scale them up as needed.


I think it might be the goal of @avikpal with GitHub - LuxDL/BatchedRoutines.jl: Routines for batching regular code and make them fast!


No it is much more narrow than that, it still requires you to manually swap operations. But one could think of potentially automating the transformation from regular ops to batched ops.

One of the main challenges with an automated transformation is that we can’t do if <custom type> so you need to use Casette and such to rewrite those for you, which quickly complicates things.

Currently, few of the the things it is solving are:

  1. If you have a ForwardDiff.gradient or ForwardDiff.jacobian call in the loss function Zygote constructs the full hessian (through no fault of Zygote). Also there is no API to compute the HVP for the parameter gradient correctly. See Improve Nested AD · Issue #913 · SciML/DiffEqFlux.jl · GitHub why an API call with single input is insufficient here.
  2. If you have a batch of Nonlinear Systems, how do you do that really fast?
  3. Same for linear systems, it overloads the correct architecture specific calls to do that.

The pain point for me is that I can not deploy the model developed either by Flux or Lux to be easily called by C/C++ due to the lack of support of ONNX or other common format.


Thank you very much for your kind words as well @darsnack. Please pass on my thanks to your colleagues, some of whom I have been in touch with directly, although not with all of those mentioned by you. I did some thinking and I still stand by my initial position that, from a strategic point of view, focusing efforts on the leading machine learning framework would be better for such a small ecosystem. I would also stand by my position that, in the medium to long term, such a leading framework should be better off on the independent open source side of Julia than on the quasi-commercial side. Especially considering that the quasi-commercial side does not seem to be particularly well-funded. Hence, my suggestion regarding the creation of The Julia Machine Learning Organization. One way or another, I would like to say once again, thank you very much, for all your and your colleagues’ work on Flux.


Not sure this has been discussion in this thread but is there any funded effort toward NNlib.jl and the gpu stack for Flux.jl and Lux.jl? I have the impression that SciML is funded, if so how much of the funding would go toward improving the stack partially or as a whole? It would be good to hear some long-term plans and priorities. At this point only full-time payed devs could make a difference with large milestone goals. I feel like the community is good for maintatining, but for pushing large functionality there needs to be a financial body behind it.

As for the problems with Flux (havent used Lux but probably the same issues there) I find the high gpu memory usage to be the deal-breaker for me which is why I had to resort to pytorch. I don’t mind that the functionality is not there on par with torch, as Julia is more flexible about kernels and gpu programming overall. It is hard for me who does reserach on ML models to overcome this GPU problem. I do not mind that it is a little slower, or there are not enough convinieces. If understand stuff from papers there is nothing stopping me from implementation of these models. But not being able to start a few parallel trainign experiments due to the memory hog is really a dealbreaker. I’m just much faster with experimentation if I can use the gpu more efficiently albeit with less freedom.

On the positive note, I love that Julia is more composable and hackable than python stack will ever be. Now torch or jax is just a walled garden. If you do torch, everything has to be torch. If you do jax it is all jax. So much redudancy and stuff being implemented slightly differently in each and little-to-none code reusablility. Deep learning stack is honest in what it does and let’s us do what we mean without making everything a ‘tensor’.


Thanks to @maleadt work, the CUDA memory management problem will be largely improved with Consider running GC when allocating and synchronizing by maleadt · Pull Request #2304 · JuliaGPU/CUDA.jl · GitHub


Oh that’s nice to see! I missed that, big improvement.

Anyone who is a postdoc or higher at an academic institution can get funding. You can start applying for grants the month after you graduate with your PhD. There are plenty of folks in this thread alone who joined the Discouse in 2019, which means by 2024 they are ready to start applying for funding (note that as a postdoc you may need to apply for PI status at your institution, but in many cases it can be granted)! SciML is not special in this regard, I just made a very strong effort to do this immediately.

If anyone in the Julia ML community wants to start taking a step towards getting students and staff in funded positions working on these tools, I highly recommend applying for the NASA ROSES:

Co-I’s can be other folks in the Julia ML community. If the staff need more structure around them, we’d be happy to host them as visitors at the MIT Julia Lab if they are funded. I’d be happy to help folks craft a narrative if interested, though I’ll be doing a separate application on this towards SciML projects.