Lux.jl vs Jax

ijk · September 20, 2024, 6:22pm

I’ve started using Julia this semester, and would like to reimplement some ML models from Jax to Lux.jl. I have three concerns:

I noticed there aren’t constructs like vmap. My existing Jax code has quite a few vmap calls. It’s crucial to take advantage of the GPU.
How fast is Lux.jl compared to Jax implementation for moderately beefy models?
I have a hard time designing structs which are to be vectorized. For example, I could have

struct Foo
  x::Float32
end

xs = Foo[...]

or

struct Foo
  x::Vector{Float32}
end

I prefer the first one since it is easier to reason with, but I believe the second is necessary for good GPU performance. Do I need to make this trade-off or am I missing something?

GPU Samplers. The Distributions.jl package seems to implement distributions on the CPU, but I don’t see anything about GPU sampling.

Thank you!

marius311 · September 20, 2024, 6:43pm

Welcome! Answers to 2 of your questions:

I noticed there aren’t constructs like vmap. My existing Jax code has quite a few vmap calls. It’s crucial to take advantage of the GPU.

No vmap in Julia currently unfortunately. Just try putting your loop on the outer most level with a normal map rather than pushing to the inner-most loop like vmap. As long as your inner-most loops already saturate the GPU, it won’t really matter (if they don’t though you do have to think more than the easy solution vmap offers)

I have a hard time designing structs which are to be vectorized. For example, I could have

A unique thing Julia can do that Jax can’t is put arrays of custom structs on GPU as long as the the structs are concrete (all types known at compile time). So e.g. this is totally valid and performant

julia> struct Foo
          x::Float32
       end

julia> cu(Foo.([1,2,3]))

3-element CuArray{Foo, 1, CUDA.Mem.DeviceBuffer}:
 Foo(1.0f0)
 Foo(2.0f0)
 Foo(3.0f0)

so if you find code is easier to write/reason about that way, go for it.

ijk · September 20, 2024, 6:59pm

Thank you! I didn’t know you can do that with structs. Could you elaborate on Q1 though, maybe with a simple example?

marcobonici · September 20, 2024, 7:25pm

On Q1, what do you want to do? Give the input NN a batch of input arrays simoultaneously?

jar1 · September 20, 2024, 7:29pm

I like StructArrays.jl which does the “struct of arrays” arrangement.

gdalle · September 20, 2024, 8:38pm

For Q1, I think the closest thing we’ll have in Julia is Reactant.jl, but it’s still a very very early prototype. @avikpal and @wsmoses would be able to tell us more

avikpal · September 21, 2024, 6:12pm

Generally beefy models on CUDA are just cuDNN calls internally, so it should be comparable performance. It is a bit hard to say for sure before looking at the model, but if there is a noticeable slowdown, open an issue or make a post here or in github and we can take a look.

For CPU it would be slower if you have conv routines (smaller models are actually faster than pytorch but that is not what you are interested in), and for ROCm again it depends because we don’t have all the bindings. Metal and oneAPI are experimental and performance there is quite bad atm.

But overall, the eventual goal is to be able to compile via Reactant, and if you take a look at the Reactant and Lux repo, we are actively working on an easy way to take a Lux model and make it faster.

Topic		Replies	Views
Julia (AcceleratedKernels) vs JAX time comparison Performance gpu , performance , kernelabstractions , jax	21	744	June 11, 2025
What happened to XLA.jl Machine Learning	16	4737	March 12, 2023
[ANN] Lux.jl: Explicitly Parameterized Neural Networks in Julia Package Announcements package , announcement , machine-learning	50	11453	April 27, 2024
Flux benchmark being too slow vs Jax Machine Learning gpu	11	1690	February 15, 2023
Lux.jl with GPU error Machine Learning	3	131	October 10, 2024

Lux.jl vs Jax

Related topics