I would never recommend using Lux over Flux, I find Flux much simpler and more elegant. Having parameters and states as part of the model as Flux (and pytorch!) does instead of juggling them around feels much more natural.
Consider for instance the forward pass of a hypothetical TransformerBlock in Flux:
# Flux style
function (block::TransformerBlock)(x)
x = x + block.attn(block.ln_1(x))
x = x + block.mlp(block.ln_2(x))
return x
end
I haven’t played much with Lux, but I think the same forward pass in Lux would look like this instead:
# Lux style
function (block::TransformerBlock)(x, ps, st)
z, newst = block.ln_1(x, ps.ln_1, st.ln_1)
@reset st.ln_1 = newst
z, newst = block.attn(z, ps.attn, st.attn)
@reset st.attn = newst
x = x + z
z, newst = block.ln_2(x, ps.ln_2, st.ln_2)
@reset st.ln_2 = newst
z, newst = block.mlp(x, ps.mlp, st.mlp)
@reset st.mlp = newst
x = x + z
return x, st
end
I didn’t even bother to update the state, I’m not sure how much extra work that would be. Edit: update the example with the state update syntax.
Which one do you prefer?
Maybe the explicit parameter style is more convenient is some sciml use cases,
I don’t know, but we are talking about very niche applications within the deep learning world.
Regarding the promotion rules thing mentioned, yes in Flux we convert the inputs to the weights’ float type if needed (and give a warning in doing so). If you find yourself in a situation where you mixing different float types most likely you are doing something wrong, and this is why we introduced the conversion.
Is this worth building a whole new deep learning framework over? Maybe could have asked to remove the promotion a bit more strongly? Since the problem is so felt, I’m going to advocate for the removal of the promotion rules in Flux.jl.
That being said, Flux and Lux use the same layer definitions in NNLib.jl, CUDA.jl, etc., so having this different interface doesn’t actually hurt the ?community as much as one might imagine. The amount of reused work is extremely high, it’s just top level interfacing and documentation that is different
I think this is minimizing the problem. First of all, since the julia DL ecosystem is light years behind the python one and falling ever behind, if only a fraction of the amount of copying / adapting / redesigning / reimplementing that has gone into Lux had gone into Flux we would be in a slightly better place.
Also, the amount of code that is not shared is not small. And while Lux uses a bunch of packages like NNlib.jl, Optimisers.jl, Zygote.jl, MLUtils.jl that are maintained by the Flux team and the community at large, I don’t see any contribution going in the reverse direction.
For instance, I see in the readme of the LuxLib.jl repo
“Think of this package as a temporary location for functionalities that will move into NNlib.jl.”
LuxLib was created one year ago and I have yet to see any contribution going into NNlib.jl.
So my impression is that there is a tendency of Lux maintainers to develop the sciml universe without giving back to the rest of the ecosystem.
I understand that part of it is because contributing to mature packages has frictions, you have to go through reviews, discussions, and approval, it is a slower and much more constrained process than what goes on in Lux where a single guy does all of the development, creates PRs and merges them on the fly. Maybe we can do something about relaxing the constraints of the contribution process?
Bottom line, in 2024 I recommend using Flux.jl over Lux.jl, and sadly recommend using pytorch or jax over all of them. If we want to change this, I think we should avoid duplicating work, and more generally encourage julians to contribute to the DL ecosystem at large, because julia has all the potential to be a great language for deep learning but a much larger developer base is needed to make it happen.