Trying to implement VAE using Lux and reactant

yolhan_mannes · January 2, 2025, 10:06am

Just had a lot of fun implementing VAE (Variational AutoEncoder) with Lux and Reactant, it barely works, little summary of errors,

Reactant : errors with:

'stablehlo.transpose' op using value defined outside the region  
ERROR: "failed to run pass manager on module"

CPU with Enzyme errors with

Duplicated(Decoder,RefValue) error

note : gradient can be calculated when using Reactant, but not when using cpu, however training still fail

works with AutoZygote() on cpu, did not try on gpu.

repo : GitHub - yolhan83/MLX_exemples_julia_reactant: trying to implement some of the "mlx-examples" repo code

I know I should make a mwe from that but that may be hard, will see. If anyone has ideas they are welcome.

simeonschaub · January 2, 2025, 10:43am

I encountered similar errors before, I believe it typically means you are mixing different floating point precisions, which Reactant cannot handle ATM. Looking at your code, it looks like this line specifically could be the issue, as it’s using 0.5 which is of type Float64 and it looks like the rest of your code is using Float32. Otherwise it would help to see the whole stacktrace

yolhan_mannes · January 2, 2025, 11:46am

thanks, yes that could be bad, still not working same error, here is the full error

err.jl (85.0 KB)

btw loss and loss gradient compile fine its really in the optimisation process

simeonschaub · January 2, 2025, 12:00pm

If you look at the stacktrace, you can still see ConcreteRNumber{Float64} in there, meaning you are still promoting to Float64 somewhere. Since you say this is only in the optimization process, my best guess is that you need to make your learning rate in the Adam optimizer a Float32 as well, so replace 1e-3 with 1f-3

wsmoses · January 2, 2025, 2:31pm

The transpose issue is definitely a bug and shouldn’t happen (regardless of mixed precision or not, which should be fine?)

please open an issue with a reproducer (ideally with any amount of reduction for where it’s coming from)

Also cc @avikpal

avikpal · January 2, 2025, 2:49pm

That won’t help Optimisers.jl currently stores everything in Float64 (see Type Constraints in the Rule Structs · Issue #205 · FluxML/Optimisers.jl · GitHub).

avikpal · January 2, 2025, 2:55pm

Is this on the latest releases of Lux and Reactant?

I also happen to have a partial implementation of CVAE from the MLX repo Lux.jl/examples/ConditionalVAE/main.jl at ap/cvae2 · LuxDL/Lux.jl · GitHub, but probably needs to be updated

yolhan_mannes · January 2, 2025, 4:33pm

Yes it is the latest on both and your code looks very similar to mine, is it working fine ?

yolhan_mannes · January 2, 2025, 4:35pm

Thank you I was getting crazy trying to find where those are, I will try to make the mwe

avikpal · January 2, 2025, 4:49pm

yes, it’s training fine for the most part. There are some of the usual issues of VAEs with NaNs, which I am trying to sort out

avikpal · January 3, 2025, 5:59pm

The final functional version has been merged Lux.jl/examples/ConditionalVAE/main.jl at main · LuxDL/Lux.jl · GitHub

yolhan_mannes · January 3, 2025, 6:50pm

Thank you I’ve made mine work too by putting layers together in Chain instead of having them separate in the struct did not try to make the mwe yet though will see

avikpal · January 8, 2025, 9:02pm

the transpose issue has been resolved in the latest releases of Reactant (v0.2.17) and Lux (v1.5)

Topic		Replies	Views
Error when using Enzyme to train a model with frozen layers General Usage enzyme , reactant	2	58	June 5, 2025
Flux.jl: Error thrown during gradient calculation in conv VAE Machine Learning question , debug	4	1307	July 4, 2020
Error when a neural ode is implemented Machine Learning sciml , neural-network , lux	3	135	May 2, 2025
Flux failing on GPU Machine Learning	25	4052	February 21, 2020
Flux model-zoo: Error running vae_mnist.jl Machine Learning question , flux	3	740	August 3, 2020

Trying to implement VAE using Lux and reactant

Related topics