Trying to implement VAE using Lux and reactant

Just had a lot of fun implementing VAE (Variational AutoEncoder) with Lux and Reactant, it barely works, little summary of errors,

Reactant : errors with:

'stablehlo.transpose' op using value defined outside the region  
ERROR: "failed to run pass manager on module"  

CPU with Enzyme errors with

Duplicated(Decoder,RefValue) error  

note : gradient can be calculated when using Reactant, but not when using cpu, however training still fail

works with AutoZygote() on cpu, did not try on gpu.

repo : GitHub - yolhan83/MLX_exemples_julia_reactant: trying to implement some of the "mlx-examples" repo code

I know I should make a mwe from that but that may be hard, will see. If anyone has ideas they are welcome.

1 Like

I encountered similar errors before, I believe it typically means you are mixing different floating point precisions, which Reactant cannot handle ATM. Looking at your code, it looks like this line specifically could be the issue, as it’s using 0.5 which is of type Float64 and it looks like the rest of your code is using Float32. Otherwise it would help to see the whole stacktrace

1 Like

thanks, yes that could be bad, still not working same error, here is the full error

err.jl (85.0 KB)

btw loss and loss gradient compile fine its really in the optimisation process

If you look at the stacktrace, you can still see ConcreteRNumber{Float64} in there, meaning you are still promoting to Float64 somewhere. Since you say this is only in the optimization process, my best guess is that you need to make your learning rate in the Adam optimizer a Float32 as well, so replace 1e-3 with 1f-3

The transpose issue is definitely a bug and shouldn’t happen (regardless of mixed precision or not, which should be fine?)

please open an issue with a reproducer (ideally with any amount of reduction for where it’s coming from)

Also cc @avikpal

That won’t help Optimisers.jl currently stores everything in Float64 (see Type Constraints in the Rule Structs · Issue #205 · FluxML/Optimisers.jl · GitHub).

Is this on the latest releases of Lux and Reactant?

I also happen to have a partial implementation of CVAE from the MLX repo Lux.jl/examples/ConditionalVAE/main.jl at ap/cvae2 · LuxDL/Lux.jl · GitHub, but probably needs to be updated

Yes it is the latest on both and your code looks very similar to mine, is it working fine ?

Thank you I was getting crazy trying to find where those are, I will try to make the mwe

yes, it’s training fine for the most part. There are some of the usual issues of VAEs with NaNs, which I am trying to sort out

The final functional version has been merged Lux.jl/examples/ConditionalVAE/main.jl at main · LuxDL/Lux.jl · GitHub

3 Likes

Thank you I’ve made mine work too by putting layers together in Chain instead of having them separate in the struct did not try to make the mwe yet though will see

the transpose issue has been resolved in the latest releases of Reactant (v0.2.17) and Lux (v1.5)