I was following the Keras presentation at the Google IO 2021 by
Martin Gorner and François Chollet (creator of Keras) about convolutional varitional autoencoder (VAE). As a way to learn about VAE, I re-implemented the notebook in Flux.jl.
I run into some small issues about asymmetric padding (CuDNN: Support for asymmetric padding? · Issue #128 · JuliaGPU/CUDA.jl · GitHub).
My work-around was to do some manual cropping.
In the end, I got equivalent results than the Keras implementation from François Chollet (the model has exactly the same number of parameters, and results are very similar, but not exactly the same as the network is initialised randomly).
But to my surprise, Flux.jl (2.6 second per epoch without compilation) was about twice as fast as Keras with Tensorflow back-end (5.05 seconds per epoch) on the same hardware (GeForce RTX 3080). The Keras notebook shows two ways to implement a VAE, but both have the same performance. It is clear that MNIST is a very small dataset and it would be interesting to know the performance with larger datasets.
As I posted before some issues with reaching similar performance between Flux.jl and Tensorflow, I thought this would be an interesting follow-up.
Here is my Flux.jl code
Here is the Keras code:
Let me know in case there is a problem with my Flux.jl code, but at least the Keras implementation should be good!
Kudos to Flux and CUDA developpers!