Upsampling in Flux.jl

I try to implement a convolutional auto-encoder with Flux.jl . I have a simple version working in python (Tensforflow Keras API). However the Flux.jl based model does not seem to converge and gives poor results even on the training dataset.

Here is my code using MNIST:

using Flux, Flux.Data.MNIST
using Flux: @epochs, mse, throttle
using Base.Iterators: partition
using CuArrays
using Flux.Tracker: TrackedArray, track, @grad

# return a list of batches; every batch has the size (28,28,1,batch_size)
# The last batch can be smaller
function getdata(params...; batch_size = 64)
    imgs = MNIST.images(params...)
    @show length(imgs)

    # Partition into batches
    data = [reshape(cat(float.(imgs)...; dims = 3),(28,28,1,:)) for imgs in partition(imgs, batch_size)];
    data = [gpu(Float32.(d)) for d in data];
    return data
end

# https://github.com/FluxML/NNlib.jl/pull/95
function upsample(x)
    ratio = (2,2,1,1)
    y = similar(x, (size(x) .* ratio)...)
    for i in Iterators.product(Base.OneTo.(ratio)...)
        loc = map((i,r,s)->range(i, stop = s, step = r), i, ratio, size(y))
        @inbounds y[loc...] = x
    end
    y
end

model = Chain(Conv((3, 3), 1=>16, pad=(1,1), relu),
              MaxPool((2,2)),
              Conv((3, 3), 16=>8, pad=(1,1), relu),
              MaxPool((2,2)) ,

              Conv((3, 3), 8=>8, pad=(1,1), relu),

              upsample,
              Conv((3, 3), 8=>16, pad=(1,1), relu) ,
              upsample,
              Conv((3, 3), 16=>1, pad=(1,1), relu)) |> gpu;


loss(x) = mse(model(x), x)

# get training data
data = getdata()
@show size(model(data[1]))
@show loss(data[1])

evalcb = throttle(() -> @show(loss(data[1])), 5)
opt = ADAM()

@epochs 50 Flux.train!(loss, params(model), zip(data), opt, cb = evalcb)

# get testing data
data_test = getdata(:test)

testMSE = 0
for d in data_test
    global testMSE
    testMSE += size(d,4) * Tracker.data(loss(d))
end

testMSE /= sum(size.(data_test,4))

After 50 epochs, I get an MSE of 0.04334188. In Tensorflow/Keras I get an MSE of 0.004307 after 5 epochs.

For reference, here is also the Python code:

import tensorflow as tf
import tensorflow.keras.layers as layers

import numpy as np

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

print("Number of training images ",x_train.shape[0])



model = tf.keras.models.Sequential([
    layers.Reshape((28, 28,1),input_shape=(28,28)),
    layers.Conv2D(filters=16,kernel_size=3,padding="same",activation='relu'),
    layers.MaxPooling2D(pool_size=2),
    layers.Conv2D(filters=8,kernel_size=3,padding="same",activation='relu'),
    layers.MaxPooling2D(pool_size=2),

    layers.Conv2D(filters=8,kernel_size=3,padding="same",activation='relu'),

    layers.UpSampling2D(size=(2,2)),
    layers.Conv2D(filters=16,kernel_size=3,padding="same",activation='relu'),
    layers.UpSampling2D(size=(2,2)),
    layers.Conv2D(filters=1,kernel_size=3,padding="same",activation='relu'),
    layers.Reshape((28, 28))
])

model.compile(optimizer='adam',
              loss='MSE')

model.fit(x_train, x_train, epochs=5, batch_size=64)
#model.evaluate(x_test, x_test)
print("MSE",np.mean((model.predict(x_test) - x_test)**2))

I got the upsampling function from a pending PR on NNlib.jl. The code looks correct to me, but could it be that Flux.Tracker is not able to compute its gradient property?

1 Like

I just checked that Flux.jl and Tensorflow use the parameters (such as learning rate) values for the optimizations:


https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer

I also tried without relu in the final layer, but the results are quite similar:

Flux MSE after 50 epoch: 0.063297756f0
Tensorflow MSE after 5 epoch: 0.005704676563238298

So still a factor of 10 (even after 50 epoch compared to just 5 epochs)
Here is the first test image for flux:
flux

And for tensorflow:
tensorflow

1 Like