For the past few days, I started working on an ML project.
I looked at Flux.jl
and SimpleChains.jl
for doing pure Julia Deep Learning and TensorFlow, but I could not make them agree!
The data: let say the model tries to infer the mean (and possibly std) of a sample distribution.
using Flux
using SimpleChains
using PyCall, Conda
using Distributions
using Plots
n = 20
N = 5000
T = Float32
x_train = rand(T, n, N)
y_train = mean(x_train, dims=1)
# y_train = vcat(mean(x_train, dims=1), std(x_train, dims=1))
dim_output = size(y_train, 1)
Nepoch = 5
With TensorFlow
(using PyCall)
#* tensorflow *#
py"""
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.models import Sequential
"""
py"""# data to tensorflow format
y_train = tf.constant($(permutedims(y_train)), dtype=tf.float32)
x_train = tf.constant($(permutedims(x_train)), dtype=tf.float32)
"""
py"""# Model definition
inputs = Input(shape=($n,))
x = Dense(512, activation="relu")(inputs)
x = Dropout(0.1)(x, training=True)
x = Dense(512, activation="relu")(x)
x = Dropout(0.5)(x, training=True)
outputs = Dense($dim_output)(x)
model = tf.keras.Model(inputs, outputs)
"""
py"""# optimiser
model.compile(loss="mean_squared_error", optimizer="adam")
"""
# training
history_tf = py"model.fit(x_train, y_train, epochs=$Nepoch, verbose=0)"
losses_TF = history_tf.history["loss"]
With Flux. I understand there were some major change recently and I was kinda lost to know which training format to use for my model. I decided to use the huge for loop to save the loss
at each iteration (could not find an option with train!
).
With Flux.jl
#* Flux *#
model_flux = Chain(
Dense(n => 512, Flux.relu),
Flux.Dropout(0.1),
Dense(512 => 512, Flux.relu),
Flux.Dropout(0.5),
Dense(512, dim_output)
)
optim_flux = Flux.setup(Adam(), model_flux)
losses_Flux = []
for epoch in 1:Nepoch
for (x, y) in [(x_train, y_train)]
loss, grads = Flux.withgradient(model_flux) do m
# Evaluate model and loss inside gradient context:
y_hat = m(x)
Flux.mse(y_hat, y)
end
Flux.update!(optim_flux, model_flux, grads[1])
push!(losses_Flux, loss)
end
end
With SimpleChain.jl
. Similarly to train!
, train_unbatched!
does not seem to have an option to save loss automatically. So to save it I have to train_unbatched!
for one loop and save the result.
BTW: it is really unclear from the git and docs that the result of model_SC(x_train, p)
is the loss and not y_hat
.
I had to figure out that model_SC_evaluate (x_train, p)
(without the loss) does that.
# * SimpleChains * #
model_SC_evaluate = SimpleChain(
static(n), # input dimension (optional)
TurboDense{true}(SimpleChains.relu, 512), # dense layer with bias
SimpleChains.Dropout(0.1), # dropout layer
TurboDense{true}(SimpleChains.relu, 512), # dense layer with bias
SimpleChains.Dropout(0.5), # dropout layer
TurboDense{false}(identity, dim_output) # dense layer without bias
)
model_SC = SimpleChains.add_loss(model_SC_evaluate, SquaredLoss(y_train))
p = SimpleChains.init_params(model_SC)
g = similar(p)
losses_SC = []
for epoch in 1:Nepoch
SimpleChains.train_unbatched!(g, p, model_SC, x_train, SimpleChains.ADAM(), 1)
push!(losses_SC, model_SC(x_train, p))
end
Now the 3 losses are widely different. Ok they have random initialization but here the difference is too much.
What am I doing wrong?