I currently do a lot of ML in Python using Tensorflow, which is working fine but Julia seems to be a bit more then fine, so I’m experimenting with Flux as an alternative. My first step is just to train a simple feed-forward NN on a relatively small dataset (2000 samples) using a simple MSE. In tensorflow this works fine and my MSE goes down to 10^-5 (synthetic data without noise so no overfitting) but somehow in Flux I can’t get it past 0.03. My code is below: does anyone have any idea why it’s not going past 0.035?
Thanks!
X = [[x, t] for x in data["x"] for t in data["t"]]
X = hcat(X...);
y = reshape(real(data["usol"]), (1, length(data["usol"])))
idx = randperm!(collect(1:length(y)));
X_train = X[:, idx][:, 1:2000];
y_train = y[idx][1:2000];
dataset = [(X_train, y_train)];
model = Chain(Dense(2, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 1))
ps = params(model);
loss(x, y) = mean((model(x).-y).^2)
opt = ADAM(0.002, (0.99, 0.999))
evalcb() = @show(loss(X_train, y_train))
@Flux.epochs 5000 Flux.train!(loss, ps, dataset, opt, cb = Flux.throttle(evalcb, 5))
I copied my tensorflow code below (roughly, as it contains more extra stuff, but the basis is the same…). Basically, I don’t get why my model doesn’t train past 0.035 MSE; I just tried with significantly more layers and that doesn’t work either (in fact, the MSE is higher!) , so it’s not necessarily a tensorflow vs flux question.
with tf.name_scope("Neural_Network"):
X = data
for layer in np.arange(len(config['layers'])-2):
X = tf.layers.dense(X, units=config['layers'][layer+1], activation=tf.nn.tanh, kernel_initializer=tf.constant_initializer(config['initial_weights'][layer]), bias_initializer=tf.constant_initializer(config['initial_biases'][layer]))
prediction = tf.layers.dense(inputs=X, units=config['layers'][-1], activation=None, kernel_initializer=tf.constant_initializer(config['initial_weights'][-1]), bias_initializer=tf.constant_initializer(config['initial_biases'][-1]))
MSE_costs = tf.reduce_mean(tf.square(target - prediction), axis=0)
and then just the standard ADAM optimizer and dataset input.
I can only conclude something is wrong with my code but I can’t figure out what…
Do you use the same initialization for weigths and biases (there is an initW and initb keyword argument for the Dense function, similar to the kernel_initializer of TF) and the same parameters for ADAM?
To debug I would also try to use the exact same training data, i.e. idx and consequentally X_train and y_train should be the same in Flux and TF.
Does y_train have the right dimensions? I would write X_train = X[:, idx[1:2000]]; y_train = y[:, idx[1:2000]].
It wont make a difference, but there is also Flux.mse that could be used instead of your custom MSE loss function.