Flux results not similar to Tensorflow

Hi everyone,

I currently do a lot of ML in Python using Tensorflow, which is working fine but Julia seems to be a bit more then fine, so I’m experimenting with Flux as an alternative. My first step is just to train a simple feed-forward NN on a relatively small dataset (2000 samples) using a simple MSE. In tensorflow this works fine and my MSE goes down to 10^-5 (synthetic data without noise so no overfitting) but somehow in Flux I can’t get it past 0.03. My code is below: does anyone have any idea why it’s not going past 0.035?

Thanks!

X = [[x, t] for x in data["x"] for t in data["t"]]
X = hcat(X...);

y = reshape(real(data["usol"]), (1, length(data["usol"])))

idx = randperm!(collect(1:length(y)));
X_train = X[:, idx][:, 1:2000];
y_train = y[idx][1:2000];

dataset = [(X_train, y_train)];

model = Chain(Dense(2, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 1))
ps = params(model);

loss(x, y) = mean((model(x).-y).^2)
opt = ADAM(0.002, (0.99, 0.999))
evalcb() = @show(loss(X_train, y_train))
@Flux.epochs 5000 Flux.train!(loss, ps, dataset, opt, cb = Flux.throttle(evalcb, 5))
3 Likes

Is it possible to share the data or use another example that can be reproduced by others?

Showing the TF code that produces superior results would be helpful as well.

Here’s a link to the data:

https://drive.google.com/file/d/1-CruDQbbc6gz1zPMle_hY8kTxIT4maX4/view?usp=sharing

I copied my tensorflow code below (roughly, as it contains more extra stuff, but the basis is the same…). Basically, I don’t get why my model doesn’t train past 0.035 MSE; I just tried with significantly more layers and that doesn’t work either (in fact, the MSE is higher!) , so it’s not necessarily a tensorflow vs flux question.

    with tf.name_scope("Neural_Network"):
        X = data
        for layer in np.arange(len(config['layers'])-2):
            X = tf.layers.dense(X, units=config['layers'][layer+1], activation=tf.nn.tanh, kernel_initializer=tf.constant_initializer(config['initial_weights'][layer]), bias_initializer=tf.constant_initializer(config['initial_biases'][layer]))
        prediction = tf.layers.dense(inputs=X, units=config['layers'][-1], activation=None, kernel_initializer=tf.constant_initializer(config['initial_weights'][-1]), bias_initializer=tf.constant_initializer(config['initial_biases'][-1]))

MSE_costs = tf.reduce_mean(tf.square(target - prediction), axis=0)

and then just the standard ADAM optimizer and dataset input.

I can only conclude something is wrong with my code but I can’t figure out what…

  • Do you use the same initialization for weigths and biases (there is an initW and initb keyword argument for the Dense function, similar to the kernel_initializer of TF) and the same parameters for ADAM?
  • To debug I would also try to use the exact same training data, i.e. idx and consequentally X_train and y_train should be the same in Flux and TF.
  • Does y_train have the right dimensions? I would write X_train = X[:, idx[1:2000]]; y_train = y[:, idx[1:2000]].
  • It wont make a difference, but there is also Flux.mse that could be used instead of your custom MSE loss function.