Flux results not similar to Tensorflow

GJBoth · March 9, 2019, 2:26pm

Hi everyone,

I currently do a lot of ML in Python using Tensorflow, which is working fine but Julia seems to be a bit more then fine, so I’m experimenting with Flux as an alternative. My first step is just to train a simple feed-forward NN on a relatively small dataset (2000 samples) using a simple MSE. In tensorflow this works fine and my MSE goes down to 10^-5 (synthetic data without noise so no overfitting) but somehow in Flux I can’t get it past 0.03. My code is below: does anyone have any idea why it’s not going past 0.035?

Thanks!

X = [[x, t] for x in data["x"] for t in data["t"]]
X = hcat(X...);

y = reshape(real(data["usol"]), (1, length(data["usol"])))

idx = randperm!(collect(1:length(y)));
X_train = X[:, idx][:, 1:2000];
y_train = y[idx][1:2000];

dataset = [(X_train, y_train)];

model = Chain(Dense(2, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 20, tanh),
              Dense(20, 1))
ps = params(model);

loss(x, y) = mean((model(x).-y).^2)
opt = ADAM(0.002, (0.99, 0.999))
evalcb() = @show(loss(X_train, y_train))
@Flux.epochs 5000 Flux.train!(loss, ps, dataset, opt, cb = Flux.throttle(evalcb, 5))

robsmith11 · March 9, 2019, 10:27pm

Is it possible to share the data or use another example that can be reproduced by others?

Showing the TF code that produces superior results would be helpful as well.

GJBoth · March 10, 2019, 10:45am

Here’s a link to the data:

https://drive.google.com/file/d/1-CruDQbbc6gz1zPMle_hY8kTxIT4maX4/view?usp=sharing

I copied my tensorflow code below (roughly, as it contains more extra stuff, but the basis is the same…). Basically, I don’t get why my model doesn’t train past 0.035 MSE; I just tried with significantly more layers and that doesn’t work either (in fact, the MSE is higher!) , so it’s not necessarily a tensorflow vs flux question.

    with tf.name_scope("Neural_Network"):
        X = data
        for layer in np.arange(len(config['layers'])-2):
            X = tf.layers.dense(X, units=config['layers'][layer+1], activation=tf.nn.tanh, kernel_initializer=tf.constant_initializer(config['initial_weights'][layer]), bias_initializer=tf.constant_initializer(config['initial_biases'][layer]))
        prediction = tf.layers.dense(inputs=X, units=config['layers'][-1], activation=None, kernel_initializer=tf.constant_initializer(config['initial_weights'][-1]), bias_initializer=tf.constant_initializer(config['initial_biases'][-1]))

MSE_costs = tf.reduce_mean(tf.square(target - prediction), axis=0)

and then just the standard ADAM optimizer and dataset input.

I can only conclude something is wrong with my code but I can’t figure out what…

jbrea · March 11, 2019, 9:26am

Do you use the same initialization for weigths and biases (there is an initW and initb keyword argument for the Dense function, similar to the kernel_initializer of TF) and the same parameters for ADAM?
To debug I would also try to use the exact same training data, i.e. idx and consequentally X_train and y_train should be the same in Flux and TF.
Does y_train have the right dimensions? I would write X_train = X[:, idx[1:2000]]; y_train = y[:, idx[1:2000]].
It wont make a difference, but there is also Flux.mse that could be used instead of your custom MSE loss function.

Topic		Replies	Views
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3120	December 18, 2019
Flux is lagging far beyond tensorflow with a pretty basic use case Machine Learning tensorflow , flux	5	1241	September 9, 2021
Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP) Machine Learning question , package	6	1489	December 3, 2019
Flux function fitting Machine Learning flux	2	1071	August 7, 2020
Params not getting updated during training New to Julia flux	25	1797	October 11, 2020

Flux results not similar to Tensorflow

Related topics