# Flux results not similar to Tensorflow

Hi everyone,

I currently do a lot of ML in Python using Tensorflow, which is working fine but Julia seems to be a bit more then fine, so I’m experimenting with Flux as an alternative. My first step is just to train a simple feed-forward NN on a relatively small dataset (2000 samples) using a simple MSE. In tensorflow this works fine and my MSE goes down to 10^-5 (synthetic data without noise so no overfitting) but somehow in Flux I can’t get it past 0.03. My code is below: does anyone have any idea why it’s not going past 0.035?

Thanks!

``````X = [[x, t] for x in data["x"] for t in data["t"]]
X = hcat(X...);

y = reshape(real(data["usol"]), (1, length(data["usol"])))

idx = randperm!(collect(1:length(y)));
X_train = X[:, idx][:, 1:2000];
y_train = y[idx][1:2000];

dataset = [(X_train, y_train)];

model = Chain(Dense(2, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 1))
ps = params(model);

loss(x, y) = mean((model(x).-y).^2)
evalcb() = @show(loss(X_train, y_train))
@Flux.epochs 5000 Flux.train!(loss, ps, dataset, opt, cb = Flux.throttle(evalcb, 5))
``````
3 Likes

Is it possible to share the data or use another example that can be reproduced by others?

Showing the TF code that produces superior results would be helpful as well.

Here’s a link to the data:

I copied my tensorflow code below (roughly, as it contains more extra stuff, but the basis is the same…). Basically, I don’t get why my model doesn’t train past 0.035 MSE; I just tried with significantly more layers and that doesn’t work either (in fact, the MSE is higher!) , so it’s not necessarily a tensorflow vs flux question.

``````    with tf.name_scope("Neural_Network"):
X = data
for layer in np.arange(len(config['layers'])-2):
X = tf.layers.dense(X, units=config['layers'][layer+1], activation=tf.nn.tanh, kernel_initializer=tf.constant_initializer(config['initial_weights'][layer]), bias_initializer=tf.constant_initializer(config['initial_biases'][layer]))
prediction = tf.layers.dense(inputs=X, units=config['layers'][-1], activation=None, kernel_initializer=tf.constant_initializer(config['initial_weights'][-1]), bias_initializer=tf.constant_initializer(config['initial_biases'][-1]))

MSE_costs = tf.reduce_mean(tf.square(target - prediction), axis=0)
``````

and then just the standard ADAM optimizer and dataset input.

I can only conclude something is wrong with my code but I can’t figure out what…

• Do you use the same initialization for weigths and biases (there is an `initW` and `initb` keyword argument for the `Dense` function, similar to the `kernel_initializer` of TF) and the same parameters for `ADAM`?
• To debug I would also try to use the exact same training data, i.e. `idx` and consequentally `X_train` and `y_train` should be the same in Flux and TF.
• Does `y_train` have the right dimensions? I would write `X_train = X[:, idx[1:2000]]; y_train = y[:, idx[1:2000]]`.
• It wont make a difference, but there is also `Flux.mse` that could be used instead of your custom MSE loss function.