 # Flux results not similar to Tensorflow

#1

Hi everyone,

I currently do a lot of ML in Python using Tensorflow, which is working fine but Julia seems to be a bit more then fine, so I’m experimenting with Flux as an alternative. My first step is just to train a simple feed-forward NN on a relatively small dataset (2000 samples) using a simple MSE. In tensorflow this works fine and my MSE goes down to 10^-5 (synthetic data without noise so no overfitting) but somehow in Flux I can’t get it past 0.03. My code is below: does anyone have any idea why it’s not going past 0.035?

Thanks!

``````X = [[x, t] for x in data["x"] for t in data["t"]]
X = hcat(X...);

y = reshape(real(data["usol"]), (1, length(data["usol"])))

idx = randperm!(collect(1:length(y)));
X_train = X[:, idx][:, 1:2000];
y_train = y[idx][1:2000];

dataset = [(X_train, y_train)];

model = Chain(Dense(2, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 20, tanh),
Dense(20, 1))
ps = params(model);

loss(x, y) = mean((model(x).-y).^2)
opt = ADAM(0.002, (0.99, 0.999))
evalcb() = @show(loss(X_train, y_train))
@Flux.epochs 5000 Flux.train!(loss, ps, dataset, opt, cb = Flux.throttle(evalcb, 5))
``````

#2

Is it possible to share the data or use another example that can be reproduced by others?

Showing the TF code that produces superior results would be helpful as well.

#3

Here’s a link to the data:

I copied my tensorflow code below (roughly, as it contains more extra stuff, but the basis is the same…). Basically, I don’t get why my model doesn’t train past 0.035 MSE; I just tried with significantly more layers and that doesn’t work either (in fact, the MSE is higher!) , so it’s not necessarily a tensorflow vs flux question.

``````    with tf.name_scope("Neural_Network"):
X = data
for layer in np.arange(len(config['layers'])-2):
X = tf.layers.dense(X, units=config['layers'][layer+1], activation=tf.nn.tanh, kernel_initializer=tf.constant_initializer(config['initial_weights'][layer]), bias_initializer=tf.constant_initializer(config['initial_biases'][layer]))
prediction = tf.layers.dense(inputs=X, units=config['layers'][-1], activation=None, kernel_initializer=tf.constant_initializer(config['initial_weights'][-1]), bias_initializer=tf.constant_initializer(config['initial_biases'][-1]))

MSE_costs = tf.reduce_mean(tf.square(target - prediction), axis=0)
``````

and then just the standard ADAM optimizer and dataset input.

I can only conclude something is wrong with my code but I can’t figure out what…

#4
• Do you use the same initialization for weigths and biases (there is an `initW` and `initb` keyword argument for the `Dense` function, similar to the `kernel_initializer` of TF) and the same parameters for `ADAM`?
• To debug I would also try to use the exact same training data, i.e. `idx` and consequentally `X_train` and `y_train` should be the same in Flux and TF.
• Does `y_train` have the right dimensions? I would write `X_train = X[:, idx[1:2000]]; y_train = y[:, idx[1:2000]]`.
• It wont make a difference, but there is also `Flux.mse` that could be used instead of your custom MSE loss function.