The same network performs differently in Flux.jl and tensorflow

Hi,

I am writing a toy model to test the performance of Flux.jl. I generated some dummy data with the following code

import numpy as np

traindata=np.random.random((10000,50))
target=np.random.random(10000)

np.savetxt("traindata.csv",traindata,delimiter=',')
np.savetxt("target.csv",target,delimiter=',')

and then write a single dense layer model with relu activation to realize a non-linear regression.

In Python with tensorflow, the code is

import numpy as np
import tensorflow as tf

traindata=np.loadtxt("traindata.csv",delimiter=',')
target=np.loadtxt("target.csv",delimiter=',')
print(traindata.shape,target.shape)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(1,input_shape=(50,),activation='relu',kernel_initializer='glorot_uniform'),
])
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['mean_squared_error'])
model.fit(traindata,target,epochs=100,verbose=2)

and in Julia with Flux.jl, it is

using Base.Iterators: repeated
using CSV,Random,Printf
using Flux
using Flux: glorot_uniform

traindata=Matrix(CSV.read("traindata.csv"; header=false))'
target=Matrix(CSV.read("target.csv"; header=false))'

model=Chain(Dense(50,1,relu,initW = glorot_uniform))
loss(x, y) = Flux.mse(model(x), y)
opt = ADAM()
dataset = repeated((traindata, target),100)
evalcb = () -> @show(loss(traindata, target))
Flux.train!(loss, params(model), dataset, opt, cb=evalcb)

However, the results of them are very different. In Python withtensorflow, the mse loss decreases very fast

Epoch 1/100
10000/10000 - 0s - loss: 0.1981 - mean_squared_error: 0.1981
Epoch 2/100
10000/10000 - 0s - loss: 0.1423 - mean_squared_error: 0.1423
Epoch 3/100
10000/10000 - 0s - loss: 0.1033 - mean_squared_error: 0.1033
Epoch 4/100
10000/10000 - 0s - loss: 0.0896 - mean_squared_error: 0.0896
Epoch 5/100
10000/10000 - 0s - loss: 0.0861 - mean_squared_error: 0.0861
Epoch 6/100
10000/10000 - 0s - loss: 0.0851 - mean_squared_error: 0.0851
Epoch 7/100
10000/10000 - 0s - loss: 0.0845 - mean_squared_error: 0.0845
Epoch 8/100
10000/10000 - 0s - loss: 0.0847 - mean_squared_error: 0.0847
Epoch 9/100
10000/10000 - 0s - loss: 0.0843 - mean_squared_error: 0.0843
Epoch 10/100
10000/10000 - 0s - loss: 0.0844 - mean_squared_error: 0.0844

and the final loss after 100 epochs is about 0.08.

But in Julia with Flux.jl, the loss decreases slow and seems to be trapped in local minimum.

loss(traindata, target) = 0.20698824682017267 (tracked)
loss(traindata, target) = 0.20629590458383318 (tracked)
loss(traindata, target) = 0.20560309354360407 (tracked)
loss(traindata, target) = 0.2049097923861889 (tracked)
loss(traindata, target) = 0.20421840230183272 (tracked)
loss(traindata, target) = 0.20352757445130545 (tracked)
loss(traindata, target) = 0.20283026868343568 (tracked)
loss(traindata, target) = 0.20213053943995535 (tracked)
loss(traindata, target) = 0.20142913955620284 (tracked)
loss(traindata, target) = 0.20072485457048353 (tracked)

The final loss after 100 epochs remains 0.17.

The experiment has been repeated several times to avoid the influence of the random seed, but the trend is the same: model built with tensorflow performs better than the model build with Flux.jl, even if they have same structure, activation and initialization. What’s the reason behind this frustrating phenomenon?

Thank you very much!

2 Likes

As there is no relationship between input and output, the best the NN can do is to return the mean i.e. 0.5. So the expected MSE is 1/12 = 0.083333 (the variance of a uniform standard distribution). So it seems that tensorflow gives the correct result. But flux seems to still give random numbers which is indeed strange.

As a test I would try with a different activation function as rely has a zero gradient for negative values.

Thank you. As you said I tried a linear activation (in tensorflow it’s 'linear' and in Flux.jl it’s 'identity'), the trend remains the same.

In Flux.jl the loss is

loss(traindata, target) = 0.26388273830343645 (tracked)
loss(traindata, target) = 0.254745100985269 (tracked)
loss(traindata, target) = 0.24702623084351585 (tracked)
loss(traindata, target) = 0.24071751586112944 (tracked)
loss(traindata, target) = 0.23578285877461885 (tracked)
loss(traindata, target) = 0.23215070433838075 (tracked)
loss(traindata, target) = 0.2297068552543899 (tracked)
loss(traindata, target) = 0.22829085699168883 (tracked)
loss(traindata, target) = 0.22769951121435944 (tracked)
loss(traindata, target) = 0.22770057233706334 (tracked)

The final loss after 100 epochs is about 0.17.

In tensorflow the loss is

Epoch 1/100
10000/10000 - 1s - loss: 0.2868 - mean_squared_error: 0.2868
Epoch 2/100
10000/10000 - 1s - loss: 0.1848 - mean_squared_error: 0.1848
Epoch 3/100
10000/10000 - 1s - loss: 0.1390 - mean_squared_error: 0.1390
Epoch 4/100
10000/10000 - 1s - loss: 0.1101 - mean_squared_error: 0.1101
Epoch 5/100
10000/10000 - 1s - loss: 0.0951 - mean_squared_error: 0.0951
Epoch 6/100
10000/10000 - 1s - loss: 0.0883 - mean_squared_error: 0.0883
Epoch 7/100
10000/10000 - 1s - loss: 0.0858 - mean_squared_error: 0.0858
Epoch 8/100
10000/10000 - 1s - loss: 0.0847 - mean_squared_error: 0.0847
Epoch 9/100
10000/10000 - 1s - loss: 0.0844 - mean_squared_error: 0.0844
Epoch 10/100
10000/10000 - 1s - loss: 0.0844 - mean_squared_error: 0.0844

The final loss after 100 epochs is 0.0833.

Could the batch size be an issue? It seems that keras defaults to 32 if unspecified (https://keras.io/models/model/).

It seems to work with a batch size of 32 (and still a relu activation function)

using Base.Iterators: repeated
using CSV,Random,Printf
using Flux
using Flux: glorot_uniform

traindata=Matrix(CSV.read("traindata.csv"; header=false))'
target=Matrix(CSV.read("target.csv"; header=false))'

model=Chain(Dense(50,1,relu,initW = glorot_uniform))
loss(x, y) = Flux.mse(model(x), y)
opt = ADAM()

dataset_batch = [(traindata[:,ind],target[:,ind])  for ind in partition(1:length(target),32) ];

for epoch = 1:100
   Flux.train!(loss, params(model), dataset_batch, opt)
  @show epoch,loss(traindata, target)
end

After 10 epoch I get now:

epoch, loss(traindata, target)) = (1, 0.15714580257711558 (tracked))
(epoch, loss(traindata, target)) = (2, 0.11063598723179667 (tracked))
(epoch, loss(traindata, target)) = (3, 0.09125624982175756 (tracked))
(epoch, loss(traindata, target)) = (4, 0.08590421903194571 (tracked))
(epoch, loss(traindata, target)) = (5, 0.0845466921288617 (tracked))
(epoch, loss(traindata, target)) = (6, 0.08419594869003737 (tracked))
(epoch, loss(traindata, target)) = (7, 0.08407588604495675 (tracked))
(epoch, loss(traindata, target)) = (8, 0.08400488181621797 (tracked))
(epoch, loss(traindata, target)) = (9, 0.08394463978187157 (tracked))
(epoch, loss(traindata, target)) = (10, 0.08388823812964562 (tracked))
[...]
6 Likes

Friends dont let friends use minibatches larger than 32

5 Likes

This is the reason. The official document of flux.jl seems not to mention how to set the batchsize. Perhaps I should open an issue to ask them to add the information. Thank you very much!

4 Likes

I agree, it is not so obvious to find such information.

Thanks you very much, I had the same problem and I saw your post.

I believe this code also needs using Base.Iterators:partition, otherwise, partition is not defined. Thanks for the nice minibatch example.

I am new in Julia and Flux world and I would like to test a simple neural network:
My training data are

training_X
training_Y

size(training_X) # 10000 times pre-calculated profile (lenght 80) for three parameters

(10000,80)

size(training_Y)

(10000,3)

My network is like this:

using Flux, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
using Printf, BSON

model = Chain(
Dense(80,256,tanh),
Dense(256,256,tanh),
Dense(256,256,tanh),
Dense(256,3),
)

loss(x, y) = Flux.mse(model(x), y)
opt = ADAM()

I used the batch size 32

batch_size = 32

dataset_batch = [(training_X[ind,:],training_Y[ind,:]) for ind in partition(1:size(training_Y,1),batch_size) ];

dataset_batch[1][1]

32×80 Array{Float16,2}:

Flux.train!(loss, params(model), dataset_batch, opt)

I got this ERROR:
DimensionMismatch(“A has dimensions (80,256) but B has dimensions (32,80)”)

In keras it is writen as follows:

from tensorflow.keras import Model
from tensorflow.keras.layers import LSTM, Input, Dense

inputs = Input(shape=(training_X.shape[1],))
x = Dense(256, activation=‘tanh’)(inputs)
x = Dense(256, activation=‘tanh’)(x)
x = Dense(256, activation=‘tanh’)(x)
outputs = Dense(3, activation=‘linear’)(x) # three ouput for regression

ffn_model = Model(inputs, outputs)

ffn_model.compile(loss=‘mean_squared_error’,
optimizer=‘adam’,
metrics=[‘mae’])

Is it possible to specify the input shape similarly as in keras?
Input(shape=(training_X.shape[1],)

SOLUTION: TRANSPOSE THE INPUT DATA

Perfect! Thank you.

One more question about GPU

I tried to train model using GPU

training_X = gpu.(training_X)
training_Y = gpu.(training_Y)

model = gpu(model)

N_epochs = 50
loss_train = zeros(N_epochs,1)
loss_test = zeros(N_epochs,1)

for epoch = 1:N_epochs
Flux.train!(loss, params(model), dataset_batch, opt)
loss_train[epoch] = loss(training_X, training_Y)
@show epoch,loss(training_X, training_Y)
end

I got the following error:

ArgumentError: cannot take the CPU address of a CuArray{Float32,2,Nothing}

Stacktrace:
[1] unsafe_convert(::Type{Ptr{Float32}}, ::CuArray{Float32,2,Nothing}) at /home/otobrzo/.julia/packages/CuArrays/ZYCpV/src/array.jl:212
[2] gemm!(::Char, ::Char, ::Float32, ::CuArray{Float32,2,Nothing}, ::Array{Float32,2}, ::Float32, ::Array{Float32,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/LinearAlgebra/src/blas.jl:1131
[3] gemm_wrapper!(::Array{Float32,2}, ::Char, ::Char, ::CuArray{Float32,2,Nothing}, ::Array{Float32,2}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/LinearAlgebra/src/matmul.jl:464
[4] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/LinearAlgebra/src/matmul.jl:145 [inlined]
[5] forward at /home/otobrzo/.julia/packages/Tracker/SAr25/src/lib/array.jl:415 [inlined]
[6] #track#1 at /home/otobrzo/.julia/packages/Tracker/SAr25/src/Tracker.jl:51 [inlined]
[7] track at /home/otobrzo/.julia/packages/Tracker/SAr25/src/Tracker.jl:51 [inlined]
[8] * at /home/otobrzo/.julia/packages/Tracker/SAr25/src/lib/array.jl:378 [inlined]
[9] Dense at /home/otobrzo/.julia/packages/Flux/qXNjB/src/layers/basic.jl:99 [inlined]
[10] Dense at /home/otobrzo/.julia/packages/Flux/qXNjB/src/layers/basic.jl:110 [inlined]
[11] (::Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}})(::Array{Float16,2}) at /home/otobrzo/.julia/packages/Flux/qXNjB/src/layers/basic.jl:113
[12] applychain(::Tuple{Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}},Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}},Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}},Dense{typeof(identity),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}}}, ::Array{Float16,2}) at /home/otobrzo/.julia/packages/Flux/qXNjB/src/layers/basic.jl:31
[13] (::Chain{Tuple{Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}},Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}},Dense{typeof(tanh),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}},Dense{typeof(identity),TrackedArray{…,CuArray{Float32,2,Nothing}},TrackedArray{…,CuArray{Float32,1,Nothing}}}}})(::Array{Float16,2}) at /home/otobrzo/.julia/packages/Flux/qXNjB/src/layers/basic.jl:33
[14] loss(::Array{Float16,2}, ::Array{Float16,2}) at ./In[22]:30
[15] #15 at /home/otobrzo/.julia/packages/Flux/qXNjB/src/optimise/train.jl:72 [inlined]
[16] gradient
(::getfield(Flux.Optimise, Symbol("##15#21")){typeof(loss),Tuple{Array{Float16,2},Array{Float16,2}}}, ::Tracker.Params) at /home/otobrzo/.julia/packages/Tracker/SAr25/src/back.jl:97
[17] #gradient#24(::Bool, ::typeof(Tracker.gradient), ::Function, ::Tracker.Params) at /home/otobrzo/.julia/packages/Tracker/SAr25/src/back.jl:164
[18] gradient at /home/otobrzo/.julia/packages/Tracker/SAr25/src/back.jl:164 [inlined]
[19] macro expansion at /home/otobrzo/.julia/packages/Flux/qXNjB/src/optimise/train.jl:71 [inlined]
[20] macro expansion at /home/otobrzo/.julia/packages/Juno/oLB1d/src/progress.jl:134 [inlined]
[21] #train!#12(::getfield(Flux.Optimise, Symbol("##16#22")), ::typeof(Flux.Optimise.train!), ::Function, ::Tracker.Params, ::Array{Tuple{Array{Float16,2},Array{Float16,2}},1}, ::ADAM) at /home/otobrzo/.julia/packages/Flux/qXNjB/src/optimise/train.jl:69
[22] train!(::Function, ::Tracker.Params, ::Array{Tuple{Array{Float16,2},Array{Float16,2}},1}, ::ADAM) at /home/otobrzo/.julia/packages/Flux/qXNjB/src/optimise/train.jl:67
[23] top-level scope at ./In[24]:6