Hi,
I am writing a toy model to test the performance of Flux.jl
. I generated some dummy data with the following code
import numpy as np
traindata=np.random.random((10000,50))
target=np.random.random(10000)
np.savetxt("traindata.csv",traindata,delimiter=',')
np.savetxt("target.csv",target,delimiter=',')
and then write a single dense layer model with relu activation to realize a non-linear regression.
In Python with tensorflow
, the code is
import numpy as np
import tensorflow as tf
traindata=np.loadtxt("traindata.csv",delimiter=',')
target=np.loadtxt("target.csv",delimiter=',')
print(traindata.shape,target.shape)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1,input_shape=(50,),activation='relu',kernel_initializer='glorot_uniform'),
])
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['mean_squared_error'])
model.fit(traindata,target,epochs=100,verbose=2)
and in Julia with Flux.jl
, it is
using Base.Iterators: repeated
using CSV,Random,Printf
using Flux
using Flux: glorot_uniform
traindata=Matrix(CSV.read("traindata.csv"; header=false))'
target=Matrix(CSV.read("target.csv"; header=false))'
model=Chain(Dense(50,1,relu,initW = glorot_uniform))
loss(x, y) = Flux.mse(model(x), y)
opt = ADAM()
dataset = repeated((traindata, target),100)
evalcb = () -> @show(loss(traindata, target))
Flux.train!(loss, params(model), dataset, opt, cb=evalcb)
However, the results of them are very different. In Python withtensorflow
, the mse loss decreases very fast
Epoch 1/100
10000/10000 - 0s - loss: 0.1981 - mean_squared_error: 0.1981
Epoch 2/100
10000/10000 - 0s - loss: 0.1423 - mean_squared_error: 0.1423
Epoch 3/100
10000/10000 - 0s - loss: 0.1033 - mean_squared_error: 0.1033
Epoch 4/100
10000/10000 - 0s - loss: 0.0896 - mean_squared_error: 0.0896
Epoch 5/100
10000/10000 - 0s - loss: 0.0861 - mean_squared_error: 0.0861
Epoch 6/100
10000/10000 - 0s - loss: 0.0851 - mean_squared_error: 0.0851
Epoch 7/100
10000/10000 - 0s - loss: 0.0845 - mean_squared_error: 0.0845
Epoch 8/100
10000/10000 - 0s - loss: 0.0847 - mean_squared_error: 0.0847
Epoch 9/100
10000/10000 - 0s - loss: 0.0843 - mean_squared_error: 0.0843
Epoch 10/100
10000/10000 - 0s - loss: 0.0844 - mean_squared_error: 0.0844
and the final loss after 100 epochs is about 0.08.
But in Julia with Flux.jl
, the loss decreases slow and seems to be trapped in local minimum.
loss(traindata, target) = 0.20698824682017267 (tracked)
loss(traindata, target) = 0.20629590458383318 (tracked)
loss(traindata, target) = 0.20560309354360407 (tracked)
loss(traindata, target) = 0.2049097923861889 (tracked)
loss(traindata, target) = 0.20421840230183272 (tracked)
loss(traindata, target) = 0.20352757445130545 (tracked)
loss(traindata, target) = 0.20283026868343568 (tracked)
loss(traindata, target) = 0.20213053943995535 (tracked)
loss(traindata, target) = 0.20142913955620284 (tracked)
loss(traindata, target) = 0.20072485457048353 (tracked)
The final loss after 100 epochs remains 0.17.
The experiment has been repeated several times to avoid the influence of the random seed, but the trend is the same: model built with tensorflow performs better than the model build with Flux.jl
, even if they have same structure, activation and initialization. What’s the reason behind this frustrating phenomenon?
Thank you very much!