I don’t understand why the housing example found here, which functions as expected, is not able to train on alternative data where I’m intentionally including the target in the observations to make it really easy.
using Flux.Tracker, Statistics, DelimitedFiles
using Flux.Tracker: Params, gradient, update!
using Flux: gpu
if false
isfile("housing.data") ||
download("https://raw.githubusercontent.com/MikeInnes/notebooks/master/housing.data","housing.data")
rawdata = readdlm("housing.data")'
x = rawdata[1:13,:] |> gpu
y = rawdata[14:14,:] |> gpu
else
mydata = rand(4,100)
mydata[1,:] = 1:100 #make row 1 really linear
y = mydata[1,:] |> gpu
x = mydata[1:4,:] |> gpu #note that row 1 is included.. which should make this easy
end
x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2) # Normalise the data
# The model
W = param(randn(1,size(x,1))/10) |> gpu
b = param([0.]) |> gpu
predict(x) = W*x .+ b
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)
η = 0.1
θ = Params([W, b])
for i = 1:10
g = gradient(() -> loss(x, y), θ)
for x in θ
update!(x, -g[x]*η)
end
@show loss(x, y)
end
For the housing data:
loss(x, y) = 366.7167121752129 (tracked)
loss(x, y) = 241.6449236032499 (tracked)
loss(x, y) = 162.82818106521944 (tracked)
loss(x, y) = 112.60388537993906 (tracked)
loss(x, y) = 80.53242187380584 (tracked)
loss(x, y) = 60.02268797814429 (tracked)
loss(x, y) = 46.88571428121097 (tracked)
loss(x, y) = 38.45495197071315 (tracked)
loss(x, y) = 33.03116747681997 (tracked)
loss(x, y) = 29.53055357861564 (tracked)
For the dummy doped data: No convergence
loss(x, y) = 1.019095800916605e14 (tracked)
loss(x, y) = 4.072325124649639e20 (tracked)
loss(x, y) = 1.6273116284718889e27 (tracked)
loss(x, y) = 6.502797472585939e33 (tracked)
loss(x, y) = 2.5985528119199597e40 (tracked)
loss(x, y) = 1.0384021896078046e47 (tracked)
loss(x, y) = 4.1495761865196735e53 (tracked)
loss(x, y) = 1.6582425704352902e60 (tracked)
loss(x, y) = 6.626766565216528e66 (tracked)
loss(x, y) = 2.6483129014595432e73 (tracked)
Adjusting learning rate didn’t solve this issue. After 10 iterations:
η = 0.1
increases and ends with loss(x, y) = 2.6...e73
η = 0.01
increases and ends with loss(x, y) = 2.4...e53
η = 0.001
increases and ends with loss(x, y) = 9.5...e32
η = 0.0001
starts and ends with loss(x, y) = 3.3...e7