Flux not able to resolve strong observation, based on standard example that works

I don’t understand why the housing example found here, which functions as expected, is not able to train on alternative data where I’m intentionally including the target in the observations to make it really easy.

using Flux.Tracker, Statistics, DelimitedFiles
using Flux.Tracker: Params, gradient, update!
using Flux: gpu

if false
    isfile("housing.data") ||
      download("https://raw.githubusercontent.com/MikeInnes/notebooks/master/housing.data","housing.data")
    rawdata = readdlm("housing.data")'
    x = rawdata[1:13,:] |> gpu
    y = rawdata[14:14,:] |> gpu
else
    mydata = rand(4,100)
    mydata[1,:] = 1:100      #make row 1 really linear
    y = mydata[1,:] |> gpu
    x = mydata[1:4,:] |> gpu #note that row 1 is included.. which should make this easy
end
x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2) # Normalise the data

# The model

W = param(randn(1,size(x,1))/10) |> gpu
b = param([0.]) |> gpu

predict(x) = W*x .+ b
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)

η = 0.1
θ = Params([W, b])

for i = 1:10
  g = gradient(() -> loss(x, y), θ)
  for x in θ
    update!(x, -g[x]*η)
  end
  @show loss(x, y)
end

For the housing data:

loss(x, y) = 366.7167121752129 (tracked)
loss(x, y) = 241.6449236032499 (tracked)
loss(x, y) = 162.82818106521944 (tracked)
loss(x, y) = 112.60388537993906 (tracked)
loss(x, y) = 80.53242187380584 (tracked)
loss(x, y) = 60.02268797814429 (tracked)
loss(x, y) = 46.88571428121097 (tracked)
loss(x, y) = 38.45495197071315 (tracked)
loss(x, y) = 33.03116747681997 (tracked)
loss(x, y) = 29.53055357861564 (tracked)

For the dummy doped data: No convergence

loss(x, y) = 1.019095800916605e14 (tracked)
loss(x, y) = 4.072325124649639e20 (tracked)
loss(x, y) = 1.6273116284718889e27 (tracked)
loss(x, y) = 6.502797472585939e33 (tracked)
loss(x, y) = 2.5985528119199597e40 (tracked)
loss(x, y) = 1.0384021896078046e47 (tracked)
loss(x, y) = 4.1495761865196735e53 (tracked)
loss(x, y) = 1.6582425704352902e60 (tracked)
loss(x, y) = 6.626766565216528e66 (tracked)
loss(x, y) = 2.6483129014595432e73 (tracked)

Adjusting learning rate didn’t solve this issue. After 10 iterations:
η = 0.1 increases and ends with loss(x, y) = 2.6...e73
η = 0.01 increases and ends with loss(x, y) = 2.4...e53
η = 0.001 increases and ends with loss(x, y) = 9.5...e32
η = 0.0001 starts and ends with loss(x, y) = 3.3...e7

Notice that when you run your code, y is a 1-dimensional array where size(y) is (100,) , whereas y in the original code is a 2-dimensional array where size(y) is (1, 100). This makes a big difference when you write

meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)

because predict(x) is a row vector and y is a 1-dimensional (column) vector. This means that ŷ .- y is really predict(x) .- y, which gives you a matrix of size (100, 100), not what you intended! To fix your code, change the following line:

# y = mydata[1,:] |> gpu
y = reshape(mydata[1,:], 1, :) |> gpu
2 Likes

Oh! Right! There’s a difference between mydata[1:1,:] and mydata[1,:]… The first preserves dimensionality!

Thanks for catching that!