 # Flux not able to resolve strong observation, based on standard example that works

#1

I don’t understand why the housing example found here, which functions as expected, is not able to train on alternative data where I’m intentionally including the target in the observations to make it really easy.

``````using Flux.Tracker, Statistics, DelimitedFiles
using Flux.Tracker: Params, gradient, update!
using Flux: gpu

if false
isfile("housing.data") ||
x = rawdata[1:13,:] |> gpu
y = rawdata[14:14,:] |> gpu
else
mydata = rand(4,100)
mydata[1,:] = 1:100      #make row 1 really linear
y = mydata[1,:] |> gpu
x = mydata[1:4,:] |> gpu #note that row 1 is included.. which should make this easy
end
x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2) # Normalise the data

# The model

W = param(randn(1,size(x,1))/10) |> gpu
b = param([0.]) |> gpu

predict(x) = W*x .+ b
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)

η = 0.1
θ = Params([W, b])

for i = 1:10
g = gradient(() -> loss(x, y), θ)
for x in θ
update!(x, -g[x]*η)
end
@show loss(x, y)
end
``````

For the housing data:

``````loss(x, y) = 366.7167121752129 (tracked)
loss(x, y) = 241.6449236032499 (tracked)
loss(x, y) = 162.82818106521944 (tracked)
loss(x, y) = 112.60388537993906 (tracked)
loss(x, y) = 80.53242187380584 (tracked)
loss(x, y) = 60.02268797814429 (tracked)
loss(x, y) = 46.88571428121097 (tracked)
loss(x, y) = 38.45495197071315 (tracked)
loss(x, y) = 33.03116747681997 (tracked)
loss(x, y) = 29.53055357861564 (tracked)
``````

For the dummy doped data: No convergence

``````loss(x, y) = 1.019095800916605e14 (tracked)
loss(x, y) = 4.072325124649639e20 (tracked)
loss(x, y) = 1.6273116284718889e27 (tracked)
loss(x, y) = 6.502797472585939e33 (tracked)
loss(x, y) = 2.5985528119199597e40 (tracked)
loss(x, y) = 1.0384021896078046e47 (tracked)
loss(x, y) = 4.1495761865196735e53 (tracked)
loss(x, y) = 1.6582425704352902e60 (tracked)
loss(x, y) = 6.626766565216528e66 (tracked)
loss(x, y) = 2.6483129014595432e73 (tracked)
``````

Adjusting learning rate didn’t solve this issue. After 10 iterations:
`η = 0.1` increases and ends with `loss(x, y) = 2.6...e73`
`η = 0.01` increases and ends with `loss(x, y) = 2.4...e53`
`η = 0.001` increases and ends with `loss(x, y) = 9.5...e32`
`η = 0.0001` starts and ends with `loss(x, y) = 3.3...e7`

#2

Notice that when you run your code, y is a 1-dimensional array where size(y) is (100,) , whereas y in the original code is a 2-dimensional array where size(y) is (1, 100). This makes a big difference when you write

``````meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)
``````

because predict(x) is a row vector and y is a 1-dimensional (column) vector. This means that ŷ .- y is really predict(x) .- y, which gives you a matrix of size (100, 100), not what you intended! To fix your code, change the following line:

``````# y = mydata[1,:] |> gpu
y = reshape(mydata[1,:], 1, :) |> gpu
``````

#3

Oh! Right! There’s a difference between `mydata[1:1,:]` and `mydata[1,:]`… The first preserves dimensionality!

Thanks for catching that!