I don’t understand why the housing example found here, which functions as expected, is not able to train on alternative data where I’m intentionally including the target in the observations to make it really easy.

```
using Flux.Tracker, Statistics, DelimitedFiles
using Flux.Tracker: Params, gradient, update!
using Flux: gpu
if false
isfile("housing.data") ||
download("https://raw.githubusercontent.com/MikeInnes/notebooks/master/housing.data","housing.data")
rawdata = readdlm("housing.data")'
x = rawdata[1:13,:] |> gpu
y = rawdata[14:14,:] |> gpu
else
mydata = rand(4,100)
mydata[1,:] = 1:100 #make row 1 really linear
y = mydata[1,:] |> gpu
x = mydata[1:4,:] |> gpu #note that row 1 is included.. which should make this easy
end
x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2) # Normalise the data
# The model
W = param(randn(1,size(x,1))/10) |> gpu
b = param([0.]) |> gpu
predict(x) = W*x .+ b
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)
η = 0.1
θ = Params([W, b])
for i = 1:10
g = gradient(() -> loss(x, y), θ)
for x in θ
update!(x, -g[x]*η)
end
@show loss(x, y)
end
```

For the housing data:

```
loss(x, y) = 366.7167121752129 (tracked)
loss(x, y) = 241.6449236032499 (tracked)
loss(x, y) = 162.82818106521944 (tracked)
loss(x, y) = 112.60388537993906 (tracked)
loss(x, y) = 80.53242187380584 (tracked)
loss(x, y) = 60.02268797814429 (tracked)
loss(x, y) = 46.88571428121097 (tracked)
loss(x, y) = 38.45495197071315 (tracked)
loss(x, y) = 33.03116747681997 (tracked)
loss(x, y) = 29.53055357861564 (tracked)
```

For the dummy doped data: No convergence

```
loss(x, y) = 1.019095800916605e14 (tracked)
loss(x, y) = 4.072325124649639e20 (tracked)
loss(x, y) = 1.6273116284718889e27 (tracked)
loss(x, y) = 6.502797472585939e33 (tracked)
loss(x, y) = 2.5985528119199597e40 (tracked)
loss(x, y) = 1.0384021896078046e47 (tracked)
loss(x, y) = 4.1495761865196735e53 (tracked)
loss(x, y) = 1.6582425704352902e60 (tracked)
loss(x, y) = 6.626766565216528e66 (tracked)
loss(x, y) = 2.6483129014595432e73 (tracked)
```

Adjusting learning rate didn’t solve this issue. After 10 iterations:

`η = 0.1`

increases and ends with `loss(x, y) = 2.6...e73`

`η = 0.01`

increases and ends with `loss(x, y) = 2.4...e53`

`η = 0.001`

increases and ends with `loss(x, y) = 9.5...e32`

`η = 0.0001`

starts and ends with `loss(x, y) = 3.3...e7`