Flux not able to resolve strong observation, based on standard example that works

ianshmean · February 25, 2019, 4:07pm

I don’t understand why the housing example found here, which functions as expected, is not able to train on alternative data where I’m intentionally including the target in the observations to make it really easy.

using Flux.Tracker, Statistics, DelimitedFiles
using Flux.Tracker: Params, gradient, update!
using Flux: gpu

if false
    isfile("housing.data") ||
      download("https://raw.githubusercontent.com/MikeInnes/notebooks/master/housing.data","housing.data")
    rawdata = readdlm("housing.data")'
    x = rawdata[1:13,:] |> gpu
    y = rawdata[14:14,:] |> gpu
else
    mydata = rand(4,100)
    mydata[1,:] = 1:100      #make row 1 really linear
    y = mydata[1,:] |> gpu
    x = mydata[1:4,:] |> gpu #note that row 1 is included.. which should make this easy
end
x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2) # Normalise the data

# The model

W = param(randn(1,size(x,1))/10) |> gpu
b = param([0.]) |> gpu

predict(x) = W*x .+ b
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)

η = 0.1
θ = Params([W, b])

for i = 1:10
  g = gradient(() -> loss(x, y), θ)
  for x in θ
    update!(x, -g[x]*η)
  end
  @show loss(x, y)
end

For the housing data:

loss(x, y) = 366.7167121752129 (tracked)
loss(x, y) = 241.6449236032499 (tracked)
loss(x, y) = 162.82818106521944 (tracked)
loss(x, y) = 112.60388537993906 (tracked)
loss(x, y) = 80.53242187380584 (tracked)
loss(x, y) = 60.02268797814429 (tracked)
loss(x, y) = 46.88571428121097 (tracked)
loss(x, y) = 38.45495197071315 (tracked)
loss(x, y) = 33.03116747681997 (tracked)
loss(x, y) = 29.53055357861564 (tracked)

For the dummy doped data: No convergence

loss(x, y) = 1.019095800916605e14 (tracked)
loss(x, y) = 4.072325124649639e20 (tracked)
loss(x, y) = 1.6273116284718889e27 (tracked)
loss(x, y) = 6.502797472585939e33 (tracked)
loss(x, y) = 2.5985528119199597e40 (tracked)
loss(x, y) = 1.0384021896078046e47 (tracked)
loss(x, y) = 4.1495761865196735e53 (tracked)
loss(x, y) = 1.6582425704352902e60 (tracked)
loss(x, y) = 6.626766565216528e66 (tracked)
loss(x, y) = 2.6483129014595432e73 (tracked)

Adjusting learning rate didn’t solve this issue. After 10 iterations:
η = 0.1 increases and ends with loss(x, y) = 2.6...e73
η = 0.01 increases and ends with loss(x, y) = 2.4...e53
η = 0.001 increases and ends with loss(x, y) = 9.5...e32
η = 0.0001 starts and ends with loss(x, y) = 3.3...e7

elperkerson62 · February 25, 2019, 6:05pm

Notice that when you run your code, y is a 1-dimensional array where size(y) is (100,) , whereas y in the original code is a 2-dimensional array where size(y) is (1, 100). This makes a big difference when you write

meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)

because predict(x) is a row vector and y is a 1-dimensional (column) vector. This means that ŷ .- y is really predict(x) .- y, which gives you a matrix of size (100, 100), not what you intended! To fix your code, change the following line:

# y = mydata[1,:] |> gpu
y = reshape(mydata[1,:], 1, :) |> gpu

ianshmean · February 25, 2019, 6:09pm

Oh! Right! There’s a difference between mydata[1:1,:] and mydata[1,:]… The first preserves dimensionality!

Thanks for catching that!

Topic		Replies	Views
Gradient in simple regression not working Machine Learning flux	4	639	September 3, 2019
Model zoo housing.jl - out of date use of Flux.Tracker? Machine Learning	0	363	April 7, 2020
Flux: Embeddings on GPU Machine Learning gpu , flux	5	1025	February 28, 2021
Toy Flux example with one paramemeter not working? Machine Learning flux	2	800	November 4, 2019
Problems using Flux New to Julia	7	438	June 6, 2023

Flux not able to resolve strong observation, based on standard example that works

Related topics