Hi,
I want to write a custom layer and a custom loss function and thought I’d start out with the basic linear regression problem from the tutorial of Flux and do the SGD from scratch for a custom layer in a Chain.
For the “custom” layer", I copied the source code from Dense from Flux and simply renamed it Linear.
The problem is that the loss goes towards infinity and I don’t really know why.
Any help is greatly appreciated.
Thank you in advance to anybody who is taking the time of reading this.
Below is the super basic code.
clearconsole()
println("LinearRegression.jl")
using Flux, Flux.Tracker
using Distributions
using Plots
using Flux.Tracker: grad, update!
num_samples = 50
x_noise_std = 0.01
y_noise_std = 0.5
function generate_linear_data()
x = reshape(range(-1, stop=1, length=num_samples),num_samples,1)
x_noise = rand(Normal(0,x_noise_std), num_samples)
y_noise = rand(Normal(0,y_noise_std), num_samples)
y = 1 .* x .+ 3 .+ y_noise
if false
display(scatter(x, y))
error("Exited in function generate_linear_data()")
end
x = reshape(x, 1, num_samples)
y = reshape(y, 1, num_samples)
return x, y
end
X, Y = generate_linear_data() # Training data of shape (1,50)
# Copied code from Dense layer and simply renamed it
struct Linear{F,S,T}
W::S
b::T
σ::F
end
Linear(W, b) = Linear(W, b, identity)
function Linear(in::Integer, out::Integer, σ = identity)
return Linear(param(randn(out, in)), param(zeros(out)), σ)
end
Flux.@treelike Linear
function (a::Linear)(x::AbstractArray)
W, b, σ = a.W, a.b, a.σ
σ.(W*x .+ b)
end
layer = Linear(1, 1)
# layer = Flux.Dense(1,1)
criterion(x, y) = mean((model(x) .- y).^2)
model = Chain(layer)
θ = Flux.params(model)
opt = Flux.Descent(0.01)
for itr=1:100
pred = layer(X) # Full batch training with size(X)=(1,50)
loss = criterion(pred,Y)
println(loss)
grads = Tracker.gradient(() -> loss, θ)
for p in θ
# println("p ", p)
update!(opt, p, grads[p])
end
end
ŷ = Tracker.data(model(X))
scatter([transpose(X) transpose(X)], [transpose(Y) transpose(ŷ)], layout=(2,1))