FluxML Basic Custom Layer with Custom Loss Function

Hi,

I want to write a custom layer and a custom loss function and thought I’d start out with the basic linear regression problem from the tutorial of Flux and do the SGD from scratch for a custom layer in a Chain.
For the “custom” layer", I copied the source code from Dense from Flux and simply renamed it Linear.

The problem is that the loss goes towards infinity and I don’t really know why.
Any help is greatly appreciated.

Thank you in advance to anybody who is taking the time of reading this.

Below is the super basic code.

clearconsole()
println("LinearRegression.jl")

using Flux, Flux.Tracker
using Distributions
using Plots
using Flux.Tracker: grad, update!

num_samples = 50
x_noise_std = 0.01
y_noise_std = 0.5

function generate_linear_data()
    x = reshape(range(-1, stop=1, length=num_samples),num_samples,1)
    x_noise = rand(Normal(0,x_noise_std), num_samples)
    y_noise = rand(Normal(0,y_noise_std), num_samples)

    y = 1 .* x .+ 3 .+ y_noise

    if false
        display(scatter(x, y))
        error("Exited in function generate_linear_data()")
    end

    x = reshape(x, 1, num_samples)
    y = reshape(y, 1, num_samples)

    return x, y
end

X, Y = generate_linear_data() # Training data of shape (1,50)

# Copied code from Dense layer and simply renamed it
struct Linear{F,S,T}
    W::S
    b::T
    σ::F
end

Linear(W, b) = Linear(W, b, identity)

function Linear(in::Integer, out::Integer, σ = identity)
    return Linear(param(randn(out, in)), param(zeros(out)), σ)
end

Flux.@treelike Linear

function (a::Linear)(x::AbstractArray)
    W, b, σ = a.W, a.b, a.σ
    σ.(W*x .+ b)
end

layer = Linear(1, 1)
# layer = Flux.Dense(1,1)

criterion(x, y) = mean((model(x) .- y).^2)
model = Chain(layer)
θ = Flux.params(model)
opt = Flux.Descent(0.01)

for itr=1:100

    pred = layer(X) # Full batch training with size(X)=(1,50)
    loss = criterion(pred,Y)
    println(loss)
    grads = Tracker.gradient(() -> loss, θ)
    for p in θ
        # println("p ", p)
        update!(opt, p, grads[p])
    end
end

ŷ = Tracker.data(model(X))
scatter([transpose(X) transpose(X)], [transpose(Y) transpose(ŷ)], layout=(2,1))

1 Like

I haven’t run your code yet (I will when back in front of my laptop), but anytime your loss goes to infinity, you should ask yourself if you’re missing a negative sign in gradient updates. Try update!(opt, p, -grads[p]). This also changed recently on the latest Flux, which might explain what’s going on.

3 Likes

In the current implementation for

a minus sign is missing. So you need to put it by hand as @jpsamaroo suggested.

1 Like

Thank you for your answers.
I found my mistake:
I defined “criterion(x,y) = mean(model(x) .- y).^2)” was the culprit.
I did two forward passes through the linear layer while computing the loss functions.

Down below is the working code:

clearconsole()
println("LinearRegression.jl")

using Flux, Flux.Tracker
using Distributions
using Plots
using Flux.Tracker: grad, update!

num_samples = 50
x_noise_std = 0.01
y_noise_std = 0.25

function generate_linear_data()
    x = reshape(range(-1, stop=1, length=num_samples),num_samples,1)
    x_noise = rand(Normal(0,x_noise_std), num_samples)
    y_noise = rand(Normal(0,y_noise_std), num_samples)

    y = 3 .* x .+ y_noise #.+ 3

    if false
        display(scatter(x, y))
        error("Exited in function generate_linear_data()")
    end

    x = transpose(x)
    y = transpose(y)

    return x, y
end

X, Y = generate_linear_data() # Training data of shape (1,50)

# Copied code from Dense layer and simply renamed it
struct Linear{F,S,T}
    W::S
    b::T
    σ::F
end

Linear(W, b) = Linear(W, b, identity)

function Linear(in::Integer, out::Integer, σ = identity)
    return Linear(param(randn(out, in)), param(randn(out)), σ)
end

Flux.@treelike Linear

function (a::Linear)(x::AbstractArray)
    W, b, σ = a.W, a.b, a.σ
    return (W*x .+ b)
end

layer = Linear(1, 1)
# layer = Flux.Dense(1,1)

model = Chain(layer)
criterion(x, y) = mean((x .- y).^2)
θ = Flux.params(model)
opt = Flux.Descent(-0.1)
println("θ ", θ)
for itr=1:300

    pred = model(X) # Full batch training with size(X)=(1,50)
    loss = criterion(pred,Y)
    # println(loss)
    grads = Tracker.gradient(() -> loss, θ)
    for p in θ
        # println("p ", p)
        update!(opt, p, grads[p])
    end
end

println(θ)

ŷ = Tracker.data(model(X))
display(scatter([transpose(X) transpose(X)], [transpose(Y) transpose(ŷ)], layout=(1,1)))
println(criterion(model(X),Y))

3 Likes