FluxML Basic Custom Layer with Custom Loss Function

LudiWin · March 19, 2019, 12:01pm

Hi,

I want to write a custom layer and a custom loss function and thought I’d start out with the basic linear regression problem from the tutorial of Flux and do the SGD from scratch for a custom layer in a Chain.
For the “custom” layer", I copied the source code from Dense from Flux and simply renamed it Linear.

The problem is that the loss goes towards infinity and I don’t really know why.
Any help is greatly appreciated.

Thank you in advance to anybody who is taking the time of reading this.

Below is the super basic code.

clearconsole()
println("LinearRegression.jl")

using Flux, Flux.Tracker
using Distributions
using Plots
using Flux.Tracker: grad, update!

num_samples = 50
x_noise_std = 0.01
y_noise_std = 0.5

function generate_linear_data()
    x = reshape(range(-1, stop=1, length=num_samples),num_samples,1)
    x_noise = rand(Normal(0,x_noise_std), num_samples)
    y_noise = rand(Normal(0,y_noise_std), num_samples)

    y = 1 .* x .+ 3 .+ y_noise

    if false
        display(scatter(x, y))
        error("Exited in function generate_linear_data()")
    end

    x = reshape(x, 1, num_samples)
    y = reshape(y, 1, num_samples)

    return x, y
end

X, Y = generate_linear_data() # Training data of shape (1,50)

# Copied code from Dense layer and simply renamed it
struct Linear{F,S,T}
    W::S
    b::T
    σ::F
end

Linear(W, b) = Linear(W, b, identity)

function Linear(in::Integer, out::Integer, σ = identity)
    return Linear(param(randn(out, in)), param(zeros(out)), σ)
end

Flux.@treelike Linear

function (a::Linear)(x::AbstractArray)
    W, b, σ = a.W, a.b, a.σ
    σ.(W*x .+ b)
end

layer = Linear(1, 1)
# layer = Flux.Dense(1,1)

criterion(x, y) = mean((model(x) .- y).^2)
model = Chain(layer)
θ = Flux.params(model)
opt = Flux.Descent(0.01)

for itr=1:100

    pred = layer(X) # Full batch training with size(X)=(1,50)
    loss = criterion(pred,Y)
    println(loss)
    grads = Tracker.gradient(() -> loss, θ)
    for p in θ
        # println("p ", p)
        update!(opt, p, grads[p])
    end
end

ŷ = Tracker.data(model(X))
scatter([transpose(X) transpose(X)], [transpose(Y) transpose(ŷ)], layout=(2,1))

jpsamaroo · March 19, 2019, 12:17pm

I haven’t run your code yet (I will when back in front of my laptop), but anytime your loss goes to infinity, you should ask yourself if you’re missing a negative sign in gradient updates. Try update!(opt, p, -grads[p]). This also changed recently on the latest Flux, which might explain what’s going on.

lazarusA · March 19, 2019, 1:40pm

In the current implementation for

a minus sign is missing. So you need to put it by hand as @jpsamaroo suggested.

LudiWin · March 19, 2019, 2:01pm

Thank you for your answers.
I found my mistake:
I defined “criterion(x,y) = mean(model(x) .- y).^2)” was the culprit.
I did two forward passes through the linear layer while computing the loss functions.

Down below is the working code:

clearconsole()
println("LinearRegression.jl")

using Flux, Flux.Tracker
using Distributions
using Plots
using Flux.Tracker: grad, update!

num_samples = 50
x_noise_std = 0.01
y_noise_std = 0.25

function generate_linear_data()
    x = reshape(range(-1, stop=1, length=num_samples),num_samples,1)
    x_noise = rand(Normal(0,x_noise_std), num_samples)
    y_noise = rand(Normal(0,y_noise_std), num_samples)

    y = 3 .* x .+ y_noise #.+ 3

    if false
        display(scatter(x, y))
        error("Exited in function generate_linear_data()")
    end

    x = transpose(x)
    y = transpose(y)

    return x, y
end

X, Y = generate_linear_data() # Training data of shape (1,50)

# Copied code from Dense layer and simply renamed it
struct Linear{F,S,T}
    W::S
    b::T
    σ::F
end

Linear(W, b) = Linear(W, b, identity)

function Linear(in::Integer, out::Integer, σ = identity)
    return Linear(param(randn(out, in)), param(randn(out)), σ)
end

Flux.@treelike Linear

function (a::Linear)(x::AbstractArray)
    W, b, σ = a.W, a.b, a.σ
    return (W*x .+ b)
end

layer = Linear(1, 1)
# layer = Flux.Dense(1,1)

model = Chain(layer)
criterion(x, y) = mean((x .- y).^2)
θ = Flux.params(model)
opt = Flux.Descent(-0.1)
println("θ ", θ)
for itr=1:300

    pred = model(X) # Full batch training with size(X)=(1,50)
    loss = criterion(pred,Y)
    # println(loss)
    grads = Tracker.gradient(() -> loss, θ)
    for p in θ
        # println("p ", p)
        update!(opt, p, grads[p])
    end
end

println(θ)

ŷ = Tracker.data(model(X))
display(scatter([transpose(X) transpose(X)], [transpose(Y) transpose(ŷ)], layout=(1,1)))
println(criterion(model(X),Y))

Topic		Replies	Views
Why the Loss function does not decrease significantly in Flux.jl Machine Learning	2	334	February 2, 2023
Specifying loss functions in Flux.jl Machine Learning question , package , flux	8	1922	August 8, 2020
How to use Flux.train! to train custom layer? Machine Learning question	2	1820	September 2, 2019
Flux.jl is not modifying the loss Machine Learning	4	794	June 25, 2019
Issue with custom Flux loss function General Usage flux	0	232	December 7, 2022

FluxML Basic Custom Layer with Custom Loss Function

Related topics