Training a LSTM model for time series, lack of performance

Hi everyone,
I’m trying to train a LSTM model for forecasting a time series.
At this stage training happens, but with a really bad performance: loss function values exhibit really minimal changes. I added a picture that is quite explicative of the problem.
LSTM training

I’m sure something is wrong but I can not tell what.
How can I change it in order to make it work properly?

using Random, Statistics
using CSV, DataFrames, MLDatasets
using ComponentArrays, DataInterpolations, Lux, MLUtils
using Optimization, Zygote
using Optimisers, OptimizationOptimisers
using Optim, OptimizationOptimJL
using Plots

function get_data()
    data_fn = raw"File csv/Tinterna uncontrolled.csv"
    data_df = CSV.read(data_fn, DataFrame; header = false)

    train_D = convert.(Float32, data_df[1:4320, 1])
    test_D = convert.(Float32, data_df[10000:11800, 1])

    train_D, test_D
end

function get_model()
    Lux.LSTMCell(1 => 1)
    # Lux.Dense(1 => 1, tanh)
end

function fit_it_1(model, train_X, train_Y)
    rng = Random.default_rng()
    ps, st = Lux.setup(rng, model)
    ps = ComponentArray(ps)
    # opt = NewtonTrustRegion()
    opt = Lion()
    optfunc = OptimizationFunction((p, theta) -> loss(model, p, st, train_X, train_Y), AutoZygote())
    optprob = OptimizationProblem(optfunc, ps)
    res = solve(optprob, opt; callback, maxiters = 100)
    ps .= res.u
    ps, st
end

function fit_it_2(model, train_X, train_Y)
    rng = Random.default_rng()
    ps, st = Lux.setup(rng, model)
    ps = ComponentArray(ps)

    learningrate = [0.08, 0.05, 0.01, 0.007, 0.005, 0.001]

    for lr in learningrate
        opt = Optimisers.Adam(lr)
        opt_state = Optimisers.setup(opt, ps)

        for i in 1:30
            gs = gradient(p -> loss(model, p, st, train_X, train_Y), ps)[1]
            opt_state, ps = Optimisers.update!(opt_state, ps, gs)
            @show loss(model, ps, st, train_X, train_Y)
        end
    end

    ps, st
end

function evaluate_it(model, ps, st, test_X, test_Y) end

function plot_it(model, ps, st, test_X, test_Y, test_mean, test_std)
    pred, st = model(test_X', ps, st)
    pred = pred[1]
    pred = pred[1, :]
    pred = (pred .* test_std) .+ test_mean

    plt1 = plot(pred; label = "prediction")
    plt1 = plot!(plt1, test_Y; ylabel = "Temperatura [°C]", label = "data")
    plt1
end

function loss(model, ps, st, train_X, train_Y)
    pred, st = model(train_X', ps, st)
    pred = pred[1]
    pred = pred[1, :]
    mean(abs2, train_Y - pred)
end

function callback(ps, l)
    @show l
    false
end

function main()
    train_D, test_D = get_data()
    train_mean, train_std = mean(train_D), std(train_D)
    test_mean, test_std = mean(test_D), std(test_D)
    train_data = (train_D .- train_mean) ./ train_std
    train_data = vcat(train_data[2:end], train_data[1])
    test_data = (test_D .- test_mean) ./ test_std
    test_data = vcat(test_data[2:end], test_data[1])
    model = get_model()
    ps, st = fit_it_1(model, train_data, train_D)
    evaluate_it(model, ps, st, test_data, test_D)

    plt1 = plot_it(model, ps, st, train_data, train_D, train_mean, train_std)
    savefig(plt1, "fig-train.png")

    plt2 = plot_it(model, ps, st, test_data, test_D, test_mean, test_std)
    savefig(plt2, "fig-test.png")
end

main()

here is a link for the csv file :

Tinterna uncontrolled.csv - Google Drive

Guess your model is way to small, e.g., LSTMCell(1 => 1) has just 12 trainable parameters. You probably want to try something much larger and deeper …

I thought about that, but it doesn’t let me create a Chain of layers. It gives me a “no method matching” error.

What did you try? Something like Lux.Chain(Lux.Dense(1 => 8), Lux.LSTMCell(8 => 16), Lux.Dense(16 => 1)) seems to work for me …

Sorry, looks like I was to quick as the above does not run.

The following seems to work though:

rng = Random.default_rng()
model = Lux.Chain(Lux.Dense(1 => 4), Lux.Recurrence(Lux.LSTMCell(4 => 8)), Lux.Dense(8 => 1))
ps, st = Lux.setup(rng, model)
# You can then pass an input of inputdim x seqlen x batchsize
x = randn(rng, Float32, 1, 16, 10);
model(x, ps, st)

it seems to work but I am having some troubles while training. Infact, when I want to evaluate the gradients it gives me teh following error:

MethodError: no method matching (::var"#29#30"{Array{Float32, 3}})(::NamedTuple{(:weight_i, :weight_h, :bias), Tuple{Matrix{Float32}, Matrix{Float32}, Matrix{Float32}}})

Closest candidates are:
(::var"#29#30")()
@ Main In[242]:17

This is the current code’s version:

using Random, Statistics, CSV, DataFrames, MLDatasets, ComponentArrays, DataInterpolations, Lux, MLUtils, Optimization, Zygote
using Optimisers, OptimizationOptimisers,Optim, OptimizationOptimJL,Plots, Flux
data_fn = raw"File csv/Tinterna uncontrolled.csv"
data_df = CSV.read(data_fn, DataFrame; header = false)

data_fn = raw"File csv/Rad solare.csv"
sun_df = CSV.read(data_fn, DataFrame; header = false)

data_fn = raw"File csv/Testerna totale.csv"
text_df = CSV.read(data_fn, DataFrame; header = false)
train_temp = convert.(Float32, text_df[1:5000, 1])
traim_temp = (train_temp .- mean(train_temp)) ./ std(train_temp)

#train_D = convert.(Float32, data_df[1:4320, 1])
train_D = convert.(Float32, data_df[1:5000, 1])
#train_D = convert.(Float32, data_df[1:15000, 1])
test_D = convert.(Float32, data_df[10000:11800, 1])

train_D, test_D

train_mean, train_std = mean(train_D), std(train_D)
test_mean, test_std = mean(test_D), std(test_D)
train_data = (train_D .- train_mean) ./ train_std
train_data = vcat(train_data[2:end], train_data[1])
#temp_data = vcat(train_temp[2:end], train_temp[1])

test_data = (test_D .- test_mean) ./ test_std
test_data = vcat(test_data[2:end], test_data[1]);

#train_data = (hcat(train_data, temp_data));
#train_data = (vcat(train_data, temp_data));
rng = Random.default_rng()
model = Lux.Chain(Lux.Dense(1 => 4), Lux.Recurrence(Lux.LSTMCell(4 => 8)), Lux.Dense(8 => 1))
ps, st = Lux.setup(rng, model)

# You can then pass an input of inputdim x seqlen x batchsize
x = randn(rng, Float32, 1, 16, 10)

ydata = reshape(train_D, (1,500,10))
    
#train_data = reshape(train_data, (1, 200, 10))
train_data = reshape(train_data, (1, 500, 10))
    
model(train_data, ps, st)[1]
1×10 Matrix{Float32}:
 0.294091  0.302725  0.309891  0.298465  …  0.336742  0.304669  0.306705
num_samples = 10
batch_size = 1 # Ogni mini-batch contiene un campione
loss_fn(y_pred, y_true) = Flux.mse(y_pred, y_true)



 opt = Optimisers.Adam(0.05)
 opt_state = Optimisers.setup(opt, ps)

    for batch_start in 1:batch_size:num_samples
        batch_end = min(batch_start + batch_size - 1, num_samples)
       x_batch = train_data[:, batch_start:batch_end, :]
        y_batch = ydata[:, batch_start:batch_end, :] 
        y_batch = reshape(y_batch, (1, size(y_batch, 3)))
    
     gs = gradient(ps) do
            y_pred = model(x_batch, ps, st)[1]
            loss_value = loss_fn(y_pred, y_batch)
            return loss_value
        
    end
    opt_state, ps = Optimisers.update!(opt_state, ps, gs)
    
     #   y_pred = model(x_batch, ps, st)[1]
   # l = loss_fn(y_pred, y_batch)
    #println(x_batch)
   # println("Batch: $batch_start - $batch_end")
        println("Loss: $l")
        


    end

Try replacing

gs = gradient(ps) do
[...]

with

gs = gradient(ps) do ps
[...]

The function to differentiate at ps needs to take an argument. It’s what the method error is complaining about; it only found a method that takes no arguments:

Closest candidates are:
(::var"#29#30")() #<-- empty argument list

if I do like that, it gives my another type of error:

type Tuple has no field layer_1

I’m sorry but I’m not familiar with this.
Thank you

Ah, I think you need to do

opt_state, ps = Optimisers.update!(opt_state, ps, gs[1])

because gradient always returns a tuple.

1 Like

Thank you!