Training a LSTM model for time series, lack of performance

Maria_Adelaide_Loffa · October 11, 2023, 5:30pm

Hi everyone,
I’m trying to train a LSTM model for forecasting a time series.
At this stage training happens, but with a really bad performance: loss function values exhibit really minimal changes. I added a picture that is quite explicative of the problem.
LSTM training

I’m sure something is wrong but I can not tell what.
How can I change it in order to make it work properly?

using Random, Statistics
using CSV, DataFrames, MLDatasets
using ComponentArrays, DataInterpolations, Lux, MLUtils
using Optimization, Zygote
using Optimisers, OptimizationOptimisers
using Optim, OptimizationOptimJL
using Plots

function get_data()
    data_fn = raw"File csv/Tinterna uncontrolled.csv"
    data_df = CSV.read(data_fn, DataFrame; header = false)

    train_D = convert.(Float32, data_df[1:4320, 1])
    test_D = convert.(Float32, data_df[10000:11800, 1])

    train_D, test_D
end

function get_model()
    Lux.LSTMCell(1 => 1)
    # Lux.Dense(1 => 1, tanh)
end

function fit_it_1(model, train_X, train_Y)
    rng = Random.default_rng()
    ps, st = Lux.setup(rng, model)
    ps = ComponentArray(ps)
    # opt = NewtonTrustRegion()
    opt = Lion()
    optfunc = OptimizationFunction((p, theta) -> loss(model, p, st, train_X, train_Y), AutoZygote())
    optprob = OptimizationProblem(optfunc, ps)
    res = solve(optprob, opt; callback, maxiters = 100)
    ps .= res.u
    ps, st
end

function fit_it_2(model, train_X, train_Y)
    rng = Random.default_rng()
    ps, st = Lux.setup(rng, model)
    ps = ComponentArray(ps)

    learningrate = [0.08, 0.05, 0.01, 0.007, 0.005, 0.001]

    for lr in learningrate
        opt = Optimisers.Adam(lr)
        opt_state = Optimisers.setup(opt, ps)

        for i in 1:30
            gs = gradient(p -> loss(model, p, st, train_X, train_Y), ps)[1]
            opt_state, ps = Optimisers.update!(opt_state, ps, gs)
            @show loss(model, ps, st, train_X, train_Y)
        end
    end

    ps, st
end

function evaluate_it(model, ps, st, test_X, test_Y) end

function plot_it(model, ps, st, test_X, test_Y, test_mean, test_std)
    pred, st = model(test_X', ps, st)
    pred = pred[1]
    pred = pred[1, :]
    pred = (pred .* test_std) .+ test_mean

    plt1 = plot(pred; label = "prediction")
    plt1 = plot!(plt1, test_Y; ylabel = "Temperatura [°C]", label = "data")
    plt1
end

function loss(model, ps, st, train_X, train_Y)
    pred, st = model(train_X', ps, st)
    pred = pred[1]
    pred = pred[1, :]
    mean(abs2, train_Y - pred)
end

function callback(ps, l)
    @show l
    false
end

function main()
    train_D, test_D = get_data()
    train_mean, train_std = mean(train_D), std(train_D)
    test_mean, test_std = mean(test_D), std(test_D)
    train_data = (train_D .- train_mean) ./ train_std
    train_data = vcat(train_data[2:end], train_data[1])
    test_data = (test_D .- test_mean) ./ test_std
    test_data = vcat(test_data[2:end], test_data[1])
    model = get_model()
    ps, st = fit_it_1(model, train_data, train_D)
    evaluate_it(model, ps, st, test_data, test_D)

    plt1 = plot_it(model, ps, st, train_data, train_D, train_mean, train_std)
    savefig(plt1, "fig-train.png")

    plt2 = plot_it(model, ps, st, test_data, test_D, test_mean, test_std)
    savefig(plt2, "fig-test.png")
end

main()

here is a link for the csv file :

Tinterna uncontrolled.csv - Google Drive

bertschi · October 11, 2023, 7:42pm

Guess your model is way to small, e.g., LSTMCell(1 => 1) has just 12 trainable parameters. You probably want to try something much larger and deeper …

Maria_Adelaide_Loffa · October 11, 2023, 7:50pm

I thought about that, but it doesn’t let me create a Chain of layers. It gives me a “no method matching” error.

bertschi · October 11, 2023, 7:52pm

What did you try? Something like Lux.Chain(Lux.Dense(1 => 8), Lux.LSTMCell(8 => 16), Lux.Dense(16 => 1)) seems to work for me …

bertschi · October 11, 2023, 8:18pm

Sorry, looks like I was to quick as the above does not run.

The following seems to work though:

rng = Random.default_rng()
model = Lux.Chain(Lux.Dense(1 => 4), Lux.Recurrence(Lux.LSTMCell(4 => 8)), Lux.Dense(8 => 1))
ps, st = Lux.setup(rng, model)
# You can then pass an input of inputdim x seqlen x batchsize
x = randn(rng, Float32, 1, 16, 10);
model(x, ps, st)

Maria_Adelaide_Loffa · October 12, 2023, 10:02am

it seems to work but I am having some troubles while training. Infact, when I want to evaluate the gradients it gives me teh following error:

MethodError: no method matching (::var"#29#30"{Array{Float32, 3}})(::NamedTuple{(:weight_i, :weight_h, :bias), Tuple{Matrix{Float32}, Matrix{Float32}, Matrix{Float32}}})

Closest candidates are:
(::var"#29#30")()
@ Main In[242]:17

This is the current code’s version:

using Random, Statistics, CSV, DataFrames, MLDatasets, ComponentArrays, DataInterpolations, Lux, MLUtils, Optimization, Zygote
using Optimisers, OptimizationOptimisers,Optim, OptimizationOptimJL,Plots, Flux
data_fn = raw"File csv/Tinterna uncontrolled.csv"
data_df = CSV.read(data_fn, DataFrame; header = false)

data_fn = raw"File csv/Rad solare.csv"
sun_df = CSV.read(data_fn, DataFrame; header = false)

data_fn = raw"File csv/Testerna totale.csv"
text_df = CSV.read(data_fn, DataFrame; header = false)
train_temp = convert.(Float32, text_df[1:5000, 1])
traim_temp = (train_temp .- mean(train_temp)) ./ std(train_temp)

#train_D = convert.(Float32, data_df[1:4320, 1])
train_D = convert.(Float32, data_df[1:5000, 1])
#train_D = convert.(Float32, data_df[1:15000, 1])
test_D = convert.(Float32, data_df[10000:11800, 1])

train_D, test_D

train_mean, train_std = mean(train_D), std(train_D)
test_mean, test_std = mean(test_D), std(test_D)
train_data = (train_D .- train_mean) ./ train_std
train_data = vcat(train_data[2:end], train_data[1])
#temp_data = vcat(train_temp[2:end], train_temp[1])

test_data = (test_D .- test_mean) ./ test_std
test_data = vcat(test_data[2:end], test_data[1]);

#train_data = (hcat(train_data, temp_data));
#train_data = (vcat(train_data, temp_data));
rng = Random.default_rng()
model = Lux.Chain(Lux.Dense(1 => 4), Lux.Recurrence(Lux.LSTMCell(4 => 8)), Lux.Dense(8 => 1))
ps, st = Lux.setup(rng, model)

# You can then pass an input of inputdim x seqlen x batchsize
x = randn(rng, Float32, 1, 16, 10)

ydata = reshape(train_D, (1,500,10))
    
#train_data = reshape(train_data, (1, 200, 10))
train_data = reshape(train_data, (1, 500, 10))
    
model(train_data, ps, st)[1]
1×10 Matrix{Float32}:
 0.294091  0.302725  0.309891  0.298465  …  0.336742  0.304669  0.306705
num_samples = 10
batch_size = 1 # Ogni mini-batch contiene un campione
loss_fn(y_pred, y_true) = Flux.mse(y_pred, y_true)



 opt = Optimisers.Adam(0.05)
 opt_state = Optimisers.setup(opt, ps)

    for batch_start in 1:batch_size:num_samples
        batch_end = min(batch_start + batch_size - 1, num_samples)
       x_batch = train_data[:, batch_start:batch_end, :]
        y_batch = ydata[:, batch_start:batch_end, :] 
        y_batch = reshape(y_batch, (1, size(y_batch, 3)))
    
     gs = gradient(ps) do
            y_pred = model(x_batch, ps, st)[1]
            loss_value = loss_fn(y_pred, y_batch)
            return loss_value
        
    end
    opt_state, ps = Optimisers.update!(opt_state, ps, gs)
    
     #   y_pred = model(x_batch, ps, st)[1]
   # l = loss_fn(y_pred, y_batch)
    #println(x_batch)
   # println("Batch: $batch_start - $batch_end")
        println("Loss: $l")
        


    end

skleinbo · October 12, 2023, 10:21am

Try replacing

gs = gradient(ps) do
[...]

with

gs = gradient(ps) do ps
[...]

The function to differentiate at ps needs to take an argument. It’s what the method error is complaining about; it only found a method that takes no arguments:

Closest candidates are:
(::var"#29#30")() #<-- empty argument list

Maria_Adelaide_Loffa · October 12, 2023, 11:01am

if I do like that, it gives my another type of error:

type Tuple has no field layer_1

I’m sorry but I’m not familiar with this.
Thank you

skleinbo · October 12, 2023, 11:35am

Ah, I think you need to do

opt_state, ps = Optimisers.update!(opt_state, ps, gs[1])

because gradient always returns a tuple.

Maria_Adelaide_Loffa · October 13, 2023, 4:16pm

Thank you!

Topic		Replies	Views
Timeseries model training using Lux.jl Machine Learning lstm , rnn , timeseries , lux	0	146	October 22, 2024
Lux.jl LSTM timeseries prediction: AD / optimization does not start New to Julia optimization , zygote , lstm , lux	2	453	February 14, 2024
Error in trying to use Optimization.jl for LSTM training based on Lux.jl General Usage question , optimization , lux	2	106	December 3, 2024
Problem with LSTM and GRU Layers in Flux New to Julia flux , machine-learning	9	679	February 14, 2024
LSTM Method Error - Time Series Machine Learning question , flux , time-series , machine-learning	3	1033	December 7, 2023

Training a LSTM model for time series, lack of performance

Related topics