Energy Time Series GRU with Historic and Forecasted Variables

dmoored4 · April 16, 2024, 12:33am

Hello, all! I need conceptual assistance with how I should be using a GRU which uses historic data for variables 1:8 and forecasted variables for variables 1:4 to forecast the values for variables 5:8.

I have time series data pertaining to renewable energy generation, day-ahead cost, and single system price in 30-minute intervals over about 6 weeks. I want to predict all four of these variables for the next T+h periods. Additionally, I have brought in basic weather data which I am treating as historic data up to time T and forecasted values for T+h.

E_{T+1:T+h} = m(W_{1:T}, W_{T+1:T+h}, E_{1:T})

E_t: Energy variables at time t. (Solar, Wind, DAP, SSP)
W_t: Weather variables at time t. (Temperature, Cloud Cover, Windspeed, Wind Direction)
T: End of historic data
h: forecast horizon

# faking up some data.

# historical and forecasted variables
data = DataFrame(
	timestamp_utc=DateTime(today()-Day(45)):Minute(30):DateTime(today())
)
# weather data (historical and forecasted))
data.temperature = rand(Float32, size(data, 1))
data.cloudcover = rand(Float32, size(data, 1))
data.windspeed = rand(Float32, size(data, 1))
data.winddir = rand(Float32, size(data, 1))

# provided (historical) variables (TARGET)
data.Solar = rand(Float32, size(data, 1))
data.Wind = rand(Float32, size(data, 1))
data.DAP = rand(Float32, size(data, 1))
data.SSP = rand(Float32, size(data, 1))

Then I create a GRU model from Flux with:

m = Chain(
    GRU(8 => 4)
)

When I call m on an 8 \times n Matrix I get a 4 \times n Matrix. This seems like this model is supposed to be used to predict other variable(s) for the same n periods. But from the reading I’ve done it seems like I should be able to make predictions about the future, not just make "co-forecasts’ if that term makes sense.

Flux.reset!(m)
m(data[1:72, 2:end] |> Matrix |> transpose)

Do I need to hardcode the training horizon? So if h=1\ day then I should add a Dense layer which would be h=48 and then take only the first 4 columns? This does not seem like the most direct and correct approach, but it does provide a useful shape and I suppose the parameters would be still be optimized?

m = Chain(
	GRU(8 => 4),
	Dense(4 => 48),
	x -> x[:, 1:4]
)

Also, even if the above is the case, I am still unclear of how to use both the historic data. Should I be working with two models like:

m(f(H), F) where F is forecasted data and f(H) is the output of the model on historic data?

Ultimately, I plan to pass this model to ConformalModels.jl to obtain probabilistic forecasts. If anybody knows a reason that wouldn’t work well please lmk!

Thanks!

dmoored4 · April 21, 2024, 5:50pm

Alright so I got there eventually and I’ll explain what I did that got good results for me. Please chime in if I misrepresent anything. I’ve gone into a good bit of detail so others can follow the logic, but the logic may not be correct!

My first problem was using a GRU instead of an LSTM. From what I observed, the GRU just couldn’t capture the longer term patterns. So maybe try an LSTM first. Trying to force the GRU took me on a circuitous, but educational path.

Packaging the Data

My data was ~2 months of data in 30-minute intervals with 1 column being DateTime, the next 4 being weather (temp, cloud cover, windspeed, winddir) and the last 4 being energy (solar production, wind production, day ahead price, and imbalance price). Some things that helped me:

Normalize the data. I went with standardizing at first because I wanted the possibility of predicting extreme values for price data. For otherwise good models, this resulted in negative energy production which is nonsensical. I wanted to keep this as a MultiTarget LSTM regression task so the final activation had to be the same so all targets had to be the same. The results I got were good enough once I did this so I didn’t explore if there’s an elegant way to do this and capture more nuance.
Timestep size and batch size. The Flux docs for recurrence do make this clear, but it took me a minute to really set it up correctly.
- Let X be a vector of your input data. Each element in X represents the collection of features at that time step. Each sample is one column with as many rows as input features. The reason you may want many columns is so that your gradients are taken over a larger batch and the optimizer is making more general improvements rather than looking at a single time. The order of the columns makes no difference as long as you are consistent across time steps. So for X[1][:, 1] should be the observation immediately preceding X[2][:, 1] and X[t][:, k] be the observations immediately before X[t+1][:, k]. So looking at an abridged set:

Time	Wind	Day Ahead Price (DAP)
1	W1	D1
2	W2	D2
3	W3	D3
4	W4	D4
5	W5	D5
6	W6	D6
7	W7	D7

batched data would look like this where each element in batched_data is a Matrix of features \times batchsize. The columns are in this order just to show that the order of the columns doesn’t matter as long as they are consistent across each Matrix in batched_data.

batched_data = [
	[
		W1 W3 W4 W2
		D1 D3 D4 D2
	],
	[
		W2 W4 W5 W3
		D2 D4 D5 D3
	],
	[
		W3 W5 W6 W4
		D3 D5 D6 D4
	],
]

Y is the vector representing what you want to predict. Y should be the same overall length as X, and the number of columns should match, but the number of rows is what you want to predict. If you want to make predictions for every feature, then you just need to get the next time step. For me, I only wanted to predict energy data, or DAP in this abridged example. So Y looks like:

batched_targets = [
	[
		D2 D4 D5 D3
	],
	[
		D3 D5 D6 D4
	],
	[
		D4 D6 D7 D5
	],
]

Finally I zipped these into a Vector of Named Tuple for my own convenience:

train_data = [(past=x, next=y) for (x,y) in zip(batched_data, batched_targets)]

One-step ahead. Maybe this is obvious and this was the crux of my original questions. The most sensible thing to do is predict one step and then use that prediction and any other information that is available for that time and predict the next step and so on. I got hung up on the idea of how do I provide it all the information I have available right now. For this problem I wanted to make 24-hours of predictions beginning at 8AM with the assumption that I would have all weather and energy data up to 8 AM and I would have weather forecasts for the next 24-hours. The short version of the story is that this was a bad time.

Building and Training the Model

Model

This is a model that worked for me.

model = Chain(
	LSTM_in = LSTM(input_dims => hidden_dim),
	LSTM_hidden = LSTM(hidden_dim => hidden_dim),
	Dense_hidden1 = Dense(hidden_dim => hidden_dim),
	Dense_out = Dense(hidden_dim => output_dim, σ),
)

# Initializing loss logs whenever the model is built.
Flux.reset!(model)
train_log = [loss(model, first(Train))]

Flux.reset!(model)
test_log = [loss(model, first(Test))]

Loss

I stored my data as a vector of NamedTuple so I could access it easily. I used MSE for my loss.

loss(m, X, Y) = Flux.mse(m(X), Y)
loss(m::Chain, traindata::NamedTuple) = loss(m, traindata.past, traindata.next)

Optimizer

Used Adam to optimize:

opt_state = Flux.setup(Adam(), model)

`train!` Function

function train!(opt_state, model, loss, traindata; train_log=[])
	L, ∇ = Flux.withgradient(loss, model, traindata)

	# Detect loss of Inf or NaN. Print a warning, and then skip update!
    if !isfinite(L)
		@warn "Loss value, \"$L\", is invalid."
	else
		Flux.update!(opt_state, model, ∇[1])
    end

	push!(train_log, L)
end

Training Loop

for e in 1:train_epochs
	# need to gather all loss so I can get an average for each epoch so I can compare it to the loss from the training set. They are not the same length so I want an average at each epoch
	temp_log = []
	# recurrent networks need to have their internal state reset
	Flux.reset!(model)
	# before training, I condition the recurrent model by just calling it on the first batch in my train set
	model(first(Train).past)

	# now i train using my `train!` function for the rest of the data. 
	for T in Train[2:end]
		train!(opt_state, model, loss, T, train_log=temp_log)
	end

	# get the mean of my losses for this epoch
	push!(train_log, mean(temp_log))
	
	# rest the model, run it on everything in my Test set after conditioning it and then averaging it and putting it in the test log
	Flux.reset!(model)
	model(first(Test).past)
	push!(test_log, mean(loss(model, T) for T in Test[2:end]))
end

# plotting after each run of epochs to watch for progress and overfitting
plot(
	title="Loss Logging",
	xlabel="Total Training Epochs", ylabel="Loss (MSE)",
	[train_log test_log],
	label=["Train" "Test"],
	xticks=2 .^ (0:8),
	xscale=:log2)

Making Predictions

Below is the function I made to use the LSTM as intended. The point was to condition the model on all data up to the time before the the first forecast (t+1) and then:

make a forecast for the first forecast at t+1. Store the result.
combine that with the available weather data to make a forecast for t+2. Store the result.
repeat until forecasting period is complete.
handle all data transformations and outputs.

function forecast(data, firstforecast, lastforecast)
	firstforecast = findlast(row -> row ≤ firstforecast, data.timestamp_utc)
	lastforecast = findlast(row -> row < lastforecast, data.timestamp_utc)
	
	xfrm = standardize_df(data, transforms)
	
	M = xfrm |> Matrix |> transpose

	Flux.reset!(model)

	for i in 1:size(M, firstforecast)-1
		model(M[:, i])
	end

	results = [model(M[:, firstforecast])]
	for j in firstforecast+1:lastforecast
		new_result = model([M[1:5, j];last(results)])
		push!(results, new_result)
	end

	out_df = DataFrame(hcat(results...) |> transpose, energy_cols)
	out_df.timestamp_utc = data[firstforecast:lastforecast, :timestamp_utc]
	select!(out_df, :timestamp_utc, :)
	out_df = reconstruct_df(out_df, transforms)
end

That’s it for my lessons learned for now. Would this be worth making a full tutorial for the model zoo?

Topic		Replies	Views
Simple Flux LSTM for Time Series Machine Learning question , flux , time-series , machine-learning	62	13788	April 11, 2022
Flux.jl Training loop where data sequentially depends on model output Machine Learning	1	1022	March 29, 2019
Problem with LSTM and GRU Layers in Flux New to Julia flux , machine-learning	9	747	February 14, 2024
How to arrange data for time series forecasting (mini-batching) without violoating the GPU memory for a LSTM? General Usage flux	7	1401	April 20, 2021
Problem with GRU and LSTM Layers in Flux General Usage flux , machine-learning , error-message	0	201	February 11, 2024