Energy Time Series GRU with Historic and Forecasted Variables

Hello, all! I need conceptual assistance with how I should be using a GRU which uses historic data for variables 1:8 and forecasted variables for variables 1:4 to forecast the values for variables 5:8.

I have time series data pertaining to renewable energy generation, day-ahead cost, and single system price in 30-minute intervals over about 6 weeks. I want to predict all four of these variables for the next T+h periods. Additionally, I have brought in basic weather data which I am treating as historic data up to time T and forecasted values for T+h.

E_{T+1:T+h} = m(W_{1:T}, W_{T+1:T+h}, E_{1:T})

  • E_t: Energy variables at time t. (Solar, Wind, DAP, SSP)
  • W_t: Weather variables at time t. (Temperature, Cloud Cover, Windspeed, Wind Direction)
  • T: End of historic data
  • h: forecast horizon
# faking up some data.

# historical and forecasted variables
data = DataFrame(
# weather data (historical and forecasted))
data.temperature = rand(Float32, size(data, 1))
data.cloudcover = rand(Float32, size(data, 1))
data.windspeed = rand(Float32, size(data, 1))
data.winddir = rand(Float32, size(data, 1))

# provided (historical) variables (TARGET)
data.Solar = rand(Float32, size(data, 1))
data.Wind = rand(Float32, size(data, 1))
data.DAP = rand(Float32, size(data, 1))
data.SSP = rand(Float32, size(data, 1))

Then I create a GRU model from Flux with:

m = Chain(
    GRU(8 => 4)

When I call m on an 8 \times n Matrix I get a 4 \times n Matrix. This seems like this model is supposed to be used to predict other variable(s) for the same n periods. But from the reading I’ve done it seems like I should be able to make predictions about the future, not just make "co-forecasts’ if that term makes sense.

m(data[1:72, 2:end] |> Matrix |> transpose)

Do I need to hardcode the training horizon? So if h=1\ day then I should add a Dense layer which would be h=48 and then take only the first 4 columns? This does not seem like the most direct and correct approach, but it does provide a useful shape and I suppose the parameters would be still be optimized?

m = Chain(
	GRU(8 => 4),
	Dense(4 => 48),
	x -> x[:, 1:4]

Also, even if the above is the case, I am still unclear of how to use both the historic data. Should I be working with two models like:

  • m(f(H), F) where F is forecasted data and f(H) is the output of the model on historic data?

Ultimately, I plan to pass this model to ConformalModels.jl to obtain probabilistic forecasts. If anybody knows a reason that wouldn’t work well please lmk!


1 Like

Alright so I got there eventually and I’ll explain what I did that got good results for me. Please chime in if I misrepresent anything. I’ve gone into a good bit of detail so others can follow the logic, but the logic may not be correct!

My first problem was using a GRU instead of an LSTM. From what I observed, the GRU just couldn’t capture the longer term patterns. So maybe try an LSTM first. Trying to force the GRU took me on a circuitous, but educational path.

Packaging the Data

My data was ~2 months of data in 30-minute intervals with 1 column being DateTime, the next 4 being weather (temp, cloud cover, windspeed, winddir) and the last 4 being energy (solar production, wind production, day ahead price, and imbalance price). Some things that helped me:

  • Normalize the data. I went with standardizing at first because I wanted the possibility of predicting extreme values for price data. For otherwise good models, this resulted in negative energy production which is nonsensical. I wanted to keep this as a MultiTarget LSTM regression task so the final activation had to be the same so all targets had to be the same. The results I got were good enough once I did this so I didn’t explore if there’s an elegant way to do this and capture more nuance.
  • Timestep size and batch size. The Flux docs for recurrence do make this clear, but it took me a minute to really set it up correctly.
    • Let X be a vector of your input data. Each element in X represents the collection of features at that time step. Each sample is one column with as many rows as input features. The reason you may want many columns is so that your gradients are taken over a larger batch and the optimizer is making more general improvements rather than looking at a single time. The order of the columns makes no difference as long as you are consistent across time steps. So for X[1][:, 1] should be the observation immediately preceding X[2][:, 1] and X[t][:, k] be the observations immediately before X[t+1][:, k]. So looking at an abridged set:
Time Wind Day Ahead Price (DAP)
1 W1 D1
2 W2 D2
3 W3 D3
4 W4 D4
5 W5 D5
6 W6 D6
7 W7 D7

batched data would look like this where each element in batched_data is a Matrix of features \times batchsize. The columns are in this order just to show that the order of the columns doesn’t matter as long as they are consistent across each Matrix in batched_data.

batched_data = [
		W1 W3 W4 W2
		D1 D3 D4 D2
		W2 W4 W5 W3
		D2 D4 D5 D3
		W3 W5 W6 W4
		D3 D5 D6 D4
  • Y is the vector representing what you want to predict. Y should be the same overall length as X, and the number of columns should match, but the number of rows is what you want to predict. If you want to make predictions for every feature, then you just need to get the next time step. For me, I only wanted to predict energy data, or DAP in this abridged example. So Y looks like:
batched_targets = [
		D2 D4 D5 D3
		D3 D5 D6 D4
		D4 D6 D7 D5

Finally I zipped these into a Vector of Named Tuple for my own convenience:

train_data = [(past=x, next=y) for (x,y) in zip(batched_data, batched_targets)]
  • One-step ahead. Maybe this is obvious and this was the crux of my original questions. The most sensible thing to do is predict one step and then use that prediction and any other information that is available for that time and predict the next step and so on. I got hung up on the idea of how do I provide it all the information I have available right now. For this problem I wanted to make 24-hours of predictions beginning at 8AM with the assumption that I would have all weather and energy data up to 8 AM and I would have weather forecasts for the next 24-hours. The short version of the story is that this was a bad time.

Building and Training the Model


This is a model that worked for me.

model = Chain(
	LSTM_in = LSTM(input_dims => hidden_dim),
	LSTM_hidden = LSTM(hidden_dim => hidden_dim),
	Dense_hidden1 = Dense(hidden_dim => hidden_dim),
	Dense_out = Dense(hidden_dim => output_dim, σ),

# Initializing loss logs whenever the model is built.
train_log = [loss(model, first(Train))]

test_log = [loss(model, first(Test))]


I stored my data as a vector of NamedTuple so I could access it easily. I used MSE for my loss.

loss(m, X, Y) = Flux.mse(m(X), Y)
loss(m::Chain, traindata::NamedTuple) = loss(m, traindata.past,


Used Adam to optimize:

opt_state = Flux.setup(Adam(), model)

train! Function

function train!(opt_state, model, loss, traindata; train_log=[])
	L, ∇ = Flux.withgradient(loss, model, traindata)

	# Detect loss of Inf or NaN. Print a warning, and then skip update!
    if !isfinite(L)
		@warn "Loss value, \"$L\", is invalid."
		Flux.update!(opt_state, model, ∇[1])

	push!(train_log, L)

Training Loop

for e in 1:train_epochs
	# need to gather all loss so I can get an average for each epoch so I can compare it to the loss from the training set. They are not the same length so I want an average at each epoch
	temp_log = []
	# recurrent networks need to have their internal state reset
	# before training, I condition the recurrent model by just calling it on the first batch in my train set

	# now i train using my `train!` function for the rest of the data. 
	for T in Train[2:end]
		train!(opt_state, model, loss, T, train_log=temp_log)

	# get the mean of my losses for this epoch
	push!(train_log, mean(temp_log))
	# rest the model, run it on everything in my Test set after conditioning it and then averaging it and putting it in the test log
	push!(test_log, mean(loss(model, T) for T in Test[2:end]))

# plotting after each run of epochs to watch for progress and overfitting
	title="Loss Logging",
	xlabel="Total Training Epochs", ylabel="Loss (MSE)",
	[train_log test_log],
	label=["Train" "Test"],
	xticks=2 .^ (0:8),

Making Predictions

Below is the function I made to use the LSTM as intended. The point was to condition the model on all data up to the time before the the first forecast (t+1) and then:

  • make a forecast for the first forecast at t+1. Store the result.
  • combine that with the available weather data to make a forecast for t+2. Store the result.
  • repeat until forecasting period is complete.
  • handle all data transformations and outputs.
function forecast(data, firstforecast, lastforecast)
	firstforecast = findlast(row -> row ≤ firstforecast, data.timestamp_utc)
	lastforecast = findlast(row -> row < lastforecast, data.timestamp_utc)
	xfrm = standardize_df(data, transforms)
	M = xfrm |> Matrix |> transpose


	for i in 1:size(M, firstforecast)-1
		model(M[:, i])

	results = [model(M[:, firstforecast])]
	for j in firstforecast+1:lastforecast
		new_result = model([M[1:5, j];last(results)])
		push!(results, new_result)

	out_df = DataFrame(hcat(results...) |> transpose, energy_cols)
	out_df.timestamp_utc = data[firstforecast:lastforecast, :timestamp_utc]
	select!(out_df, :timestamp_utc, :)
	out_df = reconstruct_df(out_df, transforms)

That’s it for my lessons learned for now. Would this be worth making a full tutorial for the model zoo?