# How to handle/ignore missing values when fitting differential equations?

Hi,

I have a time-series of hourly weather data records that you can check here. You can see that for every hour in these two days (July 16-17 2018), we have a measurement of temperature, and other variables. I am interested in using Julia to fit differential equations to my data, so I can assess whether this set up would work to forecast weather in the short-term (like in this example). However, as it happens with real data, you can see that there are some missing values in the CSV, that is, for a particular hour, there are no measurements taken.

I was wondering whether the differential equations library can actually work with gaps in the time-series, or it is absolutely mandatory to provide a value(s) at each time step (i.e. each hour in this case). I could impute or interpolate the missing values, but these would not be the real ones and that could be misleading during the fitting process.

Is it possible to pass data with gaps in such a set up?
Can you provide any hint, if any, on how to do this?

Assuming youâre using DiffEqFlux, the missings should only show up in the data in the loss value that you write, so you just need to be careful there. For example, subtract the solution from your data and youâll get missings, and then drop missings because you do `sum(abs2,x)` for sum squared error and it would do what youâre looking for on that kind of loss. For other losses youâd do similar missing dropping after doing the subtraction etc. against data.

1 Like

Hi Chris,

Thanks for your reply! Ok, so it seems there is a way for `DiffEqFlux` to proceed with the fitting regardless of the missing values. Since I am very new to `Julia`+`DiffEqFlux`, I have a couple of extra questions, just to have it clear in my mind:

• Loss function: If I got this correctly, the key would be defining a loss function that can handle these âmissingâ or âNaNâ for a given timestamp. In this example from DiffEqFlux, a L2 loss function is defined as follows:
``````function loss_n_ode(p)
pred = predict_n_ode(p)
loss = sum(abs2,ode_data .- pred) # L2 loss, I guess
loss,pred
end
``````

So in the event that this function receives a `missing` / `Nan`, and we add the robust loss function you kindly suggested, it could be rewritten as:

``````function loss_n_ode(p)
clean = filter(!isnan, p) <--
pred = predict_n_ode(clean) <--
loss = sum(abs2, pred) <--
loss,pred
end
``````

So this would not crash in the event one of my weather values is `NaN`.

• Delete empty rows of DataFrame? By writing the previous item, I realized that the missing value is also an empty row in my DataFrame: a gap in the temporal axis. Maybe the function loss_n_node should not receive any NaN, because this row should be removed. Thus, in the event that I remove all these empty rows from my DataFrame, would still be ok to use DiffEqFlux with an irregularly-sampled temporal axis? I thought the temporal axis had to be âsegmentedâ in equal units, but perhaps I am wrong on that.