TSAnalysis: time series analysis and state-space modelling

I am not sure about the N4SID algorithm. However, Aoki worked on the same type of models I am implementing on TSAnalysis.jl. I am not sure whether I will code models that use the frequency domain in the near future (e.g., the Wiener - Kolmogorov filter).

1 Like

DependentData.jl is too general name. I for example deal with dependent data that is not indexed by time but space in GeoStats.jl. Maybe everyone involved with time series research could contact the TimeSeries.jl package and collaborate there.

1 Like

This looks great! I’ve been working on a GARCH modeling package which I haven’t released yet because I itend to do some breaking changes which would be very confusing in the short run.
I’d love to contribute to a general TimeSeries.jl package but I don’t know if I have time for it in the near future. Publishing the package such that all the functions can be copy pasted (and maybe improved if necessary) could be a contribution though. I have hopefully a quite good documentation right now as well.
Moreover, Simon Broda and his ARCHModeling.jl package might contribute a lot to this.
TimeSeries.jl seems to me to be the perfect fit for a name. My package would only be GARCH.jl

I’ve just checked the TimeSeries.jl package, and it has some nice abstractions already. The type has many features along the lines of GeoStats.jl types. I would just contribute models to that package instead of trying to create a new package for time series.

I think MATLAB’s N4SID algorithm works as an extension of Aoki’s work, where you find models

x_{t+1} = Ax_t + Bu_t + \Gamma w_t \\ y_t = Cx_t + Du_t + v_t

where the algorithm finds (A,B,\Gamma,C,D) as well as the covariance of w_t when the covariance of v_t is assumed (?). Here, u_t is a deterministic input, while w_t and v_t are stochastic. Aoki worked with systems without deterministic input u_t – at least in his 1987 book.

There are a number of algorithms for finding such models – N4SID is just one of many. I seem to recall that baggepinnen was working on one such package in Julia.

Such realization methods with deterministic inputs are particularly interesting for control people. Aoki has an example in his 1987 book where he tries to predict the price (?) of calf meat, or something, based on data from some US meat producer (Chicago?). He visited my department when I was a student, and I recall asking him about the possibility to manipulate the price by changing some deterministic inputs.

1 Like

I think we have to agree to disagree on TimeSeries.jl.

Often, regular arrays are enough to estimate time series models and forecast. More complicated structures are nice for EDA, but are superfluous for what I generally do. This is one of the reason why I prefer to have something simpler like TSAnalysis.jl (other reasons are above and here).

It looks interesting. If you can try to follow a similar style of the ARIMA model in TSAnalysis.jl we could add it to the package. Of course, I perfectly understand if you prefer to keep it separate :slight_smile:

It should be relatively easy to add exogenous predictors. I will add it to the to do list! :slight_smile:

1 Like

Here is one link to discussion of subspace methods a la N4SID – it is not complete, though. http://people.duke.edu/~hpgavin/SystemID/References/Qin-SubspaceID-2004.pdf

… with an extension of the overview in this paper: An overview of subspace identification - ScienceDirect

1 Like

I’d rather adjust your code to my design. The arimasettings are not necessary, and there are other shortcomings due to the maturity of your package.

I think you are sub-estimating the power of abstraction in Julia. You can have a custom type like the one in TimeSeries.jl, which is just data indexed with times (metadata), behave like a generic Julia array. It is important to notice that this definition doesn’t constrain users, nor developers, it actually helps a lot. I did similar modeling in GeoStats.jl for spatial types where the indexing is more complicated and requires various domain types as opposed to points in the real line. They are quite convenient to use. More important than having models, is to have a nice abstraction over time series including (1) types (TimeArray already implemented in TimeSeries.jl that could be renamed to something more general like TimeSeries, unfortunately the name of the package would need to change) and (2) a set of verbs for the operations that can be performed with time series like forecast(ts, newtimes), interpolate(ts, newresolution), simulate(ts, times).

In my opinion the work of the community here has to concentrate on item (2). Defining a set of verbs (a.k.a. API) for time series that encompasses all things you can possibly do with time series. After these verbs are defined, then you can start mapping the models out there to a common package, possibly migrating them to a TimeSeriesModels.jl package, that would be re-exported by TimeSeries.jl. That is how I would do it.

If you want to get some inspiration, check the GeoStats.jl stack, particularly the GeoStatsBase.jl package that contains the abstractions I mentioned.

1 Like

I think this is where our opinion differs. Having a metadata that keeps track of the data ordering or release dates (e.g., a Date vector) is not necessary to perform most operations. There might be cases where this is adavantageous or necessary. However, within my domain, they are relatively few and they can be easily handled by passing an extra argument to a few functions.

Furthermore, the common ground for most packages is the use of Arrays. In my view, non-experienced Julia developers would find easier to interface multiple packages if the data is all in the same format.

Point 2 is partially implemented in TSAnalysis.jl. In about two weeks, I will open a new dev branch for the VARIMA models. I am re-using the same structure as for the ARIMAs (which will be a special case of the VARIMAs), including the forecast function. simulate will follow.

1 Like

I implemented some generic methods for Estimation of linear statespace models in GitHub - baggepinnen/ControlSystemIdentification.jl: System Identification toolbox for LTI systems, compatible with ControlSystems.jl
I’m not making use of subspace methods like n4sid though.

Edit: n4sid is implemented now.

2 Likes

I think that is precisely where you are underusing the type system. You don’t need to know that there is metadata hanging around, and you should be able to treat a time series type exactly as a default Julia array after you implement the interfaces: Interfaces · The Julia Language

This is not true. Most packages create their own types to leverage multiple-dispatch in non-trivial ways. Furthermore, when it is appropriate, these same types implement the interfaces above and users won’t be able to tell that they are not Julia arrays unless they print the type on the screen.

Can you point out which verbs are implemented? What is your vision for what can be possibly done with time series data? If that vision is not clearly externalized and made explicit with an interface, people won’t have clarity about where things should go in terms of research and development.

I will definitely take a look. I think that it would be interesting to benchmark ControlSystemIdentification.jl as well with guilhermebodin’s StateSpaceModels.jl and TSAnalysis.jl. I am not suggesting we should merge them, but it would be interesting to understand what is the most efficient way to implement these methods - at least, at their simplest form.

I will release more details with the new version - soon :slight_smile:

1 Like

¹ TimeSeries.jl is a very nice name, but I fear that there might be already a git project with this name. I will check the register when I come back from the break. I thought about TimeSeriesAnalysis.jl when I started the project, but it sounded a bit too long for my taste. I am open to suggestions though :slight_smile:

I, too, vote for TimeSeriesAnalysis.jl :slightly_smiling_face:

I don’t think that TimeSeriesAnalysis.jl is all that long - it is very clear as to what it ia aimed at, it seems to me.

A small update. I released a separate package for penalised vector autoregressions compatible with incomplete data. This is a replication code for the empirical application of a new paper of mine (linked in the Git page). This new package is not registered and it probably never will. Over time, I will merge some of the new features in TSAnalysis.jl.

I will soon start again updating TSAnalysis.jl. I plan to rename it either at the next update or at the following one.

2 Likes

Did you decide on a name for the package?

In electrical engineering (and probably other fields), a distinction is made between signals and systems, where a system essentially is a transformation between two sets of signals.

“Time Series” sounds like the focus is on signals. In systems theory and control engineering, the focus is more on systems, hence “Systems Identification” and similar phrases are more common.

Even when there is only one signal/an input signal, one might think of the signal as generated from some stochastic signal which is unknown. Thus, control engineers tend to even consider a time series (a signal) to be a system.

In any way, the choice of package name probably reflects the field of the developer/the history of her/his field.

I am undecided between a few names, including TimeSeriesAnalysis. I suppose I will choose on the spot and on the basis of the development plan. Maybe it would be wise to limit a bit the scope of the project to an area of time series, rather than having a package for all possible models.

However, I would really like to have “time series” in the name, as an indication of my field.

1 Like

I have been following this discussion thread. A time series analysis package for Julia would be an excellent addition to the Julia community!

As I have it understood, there are packages in Julia, like TimeModels.jl under JuliaStats, but nothing specific to time series, as you are doing, @fipelle.

Thank you, and I am very much looking forward to this :smiley: .

2 Likes

@fipelle Is there any way you could provide an “auto.arima”-like function, as can be found in R?

I did not like how after running an MCMC on my differenced time series data, I wasn’t getting the results I was expecting. Sometimes, the “standard error” doesn’t apply, in which case, I can always try a Student’s T random variable, with more degrees of freedom, yet…

It’s tough getting the right p, d, q parameters. I have already plotted the correlograms and found some seasonal behavior in the data. Should I try a periodogram also? For reference, I was following this online guide: ARIMA Models with Turing.jl. Using the Probabilistic Programming… | by Saumya Shah | Towards Data Science.

2 Likes