DataDrivenDiffEq package - Choosing the Basis

ccccc · June 7, 2022, 8:02pm

I’m trying to use the package DataDrivenDiffEq.jl, and I’ve worked through many of the examples. My problem is that I’m not sure how to choose a basis for any given problem (or the algorithm). I wondered if anyone has any good resources for learning about this, and when each applies?

ChrisRackauckas · June 7, 2022, 9:24pm

It really is just domain-specific knowledge. Use one of the basis-free approaches if you don’t have a good idea for one.

ccccc · June 8, 2022, 12:43pm

Thanks, I will try that, but i’m also interested in really understanding this. My problem is that I’m not a mathematician, so I don’t really understand what the basis is, or what it does. That makes it difficult to find the information I need to input the correct code.

For example:
Ψ = Basis([u; u[1]^2], u, independent_variable = t)
res = solve(prob, Ψ, DMDPINV(), digits = 1)

Here I don’t understand what u[1]^2] is doing, or DMDPINV()

basis = Basis(polynomial_basis(u, 5), u, iv = t)
opt = STLSQ(exp10.(-5:0.1:-1))

here I don’t understand what the opt= is doing. I can guess what the polynomial basis is, but I don’t know why I would choose 5 instead of another number. I couldn’t get this to work even with a simple predator prey model.

My eventual problem will I think be a DAE, and I saw a response on another thread that implicit SINDy could be good for that?

basis= Basis([u^i for i in 0:4], [u])

opt = ADM(1e-1)
Ψ = ISInDy(X, DX, basis, opt = opt, maxiter = 100, rtol = 0.9)

Again, i’m not sure how to know if u^i would be the right choice, and what ADM(1e-1) means. Iterations and tolerance I can probably figure out through trial and error.

I don’t need a heavy mathematical explanation, but it would be great if somewhere there’s a tutorial that just explains simply and conceptually the steps that go into making the choice. It also doesn’t have to be for Julia specifically.

The Bayesian parameter identifier in Turing is also a possibility, I can figure out prior distributions more easily, but I like that this one identifies the equations too, and not sure if the Turing one is even meant for DAEs?

ChrisRackauckas · June 8, 2022, 12:46pm

It’s saying that the basis is u[1], u[2], ..., u[n], u[1]^2. DMDPINV() is the algorithm. opt = STLSQ(exp10.(-5:0.1:-1)) that’s the choice of optimization method.

I’ll get Julius to update this.

Julius_Martensen · June 8, 2022, 1:05pm

The basis in the case of any of the SINDy or Koopman related algorithms gives you a lifted - or to put in in the terms of machine learning - feature space. The overall model is assumed to be

y_i = \sum_{k = 1}^n w_{i,k} * \psi_k(x, p, t, c)

where y_i is the i-th measured variable (in the case of an ODE y = \partial_t x(t) , so the time derivative), x denotes the independent variables ( states ), p the parameters, t the time and c a control input.

\psi denotes a function which takes all these inputs and maps it to a scalar. So what we are doing here is finding a linear combination (weighted sum) of all possible features to approximate the signal to find.

The source code you are using seems to be outdated. Have you looked at the current docs ?

In the case of an implicit basis, we have something like

0= \sum_{k = 1}^n w_{i,k} * \psi_k(y, x, p, t, c)

which is solved by iterating over all possible candidates for a left hand and find the sparse solution

\psi_j(y, x, p, t, c) = \sum_{k = 1, k \neq j }^n w_{i,k} * \psi_k(y, x, p, t, c)

In the context of “meaning”, try to think about the basis as a set of assumptions.

A robot with rotary joints almost certainly has some terms containing trigonometric functions. Maybe a special friction model using either a sign, tanh or something similar which depends on the position and the velocity?
Saturating processes ( e.g. in biology or chemistry ) will reach a limit ( so an implicit function is likely to occur ).

I’ll add a little more context in here.

Julius_Martensen · June 8, 2022, 1:17pm

opt is defining the optimizer, so the underlying optimisation problem and how to solve this.
5 in this case is chosen rather random or by try and error and is a hyperparameter of the model, which might need tuning ( depending on the structure of the problem, basis functions ). The same goes for all the different options like : preprocessing ( e.g. numerical differentiation of the data ), denoising, normalisation etc. .

The predator prey model is not a simple as one might think, since it strongly depends on the data and has to fit the equations quite perfectly. I have two students who look into the robustness of SINDy on exactly this example and even with small noise scales ( 1e-2 ) or slightly off measurements the identification becomes quite difficult.

To be fair, I also tend to use it as a benchmark quite often, but maybe start with something like a pendulum

ccccc · June 8, 2022, 1:30pm

Thanks very much for this!

ccccc · June 19, 2022, 1:35pm

p.s. for anyone else coming in with the kind of conceptual questions I had above, this video series seems excellent, it runs through it step by step:

Topic		Replies	Views
DataDrivenDiffEq.jl for estimating DAE system of equations from data New to Julia package , diffeq , sciml	1	490	December 7, 2021
Using @parameters with DataDrivenDiffEq General Usage	4	311	January 29, 2023
DimensionMismatch: arrays could not be broadcast to a common size: a has axes Base.OneTo(5) and b has axes Base.OneTo(0) General Usage	6	87	February 23, 2025
DataDrivenProblem strange error General Usage	10	299	December 14, 2022
Koopman inference fails for simple example while SINDy works fine Modelling & Simulations koopman	1	592	March 30, 2022

DataDrivenDiffEq package - Choosing the Basis

Related topics