Question about DataDrivenDiffEq / UDE / Symbolic Regression failing for very sparse settings

Brian_Hartley · May 15, 2024, 8:20pm

Hello,

I’m playing around with symbolic regression and UDEs for a larger problem and have run into some difficulties. It seems that for various methods, the symbolic regression simply fails with zeros method errors and different array dimension errors (which are rather inscrutable to me). This seems to happen when the sparsity-related parameters are set too “high” – but this seems to defeat the purpose of the exercise, as the only solutions I can successfully obtain are really big expressions.

I post a simple example here where it fails with random matrices but could post my actual MWE / application which is training a UDE on an ODE that is scalable by dimension to check the recovery performance of interaction terms for various system sizes. I guess I naively thought that you can simply sparsify the regression however much you wanted, but unless I am making a silly mistake, this doesn’t seem to be the case.

Any guidance or resources to check on hyperparameter tuning would be helpful here.

Thanks!


using DataDrivenDiffEq, DataDrivenSparse, ModelingToolkit

n = 10 
T = 15
@variables u[1:n]
b = polynomial_basis(u, 2)
basis = Basis(b, u);

# random X 
X̂ = rand(n,T)

# simple relation to Y 
Ŷ = X̂ .+ 0.15 .* rand(n,T)


nn_problem = DirectDataDrivenProblem(X̂, Ŷ)

#λ_sparse = exp10.(-1:1:3)
#opt = STLSQ(λ_sparse)
opt = SR3(1e-1, 100.0) # <- doesn't work 
#opt = SR3(1e-3,100.0)  # <- works 

options = DataDrivenCommonOptions(maxiters = 10_000,
                                  normalize = DataNormalization(ZScoreTransform),
                                  selector = bic, digits = 2,
                                  data_processing = DataProcessing(split = 0.9,
                                                                   batchsize = 10,
                                                                   shuffle = true,
                                                                   rng = StableRNG(1111)))

nn_res = solve(nn_problem, basis, opt, options = options)
nn_eqs = get_basis(nn_res)
equations(nn_eqs)

ChrisRackauckas · June 15, 2024, 11:14am

This might just be an issue with the way the optimization is done. We need to change it to a convex optimization. Can you share this as an issue on DataDrivenDiffEq.jl so we can track it?

Brian_Hartley · July 9, 2024, 11:23pm

Sure, will post now! Sorry lost track of this.

Topic		Replies	Views
SymbolicRegression via DataDrivenDiffEq throws error Modelling & Simulations	1	33	November 3, 2024
How to use DataDrivenDiffEq.jl for simple Sparse Identification without ODE General Usage	2	457	December 5, 2022
DataDrivenDiffEq.jl sparse regression General Usage sciml , sindy	2	398	August 25, 2022
Problem setup for UODE testing Machine Learning diffeq	11	755	August 13, 2020
Model discovery using Sparse Regression on UDE model Modelling & Simulations diffeq , sciml	1	272	April 7, 2024

Question about DataDrivenDiffEq / UDE / Symbolic Regression failing for very sparse settings

Related topics