Excited to announce SymbolicRegression.jl, a high-performance package for learning equations via regularized evolution! It supports distributed computing, allows user-defined operators (including discontinuous ones), and can export to SymbolicUtils.jl.

Here’s a demo:

You can install it with `Pkg.add("SymbolicRegression")`

Check out the repo and documentation. There’s also a Python frontend: PySR.

The package is built using lightweight binary tree that allow for quick allocation of new equations, and pre-compiled memory-efficient evaluation of arbitrary equations with degree=1,2 operators. It uses a combination of regularized evolution, simulated annealing, and gradient-free optimization (via Optim.jl). Importantly there are no gradients used in the package, which allows one to work with discontinuous functions such as logical operators, and operators with singularities in general.

SymbolicRegression.jl is built for regression problems - you want to find an analytic function f:\mathbb{R}^m\rightarrow\mathbb{R} such that f(X_{i, :}) \approx y_i \forall i for an input X and y. For finding differential equations via gradient descent, you should check out SINDy which is implemented in DataDrivenDiffEq.jl.

Symbolic regression via evolution like in SymbolicRegression.jl, in general, scales quite badly with the number of input features—this is the drawback of avoiding gradients. To get around this you can either estimate feature importances and only include the most important, or, to work with a very large number of input features (such as an N-Body dataset), you can try this method: [2006.11287] Discovering Symbolic Models from Deep Learning with Inductive Biases, which has code here (blogpost here). Essentially you first fit a neural network to your data with a sparsity regularization, and then fit equations to the trained sparse network. This also allows one to apply other (differentiable) constraints to a learned model - e.g., one can learn Lagrangians by training a Lagrangian Neural Network, and then fitting equations to the Lagrangian term in the learned model.

Thanks to @patrick-kidger and @ChrisRackauckas for helping make the code more generic and easy-to-use, @shashi @mason and @yingboma for building SymbolicUtils.jl (used for internal simplification and export), and @marius311 @Skoffer @Henrique_Becker for advice on this forum which led to big improvements in speed!

Would love to hear any feedback, tips, ideas, etc.

Cheers!

Miles