How to enforce constraints with SymbolicRegression.jl

Hello,

I am interested in whether symbolic regression could be used to learn or approximate likelihood functions of simulation-based models. One approach might be approximate the pdf from simulated data using kernel density estimation, and use SymbolicRegression.jl to find a suitable equation. The code below shows that this can be done in a simple case with an exponential distribution.

I was wondering if it is possible to define constraints. In this particular case, an important constraint is that the function integrates to 1 with respect to the data input. I would image that adding constraints might be useful in a variety of applications.

Summary
using SymbolicRegression
import MLJ: machine, fit!, predict, report
using SymbolicUtils

# data input, rate parameter
X = (x = rand(1000) * 5, λ = rand(1000) * 3)

# pdf of exponential distribution 
y = @. X.λ * exp(-X.λ * X.x)

model = SRRegressor(
    niterations=100,
    binary_operators=[*,-],
    unary_operators=[exp],
)

mach = machine(model, X, y)

fit!(mach)

r = report(mach)
r.equations[r.best_idx]

eq = node_to_symbolic(r.equations[r.best_idx], model; variable_names=["x", "λ"])
simplify(eq)

pdfs are terribly hard to just cook up with random functions because of the positivity and integration constraints. I would suggest using something like kernel densities or maximum entropy densities if you want something flexible and data-driven. Could also try to adaptively fit mixtures of parametric distributions.

1 Like