Hello,
I am interested in whether symbolic regression could be used to learn or approximate likelihood functions of simulation-based models. One approach might be approximate the pdf from simulated data using kernel density estimation, and use SymbolicRegression.jl to find a suitable equation. The code below shows that this can be done in a simple case with an exponential distribution.
I was wondering if it is possible to define constraints. In this particular case, an important constraint is that the function integrates to 1 with respect to the data input. I would image that adding constraints might be useful in a variety of applications.
Summary
using SymbolicRegression
import MLJ: machine, fit!, predict, report
using SymbolicUtils
# data input, rate parameter
X = (x = rand(1000) * 5, λ = rand(1000) * 3)
# pdf of exponential distribution
y = @. X.λ * exp(-X.λ * X.x)
model = SRRegressor(
niterations=100,
binary_operators=[*,-],
unary_operators=[exp],
)
mach = machine(model, X, y)
fit!(mach)
r = report(mach)
r.equations[r.best_idx]
eq = node_to_symbolic(r.equations[r.best_idx], model; variable_names=["x", "λ"])
simplify(eq)