SymbolicRegression: Python -> Julia?

BLI · September 16, 2024, 2:05pm

I’m trying to use SymbolicRegression from Julia, based on some [SRRegressor] model in Python. Here is the Python model:

model_SK = PySRRegressor(
    batching=True,
    niterations=400,
    model_selection="accuracy",
    binary_operators=["+", "-", "*", "/", "^"],
    unary_operators=["log", "tan"],
    nested_constraints={'tan': {'tan': 0}, 'log': {'log': 0}},
    maxsize=30,
    timeout_in_seconds=60 * 5,
)

The following Julia code works – but I’ve had to skip 2 keywords…:

model_SK = SRRegressor(
    batching=true,
    niterations=400,
    #model_selection=:accuracy,
    binary_operators=[+,-,*,/,^],
    unary_operators=[log, tan],
    #nested_constraints={tan: {tan: 0}, log: {log: 0}},
    maxsize=30,
    timeout_in_seconds=60.0 * 5, 
)

I don’t find (Julia) documentation for the (Python) keywords model_selection and nested_constraints…, nor what the right-hand side means :-o.

Questions:
A. What are the keyword names in Julia for model_selection and nested_constraints?
B. What would be the Julia equivalence of their RHS values (i.e., "accuracy" and {tan: {tan: 0}, log: {log: 0}}, respectively)?

Without these two keywords, the data vs. prediction is still pretty good:

gdalle · September 16, 2024, 4:38pm

@MilesCranmer this one’s for you

MilesCranmer · September 16, 2024, 7:29pm

For nested_constraints:

nested_constraints=[tan => [tan => 0], log => [log => 0]]

For model_selection, there is instead the parameter selection_method::Function. You pass a function that takes kwargs trees, losses, scores, complexities, and returns an integer for the chosen equation.

BLI · September 16, 2024, 7:50pm

Clear answer – I’ll figure out the selection_method thing.

When I fit the model, I get an “ugly” expression:

I can get out the function by:

r.equations[r.best_idx]

which gives the function with variable names equal to the names in the named tuple of the “X”-data,

How can I create a function out of this?

yhat(p_r,T_r) = r.equations[r.best_idx]

doesn’t work.

I’m sure there is a simpler way of making this into a function than typing it manually…

MilesCranmer · September 17, 2024, 2:59am

r.equations[i] is a callable object although you need to manually format the inputs into a single array (and transpose it). Also, once BREAKING: Change expression types to `DynamicExpressions.Expression` (from `DynamicExpressions.Node`) by MilesCranmer · Pull Request #326 · MilesCranmer/SymbolicRegression.jl · GitHub merges, life will be much simpler. (Just working out some weird performance changes with that PR which have been tedious to track down.)

BLI · September 17, 2024, 7:28am

OK… my input data are:

# Data 
X = (X1 = x1, X2 = x2)

where x1 is a vector of values typically lying in the interval [0,30], while x2 is a vector of values typically lying in the interval [1.5,2.5].

I next want to try with some data point [x1, x2], say [15, 1.5], which was not used in the model fit. I try with:

julia> r.equations[r.best_idx]([15,1.5]')
2-element Vector{Float64}:
 NaN
 NaN

So obviously, I misunderstood something.

BLI · September 17, 2024, 4:56pm

OK: this may not be a recommended method, but it seems to work…

julia> eval(Meta.parse("yhat(p_r,T_r) = "*r.equation_strings[r.best_idx] ))
julia> yhat(0.8,1.5)
0.9015511099442255

MilesCranmer · September 17, 2024, 5:16pm

Can you share the specific equation where it is outputting NaNs? There are specific situations where it will choose to output NaN rather than propagate Inf (which might result in a finite value in a regular calculation).

BLI · September 18, 2024, 7:17am

Sorry for delay. Here it is:

julia> r.equations[r.best_idx]
log((((1.394181306920604 ^ p_r) - p_r) / (T_r / 0.27927367339196735)) + (T_r / 0.8914537841077675)) ^ 0.3399461332361616

julia> r.equations[r.best_idx]([15,1.5]')
2-element Vector{Float64}:
 NaN
 NaN

julia> eval(Meta.parse("sym_reg(p_r,T_r) = "*r.equation_strings[r.best_idx] ))

julia> sym_reg(15,1.5)
1.4946988237609764

MilesCranmer · September 18, 2024, 8:13am

Oh, the callable version you pass in as a batched array. So it should be like

num_rows = 1
num_features = 2
X = ones(num_features, num_rows)
X[1, 1] = 15
X[2, 1] = 1.5

And then pass that X in.

The reason it is returning NaN is because it is accessing undefined memory.

BLI · September 18, 2024, 10:41am

Aha. OK.

I’ll probably stick with the eval(Meta.parse...version, as it is simple to use :-o.

Topic		Replies	Views
Understanding Symbolic Regression? Modelling & Simulations	3	1825	May 7, 2023
Why isn’t Symbolic Regression used more? Machine Learning	10	2295	June 27, 2024
[ANN] SymbolicRegression.jl 1.0.0 - Distributed High-Performance Symbolic Regression in Julia Package Announcements package , symbolic-regression	24	1504	November 29, 2024
[ANN] SymbolicRegression.jl - distributed symbolic regression Package Announcements package , announcement , symbolic	22	3866	March 13, 2025
Subset sum problem with SymbolicRegressions.jl General Usage symbolic-regression	7	625	April 14, 2023

SymbolicRegression: Python -> Julia?

Related topics