SymbolicRegression: Python -> Julia?

I’m trying to use SymbolicRegression from Julia, based on some [SRRegressor] model in Python. Here is the Python model:

model_SK = PySRRegressor(
    binary_operators=["+", "-", "*", "/", "^"],
    unary_operators=["log", "tan"],
    nested_constraints={'tan': {'tan': 0}, 'log': {'log': 0}},
    timeout_in_seconds=60 * 5,

The following Julia code works – but I’ve had to skip 2 keywords…:

model_SK = SRRegressor(
    unary_operators=[log, tan],
    #nested_constraints={tan: {tan: 0}, log: {log: 0}},
    timeout_in_seconds=60.0 * 5, 

I don’t find (Julia) documentation for the (Python) keywords model_selection and nested_constraints…, nor what the right-hand side means :-o.

A. What are the keyword names in Julia for model_selection and nested_constraints?
B. What would be the Julia equivalence of their RHS values (i.e., "accuracy" and {tan: {tan: 0}, log: {log: 0}}, respectively)?

Without these two keywords, the data vs. prediction is still pretty good:

1 Like

@MilesCranmer this one’s for you

1 Like

For nested_constraints:

nested_constraints=[tan => [tan => 0], log => [log => 0]]

For model_selection, there is instead the parameter selection_method::Function. You pass a function that takes kwargs trees, losses, scores, complexities, and returns an integer for the chosen equation.


Clear answer – I’ll figure out the selection_method thing.

When I fit the model, I get an “ugly” expression:

I can get out the function by:


which gives the function with variable names equal to the names in the named tuple of the “X”-data,

How can I create a function out of this?

yhat(p_r,T_r) = r.equations[r.best_idx]

doesn’t work.

I’m sure there is a simpler way of making this into a function than typing it manually…

r.equations[i] is a callable object although you need to manually format the inputs into a single array (and transpose it). Also, once BREAKING: Change expression types to `DynamicExpressions.Expression` (from `DynamicExpressions.Node`) by MilesCranmer · Pull Request #326 · MilesCranmer/SymbolicRegression.jl · GitHub merges, life will be much simpler. (Just working out some weird performance changes with that PR which have been tedious to track down.)

1 Like

OK… my input data are:

# Data 
X = (X1 = x1, X2 = x2)

where x1 is a vector of values typically lying in the interval [0,30], while x2 is a vector of values typically lying in the interval [1.5,2.5].

I next want to try with some data point [x1, x2], say [15, 1.5], which was not used in the model fit. I try with:

julia> r.equations[r.best_idx]([15,1.5]')
2-element Vector{Float64}:

So obviously, I misunderstood something.

OK: this may not be a recommended method, but it seems to work…

julia> eval(Meta.parse("yhat(p_r,T_r) = "*r.equation_strings[r.best_idx] ))
julia> yhat(0.8,1.5)

Can you share the specific equation where it is outputting NaNs? There are specific situations where it will choose to output NaN rather than propagate Inf (which might result in a finite value in a regular calculation).

Sorry for delay. Here it is:

julia> r.equations[r.best_idx]
log((((1.394181306920604 ^ p_r) - p_r) / (T_r / 0.27927367339196735)) + (T_r / 0.8914537841077675)) ^ 0.3399461332361616

julia> r.equations[r.best_idx]([15,1.5]')
2-element Vector{Float64}:

julia> eval(Meta.parse("sym_reg(p_r,T_r) = "*r.equation_strings[r.best_idx] ))

julia> sym_reg(15,1.5)

Oh, the callable version you pass in as a batched array. So it should be like

num_rows = 1
num_features = 2
X = ones(num_features, num_rows)
X[1, 1] = 15
X[2, 1] = 1.5

And then pass that X in.

The reason it is returning NaN is because it is accessing undefined memory.

1 Like

Aha. OK.

I’ll probably stick with the eval(Meta.parse...version, as it is simple to use :-o.