SymbolicRegression: Python -> Julia?

I’m trying to use SymbolicRegression from Julia, based on some [SRRegressor] model in Python. Here is the Python model:

model_SK = PySRRegressor(
    batching=True,
    niterations=400,
    model_selection="accuracy",
    binary_operators=["+", "-", "*", "/", "^"],
    unary_operators=["log", "tan"],
    nested_constraints={'tan': {'tan': 0}, 'log': {'log': 0}},
    maxsize=30,
    timeout_in_seconds=60 * 5,
)

The following Julia code works – but I’ve had to skip 2 keywords…:

model_SK = SRRegressor(
    batching=true,
    niterations=400,
    #model_selection=:accuracy,
    binary_operators=[+,-,*,/,^],
    unary_operators=[log, tan],
    #nested_constraints={tan: {tan: 0}, log: {log: 0}},
    maxsize=30,
    timeout_in_seconds=60.0 * 5, 
)

I don’t find (Julia) documentation for the (Python) keywords model_selection and nested_constraints…, nor what the right-hand side means :-o.

Questions:
A. What are the keyword names in Julia for model_selection and nested_constraints?
B. What would be the Julia equivalence of their RHS values (i.e., "accuracy" and {tan: {tan: 0}, log: {log: 0}}, respectively)?

Without these two keywords, the data vs. prediction is still pretty good:

1 Like

@MilesCranmer this one’s for you

1 Like

For nested_constraints:

nested_constraints=[tan => [tan => 0], log => [log => 0]]

For model_selection, there is instead the parameter selection_method::Function. You pass a function that takes kwargs trees, losses, scores, complexities, and returns an integer for the chosen equation.

3 Likes

Clear answer – I’ll figure out the selection_method thing.

When I fit the model, I get an “ugly” expression:

I can get out the function by:

r.equations[r.best_idx]

which gives the function with variable names equal to the names in the named tuple of the “X”-data,

How can I create a function out of this?

yhat(p_r,T_r) = r.equations[r.best_idx]

doesn’t work.

I’m sure there is a simpler way of making this into a function than typing it manually…

r.equations[i] is a callable object although you need to manually format the inputs into a single array (and transpose it). Also, once BREAKING: Change expression types to `DynamicExpressions.Expression` (from `DynamicExpressions.Node`) by MilesCranmer · Pull Request #326 · MilesCranmer/SymbolicRegression.jl · GitHub merges, life will be much simpler. (Just working out some weird performance changes with that PR which have been tedious to track down.)

1 Like

OK… my input data are:

# Data 
X = (X1 = x1, X2 = x2)

where x1 is a vector of values typically lying in the interval [0,30], while x2 is a vector of values typically lying in the interval [1.5,2.5].

I next want to try with some data point [x1, x2], say [15, 1.5], which was not used in the model fit. I try with:

julia> r.equations[r.best_idx]([15,1.5]')
2-element Vector{Float64}:
 NaN
 NaN

So obviously, I misunderstood something.

OK: this may not be a recommended method, but it seems to work…

julia> eval(Meta.parse("yhat(p_r,T_r) = "*r.equation_strings[r.best_idx] ))
julia> yhat(0.8,1.5)
0.9015511099442255

Can you share the specific equation where it is outputting NaNs? There are specific situations where it will choose to output NaN rather than propagate Inf (which might result in a finite value in a regular calculation).

Sorry for delay. Here it is:

julia> r.equations[r.best_idx]
log((((1.394181306920604 ^ p_r) - p_r) / (T_r / 0.27927367339196735)) + (T_r / 0.8914537841077675)) ^ 0.3399461332361616

julia> r.equations[r.best_idx]([15,1.5]')
2-element Vector{Float64}:
 NaN
 NaN

julia> eval(Meta.parse("sym_reg(p_r,T_r) = "*r.equation_strings[r.best_idx] ))

julia> sym_reg(15,1.5)
1.4946988237609764

Oh, the callable version you pass in as a batched array. So it should be like

num_rows = 1
num_features = 2
X = ones(num_features, num_rows)
X[1, 1] = 15
X[2, 1] = 1.5

And then pass that X in.

The reason it is returning NaN is because it is accessing undefined memory.

1 Like

Aha. OK.

I’ll probably stick with the eval(Meta.parse...version, as it is simple to use :-o.