[ANN] DynamicExpressions.jl: Fast evaluation of runtime-generated expressions without compilation

Need to work with a large space of symbolic expressions which are unknown at runtime, but numerically evaluate them just as quickly as if they were hard-coded?

Then you need DynamicExpressions.jl!

dynamic_expressions_logo

DynamicExpressions.jl is the newly-exposed backbone of the symbolic regression libraries SymbolicRegression.jl and PySR. These are machine learning algorithms which discover equations which optimize an objective, rather than tuning parameters in a flexible pre-defined model. Refactoring the underlying data structure into its own minimal library will make it easier for others to use and extend.

A dynamic expression is a snippet of code that changes throughout runtime - compilation is not possible! DynamicExpressions.jl does the following:

  1. Defines an enum over user-specified operators.
  2. Using this enum, it defines a very lightweight and type-stable data structure for arbitrary expressions.
  3. It then generates specialized evaluation kernels for the space of potential operators.
  4. It also generates kernels for the first-order derivatives, using Zygote.jl.
  5. It can also operate on arbitrary other types (vectors, tensors, symbols, strings, or even unions) - see last part below.

It also has import and export functionality with SymbolicUtils.jl, so you can move your runtime expression into a CAS!

using DynamicExpressions

operators = OperatorEnum(; binary_operators=[+, -, *], unary_operators=[cos])

x1 = Node(; feature=1)
x2 = Node(; feature=2)

expression = x1 * cos(x2 - 3.2)

X = randn(Float64, 2, 100);
expression(X) # 100-element Vector{Float64}

(We can construct this expression with normal operators, since calling OperatorEnum() will @eval new functions on Node that use the specified enum.)

Speed

First, what happens if we naively use Julia symbols to define and then evaluate this expression?

@btime eval(:(X[1, :] .* cos.(X[2, :] .- 3.2)))
# 117,000 ns

This is quite slow, meaning it will be hard to quickly search over the space of expressions. Let’s see how DynamicExpressions.jl compares:

@btime expression(X)
# 693 ns

Much faster! And we didn’t even need to compile it. (Internally, this is calling eval_tree_array(expression, X, operators) - where operators has been pre-defined when we called OperatorEnum()).

If we change expression dynamically with a random number generator, it will have the same performance:

@btime begin
    expression.op = rand(1:3)  # random operator in [+, -, *]
    expression(X)
end
# 842 ns

Now, let’s see the performance if we had hard-coded these expressions:

f(X) = X[1, :] .* cos.(X[2, :] .- 3.2)
@btime f(X)
# 708 ns

So, our dynamic expression evaluation is about the same (or even a bit faster) as evaluating a basic hard-coded expression! Let’s see if we can optimize the speed of the hard-coded version:

f_optimized(X) = begin
    y = Vector{Float64}(undef, 100)
    @inbounds @simd for i=1:100
        y[i] = X[1, i] * cos(X[2, i] - 3.2)
    end
    y
end
@btime f_optimized(X)
# 526 ns

The DynamicExpressions.jl version is only 25% slower than one which has been optimized by hand into a single SIMD kernel! Not bad at all.

More importantly: we can change expression throughout runtime, and expect the same performance. This makes this data structure ideal for symbolic regression and other evaluation-based searches over expression trees.

Derivatives

We can also compute gradients with the same speed:

operators = OperatorEnum(;
    binary_operators=[+, -, *],
    unary_operators=[cos],
    enable_autodiff=true,
)
x1 = Node(; feature=1)
x2 = Node(; feature=2)
expression = x1 * cos(x2 - 3.2)

We can take the gradient with respect to inputs with simply the ' character:

grad = expression'(X)

This is quite fast:

@btime expression'(X)
# 2894 ns

and again, we can change this expression at runtime, without loss in performance!

@btime begin
    expression.op = rand(1:3)
    expression'(X)
end
# 3198 ns

Internally, this is calling the eval_grad_tree_array function, which performs forward-mode automatic differentiation on the expression tree with Zygote-compiled kernels. We can also compute the derivative with respect to constants:

result, grad, did_finish = eval_grad_tree_array(expression, X, operators; variable=false)

or with respect to variables, and only in a single direction:

feature = 2
result, grad, did_finish = eval_diff_tree_array(expression, X, operators, feature)

Generic types

Does this work for only scalar operators on real numbers, or will it work for MyCrazyType?

I’m so glad you asked. DynamicExpressions.jl actually will work for arbitrary types! However, to work on operators other than real scalars, you need to use the GenericOperatorEnum <: AbstractOperatorEnum instead of the normal OperatorEnum. Let’s try it with strings!

x1 = Node(String; feature=1) 

This node, will be used to index input data (whatever it may be) with either data[feature] (1D abstract arrays) or selectdim(data, 1, feature) (ND abstract arrays). Let’s now define some operators to use:

my_string_func(x::String) = "Hello $x"

operators = GenericOperatorEnum(;
    binary_operators=[*],
    unary_operators=[my_string_func],
    extend_user_operators=true)

Now, let’s create an expression:

tree = x1 * " World!"
tree(["Hello", "Me?"])
# Hello World!

So indeed it works for arbitrary types. It is a bit slower due to the potential for type instability, but it’s not too bad:

@btime tree(["Hello", "Me?"])
# 1738 ns

Tensors

Does this work for tensors, or even unions of scalars and tensors?

Also yes! Let’s see:

using DynamicExpressions

T = Union{Float64,Vector{Float64}}

c1 = Node(T; val=0.0)  # Scalar constant
c2 = Node(T; val=[1.0, 2.0, 3.0])  # Vector constant
x1 = Node(T; feature=1)

# Some operators on tensors (multiple dispatch can be used for different behavior!)
vec_add(x, y) = x .+ y
vec_square(x) = x .* x

# Set up an operator enum:
operators = GenericOperatorEnum(;binary_operators=[vec_add], unary_operators=[vec_square], extend_user_operators=true)

# Construct the expression:
tree = vec_add(vec_add(vec_square(x1), c2), c1)

X = [[-1.0, 5.2, 0.1], [0.0, 0.0, 0.0]]

# Evaluate!
tree(X)  # [2.0, 29.04, 3.01]

Note that if an operator is not defined for the particular input, nothing will be returned instead.

This is all still pretty fast, too:

@btime tree(X)
# 2,949 ns
@btime eval(:(vec_add(vec_add(vec_square(X[1]), [1.0, 2.0, 3.0]), 0.0)))
# 115,000 ns

Additional notes

Where would I use this instead of GeneralizedGenerated.jl or RuntimeGeneratedFunctions.jl?

These packages have slightly different use cases. DynamicExpressions.jl is more for exploring a very large space of possible functional forms, without ever caching or compiling individual expressions – it only ever compiles the operators used, and an general evaluation function to traverse (very lightweight) expression trees. Thus, the first time evaluating a particular expression will be just as fast as the second time (assuming you’ve already compiled the required operators and traversal code).

DynamicExpressions.jl is less flexible: all expressions need to be stored as binary trees, and potential operators have to be known beforehand. GeneralizedGenerated.jl and RuntimeGeneratedFunctions.jl are for cases where you want to generate and compile entire Julia functions at runtime, and re-use them multiple times. Note that they should be compatible with DynamicExpressions.jl, I think, in cases where you want to generate operators, and then use those operators in generated expressions.


Very interested in hearing people’s thoughts, feedback, and ideas. This library would not be here without the help of many contributors–including code, advice, and tips on this very forum– Kaze Wong, @mkitti, @ngam, @johanbluecreek, @marius311, @Julius_Martensen, @ChrisRackauckas, Charles Fox, @patrick-kidger, @jling, vnikoofard, @Skoffer, @Henrique_Becker, shashi, YingboMa, and probably many others I’m forgetting. Thank you!


Context: DynamicExpressions.jl is actually the result of SymbolicRegression.jl being split in two: the former is exclusively focused on the underlying expression structure for fast evaluation, and the latter will now only worry about evolution-based symbolic optimization (though this splitting is still in-process). There are multiple reasons for the split:

  1. Makes it easier for others to build symbolic regression libraries, as they can now import just the data structure and evaluation scheme, rather than the heavyweight SymbolicRegression.jl. I hope that this also will introduce new contributors to improve DynamicExpressions.jl so that we can all benefit.
  2. Lowers code complexity and increases modularity. It should be easier to explore new interfaces with other Julia libraries.
  3. Will make it easier to include other types of operators, such as tensors (right now, only scalar operators are available), as the modification doesn’t need to immediately work with the entirety of SymbolicRegression.jl.
36 Likes

This is great. Thanks for splitting this out, I think it’s a utility that could be more generally useful for partial evaluations with Symbolics too.

1 Like

I have a new package FastSymbolicDifferention.jl (not yet registered but soon) that generates fast symbolic derivatives. It can compute Jacobians relatively quickly but LLVM chokes on the large expressions that arise when computing the Jacobian of a function with 10⁾-10⁡ expression nodes.

Would DynamicExpressions work well for expressions in this size range?

Also, you say all expressions are stored as binary trees. Does this mean that common subexpressions will be reevaluated? in this expression

(x²+y²)*(x²+y²)

would x²+y² evaluate twice? If so, this would not work since the hard expressions to differentiate tend to have lots of common subexpressions. Blowing them out into a binary tree would exponentially increase computation.

1 Like

Great questions -

I have a new package FastSymbolicDifferention.jl (not yet registered but soon) that generates fast symbolic derivatives. It can compute Jacobians relatively quickly but LLVM chokes on the large expressions that arise when computing the Jacobian of a function with 10⁾-10⁡ expression nodes.

Would DynamicExpressions work well for expressions in this size range?

Seems like it can handle it; here’s an expression with 1,000,000 nodes: random operators, variables, and constants:

julia> using DynamicExpressions

julia> operators = OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(cos, sin), enable_autodiff=true);

julia> x1, x2, x3 = [Node(Float64; feature=i) for i in 1:3];

julia> c1, c2, c3 = [Node(; val=1.0), Node(; val=1.0), Node(; val=1.0)];

julia> tree = reduce((l, r) -> rand([+, *, -])(l, r), [rand([x1, x2, x3, c1, c2, c3]) for _=1:500000]);

julia> tree = copy(tree);

julia> sum(t -> 1, tree)  # or just count_nodes(tree)
999463

julia> tree_mapreduce(t->1, (p, c...)->p+max(c...), tree)  # or just count_depth(tree)
986

so there are about a million nodes in this expression tree, and it is about a thousand levels deep. We can evaluate it without any simplification:

julia> X = rand(3, 10_000) .* 0.1;

julia> tree(X, operators)[1:5]
5-element Vector{Float64}:
 -72.72113426886858
 -78.97590570024889
 -61.4416796428703
 -71.68626949219757
 -76.85580160707025

The speed isn’t too bad considering nestedness of this expression:

julia> @btime tree(X, operators) seconds=30;
  4.578 s (500755 allocations: 54.34 MiB)

LoopVectorization.jl helps a tiny bit too:

julia> @btime tree(X, operators; turbo=true) seconds=30;
  4.345 s (500755 allocations: 54.34 MiB)

Keep in mind this doesn’t do any sort of symbolic simplification. It is just directly traversing the expanded expression tree.

Here is the gradient of this expression, batched over the 10_000 input points:

julia> @btime tree'(X, operators) seconds=60;
  25.954 s (4497593 allocations: 149.05 GiB)

(maybe not bad for an uncompiled 1e6 expression binary tree? At least it doesn’t crash…)

I haven’t had a chance to implement n-ary nodes yet but it should be easy, especially with the recent change to have a single function handle all the tree traversal: https://github.com/SymbolicML/DynamicExpressions.jl/tree/master/src/base.jl, and all other functions just call it (so to implement an n-ary node, you’d only need to change a few places). That would let you have more nodes with less depth.

Also, you say all expressions are stored as binary trees. Does this mean that common subexpressions will be reevaluated? in this expression

(x²+y²)*(x²+y²)

would x²+y² evaluate twice?

A few notes:

Simplification

DynamicExpressions.jl doesn’t simplify automatically, so if you write this expression out literally like this (rather than (x²+y²)²), it would not check for you to see whether an expression is repeated (would be too slow to need to check every single subtree), and would evaluate it as-is. DynamicExpressions.jl has an interface with SymbolicUtils.jl, so you could simplify the expression before evaluation. You could also write your own custom rules on the space of trees using tree_mapreduce or the other utilities.

(DynamicExpressions.jl isn’t built for any types or operators per se - you can even work with expressions over strings or unions or anything else)

Acyclic graphs

I say binary trees, but note that DynamicExpressions.jl technically represents expressions as acyclic graphs. This lets you have shared subexpressions. For example:

x1 = Node(; feature=1)
base_tree = cos(x1) - 0.3 * x1
tree = exp(base_tree) - base_tree

This expression will only have 7 nodes in memory (=(1+1+1+1+1+1)), rather than 14 nodes (=(2*1+1+1+1+1)*2+1+1). In other words, x1 and base_tree both have multiple parents. In the data structure this means that the exp node and the - node simply reference the same child object.

For example, the operation

copy_node(tree; preserve_sharing=true)

will copy this above expression, also copying the pointer topology (which is performed with an IdDict). Thus the copy will have the same sharing of subexpression. You could use

copy_node(tree; preserve_sharing=false)

to break the pointer topology, resulting an expression that is a literal binary tree with 14 independent nodes.

(Keep in mind that shared nodes like this make simplification much trickier! I turn simplification off by default if I am using this feature)

I am gradually implementing utilities for working on shared nodes, but the main tree traversal function tree_mapreduce is aware of shared nodes.

Evaluation

A lot of the reason the evaluation is fast is because I fuse operators together into compiled kernels. For example, cos(x1 - 3.2) is not solved like

y1 <- x[feature]
y2 <- [constant]
y3 <- y1 - y2
y4 <- cos(y3)

but instead it gets solved like

y1 <- cos(x[feature] - [constant])

inline in a single @turbo loop. What happens here is that DynamicExpressions.jl exploits multiple dispatch to created fused kernels for all possible subexpressions of a certain size, because I assume they appear often enough to be worth compiling kernels for each. However, I only go up to a couple operators at a time because I found fusing 3+ operators hurt performance rather than helped it due to the massive number of possible kernels it needed to compile and then infer.

Right now I do not cache array outputs of nodes. For the expressions I typically work with, it is significantly slower to do so rather than simply re-computing each node (because it doesn’t require additional intermediate allocations). Even if there are many subexpressions shared, it is much faster to just re-traverse rather than create additional cached arrays.

That being said, if you need this we could chat about adding it as a flag, as I’m sure there are some applications where it is necessary!

1 Like

That sounds great. I store expressions as acyclic graphs so it should be straightforward to transform to DynamicExpression’s.

Adding DynamicExpressions output is on my todo list.

1 Like

I have time now to look at adding DynamicExpressions to my fast symbolic differentiation package. How does speed scale as the number of operator types increases? For example, if

OperatorEnum(; binary_operators=(+, *, /, -), unary_operators=(cos, sin)

was more like this:

unary_operators=(deg2rad, rad2deg, transpose, asind, log1p, acsch,
                 acos, asec, acosh, acsc, cscd, log, tand, log10, csch, asinh,
                 abs2, cosh, sin, cos, atan, cospi, cbrt, acosd, acoth, acotd,
                 asecd, exp, acot, sqrt, sind, sinpi, asech, log2, tan, exp10,
                 sech, coth, asin, cotd, cosd, sinh, abs, csc, tanh, secd,
                 atand, sec, acscd, cot, exp2, expm1, atanh, gamma,
                 loggamma, erf, erfc, erfcinv, erfi, erfcx, dawson, digamma,
                 trigamma, invdigamma, polygamma, airyai, airyaiprime, airybi,
                 airybiprime, besselj0, besselj1, bessely0, bessely1)
binary_operators = (+, -, *, /, //, \, ^)

The easiest way to incorporate DynamicExpressions would be to set up the operator types once at package load time. But if this leads to slow code I could examine each graph to extract just the operator types present in the graph and then specify the operator types on the fly.

It won’t slow the evaluation at all. It will just result in more kernels being compiled. There will be roughly O((n + n^2)*m) kernels compiled, where n is the number of operators and m is the number of types. But you can precompile those too.

While the type is called an “enum,” it’s actually a vector of operators rather than a tuple, as I found there was no performance difference, and it made precompilation possible.

For example, here’s the single-operator degree=2 kernel:

You can see that it only dispatches based on the function (op), rather than the OperatorEnum. Thus you can totally precompile all the kernels you need.