Too many precompilations

Hi all,

I am implementing genetic proggraming (GP) in julia. In GP, I need to evaluate mathematical expressions represented by tree structures.
This is extracted and simplified code relating to the evaluation.

struct FuncNode{S, T, U}
    func::S
    args::Tuple{T, U}
end

struct ConstNode{T <: Real}
    value::T
end

function evaluate(node::FuncNode)
    arg1 = evaluate(node.args[1])
    arg2 = evaluate(node.args[2])
    return node.func(arg1, arg2)
end

add_trees(tree1, tree2) = FuncNode(+, (tree1, tree2))
    
evaluate(node::ConstNode) = node.value

tree1 = FuncNode(+,
            (FuncNode(+, 
                (ConstNode(1), 
                 ConstNode(2))),
            FuncNode(-,
                (ConstNode(3),
                 ConstNode(4)))))
tree2 = FuncNode(/, (ConstNode(1), ConstNode(1)))

@show evaluate(tree1)
@show evaluate(add_trees(tree1, tree2))

In an original code, there is a function to create trees and thousands of trees are generated randomly and then evaluated. When I run the original code with --trace-compile=stderr , it seems many precompilation occur when creating and evaluating trees like this:

( I picked up some messages, because there are too many messages......)
precompile(Tuple{Type{JuliaGSGP.FuncNode{T, S, U} where U where S where T}, JuliaGSGP.PrimitiveFunction{typeof(Base.:(+))}, Tuple{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(*))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(+))}, typeof(JuliaGSGP.protected_div)}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(JuliaGSGP.protected_div)}, typeof(Base.:(+))}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(JuliaGSGP.protected_div)}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, typeof(JuliaGSGP.protected_div)}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(JuliaGSGP.protected_div)}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(*))}, typeof(Base.:(+))}, typeof(JuliaGSGP.protected_div)}}})
precompile(Tuple{Type{JuliaGSGP.FuncNode{T, S, U} where U where S where T}, JuliaGSGP.PrimitiveFunction{typeof(Base.:(*))}, Tuple{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(*))}, typeof(Base.:(+))}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(+))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, typeof(JuliaGSGP.protected_div)}}})
precompile(Tuple{Type{JuliaGSGP.FuncNode{T, S, U} where U where S where T}, JuliaGSGP.PrimitiveFunction{typeof(Base.:(+))}, Tuple{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(*))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(+))}, typeof(Base.:(+))}}})
precompile(Tuple{Type{JuliaGSGP.FuncNode{T, S, U} where U where S where T}, JuliaGSGP.PrimitiveFunction{typeof(JuliaGSGP.protected_div)}, Tuple{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(*))}, typeof(Base.:(+))}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(+))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, typeof(JuliaGSGP.protected_div)}, typeof(Base.:(*))}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(*))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(-))}, JuliaGSGP.FuncNode{JuliaGSGP.Variable{Int64}, JuliaGSGP.Variable{Int64}, typeof(Base.:(+))}, typeof(Base.:(+))}, typeof(Base.:(+))}}})

This compilations slowing down the performance. I think this is because FuncNode is parameterized by various tree structures, but I’m not sure.
What approach can I take to deal with this problem?

Are you trying to learn the language? Because I am positive that there will be some package for genetic algorithms out there, like this one: GitHub - wildart/Evolutionary.jl: Evolutionary & genetic algorithms for Julia

Also JuliaHub is your friend: JuliaHub

Yes, since the type of a tuple is built from the type of their elements. If you want to keep it more generic, use Vector here instead. If that’s too slow and you know you’ll only deal with operators with a small number of arguments, try StaticArrays.jl.

I know there are some packages for GA and I used them for reference, but I didn’t know JuliaHub. It looks useful! Thanks!

I don’t think using StaticArrays will work,
under the hood a static array is a tuple built up of the type of its elements.

2 Likes

Yes, but I believe SVector{Union{FuncNode,ConstNode},2} is a concrete type and easier to infer than a long chain of nested FuncNode types. Also only leads to two specializations of evaluate - on FuncNode and ConstNode. That of course leads to a degraded inferred total return type.

I think what you should do is just drop the type-parameters.

struct FuncNode
    func
    args::Tuple
end

struct ConstNode
    value::Real
end

The reason to make them concretely typed via type-parameters is so that the compiler can generate specialized code for that data structure,
so it can do things like knowing that node.func(evalate(node.args[1]), evaluate(node.args[2])) is actually +(Int, Int), which can actually be inlined to the LLVM call to the intrinstical for integer addition.

It is to let the compiler compile specialized methods, so that it can make it fast for if you are going to call it a lot of times.
At the cost of much more expensive first call time, as it has to first run compilation, then actually run it.

If it is type parameteric then basically every call to evaluate(node::FuncNode) is going to need to compile new code, since almost every combination of {S, T, U} will be unique.
In contrast, if you don’t have it then only one method specializationm of evaluate(node::FuncNode) will need to be compiled.

Now, revisiting what i said before.
The point of all this compilation is to generate faster code for particular inputs, so that calling them again and again will be fast.
At the cost of much higher cost for first time to run, because of the compilation.

Bui you are doing genetic programming.
Basically every FuncNode repesents program that will only be run exactly once for evaluation.
it doesn’t get run again and again.
So the specialization basically only hurts.
The things that specialization can do in this case are not super powerful anyway, in this case, i don’t think.

Potentially you mighjt want to particually specialize, just on the func and ion the ConstNode,
since you won’t see too many different ones of those.

struct FuncNode{S}
    func::S
    args::Tuple
end

struct ConstNode{T<:Real}
    value::T
end

You could even go a bit further have have FuncNode be an abstract type,
with different concreate types depending on if at a leaf or in a branch in the expression tree.
Since it might well be that you have a lot of outer leafs that occur in many programs that might be worth specalizing

abstract type FuncNode end
struct LeafFuncNode{S, A <: ConstNode, B<:ConstNode} <: FuncNode
    func::S
    args::Tuple{A, B}
end

struct BranchFuncNode{S} <: FuncNode
    func::S
    args::Tuple
end

FuncNode(func, (a, b)) = BranchFuncNode(func, (a, b))
FuncNode(func, (a, b)::Tuple{ConstNode, ConstNode}) = LeafFuncNode(func, (a, b))

struct ConstNode{T<:Real}
    value::T
end

You can keep going down this pass with e.g. parents of 2 LeafFunNodes being specialized.
Which lets you trade off between speed of common subparts, vs compile time.
But i think it won’t do much because there is not all that much the compiler can do in specializeing evaluate to make if faster


I have heard it said that the process of learning julia type parameters/field typing is:

  1. First you think it doesn’t matter, and you have a bunch of abstractly typed fields
  2. Then you learn it matters, and you add a lot of type parameters, to avoid abstractly typed fields
  3. Then you realize that it actually doesn’t matter a lot of the time, and you have some abstractly typed fields and some non-abstractly typed fields.
6 Likes

@oxinabox It still feels conterintuitive to me - you would think that by specifying types you are HELPING the compiler. I guess you are, but probably not in the way that you think you are.

That’s very helpful. Just dropping type parameters, the code became several times faster!
Thanks a lot!

2 Likes