Why Julia compilation times are so long in case of functions created using Symbolics?

I am relatively new in Julia, so maybe I am missing something trivial.
I created a file containing a function, using the Symbolics package

_, f = build_function(F, x)
write("function.jl", string(f))
myf = include("function.jl")

where F is a symbolic polynomial vector-valued function, and x is a symbolic vector.
When I try to call myf for the first time, Julia takes a lot of time (up to 1h) to give a result. On the other hand, from the second call on, myf only takes some nanoseconds. The same gigantic runtime is needed if I use precompile. So, I think it is the compiler requiring a lot of time to compile myf.
Do you have any clue why this happens? And, in case, how can I mitigate it?

PS: Below, you can find a bit of function.jl, to give an idea of what is inside it

function (ËŤâ‚‹out, ËŤâ‚‹arg1)
begin
    begin
        @inbounds begin
                    ËŤâ‚‹out[1] = 1.0       
                    [...]
                    ËŤâ‚‹out[48] = ËŤâ‚‹arg1[47]
                    ËŤâ‚‹out[49] = (+)(-0.7071067811865475, (*)(0.7071067811865475, (^)(ËŤâ‚‹arg1[1], 2)))
                    ËŤâ‚‹out[50] = (*)(ËŤâ‚‹arg1[1], ËŤâ‚‹arg1[2])
                    ËŤâ‚‹out[51] = (*)(ËŤâ‚‹arg1[1], ËŤâ‚‹arg1[3])
                    [...]
                    ËŤâ‚‹out[74308] = (*)((*)((*)(ËŤâ‚‹arg1[36], ËŤâ‚‹arg1[4]), ËŤâ‚‹arg1[47]), ËŤâ‚‹arg1[6])
                    ËŤâ‚‹out[74309] = (*)((*)((+)(-0.7071067811865475, (*)(0.7071067811865475, (^)(ËŤâ‚‹arg1[37], 2))), ËŤâ‚‹arg1[4]), ËŤâ‚‹arg1[6])
                    ËŤâ‚‹out[74310] = (*)((*)((*)(ËŤâ‚‹arg1[37], ËŤâ‚‹arg1[38]), ËŤâ‚‹arg1[4]), ËŤâ‚‹arg1[6])
                    [...]
                    nothing        
        end
    end
end
end

Not sure where this comes from, but presumably the expression swell that is sometimes inevitable with symbolic computing. You could try FastDifferentiation.jl, which seems designed to scale a little better

How many _-out[...] = ... lines are there in total? How large do _-out and _-arg1 have to be for this to work?

Unfortuntely, _-out can be quite large, up to 100k elements. _-arg1 is a relatively smaller vector, with less than 50 elements

Sounds large enough to be relevant to this then Compile time and memory usage grows quickly with function size · Issue #19158 · JuliaLang/julia · GitHub

1 Like

Can I ask why it makes sense to use a symbolic approach with such large vectors? What is the end goal?

Symbolics allows me to write automatically the function I need. Each element of out is a product of special polynomials, each with a different variable, and the symbolic approach seems the faster and easier approach to manipulate them.
Just to clairify what I am doing, this is an MWE on how I compute F:

using Polynomials, SpecialPolynomials, Symbolics
n = 4
N = 1000
pmax = 10

A = rand(0:pmax, N, n)

x = Symbolics.variables(:x, 1:n)

H = convert.(Polynomial, basis.(Chebyshev, 0:pmax)) # For clarity's sake, I am using only Chebyshev, but it can be any special polynomial
HH = H[A.+1]
F = prod([HH[i,j](x[j]) for (i,j) in Iterators.product(axes(HH)...)], dims=2)

Any suggestion to improve this approach will be very welcome

This could be the answer. Thank you.
So I am wondering, since I am not planning to change this function that often, is there any way to compile the function once for all, and cache it some way, so that Julia does not compile it any time I run it?

I’ve seen people successfully abusing GCC to compile megabyte-sized math formulas in C. If it’s possible to translate your code to C, I’d try making a shared library with GCC and loading it in Julia.

You could try putting it in a package and setting it to precompile.

__precompile__(true)

Module Initialization and Precompilation

Edit: Added another link

1 Like

Assuming this is worth caching to reuse many times between edits, precompile and PrecompileTools.jl for packages, and PackageCompiler.jl are the ways to cache compiled code. A package is necessary to specify the environment and isolate the compiled code to a global scope; there’s no point in reusing cached code that fails to change with the environment or is invalid because of arbitrary code ran before or afterward in the same global scope.

Still, you should continue looking into improving the compile-time or find out if it is even feasible, I wouldn’t know.

Compilation of 100k lines of code functions currently isn’t optimized with type inference. One thing you could do is try the CTarget backend:

I don’t know if that supports ^ operations, but the rest of the polynomial should be handled. So a small contribution to the CTarget generator may be all that’s required to get your case to generate C here, and following that tutorial using GCC you should be able to take a noticable bite out of it.

Barring that, there’s a lot of ongoing work in JuliaSimCompiler around this kind of function building but this exact case is not handled at this point.

4 Likes

Thank you for all the suggestion.
I tried the approach suggested by @ChrisRackauckas (i.e., build the function using CTarget, and compile the code using gcc). It works. If no optimization flag is set, compilation times are about 30 seconds and execution times are acceptable. As soon as I activate any optimization (even only with -O1), compilation time skyrockets again.
I’ll keep investigating.

Higher than -O1 isn’t helpful. What it’s doing is running an SLP vectorization pass which (a) takes forever on codes with lots of scalarization and (b) does not help in these kinds of codes. In fact, we see it slow down the runtime of these codes in many instances. So keep the optimization level lower, use GCC, and you should be good.

2 Likes

I wonder if a similar effect can be accomplished with the julia command flags --compile=min or --optimize=0. The docs isn’t detailed enough for me to understand what’s really happening there, and this doesn’t seem practical because it’ll affect the whole session, not a specified call signature. There is Base.Experimental.@optlevel for modules, which is still a bit broad for comfort.

We are looking into it but it might take awhile to solve

You can try SLPs from Oscar Introduction · Oscar.jl (oscar-system.org). Though I have not used it myself, and do not know how efficient the compilation is.

1 Like