My generated Julia function from a niche DSL language takes 4min to compile

I have an application that generates Julia code from a niche DSL language. The generated code is correct and returns the same results as the DSL interpreter. One problem, though, is that for a real-world application, the main building block function (the inner loop of an aggregate type function) is about 8.3k lines of code. The first time I run this function with some input data, it takes ~4min to compile.

I have attached a very contrived example that can be tested like this:

generated_code.jl.zip

f = include("generated_code.jl")
f(1:1201, nothing, 0.0, "1973-02-25", 113678.112, "EGP", 371.501, 12, 20, "grd", "grd", 79, 3.5, "grd", 0, "VALID", 0.0, "2003-11-03", "2023-11-02")
  • This example uses custom functions like rounddown and yearfrac, so it will crash, but you can still test it successfully since it will still take ~4min to compile.
  • This example is generated from a list of 318 DSL formulas. These formulas have very similar syntax (and functions) to Excel formula. They represent either time-vectors or a single value. They can reference each other but still be listed in any order (similar to Pluto blocks). Here is an example snippet for a few of these formulae:

Is there a better way for the generated function code to be structured that will lead to faster compilation? It’s already fantastic that Julia can replicate the DSL calculations; I just need some guidance on how to make it practical.

Thoughts:

  1. Within this function, many calculations can be clustered together and isolated into separate functions. Would it help the compiler if this generated function is just a combination of smaller functions?
  2. Is there a way for the compiler to ignore compilation and let the whole thing just be seen as a dynamic function (Any → Any)? And would that help in any way, i.e. allow a tradeoff between compilation time and calculation time?

I think this is fairly hard to answer after looking at the generated code. If you inspect

@code_warntype f(1:1201, nothing, 0.0, "1973-02-25", 113678.112, "EGP", 371.501, 12, 20, "grd", "grd", 79, 3.5, "grd", 0, "VALID", 0.0, "2003-11-03", "2023-11-02")

you’ll also see a lot of red lines and Anys. It’s a single (huuuuge) function, so I assume that the type inference is having a hard time to juggle around all those variables.

Maybe a bit of a “meta” question: do you need to convert lots of these procedures written in that DSL language, or is it just one (or a few) project(s) which you want to port to Julia? In the latter face, I’d certainly recommend to spend some hours/days and rewrite it.

Hi @tamasgal , no the goal is not to port a few projects, but to run existing and future projects through a Julia pipeline rather than the DSL interpreter. Automatic hands-off conversion is key.

Would breaking up the function into smaller functions make a difference? I’m thinking of generating a function for each formula (or clusters of intermixed formulas), with all the dependencies as arguments and all the variables as return values. Then the function would just be a collection of those little functions ordered in the correct evaluation order. I’m already busy working on this approach, so I’m just asking to know beforehand if I’m gonna be disappointed.

I am pretty sure that breaking it up in smaller functions will help overdoing it is probably also not the best idea :wink: They key here is the generation of code which makes type inference easy, read, type stable code.

Do you have an example how a small project written in that DSL language looks like?

If I may ask: what’s the purpose of that DSL language? Why not providing a Julia interface? Julia is great in creating DSLs.

Some observations:

You have tons of these which should be factored out into a function like getindex_with_fallback or so. Basically everything that has repeated logic like this should be factored out, so the compiler doesn’t have to redo all the work.

try
    i_cu_period_som[t]
catch e
    if e isa BoundsError
        0.0
    else
        rethrow(e)
    end
end

Then you return a 300 or so element NamedTuple, types like that tend to compile for a loong time as well. Might want to switch to a Dict unless you require that type for other functions. As it is, this doesn’t infer anyway so callers wouldn’t benefit.

Replace all of those with fill(value, n) where you precompute n once because iterating over T so many times is unnecessary.

r_bel_cnv_cf_claims_investment_som = [0 for t in T]

You also have a bunch of these: tpx_som = Vector(undef, length(T)) which are not type-stable. Try to replace those with typed versions.

2 Likes

@tamasgal I’ll try to get a small example going

I’ll also try to best explain the goal of this DSL (or rather the full platform that uses this DSL). It is a bit of a strange one. Basically, my company does a lot of actuarial consulting for different companies. These companies want summarised results that can be extended and played with (like an Excel pivot report), but also be clicked through to the underlying hard numbers and calculations that build up these summarised results. Usually a clients finances are displayed using a nested company structure where the root is the summarised finances over the whole company, the branches the aggregation over certain groups/departments/products, and the outermost leaves single products (like a life insurance scheme or something). The clients will most likely have to share this deliverable with an external auditing firm and a lot of internal parties will want to work their way through the results (and calculations)

The modeling platform basically parses a collection of variable libraries (each library representing a nodes on the so-called company hierarchy) and then exports it to a collection of dynamic spreadsheets that then holds all the calculations, all the summarisation, and all the higest-level pivot tables and views. Each number is dynamic in such a way that you can go from the highest summary and click your way through to one of the lowest products that was used in this calculation; you can basically Michael Burry your way through. The clients can, with very little technical knowledge, also make side calculations and change underlying results to see what effect that will have on dependant results, giving them a tool to quickly check simple company scenarios.

Here is an example of one of the lower calculations:

Why not use Julia all the way through?

The main reasons are:

  • Since day one we had plans to move to an alternative/modern approach, but we end up winning so many contract with this full audit trail approach (the clients end up saving costs on auditing)
  • The clients we work with (and their service providers) are intimately familiar Excel interfaces, and openly prefer these results to other dashboards and online hosted platforms
  • The clients owns the full calculation chain, if our paths diverge they can easily rebuild their models on any platform they like (ease of mind for both parties)
  • All SQL queries and data sources that goes into preforming the calculations are also stored as tables, creating a type of historical archive of everything that goes into calculating the results, which will still be accessible in 10/20 years time without hosting anything

Without looking at the context, I believe you should be able to replace this particular block with get(i_cu_period_som,t,0.0), which will behave better anyway since it isn’t based on a try/catch. Try/catch isn’t something you should need to use very often in pure Julia.

I’ll echo what you and others have suggested, the compiler will likely perform better if you can divide your code into functions. We can’t promise this up-front, but others have looked at your code and seen that it likely suffers from poor type inference. Proper use of functions can greatly improve this, resulting in faster compilation and execution. This is on top of any benefit from avoiding repeated compilation on repeated code.

Thanks @jules the try/catch BoundsError and the [... for t in T] are currently temporary fixes. Would the try/catch contribute to better compile time? f yes, then I’ll fix that now, if no, then I’m going to finish up with the function-splitting.

  • I changed my tuple return to a Dict, it doesn’t make a compile difference (or runtime difference), but I am still going to keep it a Dict as suggested
  • Unfortunately the Vector(undef, length(T)) cannot easily be generated as something type stable (but taking a good look is on my todo list). Those vectors gets filled with a generated pattern. That pattern comes from the input and can be something silly like ["A", 1+1im, 1.0, 1, true][ceil(Int, 5rand())]

How much of this will contribute to compiler inefficiency v.s. runtime inefficiency?

I’ll give some feedback after the function splitting is done