Here is some discussion about C++ templates, and a different language that solves some pain points of using templates. This section in particular stood out to me
Run-time Templates
One of the biggest steps taken in Loci is to switch templates from a compile-time feature to a run-time feature (i.e. each function computes its template arguments at run-time). This completely removes the need for the compiler to see template function implementations when it’s compiling their uses. Now instantiation occurs (in principle) at run-time, the compiler just has to emit a call with an extra parameter.
At this point I usually have to explain how run-time templates can be as fast as compile-time templates. Clearly, we’re not just always evaluating templates at run-time because that would have to be slower.
This is when optimisations step into the picture. Loci compiler’s code generation will emit LLVM IR. Then LLVM’s optimisations will start working and, where it can, you’ll see the compile-time template instantiations effectively happening as the optimisation passes inline the templated functions. Importantly, the optimisations will only do this when it benefits performance , so you won’t be paying for unnecessary code bloat or hitting unnecessary cache misses.
And that’s the point: by switching templates from a forced compile-time mechanism to an allowed-to-be-at-run-time mechanism we give the compiler more flexibility. This means that if you want to do a fast incremental build the compiler can omit most optimisations and give you a very speedy compile time. However if you want to sacrifice compile-time to get better run-time performance, we can do even better here as well.
I do not think it is saying that the language runtime environment requires a compiler. but I am not sure. IT does seem to be describing staged programming, and I am wondering if anyone can elaborate on this optimization strategy as it applies to the Julia compiler, which also can decide to e.g. specialize a function vs do dynamic dispatches.