Is there an equivalent to cross-language link time optimization via LLVM?

Benny · February 13, 2026, 7:15pm

I’m reading about this between Rust and C/Fortran/etc, and here’s a simple example of constant folding from inlining an integer range summation from Rust into a C program with literal integer inputs. Obviously we don’t routinely deal with linking, but I’m wondering if there’s something between a ccall and llvmcall where a C library function is called and its LLVM IR is optimized together with the Julia code.

Discussion about other contrast and parallels (thin vs fat LTO versus the whole-program optimization in a Julia process) and when cross-language optimization is beneficial (constant folding is rare, inlining often doesn’t help) is welcome, lots to learn.

nsajko · February 13, 2026, 7:31pm

Sounds like a nice idea for a package.

gbaraldi · February 13, 2026, 7:32pm

IIRC enzyme has some version of this. But a package can probably do this without the whole enzyme autodiff.

GeorgeGkountouras · February 13, 2026, 8:48pm

Say I have compiled a dynamic library with juliac. Can I treat it as a static library for whole-program optimization when calling it from C++?

nsajko · February 13, 2026, 9:09pm

Dynamic libraries are not static libraries. Nothing to do with juliac.

Furthermore, for LTO the compiler needs more data (LLVM IR, I guess) than just the object code.

foobar_lv2 · February 13, 2026, 10:16pm

Is it that simple? Doesn’t this require a lot of internals support, to properly cooperate with the existing system?

I think that this would be a huge feature. Compile your shared library with clang -fembed-bitcode=all and have a ccall variant that permits cross-language inlining. Or maybe a flag for Libdl.dlopen, to look for bitcode sections in the library.

For starters, we could compile most of the julia runtime with that, and have most cheap runtime calls inlined.

For seconds, this would make a lot of intrinsics programming much nicer – basically because immintrin.h is well documented, while llvm processor intrinsics are an underdocumented mess. So this would enable us to just write the kernel in C (ever tried to use the juicy aesenc instructions from julia?)

giordano · February 13, 2026, 10:16pm

This was an example of attempting to autodiff in Julia with Enzyme code generated by Numba in Python: Wrong gradient of a `cfunc` decorated Numba function · Issue #2505 · EnzymeAD/Enzyme.jl · GitHub. But in general, once you have llvm byte code, the front-end language it was originally written in doesn’t matter anymore and you can do all the optimisations you want.

Benny · February 13, 2026, 10:52pm

Never did autodifferentiation let alone Enzyme, so noob question: is that issue implying that Enzyme works or is intended to work on LLVM IR across ccall boundaries? What went wrong there exactly, couldn’t tell how the working and failed functions differed after that Numba name-unmangling.

giordano · February 13, 2026, 11:01pm

Yes, Enzyme works at the LLVM level, which means it couldn’t care less about the frontend language: as long as it receives LLVM IR/bitcode, it can do anything. And Enzyme is “just” an optimisation pass, so the same applies to other passes.

If you want a more positive example:

giordano · February 13, 2026, 11:07pm

It has been suggested a few times in the past that BinaryBuilder enables -fembed-bitcode=all by default, but that a significant infrastructure change that we never did it in practice, but in theory it should be doable.

rayegun · February 14, 2026, 12:33am

It’s possible to do this manually in both directions, I have done a fair amount of experiments but it involved a lot of manual interaction with Clang and LLD. The biggest problem I ran into was caching and cache invalidation (of Julia code in the Julia → C direction which I investigated the most). That might be better now with things like CompilerCaching.jl Various other problems included mapping between architecture tuples (there’s some weird permutations of macOS/darwin when doing lto) and managing the bitcode in memory/on disk.

ClangCompiler.jl is/was an attempt at getting some of this working less manually but I’m not sure the status of that project now.

BB2 with the local compiler shards would also be a massive help here, managing the toolchain was a huge headache.

GeorgeGkountouras · February 14, 2026, 3:38pm

Let me rephrase. All C++ and julia code is known. C++ calls my_long_julia_computation(), and this can be resolved statically.

Is there a way to do whole-program optimization then?

nsajko · February 14, 2026, 5:37pm

Per the discussion above, seems like reasonable functionality, but no one bothered to implement it yet.

Benny · February 14, 2026, 5:46pm

While JuliaC’s README does mention producing intermediate code, cross-language LLVM compilation has not been implied to be a goal anywhere. If it eventually does happen, you’ll likely be using Clang to compile and link the C++ program that uses the JuliaC library. Fortunately there’s no reason for you to wait. my_long_julia_computation is unlikely to benefit from significant optimizations because long methods are extremely unlikely to be inlined or to benefit from inlining, and shaving a couple nanoseconds of a function call from a long-running computation is negligible. This topic is more about different languages trading small functions for which they don’t already have simple equivalents.

rayegun · March 3, 2026, 11:19pm

To clarify my statement. This is completely possible, you’ll just have to do the legwork.

Compile the C++ code with the right LTO flags using Clang (fembed-bitcode I believe is one you will need, although these flags are a little complex) into object files, and then use GPUCompiler.jl or JuliaC.jl to generate the bitcode files. Then use lld to do LTO.

If you’re looking for this capability to be built into Julia then you will probably be waiting a while though, it is not a trivial thing to manage that toolchain. I would expect in the long run that someone will build the tools to do this all in-memory but it’s a very arcane use-case with a pretty high maintenance overhead.

obsidianjulua · March 5, 2026, 12:30am

RepliBuild ingests C/C++ source code, compiles it through an LLVM/MLIR pipeline, introspects DWARF debug metadata, and emits type-safe Julia bindings with correct struct layout, enum definitions, and calling conventions. Functions that require non-trivial ABI handling (packed structs, unions, virtual dispatch) are automatically routed through a JIT tier built on a custom MLIR dialect. I copied this from a README in the RepliBuild.jl pkg if its relavent.

Benny · March 5, 2026, 7:54am

The zero-cost LTO option seems to fit the bill, but is the llvmcall input (mod_ir::String, entry_fn::String) backwards there?

const LTO_IR = isfile(LTO_IR_PATH) ? read(LTO_IR_PATH, String) : ""

function vector_dot(a::Ptr{Cvoid}, b::Ptr{Cvoid}, n::Cint)::Cdouble
   if !isempty(LTO_IR)
       return Base.llvmcall(("_Z10vector_dotPdPdi", LTO_IR),
                            Cdouble, Tuple{Ptr{Cvoid}, Ptr{Cvoid}, Cint},
                            a, b, n)
   else
...

That aside, automatically generating inlineable (I’m assuming) Julia functions wrapping llvmcalls sounds convenient, lets us write “pure” Julia after. C++ has templates, so I’m wondering now how those and other languages’ generic functions could be supported with some concrete types specified from Julia.

vchuravy · March 5, 2026, 8:37am

I did a prototype ages ago, which is what the Enzyme capabilities are based off LibBC.jl/src/LibBC.jl at master · vchuravy/LibBC.jl · GitHub

The general gist is that yes once you compile with Clang and -fembed-bitcode the shared library will contain a bit code section that you can read and parse.

The big challenge here is that you need to match Clang and Julia (LLVM bit code is version dependent) and you need to correctly implement the expected semantics of C/C++ (global variables being uniques).

It is something that currently works for simple libraries, but for anything large and complex it would take significant amount of work and then additional work on the infrastructure (to get shared libraries with the right bitcode). Personally I see better C++ interoperability through something like ClangCompiler.jl as a more impactful improvement.

obsidianjulua · March 5, 2026, 2:19pm

New: Cross-Language LTO (Link-Time Optimization)

Zero-Cost Abstractions via Base.llvmcall — When enable_lto = true, the compiler now emits an LLVM Bitcode payload (_lto.bc and _lto.ll). The generated Julia wrapper intercepts safe primitive/pointer FFI boundaries and dynamically loads the LLVM IR at parse-time, routing the execution through Base.llvmcall instead of ccall to allow Julia’s JIT to inline C++ code directly into Julia hot loops.

New: MLIR Ahead-Of-Time (AOT) Thunks

Static C++ Vtable Dispatch — Introduced aot_thunks flag in the configuration to statically compile MLIR JLCS thunks directly into .o artifacts, linking them into a native _thunks.so companion library during the build() phase.
Generated Wrapper.jl now conditionally emits purely static ccall bindings that bypass the JITManager runtime entirely for zero-overhead, statically-verifiable polymorphic execution.

obsidianjulua · March 5, 2026, 6:59pm

Declarative Template Resolution — Added templates and template_headers to the [types] config. The compiler automatically generates dummy C++ source files to force Clang to instantiate the requested types (e.g. std::vector<int>), guaranteeing they appear in the DWARF debug metadata for MLIR processing and FFI wrapping.

Topic		Replies	Views
Using Base.llvmcall for cross language LTO General Usage	5	139	March 7, 2026
Ccall for LTO libraries with LLVM backend Internals & Design	0	363	March 29, 2021
[ANN] RepliBuild.jl - A full C/C++ interop toolkit for tiered FFI generation Package Announcements	9	531	March 28, 2026
Why is usage of llvmcall so restricted? General Usage	12	709	June 7, 2024
Can you call Julia methods with LLVM call? Performance	15	2325	October 1, 2022

Is there an equivalent to cross-language link time optimization via LLVM?

New: Cross-Language LTO (Link-Time Optimization)

New: MLIR Ahead-Of-Time (AOT) Thunks

Related topics