Using Base.llvmcall for cross language LTO

obsidianjulua · March 7, 2026, 3:24pm

So I use a custom c abi with mlir 21.1.8 for the newer jit features not present in MLIR.jl and when using llvmcall for the saved jit thunks I get a version mismatch in the IR, i tried sanitizing the unknown features from the IR for julia but this is very difficult and was wondering if the wrapper doesnt trigger the MLIR jit I should be able to use julia internal llvm completely for c sources but templates, c++ callbacks still route through the mlir 21 jit. I get perfect results from inlining the c++ with the julia jit through llvmcall. C++ can hit julia speed since its julias jit doing the work and just casting the data from the .bc, and shared library. This also opens the door for AD across language boundries since traditional ccall stopped the julia jit from following in the IR

For function int add(int a, int b):

function add(a::Cint, b::Cint)::Cint
    if !isempty(LTO_IR)
        return Base.llvmcall((LTO_IR, "_Z3addii"), Cint, Tuple{Cint, Cint}, a, b)
    else
        return ccall((:_Z3addii, LIBRARY_PATH), Cint, (Cint, Cint), a, b)
    end
end

If we compile C/C++ to LLVM bitcode (.bc) and hand it to Base.llvmcall, Julia’s own LLVM JIT compiles it. The C/C++ IR becomes visible to Julia’s optimization pipeline — inlining,
SROA, vectorization, and AD all work as if the code were native Julia:

Bitcode loaded once at module parse time

const LTO_IR_PATH = joinpath(@__DIR__, "mylib_lto.bc")
const LTO_IR = isfile(LTO_IR_PATH) ? read(LTO_IR_PATH) : UInt8[]

function add(a::Cint, b::Cint)::Cint
    if !isempty(LTO_IR)
        # Julia's JIT compiles this — full optimization, AD-transparent
        return Base.llvmcall((LTO_IR, "_Z3addii"), Cint, Tuple{Cint, Cint}, a, b)
    else
        # Fallback: traditional opaque FFI
        return ccall((:_Z3addii, LIBRARY_PATH), Cint, (Cint, Cint), a, b)
    end
end

The (LTO_IR, “_Z3addii”) form of llvmcall takes a bitcode module and a function name. Julia’s LLVM parses the bitcode, finds the function, and inlines it directly into the calling
code. The C++ literally runs at Julia speed because it is Julia’s JIT doing the work.

For complex C++ that needs the MLIR 21 JIT (virtual methods, template instantiations), we compile MLIR→LLVM IR, sanitize, and assemble AOT thunk bitcode so even those calls go through
llvmcall:

Complex C++ with virtual dispatch — still goes through Julia’s JIT

function call_virtual_method(obj::Ptr{MyClass}, x::Cdouble)::Cdouble
    inner_ptrs = Ptr{Cvoid}[Ptr{Cvoid}(obj), reinterpret(Ptr{Cvoid}, Ref(x))]
    if !isempty(THUNKS_LTO_IR)
        return Base.llvmcall(
            (THUNKS_LTO_IR, "_mlir_ciface_MyClass_method_thunk"),
            Cdouble, Tuple{Ptr{Ptr{Cvoid}}}, inner_ptrs)
    else
        ccall((:_mlir_ciface_MyClass_method_thunk, THUNKS_LIBRARY_PATH),
              Cdouble, (Ptr{Ptr{Cvoid}},), inner_ptrs)
    end
end

The real payoff is that AD tools can now differentiate through C/C++ code. Since llvmcall makes the IR visible to Julia’s compiler, Enzyme (or any LLVM-level AD) can follow the data
flow straight through what used to be an opaque ccall wall. This opens the door to differentiating mixed Julia/C++ codebases without manually writing adjoints for every foreign function.

Tier | Median (ns) | ns / iter | Note

pure_julia | 676,738.0 | 0.677 | Julia @inbounds loop with native add
bare_ccall_loop | 1,800,310.0 | 1.800 | Julia loop — bare ccall in a typed function
wrapper_ccall_loop | 2,025,930.0 | 2.026 | Julia loop calling ccall wrapper(no LTO)
lto_llvmcall_loop | 677,078.0 | 0.677 | Julia loop with LTO
whole_loop_in_cpp | 997,147.0 | 0.997 | Single ccall to C++ accumulate_array

giordano · March 7, 2026, 5:47pm

To be extremely clear, there’s nothing novel about this. This approach had been demonstrated in the paper Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation, by @wsmoses @vchuravy and their collaborators, and has been discussed a few times already here on Discourse. As mentioned in Is there an equivalent to cross-language link time optimization via LLVM? - #18 by vchuravy (you even intervened in that thread), the main challenge is an infrastructural one: you need to make sure you use compatible versions of LLVM to compile every piece of code, to be able to merge all bitcodes.

obsidianjulua · March 7, 2026, 6:25pm

github.com/obsidianjulua/RepliBuild.jl

docs/architecture.md

main

# RepliBuild.jl — Architecture

> ABI-aware C/C++ compiler bridge for Julia, powered by MLIR.

RepliBuild compiles C/C++ source through an LLVM/MLIR pipeline, introspects DWARF debug metadata, and emits type-safe Julia bindings with correct struct layout, enum definitions, and calling conventions. Functions requiring non-trivial ABI handling are automatically routed through a custom MLIR dialect and JIT tier.

---

## System Overview

```
┌──────────────────────────────────────────────────────────────────────────┐
│                          User API (3 functions)                         │
│                                                                         │
│    discover("path/")          build("replibuild.toml")                  │
│    ─── scan & configure ───   ─── compile & link ───                    │
│                                                                         │
│                               wrap("replibuild.toml")                   │
│                               ─── introspect & emit Julia module ───    │
└─────────────┬───────────────────────────┬───────────────────────────────┘

This file has been truncated. show original

No but a pkg that does it automatically through a toml config is really novel

giordano · March 7, 2026, 6:43pm

Quoting from your README

This requires the user to have a local toolchain installed

which is the entire problem, and I’m missing how that’s solved “automatically”.

obsidianjulua · March 7, 2026, 7:23pm

well if you just install the required toolchains then yes the toolchains do it automatically, hence needing the toolchains, regular ccall wrapping still works regardless through julias internal llvm and clang.jl, full_assert fallback tools but if you want full llvmcall and advanced c++ features in the wrappers like boost or eigen then you need the extra toolchain and dialect for mlir. also RepliBuild handles the toolchains internally, wether you have the extra toolchains or just standard julia pkgs, its very smart architecture with very defensive fallbacks. The extra toolchains are simply llvm, mlir from whatever pkg manager you use to obtain any major 21 version of that toolchain, the dialect has a built in script through cmake since it uses tablegen to build into the dialect for the mlir jit to generate the callback thunks and aot thunks for julia, these are advanced features not handled by julias llvm or corresponding .jl c abi wrapper versions.

obsidianjulua · March 7, 2026, 7:41pm

[project]
name = "lua_wrapper"
version = "0.1.0"
root = "."

[dependencies]
# Using a specific commit for stability
[dependencies.lua]
type = "git"
url = "https://github.com/lua/lua.git"
tag = "v5.4.6"
# Build the library normally by excluding the amalgamator and the standalone executables
exclude = ["onelua.c", "lua.c", "luac.c", "ltests.c"]

[compile]
flags = ["-O2", "-fPIC"]

[link]
enable_lto = false
optimization_level = "2"

Replibuild uses a toml config to build and wrap sources. This is the minial toml to make for pulling and wrapping lua from source.

RepliBuild.build("replibuild.toml")

RepliBuild.wrap("replibuild.toml")

include("julia/LuaWrapper.jl")

using .LuaWrapper

L = LuaWrapper.luaL_newstate()
LuaWrapper.luaL_openlibs(L)
LuaWrapper.lua_version(L)

No manual edits at all, full bindings. Thats what i mean by automatic and sense this is just c source no extra toolchains needed. pure julia from the box

Topic		Replies	Views
Is there an equivalent to cross-language link time optimization via LLVM? General Usage	30	690	March 6, 2026
How MLIR bridges Julia and other Languages General Usage	0	185	March 16, 2026
Ccall for LTO libraries with LLVM backend Internals & Design	0	363	March 29, 2021
Should Julia use MLIR in the future? Internals & Design question	20	4639	November 27, 2025
[ANN] RepliBuild.jl - A full C/C++ interop toolkit for tiered FFI generation Package Announcements	9	531	March 28, 2026

Using Base.llvmcall for cross language LTO

Bitcode loaded once at module parse time

Complex C++ with virtual dispatch — still goes through Julia’s JIT

Related topics