Using Base.llvmcall for cross language LTO

So I use a custom c abi with mlir 21.1.8 for the newer jit features not present in MLIR.jl and when using llvmcall for the saved jit thunks I get a version mismatch in the IR, i tried sanitizing the unknown features from the IR for julia but this is very difficult and was wondering if the wrapper doesnt trigger the MLIR jit I should be able to use julia internal llvm completely for c sources but templates, c++ callbacks still route through the mlir 21 jit. I get perfect results from inlining the c++ with the julia jit through llvmcall. C++ can hit julia speed since its julias jit doing the work and just casting the data from the .bc, and shared library. This also opens the door for AD across language boundries since traditional ccall stopped the julia jit from following in the IR

For function int add(int a, int b):

function add(a::Cint, b::Cint)::Cint
    if !isempty(LTO_IR)
        return Base.llvmcall((LTO_IR, "_Z3addii"), Cint, Tuple{Cint, Cint}, a, b)
    else
        return ccall((:_Z3addii, LIBRARY_PATH), Cint, (Cint, Cint), a, b)
    end
end

If we compile C/C++ to LLVM bitcode (.bc) and hand it to Base.llvmcall, Julia’s own LLVM JIT compiles it. The C/C++ IR becomes visible to Julia’s optimization pipeline β€” inlining,
SROA, vectorization, and AD all work as if the code were native Julia:

Bitcode loaded once at module parse time

const LTO_IR_PATH = joinpath(@__DIR__, "mylib_lto.bc")
const LTO_IR = isfile(LTO_IR_PATH) ? read(LTO_IR_PATH) : UInt8[]

function add(a::Cint, b::Cint)::Cint
    if !isempty(LTO_IR)
        # Julia's JIT compiles this β€” full optimization, AD-transparent
        return Base.llvmcall((LTO_IR, "_Z3addii"), Cint, Tuple{Cint, Cint}, a, b)
    else
        # Fallback: traditional opaque FFI
        return ccall((:_Z3addii, LIBRARY_PATH), Cint, (Cint, Cint), a, b)
    end
end

The (LTO_IR, β€œ_Z3addii”) form of llvmcall takes a bitcode module and a function name. Julia’s LLVM parses the bitcode, finds the function, and inlines it directly into the calling
code. The C++ literally runs at Julia speed because it is Julia’s JIT doing the work.

For complex C++ that needs the MLIR 21 JIT (virtual methods, template instantiations), we compile MLIR→LLVM IR, sanitize, and assemble AOT thunk bitcode so even those calls go through
llvmcall:

Complex C++ with virtual dispatch β€” still goes through Julia’s JIT

function call_virtual_method(obj::Ptr{MyClass}, x::Cdouble)::Cdouble
    inner_ptrs = Ptr{Cvoid}[Ptr{Cvoid}(obj), reinterpret(Ptr{Cvoid}, Ref(x))]
    if !isempty(THUNKS_LTO_IR)
        return Base.llvmcall(
            (THUNKS_LTO_IR, "_mlir_ciface_MyClass_method_thunk"),
            Cdouble, Tuple{Ptr{Ptr{Cvoid}}}, inner_ptrs)
    else
        ccall((:_mlir_ciface_MyClass_method_thunk, THUNKS_LIBRARY_PATH),
              Cdouble, (Ptr{Ptr{Cvoid}},), inner_ptrs)
    end
end

The real payoff is that AD tools can now differentiate through C/C++ code. Since llvmcall makes the IR visible to Julia’s compiler, Enzyme (or any LLVM-level AD) can follow the data
flow straight through what used to be an opaque ccall wall. This opens the door to differentiating mixed Julia/C++ codebases without manually writing adjoints for every foreign function.

Tier | Median (ns) | ns / iter | Note

pure_julia | 676,738.0 | 0.677 | Julia @inbounds loop with native add
bare_ccall_loop | 1,800,310.0 | 1.800 | Julia loop β€” bare ccall in a typed function
wrapper_ccall_loop | 2,025,930.0 | 2.026 | Julia loop calling ccall wrapper(no LTO)
lto_llvmcall_loop | 677,078.0 | 0.677 | Julia loop with LTO
whole_loop_in_cpp | 997,147.0 | 0.997 | Single ccall to C++ accumulate_array

1 Like

To be extremely clear, there’s nothing novel about this. This approach had been demonstrated in the paper Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation, by @wsmoses @vchuravy and their collaborators, and has been discussed a few times already here on Discourse. As mentioned in Is there an equivalent to cross-language link time optimization via LLVM? - #18 by vchuravy (you even intervened in that thread), the main challenge is an infrastructural one: you need to make sure you use compatible versions of LLVM to compile every piece of code, to be able to merge all bitcodes.

No but a pkg that does it automatically through a toml config is really novel

Quoting from your README

This requires the user to have a local toolchain installed

which is the entire problem, and I’m missing how that’s solved β€œautomatically”.

well if you just install the required toolchains then yes the toolchains do it automatically, hence needing the toolchains, regular ccall wrapping still works regardless through julias internal llvm and clang.jl, full_assert fallback tools but if you want full llvmcall and advanced c++ features in the wrappers like boost or eigen then you need the extra toolchain and dialect for mlir. also RepliBuild handles the toolchains internally, wether you have the extra toolchains or just standard julia pkgs, its very smart architecture with very defensive fallbacks. The extra toolchains are simply llvm, mlir from whatever pkg manager you use to obtain any major 21 version of that toolchain, the dialect has a built in script through cmake since it uses tablegen to build into the dialect for the mlir jit to generate the callback thunks and aot thunks for julia, these are advanced features not handled by julias llvm or corresponding .jl c abi wrapper versions.

[project]
name = "lua_wrapper"
version = "0.1.0"
root = "."

[dependencies]
# Using a specific commit for stability
[dependencies.lua]
type = "git"
url = "https://github.com/lua/lua.git"
tag = "v5.4.6"
# Build the library normally by excluding the amalgamator and the standalone executables
exclude = ["onelua.c", "lua.c", "luac.c", "ltests.c"]

[compile]
flags = ["-O2", "-fPIC"]

[link]
enable_lto = false
optimization_level = "2"

Replibuild uses a toml config to build and wrap sources. This is the minial toml to make for pulling and wrapping lua from source.

RepliBuild.build("replibuild.toml")

RepliBuild.wrap("replibuild.toml")

include("julia/LuaWrapper.jl")

using .LuaWrapper

L = LuaWrapper.luaL_newstate()
LuaWrapper.luaL_openlibs(L)
LuaWrapper.lua_version(L)

No manual edits at all, full bindings. Thats what i mean by automatic and sense this is just c source no extra toolchains needed. pure julia from the box