So I use a custom c abi with mlir 21.1.8 for the newer jit features not present in MLIR.jl and when using llvmcall for the saved jit thunks I get a version mismatch in the IR, i tried sanitizing the unknown features from the IR for julia but this is very difficult and was wondering if the wrapper doesnt trigger the MLIR jit I should be able to use julia internal llvm completely for c sources but templates, c++ callbacks still route through the mlir 21 jit. I get perfect results from inlining the c++ with the julia jit through llvmcall. C++ can hit julia speed since its julias jit doing the work and just casting the data from the .bc, and shared library. This also opens the door for AD across language boundries since traditional ccall stopped the julia jit from following in the IR
For function int add(int a, int b):
function add(a::Cint, b::Cint)::Cint
if !isempty(LTO_IR)
return Base.llvmcall((LTO_IR, "_Z3addii"), Cint, Tuple{Cint, Cint}, a, b)
else
return ccall((:_Z3addii, LIBRARY_PATH), Cint, (Cint, Cint), a, b)
end
end
If we compile C/C++ to LLVM bitcode (.bc) and hand it to Base.llvmcall, Juliaβs own LLVM JIT compiles it. The C/C++ IR becomes visible to Juliaβs optimization pipeline β inlining,
SROA, vectorization, and AD all work as if the code were native Julia:
Bitcode loaded once at module parse time
const LTO_IR_PATH = joinpath(@__DIR__, "mylib_lto.bc")
const LTO_IR = isfile(LTO_IR_PATH) ? read(LTO_IR_PATH) : UInt8[]
function add(a::Cint, b::Cint)::Cint
if !isempty(LTO_IR)
# Julia's JIT compiles this β full optimization, AD-transparent
return Base.llvmcall((LTO_IR, "_Z3addii"), Cint, Tuple{Cint, Cint}, a, b)
else
# Fallback: traditional opaque FFI
return ccall((:_Z3addii, LIBRARY_PATH), Cint, (Cint, Cint), a, b)
end
end
The (LTO_IR, β_Z3addiiβ) form of llvmcall takes a bitcode module and a function name. Juliaβs LLVM parses the bitcode, finds the function, and inlines it directly into the calling
code. The C++ literally runs at Julia speed because it is Juliaβs JIT doing the work.
For complex C++ that needs the MLIR 21 JIT (virtual methods, template instantiations), we compile MLIRβLLVM IR, sanitize, and assemble AOT thunk bitcode so even those calls go through
llvmcall:
Complex C++ with virtual dispatch β still goes through Juliaβs JIT
function call_virtual_method(obj::Ptr{MyClass}, x::Cdouble)::Cdouble
inner_ptrs = Ptr{Cvoid}[Ptr{Cvoid}(obj), reinterpret(Ptr{Cvoid}, Ref(x))]
if !isempty(THUNKS_LTO_IR)
return Base.llvmcall(
(THUNKS_LTO_IR, "_mlir_ciface_MyClass_method_thunk"),
Cdouble, Tuple{Ptr{Ptr{Cvoid}}}, inner_ptrs)
else
ccall((:_mlir_ciface_MyClass_method_thunk, THUNKS_LIBRARY_PATH),
Cdouble, (Ptr{Ptr{Cvoid}},), inner_ptrs)
end
end
The real payoff is that AD tools can now differentiate through C/C++ code. Since llvmcall makes the IR visible to Juliaβs compiler, Enzyme (or any LLVM-level AD) can follow the data
flow straight through what used to be an opaque ccall wall. This opens the door to differentiating mixed Julia/C++ codebases without manually writing adjoints for every foreign function.
Tier | Median (ns) | ns / iter | Note
pure_julia | 676,738.0 | 0.677 | Julia @inbounds loop with native add
bare_ccall_loop | 1,800,310.0 | 1.800 | Julia loop β bare ccall in a typed function
wrapper_ccall_loop | 2,025,930.0 | 2.026 | Julia loop calling ccall wrapper(no LTO)
lto_llvmcall_loop | 677,078.0 | 0.677 | Julia loop with LTO
whole_loop_in_cpp | 997,147.0 | 0.997 | Single ccall to C++ accumulate_array