[ANN] RepliBuild.jl - A full C/C++ interop toolkit for tiered FFI generation

RepliBuild.jl, a declarative compiler bridge designed to automatically generate hyper-optimized FFI bindings directly from C and C++ source code.
​Instead of relying on fragile header parsing, RepliBuild drives a local compilation pipeline and extracts structural DWARF debug metadata directly from the generated objects. It uses this pure structural data to automatically synthesize a tiered FFI boundary, generating the safest and fastest possible dispatch for every function.
​Tiered FFI Generation
​RepliBuild classifies and routes functions automatically based on their complexity:
​Zero-Cost Abstractions (Base.llvmcall): When cross-language Link-Time Optimization (LTO) is enabled, the compiler emits LLVM Bitcode (.bc). The generated Julia wrapper dynamically loads this bitcode at parse-time, routing execution entirely through Base.llvmcall instead of standard ccall. This allows Julia’s JIT to seamlessly inline C/C++ code directly into Julia hot loops.
​Tier 1 Safe ccall & GC Preservation: For standard FFI boundaries, it emits highly structured ccall bindings. It automatically generates idiomatic mutable struct wrappers equipped with GC-traced finalizers and Base.unsafe_convert methods to guarantee memory safety across the boundary.
​Tier 2 MLIR / AOT Thunks: For complex C++ paradigms (like virtual method dispatch, packed structs, and large struct returns), execution is routed transparently through an MLIR JIT or statically compiled Ahead-Of-Time (AOT) thunks.

Declarative Workflow & Caching
​There is zero manual wrapper boilerplate required. You define the project in a single TOML file:
​replibuild.toml

[dependencies.lua]
type = "git"
url = "https://github.com/lua/lua.git"
tag = "v5.4.6"

[compile]
aot_thunks = true

[link]
enable_lto = true

[wrap]
language = "c"

On the julia side

# Resolves dependencies, builds, wraps, and globally caches the artifacts
RepliBuild.register("replibuild.toml") 

# Instantly loads the cached module and LLVM bitcode
Lua = RepliBuild.use("lua_wrapper") 

RepliBuild features aggressive project-level content hashing. Successive calls to RepliBuild.use() load the cached payload instantly, completely bypassing the compilation tax and dropping Time-To-First-Plot (TTFP) to near zero.
​Current Status
​The architecture is heavily modularized into independent C and C++ generators:
​The C Pipeline is highly stable, defaults to LTO, and automatically falls back to Julia’s internal LLVM (Clang_unified_jll) to guarantee strict llvmcall compatibility.
​The C++ Pipeline natively handles deep layout constraints like bitfields, unions, and template instantiation. (Note: Advanced C++ bridging currently requires a local LLVM 21 toolchain).
​You can find the repository, documentation in the julia registry or the public repo.
​Feedback, edge-case testing, and issue reports are highly welcome! This isnt a half baked toy. I worked on this for a year now with prior experience with the mlir jit. Have fun.

9 Likes

Also started working on the rustc generator will commit the first skeleton of the generator, rust dwarf is alot cleaner and I think RepliBuild.jl and julia will eventually handle rust better overall.

EDITED: The rustc intergration went really smooth, it will be awhile before any wrapping of native rust without the export shims like RepliBuild.jl can do for c and cpp, the rust compiler is very different. The v2.5.0 and up will be just stabalizing cpp features and hardening the rust intergration and anything the julia community can poke holes in that I can patch right away.

3 Likes

CCing @grasph and @peremato here, who have also done work regard to auto-generation of C++ bindings.

Wanna colaborate, you could send some details of what your working with, if its dwarf then I had to write a custom dwarf parser because julia just doesnt have one… Im intrested, do you compile into a dialect or merge ir anywhere im your pipeline. If i can help just let me know.

Hello @obsidianjulua,

Very interesting work. The work Oliver is mentioning is WrapIt!. It has been used to wrap large C++ framerwork, like CERN ROOT (GitHub - grasph/wrapit: Automatization of C++--Julia wrapper generation · GitHub), Geant4 (GitHub - JuliaHEP/Geant4.jl: Julia bindings to the Geant4 simulation toolkit · GitHub), and Pythia8 (GitHub - JuliaHEP/PYTHIA8.jl: Pythia8 Julia interface · GitHub).

The Julia/C++ interface is performed using CxxWrap. The WrapIt! tool parses header files using LLVM/Clang to generate the wrapper code for CxxWrap.

I see that with your tool, we just provide a C++ project directory. How does the tool know how the C++ code is built and which products must be bound to Julia ?

Philippe.

To answer your questions: RepliBuild does not try to guess or reverse-engineer an existing build system like CMake. Instead, it acts as the build orchestrator and metadata extractor simultaneously, driven entirely by a declarative replibuild.toml file.
​Here is how it handles those two steps:
​1. How does it know how the C++ code is built?
The user defines the build explicitly in the replibuild.toml. You point RepliBuild at local source files or a remote Git repository, and supply the necessary include paths, compiler flags, and definitions.
​RepliBuild essentially acts as a localized compiler driver. It invokes clang (or rustc for Rust projects) on those sources directly. We bypass the host’s CMake/Make systems to guarantee we have full control over the compilation flags—specifically, ensuring that rich DWARF debug metadata is generated during the build.
​2. How does it know which products must be bound to Julia?
This is where RepliBuild fundamentally diverges from the header-parsing approach.
​Instead of parsing the C++ text headers via Clang ASTs to generate a CxxWrap C++ shim, RepliBuild analyzes the DWARF debug metadata of the compiled objects. The DWARF tree contains the absolute, ground-truth structural layout of the code exactly as the compiler synthesized it: including vtable offsets, struct padding, bitfields, and template instantiations.
​RepliBuild reads this structural data and automatically generates pure-Julia ccall or llvmcall wrappers for the public ABI. If something is present in the DWARF metadata (and not explicitly filtered out in the TOML configuration), it gets mapped into the Julia TypeRegistry and bound automatically.
​Because we have this deep structural data, we don’t need an intermediate C++ wrapping layer—the generated Julia code can marshal the memory layout perfectly and execute cross-language LTO directly through Julia’s JIT.
​I would love to hear your thoughts on this DWARF-first approach compared to WrapIt!'s header parsing! Also this just lets you force another build system to enforce debug flags and RepliBuild just wrap the binary, things are different in the IR, if your ever going to use pure julia then all the data has to be exposed to julia jit to line all the code and wrase the ffi boundry, if your relying on just header parsing then youll never wrap from julia and always need to write code in another language which is just architecturally wrong.

This does create executables, shared libs, create projects, even register wrappers but I promise those are just supplimentary features to secure the source. This is not a build system.

Looking at this closer there really isnt a comparrison your architecture requires writing foreign source and mine requires writting no source, this allows Enzyme to target other languages, this gives julia which is a world class jit, there isnt a better jit period, the tools to act like a jit.

​By extracting the DWARF metadata to understand the structure, and then feeding the raw .bc bitcode directly into Base.llvmcall, you aren’t just calling foreign code; you are mathematically fusing the foreign code into Julia’s LLVM context. You erase the FFI boundary entirely.

This isnt FFI Im calling it FFE or foreign function execution.

RepliBuild Hub

Community-maintained registry of replibuild.toml configs for popular C/C++ libraries. Search, fetch, and build wrappers directly from Julia — no manual setup required.

Repository: github.com/obsidianjulua/RepliBuild-Hub

Usage

using RepliBuild

# Search available packages
RepliBuild.search("lua")

# Install and use — one call does everything
Lua = RepliBuild.use("lua")
Lua.luaL_newstate()

use() checks your local registry first. On a miss, it fetches the TOML from the hub, registers it locally, then runs the full pipeline: dependency resolution → compile → link → DWARF introspect → wrap → load.

Subsequent calls are cached — rebuild only happens when the TOML or source content changes.

Hub Repository Structure

RepliBuild-Hub/
  index.toml                    # package listing (used by search())
  packages/
    lua/
      replibuild.toml
    sqlite/
      replibuild.toml
    cjson/
      replibuild.toml
1 Like

New: DAG Diff — Structural Mismatch Detection Between C++ and Julia IR

Added a DAG-based structural diff algorithm that compares C++ layouts (DWARF ground truth) against Julia’s inferred alignment rules. This extends the existing per-function heuristics in DispatchLogic.jl — heuristics catch the obvious cases (packed returns, unions, STL), while DAGDiff catches what point-wise checks miss: transitive layout drift through by-value containment chains.

Algorithm:

  1. Build C++ graph from DWARF metadata (struct sizes, member offsets, containment edges)
  2. Build Julia graph by computing min(sizeof(field), 8) aligned layouts from the same members
  3. Parallel walk — match nodes structurally, record size and per-member offset mismatches
  4. Propagate mismatches transitively through by-value containment (if Inner is packed and Outer contains Inner by value, Outer is also mismatched)
  5. Flag functions that pass or return mismatched types by value
  6. Topo-sort (Kahn’s algorithm) all thunk sites for safe lowering order — types before the functions that depend on them

Integration:

  • DAGDiff.needs_dag_thunk(symbol, result) queries the mismatch map — wrapper generators check this alongside existing heuristics, routing to MLIR thunks if either fires
  • Backward compatible: needs_dag_thunk(_, nothing) returns false when DAG diff is not computed
  • Wired into both C and C++ generator dispatch sites in GeneratorC.jl and GeneratorCpp.jl

Visualization:

  • export_dot(result, path) — Graphviz DOT export with mismatch color-coding (red = layout mismatch, orange = function needs thunk, gray = safe)
  • render_dot(result, path) — renders DOT to SVG/PNG/PDF via the dot command
  • Per-member offset annotations, containment edges, propagation edge coloring
  • Three view modes: :diff (both graphs overlaid), :cpp (DWARF only), :julia (inferred alignment only)

TOML configuration:

[wrap]
dag = true   # exports DAG graphs to <project_root>/dag/

When enabled, the wrap stage automatically exports diff.svg, cpp.svg, julia.svg, and diff.dot to a dag/ folder in the project root.

Files:

  • src/IRGen/DAGDiff.jl — New module (~780 lines): graph types, builders, diff algorithm, topo-sort, query API, DOT visualization
  • src/Builder/ConfigurationManager.jl — Added dag::Bool to WrapConfig
  • src/Wrapper/Generator.jl — DAG diff computed before wrapper generation; graphs exported when dag=true
  • src/Wrapper/C/GeneratorC.jl, src/Wrapper/Cpp/GeneratorCpp.jl — Dispatch sites augmented with needs_dag_thunk check
  • test/dag_test/ — 178 tests covering graph building, structural diff, transitive propagation, topo-sort, query API, DOT export, and a rendered gallery of 7 scenarios

Stress test results (73 functions, test/stress_test/):

  • 25 mismatches detected: 14 types (vtable offsets on polymorphic classes, compound struct padding, bool alignment, STL internals), 5 functions routed to thunks (compute_lu, compute_qr, compute_eigen, solve_ode_rk4, solve_ode_adaptive)

  • Transitive propagation working: uniform_real_distribution<double> flagged solely because it contains param_type by value.


New operations

jlcs.marshal_arg — Julia-aligned struct → C-packed value

Defined in: src/mlir/JLCSOps.td

Lowering: src/mlir/impl/JLCSPasses.cpp (MarshalArgOpLowering)

Reads a Julia-aligned struct through a pointer and reassembles its fields into a C-packed LLVM struct value ready to pass to an external function.

Before v2.5.6, FunctionGen.jl emitted this as an inline sequence of arith.constant / llvm.getelementptr / llvm.load / llvm.insertvalue operations directly in the thunk IR. That sequence was correct but verbose, hard to pattern-match, and scattered the layout-mismatch logic across generated IR text rather than in a verifiable dialect op.

jlcs.marshal_arg lifts the whole pattern into a single named operation that the MLIR verifier can check, the pass pipeline can recognise, and the DOT visualiser can annotate.

MLIR syntax:


%packed = jlcs.marshal_arg %ptr

{ memberTypes = [i32, f64], juliaOffsets = [0 : i64, 8 : i64] }

: (!llvm.ptr) -> !llvm.struct<packed (i32, f64)>

Lowering (MarshalArgOpLowering):

  1. Emit llvm.mlir.undef for the packed result type.

  2. For each member i:

  • Create an arith.constant for juliaOffsets[i].

  • llvm.getelementptr on srcPtr using i8 element type (byte-addressed).

  • llvm.load with alignment = 1 — unaligned because Julia’s padding may differ from C’s.

  • llvm.insertvalue at position [i] into the accumulator struct.

  1. Replace the op with the final struct value.

Effect on generated thunk IR: A 6-line-per-member inline block collapses to one line:

before (v2.5.5, 2 members)


%s_undef_1 = llvm.mlir.undef : !llvm.struct<packed (i32, f64)>

%off_1_1 = arith.constant 0 : i64

%field_ptr_raw_1_1 = llvm.getelementptr %val_ptr_1[%off_1_1] : (!llvm.ptr, i64) -> !llvm.ptr, i8

%field_val_1_1 = llvm.load %field_ptr_raw_1_1 {alignment = 1 : i64} : !llvm.ptr -> i32

%s_packed_1_1 = llvm.insertvalue %field_val_1_1, %s_undef_1[0] : !llvm.struct<packed (i32, f64)>

%off_1_2 = arith.constant 8 : i64

%field_ptr_raw_1_2 = llvm.getelementptr %val_ptr_1[%off_1_2] : (!llvm.ptr, i64) -> !llvm.ptr, i8

%field_val_1_2 = llvm.load %field_ptr_raw_1_2 {alignment = 1 : i64} : !llvm.ptr -> f64

%s_packed_1_2 = llvm.insertvalue %field_val_1_2, %s_packed_1_1[1] : !llvm.struct<packed (i32, f64)>

after (v2.5.6)

%packed_1 = jlcs.marshal_arg %val_ptr_1

{ memberTypes = [i32, f64], juliaOffsets = [0 : i64, 8 : i64] }

: (!llvm.ptr) -> !llvm.struct<packed (i32, f64)>
1 Like