Is there an equivalent to cross-language link time optimization via LLVM?

I’m reading about this between Rust and C/Fortran/etc, and here’s a simple example of constant folding from inlining an integer range summation from Rust into a C program with literal integer inputs. Obviously we don’t routinely deal with linking, but I’m wondering if there’s something between a ccall and llvmcall where a C library function is called and its LLVM IR is optimized together with the Julia code.

Discussion about other contrast and parallels (thin vs fat LTO versus the whole-program optimization in a Julia process) and when cross-language optimization is beneficial (constant folding is rare, inlining often doesn’t help) is welcome, lots to learn.

1 Like

Sounds like a nice idea for a package.

IIRC enzyme has some version of this. But a package can probably do this without the whole enzyme autodiff.

1 Like

Say I have compiled a dynamic library with juliac. Can I treat it as a static library for whole-program optimization when calling it from C++?

Dynamic libraries are not static libraries. Nothing to do with juliac.

Furthermore, for LTO the compiler needs more data (LLVM IR, I guess) than just the object code.

Is it that simple? Doesn’t this require a lot of internals support, to properly cooperate with the existing system?

I think that this would be a huge feature. Compile your shared library with clang -fembed-bitcode=all and have a ccall variant that permits cross-language inlining. Or maybe a flag for Libdl.dlopen, to look for bitcode sections in the library.

For starters, we could compile most of the julia runtime with that, and have most cheap runtime calls inlined.

For seconds, this would make a lot of intrinsics programming much nicer – basically because immintrin.h is well documented, while llvm processor intrinsics are an underdocumented mess. So this would enable us to just write the kernel in C (ever tried to use the juicy aesenc instructions from julia?)

1 Like

This was an example of attempting to autodiff in Julia with Enzyme code generated by Numba in Python: Wrong gradient of a `cfunc` decorated Numba function · Issue #2505 · EnzymeAD/Enzyme.jl · GitHub. But in general, once you have llvm byte code, the front-end language it was originally written in doesn’t matter anymore and you can do all the optimisations you want.

1 Like

Never did autodifferentiation let alone Enzyme, so noob question: is that issue implying that Enzyme works or is intended to work on LLVM IR across ccall boundaries? What went wrong there exactly, couldn’t tell how the working and failed functions differed after that Numba name-unmangling.

Yes, Enzyme works at the LLVM level, which means it couldn’t care less about the frontend language: as long as it receives LLVM IR/bitcode, it can do anything. And Enzyme is “just” an optimisation pass, so the same applies to other passes.

If you want a more positive example:

1 Like

It has been suggested a few times in the past that BinaryBuilder enables -fembed-bitcode=all by default, but that a significant infrastructure change that we never did it in practice, but in theory it should be doable.

1 Like

It’s possible to do this manually in both directions, I have done a fair amount of experiments but it involved a lot of manual interaction with Clang and LLD. The biggest problem I ran into was caching and cache invalidation (of Julia code in the Julia → C direction which I investigated the most). That might be better now with things like CompilerCaching.jl Various other problems included mapping between architecture tuples (there’s some weird permutations of macOS/darwin when doing lto) and managing the bitcode in memory/on disk.

ClangCompiler.jl is/was an attempt at getting some of this working less manually but I’m not sure the status of that project now.

BB2 with the local compiler shards would also be a massive help here, managing the toolchain was a huge headache.

1 Like