To be extremely clear, there’s nothing novel about this. This approach had been demonstrated in the paper Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation, by @wsmoses @vchuravy and their collaborators, and has been discussed a few times already here on Discourse. As mentioned in Is there an equivalent to cross-language link time optimization via LLVM? - #18 by vchuravy (you even intervened in that thread), the main challenge is an infrastructural one: you need to make sure you use compatible versions of LLVM to compile every piece of code, to be able to merge all bitcodes.