Bring Julia code to embedded hardware (ARM)


I’ve been making some progress there too, eg. there’s now a compiled run-time library as a counterpart to some of the C functions:
Currently supports boxing, allocations, and some exception handling. More to come, if and as soon as I have time to work on that.

Basically, reconfiguring the existing compiler to emit GPU-compatible LLVM IR (mainly through regular dispatch, but Cassette would be great for this once it generates some better code) in combination with a custom back-end and run-time to compile and handle that IR.

Wrt. the lack of documentation on the CUDAnative internals, I’m considering creating a package that isolates and demonstrates the approach and submit that for a talk on the next JuliaCon.


Having an isolated package for customised generation of LLVM IR would we great! What do you mean by:

BTW, I guess that LLVM IR which is GPU-compatible can also be compiled for most embedded architectures…? Or do you add stuff to the IR which can only be handled by GPU’s?


Currently, some code processed by Cassette results in extra stuff that doesn’t compile well to fast code. It’s getting closer, though: see here.