Bring Julia code to embedded hardware (ARM)


#21

I’ve been making some progress there too, eg. there’s now a compiled run-time library as a counterpart to some of the C functions: https://github.com/JuliaGPU/CUDAnative.jl/blob/95fbf9356eaa6c3da3c3321ff35c3ffa5d41f77a/src/device/runtime_intrinsics.jl
Currently supports boxing, allocations, and some exception handling. More to come, if and as soon as I have time to work on that.

Basically, reconfiguring the existing compiler to emit GPU-compatible LLVM IR (mainly through regular dispatch, but Cassette would be great for this once it generates some better code) in combination with a custom back-end and run-time to compile and handle that IR.

Wrt. the lack of documentation on the CUDAnative internals, I’m considering creating a package that isolates and demonstrates the approach and submit that for a talk on the next JuliaCon.


#22

Having an isolated package for customised generation of LLVM IR would we great! What do you mean by:

BTW, I guess that LLVM IR which is GPU-compatible can also be compiled for most embedded architectures…? Or do you add stuff to the IR which can only be handled by GPU’s?


#23

Currently, some code processed by Cassette results in extra stuff that doesn’t compile well to fast code. It’s getting closer, though: see here.