Basically, reconfiguring the existing compiler to emit GPU-compatible LLVM IR (mainly through regular dispatch, but Cassette would be great for this once it generates some better code) in combination with a custom back-end and run-time to compile and handle that IR.
Wrt. the lack of documentation on the CUDAnative internals, I’m considering creating a package that isolates and demonstrates the approach and submit that for a talk on the next JuliaCon.
Having an isolated package for customised generation of LLVM IR would we great! What do you mean by:
BTW, I guess that LLVM IR which is GPU-compatible can also be compiled for most embedded architectures…? Or do you add stuff to the IR which can only be handled by GPU’s?
Can’t we install Julia on a Linux distribution (e.g. Linaro) on ARM processor, and then run the code?
That’s how a lot of embedded programming is done. This is the straightforward way.
The other way is to use an approach similar to MATLAB embedded coder does and converts code to C/C++, which is essentially similar to this:
The straightforward way I already mentioned in my original post:
As @Tamas_Papp correctly stated, I also detailed in the OP when this might be an option, and when it’s not:
And, of course, if there even isn’t any Linux OS around, this route is not option. What’s the status, activity, roadmap of llvm-cbe? Could this be a viable option for bringing Julia to embedded devices?
I feel like this could be a an expedient way to go:
Just replace GPU by anything else;
reconfiguring the existing compiler to emit [ANYTARGET]-compatible LLVM IR
which should become easier by said clean, isolated “demo package”. @maleadt, do you plan to keep non-GPU-specific parts of this package separated to allow others to reuse them to reconfigure the compiler for their specific targets?
I am not sure if you have noticed that the latest Julia 1.2 has Linux Binaries for 32bit ARM.
I was interested to try to build and try some simple programs using Julia, and make some examples for others to use as templates (I haven’t seen any really).
I have an ARM Cortex A9 with Linux installed on it, and also an ARM Cortex M3 with FreeRTOS (installing Linux is also possible).
I want to experiment with
building executable
making lib/dll libraries.
Also as you know, a lot of the drivers and libraries are written in C, so we also can try embedding Julia in an already written C code (like writing the main computational function in Julia).
building on desktop (Windows, etc) for ARM.
I was wondering if anyone is interested to make a repository and try some examples for embedded programming. Let me know if you want to be involved.
I’m working on something that could help target embedded hardware (while not the main goal). While I could support all CPUs, is it a limitation to only support all 32-bit ones?
I’m interested in anything involving standalone compilation and cross compilation. My main immediate interest is WebAssembly (that’s a cross-compiled 32-bit platform, so there are lots of similarities).
I had actually already created a Julia embedded GitHub group a while ago here: https://github.com/juliaembedded. I was starting with just simulating embedded arithmetic, and was going to be building up a type system to model fixed-point and arbitrary floating-point arithmetic with rounding mode selection (including stochastic rounding). The goal is to allow an algorithm written in Julia to be simulated using the reduced precision that would be in embedded devices so designers could see the effect on performance.
The next project was going to be building an embedded compiler for Julia, that will take a Julia function and transform it to runtime-independent code that can be fed into the LLVM backend for the desired target (then the LLVM to C backend could be used to generate direct C code if desired). My main goal was to create static libraries that would contain the compiled Julia functions, and then those could be linked into existing C applications.
That does have its difficulties though, since you have to extract the memory allocations from the compiler, and ensure that they are either done with the provided embedded functions or all allocatons become static (which is actually the preferred way of handling memory in critical embedded systems). I haven’t had a chance to dig around in the generated code yet, but I think this should be doable in a similar way to how the GPU group accomplished this.
Are you aware of Juliaberry (Julia on Raspberry Pi)?
I played with it a little bit. I could get very fast gpio bit banging speed (about 100 ns per ccall to libpigpio, so it could generate a 5 MHz clock!) but with occasional interruptions by the OS I presume.
I was thinking of doing some embedded systems prototyping in which some control algorithm would be tested and results quickly plotted without having to move data to some other system, but I had trouble with compilation times being really excessive or not finishing at all because of insufficient memory.
I’m not actively messing with it now. If I tried again I would probably divide the problem - let the raspberry pi do the pseudo-real-time stuff in Julia, and let it serve data over Ethernet to a PC doing plotting and analysis, also in Julia.
By runtime-independent you mean architecture-independent? So removing all arch-specific stuff from the representation? This is exactly the idea behind the approach summarised in this post. Just with the difference that there, the idea was to target some specific platform(s), instead of fully platform-agnostic code (which could then not utilise any of the platform’s features such as SIMD (?).
BTW, like the logo (JuliaEmbedded · GitHub) and the idea of validating fixed-point versions of Julia code/functions.
That is definitely useful. I think the goal should be to do the simulations and build the program on desktop and only run it on the target processor. A lot of processors don’t have enough space, Ram, etc for running Julia and its compiler.
For using drivers for ARM in Julia, I think we can start by writing Julia wrappers for ARM CMSIS Driver which is part of ARM CMSIS, and is supported by many hardware.
It has API for many peripherals as stated in the documentation:
The CMSIS-Driver specification is a software API that describes peripheral driver interfaces for middleware stacks and user applications. The CMSIS-Driver API is designed to be generic and independent of a specific RTOS making it reusable across a wide range of supported microcontroller devices.
The following CMSIS-Driver API groups are defined:
Hello. I’m new to Julia, but not new to the embedded world. I have worked with many AVR and ARM devices, most of which are too small to run Linux, as most of them had limited RAM and no memory management.
I’m just wondering whether this discussion is still going on, since it appeared to halt in 2019. Are there any new developments from any of the folks who have mentioned potential directions for getting Julia onto small embedded platforms?
Second the request for more info. Also, I didn’t notice in the discussion talk of verification and validation. Do we need to now target a specific standard. Would you trust Julia to fly a plane or drive a car? Would it need to go into C so it could be checked by some of the verification tools? Where does this discussion live?