Bring Julia code to embedded hardware (ARM)

Are there any Universities working on cross compiling Julia into FPGA SOCs with ARM processors and possibly even VHDL? I’m primarily interested in United States based universities, but I would be interested to know if there is ongoing research. Thought this discourse might be a good place to look.

I am currently testing the feasibility of option 5 in the OP: running Julia on the embedded hardware and using AOT compilation. Our current target platform has a Cortex-A7 based SOC which can run the official 32-bit-ARM binary release of Julia 1.6.1.

Other than the large storage requirements, the biggest problem we have found so far is the very long startup time for even relatively simple scripts. Initially this was largely caused by precompilation times for some standard library functions and operators. For example, the precompilation of an “inverse divide” operator was taking ~14 seconds :zzz: . For comparison, this same statement took ~5 seconds to precompile on a Raspberry Pi 4 (Cortex-A72), and slightly under 1 second on a x86_64 laptop.

We can remove a lot of the precompilation time delay by using PackageCompiler.create_sysimage. But the run time overhead remains uncomfortably high, in the order of 10 seconds for a test script with less than 20 lines, or half of that time if the filesystem data is cached in memory (which we can not assume).

Any suggestions on how to further speed up the startup/load time of Julia scripts/modules/libraries? Or any tricks for getting more out of PackageCompiler?

We would be particularly interested in ways to remove unused code from the binaries, which might help with load times, and would reduce the storage requirements. I was hoping that PackageCompiler.create_app might help with that, but it does not seem to improve the startup time. It does help a bit in reducing storage requirements, though.

The filter_stdlibs argument in create_sysimage and create_app sounds promising, but I have not figured out how to use it; create_sysimage fails with compilation errors when I try it, and the documentation talks about “potential pitfalls” without further details. Any hints or pointers on using filter_stdlibs?

@enriquer I think the people best suited to answer are the PackageCompiler people, who might not be following this ARM thread. So I might suggest reposting that in a separate thread to get the right audience.

To make sure: You compiled everything that is called in the script into the sysimage? That is, the remaining overhead is pure “Julia startup / initialisation”…? So the question would be whether (and if yes, how) this can be eliminated. An embedded system is usually expected (or even required) to start up quickly, so 10s is quite a lot of time (that would be added to the startup time of the system, incl. linux etc., itself).

1 Like