I am currently testing the feasibility of option 5 in the OP: running Julia on the embedded hardware and using AOT compilation. Our current target platform has a Cortex-A7 based SOC which can run the official 32-bit-ARM binary release of Julia 1.6.1.
Other than the large storage requirements, the biggest problem we have found so far is the very long startup time for even relatively simple scripts. Initially this was largely caused by precompilation times for some standard library functions and operators. For example, the precompilation of an “inverse divide” operator was taking ~14 seconds . For comparison, this same statement took ~5 seconds to precompile on a Raspberry Pi 4 (Cortex-A72), and slightly under 1 second on a x86_64 laptop.
We can remove a lot of the precompilation time delay by using
PackageCompiler.create_sysimage. But the run time overhead remains uncomfortably high, in the order of 10 seconds for a test script with less than 20 lines, or half of that time if the filesystem data is cached in memory (which we can not assume).
Any suggestions on how to further speed up the startup/load time of Julia scripts/modules/libraries? Or any tricks for getting more out of
We would be particularly interested in ways to remove unused code from the binaries, which might help with load times, and would reduce the storage requirements. I was hoping that
PackageCompiler.create_app might help with that, but it does not seem to improve the startup time. It does help a bit in reducing storage requirements, though.
filter_stdlibs argument in
create_app sounds promising, but I have not figured out how to use it;
create_sysimage fails with compilation errors when I try it, and the documentation talks about “potential pitfalls” without further details. Any hints or pointers on using