(reposting from a Julia+ARM thread because this is not really ARM specific)
I am testing the feasibility of using Julia for an embedded Linux platform. Our current target hardware has a Cortex-A7 based SOC which can run the official 32-bit-ARM binary release of Julia 1.6.1.
Other than the large storage requirements, the biggest problem we have found so far is the very long startup time for even relatively simple scripts. Initially this was largely caused by precompilation times for some standard library functions and operators. For example, the precompilation of an “inverse divide” operator was taking ~14 seconds . For comparison, this same statement took ~5 seconds to precompile on a Raspberry Pi 4 (Cortex-A72), and slightly under 1 second on a x86_64 laptop.
We can remove a lot of the precompilation time delay by using PackageCompiler.create_sysimage
. But the run time overhead remains uncomfortably high, in the order of 10 seconds for a test script with less than 20 lines, or half of that time if the filesystem data is cached in memory (which we can not assume).
Any suggestions on how to further speed up the startup/load time of Julia scripts/modules/libraries? Or any tricks for getting more out of PackageCompiler
?
We would be particularly interested in ways to remove unused code from the binaries, which might help with load times, and would reduce the storage requirements. I was hoping that PackageCompiler.create_app
might help with that, but it does not seem to improve the startup time. It does help a bit in reducing storage requirements, though.
The filter_stdlibs
argument in create_sysimage
and create_app
sounds promising, but I have not figured out how to use it; create_sysimage
fails with compilation errors when I try it, and the documentation talks about “potential pitfalls” without further details. Any hints or pointers on using filter_stdlibs
?