Bring Julia code to embedded hardware (ARM)

Allan_Baker · May 2, 2021, 7:36pm

Are there any Universities working on cross compiling Julia into FPGA SOCs with ARM processors and possibly even VHDL? I’m primarily interested in United States based universities, but I would be interested to know if there is ongoing research. Thought this discourse might be a good place to look.

enriquer · May 12, 2021, 3:54pm

I am currently testing the feasibility of option 5 in the OP: running Julia on the embedded hardware and using AOT compilation. Our current target platform has a Cortex-A7 based SOC which can run the official 32-bit-ARM binary release of Julia 1.6.1.

Other than the large storage requirements, the biggest problem we have found so far is the very long startup time for even relatively simple scripts. Initially this was largely caused by precompilation times for some standard library functions and operators. For example, the precompilation of an “inverse divide” operator was taking ~14 seconds . For comparison, this same statement took ~5 seconds to precompile on a Raspberry Pi 4 (Cortex-A72), and slightly under 1 second on a x86_64 laptop.

We can remove a lot of the precompilation time delay by using PackageCompiler.create_sysimage. But the run time overhead remains uncomfortably high, in the order of 10 seconds for a test script with less than 20 lines, or half of that time if the filesystem data is cached in memory (which we can not assume).

Any suggestions on how to further speed up the startup/load time of Julia scripts/modules/libraries? Or any tricks for getting more out of PackageCompiler?

We would be particularly interested in ways to remove unused code from the binaries, which might help with load times, and would reduce the storage requirements. I was hoping that PackageCompiler.create_app might help with that, but it does not seem to improve the startup time. It does help a bit in reducing storage requirements, though.

The filter_stdlibs argument in create_sysimage and create_app sounds promising, but I have not figured out how to use it; create_sysimage fails with compilation errors when I try it, and the documentation talks about “potential pitfalls” without further details. Any hints or pointers on using filter_stdlibs?

jzr · May 12, 2021, 8:00pm

@enriquer I think the people best suited to answer are the PackageCompiler people, who might not be following this ARM thread. So I might suggest reposting that in a separate thread to get the right audience.

asprionj · May 13, 2021, 7:37pm

To make sure: You compiled everything that is called in the script into the sysimage? That is, the remaining overhead is pure “Julia startup / initialisation”…? So the question would be whether (and if yes, how) this can be eliminated. An embedded system is usually expected (or even required) to start up quickly, so 10s is quite a lot of time (that would be added to the startup time of the system, incl. linux etc., itself).

enriquer · May 14, 2021, 2:12pm

Thanks, that is a good point. I have reposted to a new “Performance” topic:

https://discourse.julialang.org/t/looking-for-advice-on-achieving-faster-startup-times

Please, follow this link if you want to continue discussion my previous post from two days ago.

enriquer · May 14, 2021, 2:47pm

I just replied to this in the new thread.

0xD3ADBEEF · June 16, 2021, 10:45pm

The problem is that PackageCompiler is not really a compiler: it seems like it just dumps the code that is JAOT compiled in memory to a shared object, which means any core Julia runtime functions/variables that get executed/declared along the way. Hence why it takes 5-10 min to compile even simple Julia scripts, and the final system image is >100MB (which includes debugging symbols, and that’s without the extra artifacts and libraries that get generated/copied). This in my opinion is absolutely unacceptable for embedded. (that, and the garbage collector)

In my honest opinion, I think it is a complete waste of time trying to get Julia to work on embedded, and the points discussed by Karpinski are moot at best in this domain (I already discuss the “safe” part in another topic: Julia for real time, worried about the garbage collector - #41 by 0xD3ADBEEF). Yes, it has a nice philosophy, but it needs a lot of infrastructure to carry it in the embedded space (a proper AOT compiler toolchain, drivers, HAL, etc…)

I would rather focus my efforts on making a good code generation package.

Gnimuc · March 10, 2022, 6:43am

Hi @maleadt, could you share a link to the JuliaCon talk?

maleadt · March 10, 2022, 7:03am

That didn’t happen, but GPUCompiler.jl is basically all that code isolated into a single package, easier to understand. It can also be used for non-GPU purposes, as StaticCompiler.jl demonstrates.

Allan_Baker · April 24, 2022, 11:23am

Getting Julia code to be targetable to fpga programable logic and FPGA arm cores would be huge. I don’t think i would care about Julia’s flexibility and GC, I just want the algos to be transcribed into fixed point and FPGA IP on the programable logic side and arm core side. I thought Xilinx open sourced their llvm front ends to better allow this sort of thing. Any thoughts lately on how feasible this is?

AMJ · April 24, 2022, 7:07pm

Some work was being done on getting Julia to work on FPGAs (although without llvm) here is the paper: [2201.11522] High-level Synthesis using the Julia Language

Palli · March 1, 2023, 4:41pm

Actually, like Python, MicroPython is compiled:

MicroPython consists of a Python compiler to bytecode and a runtime interpreter of that bytecode.

Yes, it’s compiled to bytecode, and I’m not sure but, it seems to be its own (or variant of CPython’s?), since it has a .mpy file ending. I’m just stating it, since it’s not for sure it’s as slow as CPython’s (though likely). Plus you have full-speed capability (e.g. inline assembly, unlike in regular Python, at least by default; sometimes, e.g. for interrupt handling you really don’t want to be slow or use an interpreter, might it though be fast enough for a bytecode?).

Since it’s fast enough for Python (even MicroPython), I think Julia should consider some bytecode… It doesn’t need to be in official Julia, only some MicroJulia…

I see:

While Numba’s main use case is Just-in-Time compilation, it also provides a facility for Ahead-of-Time compilation (AOT).

I’m not sure if Numba works for MicroPython too, I didn’t want to spend too much time googling, but it’s at least conceivable, that it or some other similar tech does. I though find it likely that Numba only supports a few not-often used for microcontroller architectures, e.g. x86, maybe also ARM, then maybe not ARM Thumb you would likely need to rather target.

cirobr · May 6, 2023, 5:48pm

Cheers,

I’ve been using a no-gpu Linux AArch64 instance with plenty of storage and RAM quite successfully for almost a year. My attempts to porting Julia on a much smaller Linux AArch64 raspberry pi 3 reached to the following point:

Basic Julia installations from both snap and jill worked out well. No difference noticed at all.
Code execution of basic Julia functions also worked out smoothly, given that they are all precompiled.
Adding a library to basic function showed to be a problem: at first “using” call, the raspberry pi could not make it, possibly due to lack of resources during compilation.
An attempt to “rsync mirror” the entire “.julia” folder with all precompiled stuff from desktop instance to the rasperry pi also did not work: although “Pkg.status()” command says everything is there, at first “using” call the system attempts to precompile everything again, which ends up in crashing.

Hope that someone is able to advise on how to proceed from this point?

Thanks in advance.

ufechner7 · May 6, 2023, 9:49pm

Did you try to create a system image with your packages on the stronger machine and use it on the smaller machine? Did you enable zram?

notinaboat · May 7, 2023, 12:30am

My hardware is like this:

Deployment target: R’Pi 3A+, cortex-a53, 512MB RAM
ARM dev environment: R’Pi 4, cortex-a72, 8GB RAM
Intel dev environment: Mac, 8-core i9, 32GM RAM

All the machines share an NFS-mounted filesystem (rsync should work fine too). The shared filesystem contains a shared Julia Depot directory (configured by the JULIA_DEPOT_PATH=/shared/jl_depot environment variable).

On the R’Pi 4 I use -C cortex-a53 to ensure that code generated there will run on the 3A+.

I do package management and initial development and debugging on intel. On the arm machines set JULIA_PKG_OFFLINE=true to stop the package manager from attempting slow registry updates etc.

When I’m ready to try something on the R’Pi 3A+ target, I first precompile on the R’Pi 4 8GB machine. Doing using MyPackage on that machine results in the precompile .ji files being generated in the shared Julia Depot (e.g. jl_depot/compiled/v1.x/MyPackage/XXXX_XXXX.ji). After that using MyPackage on the 3A+ loads the package without pre-compilation delay.

When I am ready for production release, I run PackageCompiler create_sysimage on the R’Pi 4 to build the final .sysimage that gets deployed on the 3A+ (using the -J option). (It is important to run the system through all execution paths with --trace-compile before using create_sysimage. Without this step Julia will attempt to do very slow JIT compilation on small ARM machine at runtime.)

Important:

The system clocks must be synchronised. If the timestamp of MyPackage.jl is newer than the .ji file it will not load.
The full path of the MyPackage source code must be identical on both ARM machines. When Julia loads .ji files it checks that the full path to MyPackage/src/MyPackage.jl matches.
If you have /home/sam/git/MyPackage on one machine and /home/pi/git/MyPackage on the other it won’t work.

See also:
https://github.com/JuliaLang/julia/issues/45215
https://github.com/JuliaLang/julia/issues/47943

cirobr · May 8, 2023, 6:00pm

@notinaboat , @ufechner7, thank you for the insights.

Root cause of failure is tied to usage of different directory trees at each machine: more precisely, I was using a different user name at each of them. After changing all user names to the same, and mirroring the “.julia/” folder with rsync, the number of libraries that still needed compilation on raspberry pi was reduced in about 80%.

My attempt of using a sysimage has succeeded only after harmonizing the directory trees as well.

In summary, it indeed seems the compilation process Julia adopts is based on absolute paths. As such, for this kind of use case, user names must be identical.

Regards,

ufechner7 · May 8, 2023, 7:56pm

Which libraries still needed recompilation? And any idea why?

cirobr · May 9, 2023, 5:39pm

No library recompiled, actually. Previous run included some old files not properly wiped out.

cirobr · June 7, 2023, 1:08am

Cheers everyone, hope the team is able to help again. I’m trying to PackageCompile a code that uses Flux on embedded ARM/no GPU hardware. Although the compilation succeeds, execution fails. Details as follows, from the very beginning. Code is being compiled with the create_app() function.

First attempt, to make sure everything works is to run the below code, with no Flux call, on both x86/NVidia GPU, and AArch64/no GPU. That has been accomplished. Code as follows:

module app


function julia_main()::Cint
    println("Hello World!")

    return 0
end # function julia_main


end # module

Next step was to add Flux to an empty environment, then PackageCompile the modified code from below, which has an extra “using Flux” instruction:

module app


using Flux

function julia_main()::Cint
    println("Hello World!")

    return 0
end # function julia_main


end # module

The file Project.toml looks like this:

name = "app"
uuid = "a8d786ff-56c3-4553-869b-94c6c1f0e32d"
authors = ["cirobr"]
version = "0.1.0"

[deps]
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"

And part of the Manifest.toml file has a number of dependencies in CUDA, despite of the fact it is not declared on Project.toml:

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "CompilerSupportLibraries_jll", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Preferences", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions", "UnsafeAtomicsLLVM"]
git-tree-sha1 = "442d989978ed3ff4e174c928ee879dc09d1ef693"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "4.3.2"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "498f45593f6ddc0adff64a9310bb6710e851781b"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.5.0+1"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "bcc4a23cbbd99c8535a5318455dcf0f2546ec536"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.2.2"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "5248d9c45712e51e27ba9b30eebec65658c6ce29"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.6.0+0"

[[deps.CUDNN_jll]]
deps = ["Artifacts", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "2918fbffb50e3b7a0b9127617587afa76d4276e8"
uuid = "62b44479-cb7b-5706-934f-f13b2eb2e645"
version = "8.8.1+0"

The code is PackageCompiled and executed with absolutely no warnings on x86_64/GPU. On AArch64/noGPU there is a warning during precompilation. Full output as follows:

PackageCompiler: bundled artifacts:
  ├── LLVMExtra_jll - 3.447 MiB
  └── OpenSpecFun_jll - 487.436 KiB
  Total artifact file size: 3.923 MiB
✔ [04m:35s] PackageCompiler: compiling base system image (incremental=false)
Precompiling project...
  112 dependencies successfully precompiled in 704 seconds. 4 already precompiled.
  1 dependency had warnings during precompilation:
┌ Random123 [74087812-796a-5b5d-8853-05524746bad3]
│  ┌ Warning: AES-NI is not enabled, so AESNI and ARS are not available.
│  └ @ Random123 ~/.julia/packages/Random123/u5oEp/src/Random123.jl:55
└  
✔ [04m:04s] PackageCompiler: compiling nonincremental system image

Finally, the error message upon execution on AArch64/no-GPU:

Downloaded artifact: CUDA_Driver
fatal: error thrown and no exception handler available.
InitError(mod=:CUDA_Driver_jll, error=ErrorException("Unable to automatically download/install artifact 'CUDA_Driver' from sources listed in '/home/ciro/.julia/packages/CUDA_Driver_jll/3xFy2/Artifacts.toml'.
Sources attempted:
- https://pkg.julialang.org/artifact/aa72e00d2e54224026ca1148a004b6b991849de9
    Error: IOError: could not spawn setenv(`7z x /tmp/jl_EZstyrgDsl-download.gz -so`,["_CE_M=", "PATH=:/home/ciro/.local/bin:/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/bin/remote-cli:/home/ciro/.local/bin:/home/ciro/.local/bin:/home/ciro/miniconda3/bin:/home/ciro/miniconda3/condabin:/home/ciro/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "CONDA_PYTHON_EXE=/home/ciro/miniconda3/bin/python", "LD_LIBRARY_PATH=/home/ciro/projects/myapp/app_compiled/bin/../lib/julia:/home/ciro/projects/myapp/app_compiled/bin/../lib", "GIT_ASKPASS=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/extensions/git/dist/askpass.sh", "LC_CTYPE=C.UTF-8", "DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1002/bus", "BROWSER=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/bin/helpers/browser.sh", "CUDA_VISIBLE_DEVICES=", "LANG=C.UTF-8", "LOGNAME=ciro", "SHLVL=2", "LC_MONETARY=C.UTF-8", "XDG_RUNTIME_DIR=/run/user/1002", "LC_ADDRESS=C.UTF-8", "LC_PAPER=C.UTF-8", "XDG_SESSION_TYPE=tty", "_=./app", "CONDA_DEFAULT_ENV=base", "OPENBLAS_DEFAULT_NUM_THREADS=1", "JULIA_PKG_USE_CLI_GIT=true", "USER=ciro", "LESSCLOSE=/usr/bin/lesspipe %s %s", "LC_TIME=C.UTF-8", "LC_NUMERIC=C.UTF-8", "TERM_PROGRAM_VERSION=1.78.2", "JULIA_NUM_THREADS=4", "LC_MEASUREMENT=C.UTF-8", "JULIA_DEPOT_PATH=/home/ciro/projects/myapp/app_compiled/share/julia", "CONDA_PROMPT_MODIFIER=(base) ", "PWD=/home/ciro/projects/myapp/app_compiled/bin", "XDG_SESSION_CLASS=user", "DISPLAY=:1", "LC_TELEPHONE=C.UTF-8", "TERM_PROGRAM=vscode", "VSCODE_GIT_ASKPASS_NODE=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/node", "LESSOPEN=| /usr/bin/lesspipe %s", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "SHELL=/bin/bash", "VSCODE_GIT_ASKPASS_MAIN=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/extensions/git/dist/askpass-main.js", "SSH_CONNECTION=179.100.111.174 36566 10.0.0.87 22", "VSCODE_IPC_HOOK_CLI=/run/user/1002/vscode-ipc-a462b872-9b8a-4a51-b091-12f2d281e9a4.sock", "VSCODE_GIT_ASKPASS_EXTRA_ARGS=", "VSCODE_GIT_IPC_HANDLE=/run/user/1002/vscode-git-6a6214d7ba.sock", "MOTD_SHOWN=pam", "CONDA_PREFIX=/home/ciro/miniconda3", "LC_NAME=C.UTF-8", "XDG_SESSION_ID=166", "LC_IDENTIFICATION=C.UTF-8", "SSH_CLIENT=179.100.111.174 36566 22", "JULIA_LOAD_PATH=/home/ciro/projects/myapp/app_compiled/share/julia", "_CE_CONDA=", "CONDA_SHLVL=1", "CONDA_EXE=/home/ciro/miniconda3/bin/conda", "HOME=/home/ciro", "TERM=xterm-256color", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:", "COLORTERM=truecolor", "OPENBLAS_MAIN_FREE=1"]): no such file or directory (ENOENT)
- https://github.com/JuliaBinaryWrappers/CUDA_Driver_jll.jl/releases/download/CUDA_Driver-v0.5.0+1/CUDA_Driver.v0.5.0.aarch64-linux-gnu.tar.gz
    Error: IOError: could not spawn setenv(`7z x /tmp/jl_gty1clSJ8r-download.gz -so`,["_CE_M=", "PATH=:/home/ciro/.local/bin:/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/bin/remote-cli:/home/ciro/.local/bin:/home/ciro/.local/bin:/home/ciro/miniconda3/bin:/home/ciro/miniconda3/condabin:/home/ciro/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "CONDA_PYTHON_EXE=/home/ciro/miniconda3/bin/python", "LD_LIBRARY_PATH=/home/ciro/projects/myapp/app_compiled/bin/../lib/julia:/home/ciro/projects/myapp/app_compiled/bin/../lib", "GIT_ASKPASS=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/extensions/git/dist/askpass.sh", "LC_CTYPE=C.UTF-8", "DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1002/bus", "BROWSER=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/bin/helpers/browser.sh", "CUDA_VISIBLE_DEVICES=", "LANG=C.UTF-8", "LOGNAME=ciro", "SHLVL=2", "LC_MONETARY=C.UTF-8", "XDG_RUNTIME_DIR=/run/user/1002", "LC_ADDRESS=C.UTF-8", "LC_PAPER=C.UTF-8", "XDG_SESSION_TYPE=tty", "_=./app", "CONDA_DEFAULT_ENV=base", "OPENBLAS_DEFAULT_NUM_THREADS=1", "JULIA_PKG_USE_CLI_GIT=true", "USER=ciro", "LESSCLOSE=/usr/bin/lesspipe %s %s", "LC_TIME=C.UTF-8", "LC_NUMERIC=C.UTF-8", "TERM_PROGRAM_VERSION=1.78.2", "JULIA_NUM_THREADS=4", "LC_MEASUREMENT=C.UTF-8", "JULIA_DEPOT_PATH=/home/ciro/projects/myapp/app_compiled/share/julia", "CONDA_PROMPT_MODIFIER=(base) ", "PWD=/home/ciro/projects/myapp/app_compiled/bin", "XDG_SESSION_CLASS=user", "DISPLAY=:1", "LC_TELEPHONE=C.UTF-8", "TERM_PROGRAM=vscode", "VSCODE_GIT_ASKPASS_NODE=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/node", "LESSOPEN=| /usr/bin/lesspipe %s", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "SHELL=/bin/bash", "VSCODE_GIT_ASKPASS_MAIN=/home/ciro/.vscode-server/bin/b3e4e68a0bc097f0ae7907b217c1119af9e03435/extensions/git/dist/askpass-main.js", "SSH_CONNECTION=179.100.111.174 36566 10.0.0.87 22", "VSCODE_IPC_HOOK_CLI=/run/user/1002/vscode-ipc-a462b872-9b8a-4a51-b091-12f2d281e9a4.sock", "VSCODE_GIT_ASKPASS_EXTRA_ARGS=", "VSCODE_GIT_IPC_HANDLE=/run/user/1002/vscode-git-6a6214d7ba.sock", "MOTD_SHOWN=pam", "CONDA_PREFIX=/home/ciro/miniconda3", "LC_NAME=C.UTF-8", "XDG_SESSION_ID=166", "LC_IDENTIFICATION=C.UTF-8", "SSH_CLIENT=179.100.111.174 36566 22", "JULIA_LOAD_PATH=/home/ciro/projects/myapp/app_compiled/share/julia", "_CE_CONDA=", "CONDA_SHLVL=1", "CONDA_EXE=/home/ciro/miniconda3/bin/conda", "HOME=/home/ciro", "TERM=xterm-256color", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:", "COLORTERM=truecolor", "OPENBLAS_MAIN_FREE=1"]): no such file or directory (ENOENT)
"))
error at ./error.jl:35
jfptr_error_44100 at /home/ciro/projects/myapp/app_compiled/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2940
#ensure_artifact_installed#23 at /home/ciro/packages/julias/julia-1.9/share/julia/stdlib/v1.9/Pkg/src/Artifacts.jl:443
ensure_artifact_installed at /home/ciro/packages/julias/julia-1.9/share/julia/stdlib/v1.9/Pkg/src/Artifacts.jl:387
unknown function (ip: 0xffff812acadb)
_jl_invoke at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2940
#ensure_artifact_installed#22 at /home/ciro/packages/julias/julia-1.9/share/julia/stdlib/v1.9/Pkg/src/Artifacts.jl:383
unknown function (ip: 0xffff81299187)
_jl_invoke at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2940
ensure_artifact_installed at /home/ciro/packages/julias/julia-1.9/share/julia/stdlib/v1.9/Pkg/src/Artifacts.jl:372
unknown function (ip: 0xffff81298f07)
_jl_invoke at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2940
_artifact_str at /home/ciro/packages/julias/julia-1.9/share/julia/stdlib/v1.9/Artifacts/src/Artifacts.jl:549
unknown function (ip: 0xffff812914cf)
_jl_invoke at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_f__call_latest at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:816 [inlined]
invokelatest at ./essentials.jl:813 [inlined]
macro expansion at /home/ciro/packages/julias/julia-1.9/share/julia/stdlib/v1.9/Artifacts/src/Artifacts.jl:701 [inlined]
find_artifact_dir at /home/ciro/.julia/packages/JLLWrappers/QpMQW/src/wrapper_generators.jl:17 [inlined]
__init__ at /home/ciro/.julia/packages/CUDA_Driver_jll/3xFy2/src/wrappers/aarch64-linux-gnu.jl:8
jfptr___init___53623 at /home/ciro/projects/myapp/app_compiled/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_module_run_initializer at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/toplevel.c:75
_finish_julia_init at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/init.c:850
julia_init at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/init.c:799
ijl_init_with_image at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/jlapi.c:66 [inlined]
ijl_init_with_image at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/jlapi.c:55
ijl_init at /cache/build/default-armageddon-6/julialang/julia-release-1-dot-9/src/jlapi.c:82
main at ./app (unknown line)
unknown function (ip: 0xffffa46773fb)
__libc_start_main at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
_start at ./app (unknown line)

As it can be seen at the beginning of error message, execution tries to load CUDA_Driver_jll. First and only attempt to solve that was to install the CUDA toolkit at the AArch64/no-GPU machine. It can’t be installed.

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-ubuntu

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#common-installation-instructions-for-ubuntu

Lastly, it is important to mention that the code runs smoothly when simply executed from REPL or from a Jupyter Notebook, on both machines.

Any advise is greatly appreciated. Thanks.

cirobr · June 8, 2023, 1:34pm

Have found the below paragraph at Flux homepage:

Download Julia 1.6 or later, preferably the current stable release. You can add Flux using Julia’s package manager, by typing ] add Flux in the Julia prompt. This will automatically install several other packages, including CUDA.jl for Nvidia GPU support.

I see a number of potential issues with that:

Using Julia for ML applications on embedded ARM could be a dead end through Flux.
If the programmer has, for instance, an AMD GPU, the application will have to bear with useless CUDA.

What I could not find is if Flux effectively uses CUDA at its heart, or if CUDA is loaded due to an assumption that Flux must be used anyway with GPU. Then, again, only NVIDIA? What about simple ML tasks running at multi-core?

Lastly, by checking the github for AMDGPU package, the authors make clear warnings about limitations of this package. I wonder if part of the cause is due to the burden of CUDA being inherently called from Flux. Anyway, just speculating, no facts to support.

Back to the core of bringing Julia to embedded ARM, does anyone have suggestions on how to proceed?

Thanks.

Topic		Replies	Views
Julia on embedded devices & validation thereof General Usage	36	2844	July 16, 2022
Julia for Simulating Embedded Systems (HW/SW/Environment) General Usage	7	712	January 26, 2022
Julia for Real-Time processing on embedded platforms General Usage	13	1161	June 11, 2024
Does Julia support any hardware developement? General Usage question , hardware	5	488	October 17, 2023
A Roadmap for Beginners: Embedded Systems and Control Theory New to Julia question , machine-learning , hardware , signal-processing , controlsystems	2	2185	February 25, 2023

Bring Julia code to embedded hardware (ARM)

Related topics