Bring Julia code to embedded hardware (ARM)

cirobr · June 11, 2023, 1:57pm

UPDATE

I’ve reached to the conclusion that Flux is no longer usable if the system does not have GPU. Given that Flux is strongly dependent on CUDA, that means ML apps using Flux on IoT devices such as Raspberry Pi are no longer viable. Newest CUDA release apparently blocks that (till recently it was enabled).

Using an AArch64 device, clean Ubuntu 22.04 install, clean CUDA toolkit install, and clean Julia 1.9.1 install. Pkg.add(“CUDA”) works. Using CUDA fails:

julia> using CUDA
┌ Error: Failed to initialize CUDA
│   exception =
│    CUDA error (code 100, CUDA_ERROR_NO_DEVICE)

Thus, Jetson Nano is the only game…

ToucheSir · June 11, 2023, 2:03pm

Did you actually open an issue on the CUDA.jl side as recommended on GitHub? using CUDA has intentionally been designed to be a no-op on platforms which don’t have a driver installed, so I’m pretty sure this counts as a bug.

Oscar_Smith · June 11, 2023, 3:04pm

To me it seems like it probably would be a better idea for Flux to make the entire GPU stack a weak dep now that we have those in 1.9. There’s no reason to depend on all the GPU stuff for users that don’t have GPUs.

ToucheSir · June 11, 2023, 4:06pm

Easier said than done unfortunately. We’ve run into a number of roadblocks trying to make package extensions work while maintaining backwards compat (Flux supports 1.6 and I see people on 1.7 worryingly often) without creating backport branches on a bunch of repos (which did not work out well the last time it was tried). The latest one we ran into is make NNlibCUDA an extension by CarloLucibello · Pull Request #492 · FluxML/NNlib.jl · GitHub. Given I’ve already read complaints about import word salad when it comes to using FluxML packages, you can see why these blockers are non-trivial to resolve.

cirobr · June 12, 2023, 1:01pm

As a researcher, I’m open to discuss on how to collaborate on bringing a solution for joining Flux and embedded Julia. This is a topic of great interest. If the solution involves Flux 2.0 with less CUDA dependency, so be it.

As a product developer that planned to use embedded Julia in a product, I’m afraid this path is no longer possible. One option in sight would be Python/TinyML - a number of chip suppliers already made it work even on Cortex-M family.

cirobr · June 12, 2023, 2:12pm

Just did it, and the immediate solution was … not using CUDA!

https://github.com/JuliaGPU/CUDA.jl/issues/1952#

ToucheSir · June 12, 2023, 2:59pm

No, that’s not the suggestion. What appears to be happening is that CUDA.jl is detecting you have the proprietary Nvidia driver installed on your system for some reason. Since you presumably don’t have a Nvidia GPU attached to said system, you’ll want to uninstall the driver and then the issue should go away. If for some reason you must have the driver installed despite not having a GPU attached to the system (edit: or you don’t have it installed), I would mention that on the GitHub issue.

cirobr · June 12, 2023, 3:11pm

This is not my understanding. Being the man-in-the middle is not being practical at all. Perhaps if both, Flux and JuliaGPU, try to duplicate the issue as a team?

There is definitely an issue to be solved by experts.

ToucheSir · June 12, 2023, 3:29pm

I know it’s not evident from this discussion, but you’re not the man-in-the-middle here. Tim and I have seen this issue many times already, hence why we’re both asking if you can test a couple more things.

The problem is that we need an environment like yours to replicate the issue! If you know the exact specifications of the machine you’re working on you could share them here and hope someone else has a similar one to test, but otherwise I’d recommend trying to answer the following questions from my post above:

Do you have nvidia drivers installed on this machine?
If so, can you uninstall those drivers? Does uninstalling make the error go away?

maleadt · June 12, 2023, 7:04pm

I added some details on the issue, but essentially, you’ll want to figure out which package on your system provides libcuda.so. Only removing what provides nvidia.ko, aka. the kernel-level driver, is not sufficient. You also need to remove the user-space driver.

But once more, the output you showed in the issue is just an @error message. It should not break Flux. If it does, then please provide more details (what broke? can you provide a backtrace?) so that we can help you better.

minetest2048 · June 13, 2023, 3:01am

This is where hopefully package extensions/ weakdeps get adopted more. I did have a related problem where JuliaGNSS depends on CUDA.jl just to check whether the hw have Nvidia GPU and suggest enabling GPU acceleration, while I’m running it on Raspberry Pi.

Another thing is Enable JITLink in aarch64 linux. by giordano · Pull Request #49745 · JuliaLang/julia · GitHub is very close to be merged, which fixes a lot of segfaults on at least Rock Pi 4 and QEMU.
With Julia can now run on QEMU then cross-compilation (ish) can be done, still QEMU overhead in translating ARM to x86 is still too high

maleadt · June 14, 2023, 7:39am

Yeah, we will probably need a tiny packages to detect the availability of GPU software and hardware so that people can check whether CUDA.jl is likely to work without having to depend on all of CUDA.jl.

Anyway, this is pretty off-topic to what was originally being discussed here.

cirobr · June 14, 2023, 4:38pm

Until the issue is fixed, have downgraded CUDA Pkg to v3.13.1. Development and compilation remain on a large AArch64/no GPU instance, then rsync to the Raspberry Pi.

Topic		Replies	Views
Julia on embedded devices & validation thereof General Usage	36	2824	July 16, 2022
Julia for Simulating Embedded Systems (HW/SW/Environment) General Usage	7	708	January 26, 2022
Julia for Real-Time processing on embedded platforms General Usage	13	1153	June 11, 2024
Does Julia support any hardware developement? General Usage question , hardware	5	487	October 17, 2023
A Roadmap for Beginners: Embedded Systems and Control Theory New to Julia question , machine-learning , hardware , signal-processing , controlsystems	2	2173	February 25, 2023

Bring Julia code to embedded hardware (ARM)

Related topics