A new ARM CPU is available, that is at least three times a powerful as a Raspberry Pi 4. It also has a neural network processing unit, which is supported by Python. Will it also be supported by Julia? What would be needed to support a new NPU?
“The RK3588S has a built-in NPU which provides up to 6 TOPS (tera operations per second) of neural network processing. The NPU supports mainstream deep learning frameworks, such as TensorFlow, Pytorch, MxNET and so on. The powerful RK3588S brings optimized neural network performance to various A.I. applications.” ( https://www.khadas.com/edge2 )
Perhaps I should ask a more general question: Is any NPU supported by Julia or one of the Julia packages?
There are many of them: https://en.wikichip.org/wiki/neural_processor
The support libraries are available as a binary here: GitHub - rockchip-linux/rknpu2
Writing wrappers should be possible since it is a C library, although I’m not sure what the interface should look like on the Julia side.
The simplest solution would be to create a new array type, e.g. RockchipNPUArray, for which you make custom methods like
*(a::RochchipNPUArray, b::...) = call_to_C_library_rknpu2. It should be relatively straightforward (because the NPU has only a small set of capabilities). An experienced developer can probably do it in under a week, if they have a computer running this chip and if julia compiled for it without hickups. The difficulty is finding someone who wants to do the work.
I think there are two different interfaces, one for creating a model and another one for using a trained model…
The interface does not look so complicated: rknpu2/rknn_api.h at master · rockchip-linux/rknpu2 · GitHub But they should update their copyright info…
I didn’t read that library after I found it, looking at it now it doesn’t actually seem to expose any fundamental operations, just the ability to load a model and provide input. Perhaps its not as useful as I though!
Well, this is a chip for embedded systems (and laptops). You do not train a model on such a machine, you just use pre-trained models… I think that’s the same with similar parts from NVIDIA or Google…
For example: USB Accelerator | Coral should have a similar functionality…
That should be straight-forward to make a wrapper for using CBinding.jl.