Help to select a Raspberry Pi for embedding Julia application

(v1.2) pkg> add OrdinaryDiffEq
 Resolving package versions...
 Installed Adapt ────────────────── v1.0.1
 Installed NaNMath ──────────────── v0.3.3
 Installed UnPack ───────────────── v0.1.0
 Installed GenericSVD ───────────── v0.2.2
 Installed FiniteDiff ───────────── v2.3.1
 Installed CpuId ────────────────── v0.2.2
 Installed ExponentialUtilities ─── v1.6.0
 Installed OrdinaryDiffEq ───────── v5.29.0
 Installed DiffResults ──────────── v1.0.2
 Installed Inflate ──────────────── v0.1.2
 Installed VertexSafeGraphs ─────── v0.1.2
 Installed ArnoldiMethod ────────── v0.0.4
 Installed SIMDPirates ──────────── v0.7.25
 Installed OffsetArrays ─────────── v1.0.4
 Installed RecursiveFactorization ─ v0.1.1
 Installed LightGraphs ──────────── v1.3.3
 Installed LoopVectorization ────── v0.7.8
 Installed ForwardDiff ──────────── v0.10.10
 Installed SimpleTraits ─────────── v0.9.2
 Installed SLEEFPirates ─────────── v0.4.8
 Installed SparseDiffTools ──────── v1.8.0
 Installed VectorizationBase ────── v0.11.5
  Updating `~/.julia/environments/v1.2/Project.toml`
  [1dea7af3] + OrdinaryDiffEq v5.29.0
  Updating `~/.julia/environments/v1.2/Manifest.toml`
  [79e6a3ab] + Adapt v1.0.1
  [ec485272] + ArnoldiMethod v0.0.4
  [bbf7d656] + CommonSubexpressions v0.2.0
  [adafc99b] + CpuId v0.2.2
  [163ba53b] + DiffResults v1.0.2
  [b552c78f] + DiffRules v0.0.10
  [d4d017d3] + ExponentialUtilities v1.6.0
  [6a86dc24] + FiniteDiff v2.3.1
  [f6369f11] + ForwardDiff v0.10.10
  [01680d73] + GenericSVD v0.2.2
  [d25df0c9] + Inflate v0.1.2
  [093fc24a] + LightGraphs v1.3.3
  [bdcacae8] + LoopVectorization v0.7.8
  [77ba4419] + NaNMath v0.3.3
  [6fe1bfb0] + OffsetArrays v1.0.4
  [1dea7af3] + OrdinaryDiffEq v5.29.0
  [f2c3362d] ↑ RecursiveFactorization v0.1.0 ⇒ v0.1.1
  [21efa798] + SIMDPirates v0.7.25
  [476501e8] + SLEEFPirates v0.4.8
  [699a6c99] + SimpleTraits v0.9.2
  [47a9eef4] + SparseDiffTools v1.8.0
  [3a884ed6] ↓ UnPack v1.0.1 ⇒ v0.1.0
  [3d5dd08c] + VectorizationBase v0.11.5
  [19fa3120] + VertexSafeGraphs v0.1.2
  Building VectorizationBase → `~/.julia/packages/VectorizationBase/WoChf/deps/build.log`
┌ Error: Error building `VectorizationBase`: 
│ error: couldn't allocate output register for constraint '{ax}'
│ ERROR: LoadError: Failed to precompile CpuId [adafc99b-e345-5852-983c-f28acb93d879] to /home/pi/.julia/compiled/v1.2/CpuId/vMZBF.ji.
│ Stacktrace:
│  [1] error(::String) at ./error.jl:33
│  [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1253
│  [3] _require(::Base.PkgId) at ./loading.jl:1013
│  [4] require(::Base.PkgId) at ./loading.jl:911
│  [5] require(::Module, ::Symbol) at ./loading.jl:906
│  [6] include at ./boot.jl:328 [inlined]
│  [7] include_relative(::Module, ::String) at ./loading.jl:1094
│  [8] include(::Module, ::String) at ./Base.jl:31
│  [9] include(::String) at ./client.jl:431
│  [10] top-level scope at none:5
│ in expression starting at /home/pi/.julia/packages/VectorizationBase/WoChf/deps/build.jl:1
└ @ Pkg.Operations /buildworker/worker/package_linuxarmv7l/build/usr/share/julia/stdlib/v1.2/Pkg/src/backwards_compatible_isolation.jl:647
  Building SLEEFPirates ─────→ `~/.julia/packages/SLEEFPirates/pJY4j/deps/build.log`
┌ Error: Error building `SLEEFPirates`: 
│ ERROR: LoadError: could not open file /home/pi/.julia/packages/VectorizationBase/WoChf/src/cpu_info.jl
│ Stacktrace:
│  [1] include at ./boot.jl:328 [inlined]
│  [2] include_relative(::Module, ::String) at ./loading.jl:1094
│  [3] include at ./Base.jl:31 [inlined]
│  [4] include(::String) at /home/pi/.julia/packages/VectorizationBase/WoChf/src/VectorizationBase.jl:1
│  [5] top-level scope at /home/pi/.julia/packages/VectorizationBase/WoChf/src/VectorizationBase.jl:222
│  [6] include at ./boot.jl:328 [inlined]
│  [7] include_relative(::Module, ::String) at ./loading.jl:1094
│  [8] include(::Module, ::String) at ./Base.jl:31
│  [9] top-level scope at none:2
│  [10] eval at ./boot.jl:330 [inlined]
│  [11] eval(::Expr) at ./client.jl:432
│  [12] top-level scope at ./none:3
│ in expression starting at /home/pi/.julia/packages/VectorizationBase/WoChf/src/VectorizationBase.jl:222
│ ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/pi/.julia/compiled/v1.2/VectorizationBase/Dto5m.ji.
│ Stacktrace:
│  [1] error(::String) at ./error.jl:33
│  [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1253
│  [3] _require(::Base.PkgId) at ./loading.jl:1013
│  [4] require(::Base.PkgId) at ./loading.jl:911
│  [5] require(::Module, ::Symbol) at ./loading.jl:906
│  [6] include at ./boot.jl:328 [inlined]
│  [7] include_relative(::Module, ::String) at ./loading.jl:1094
│  [8] include(::Module, ::String) at ./Base.jl:31
│  [9] include(::String) at ./client.jl:431
│  [10] top-level scope at none:5
│ in expression starting at /home/pi/.julia/packages/SLEEFPirates/pJY4j/deps/build.jl:1
└ @ Pkg.Operations /buildworker/worker/package_linuxarmv7l/build/usr/share/julia/stdlib/v1.2/Pkg/src/backwards_compatible_isolation.jl:647
1 Like

I think VectorizationBase is a better place for the issue than CpuId.jl:

When VectorizationBase builds, it’ll create a cpu_info.jl file, e.g.:

const REGISTER_SIZE = 64
const REGISTER_COUNT = 32
const REGISTER_CAPACITY = 2048
const FP256 = true # Is AVX2 fast?
const CACHELINE_SIZE = 64
const CACHE_SIZE = (32768, 1048576, 14417920)
const NUM_CORES = 10
const FMA3 = true
const AVX2 = true
const AVX512F = true
const AVX512ER = false
const AVX512PF = false
const AVX512VL = true
const AVX512BW = true
const AVX512DQ = true
const AVX512CD = true

To support ARM, I would need to:

  1. Detect that it is ARM before using CpuId, to run an ARM-specific build script instead
  2. Have an ARM-specific build script that creates a file filled with the above constants.

Most can just be set to false, e.g. it wont have FMA3, so I can’t use asm call for vfmadd231 and will fall back to @llvm.muladd instead (which generates suboptimal code compared to asm call in some situations).
While AVX2, for example, is used right not to ask if it supports SIMD integer operations (which on x86 means “has the AVX2 instruction set”). Without it, VectorizationBase.pick_vector_width(::Type{<:Integer}) will return 1.

But these two are absolutely essential to get right:

const REGISTER_SIZE = 64
const REGISTER_COUNT = 32

So I would need a way to query this information on an ARM CPU.

I think I have access to the Travis ARM CPU thanks to Viral and StaticFloat. So if anyone knows how I can query/lookup those values, I should be able to test.

Are these really only failures to precompile and not hard failures?

using appeared to work anyway:

julia> @time using OrdinaryDiffEq
[ Info: Precompiling OrdinaryDiffEq [1dea7af3-3e70-54e6-95c3-0bf5283fa5ed]
804.584271 seconds (3.50 M allocations: 124.714 MiB, 2.17% gc time)

julia>
1 Like

holy pi

2 Likes

Gonna try using DifferentialEquations now.

Where all is RecursiveFactorization.jl used? Stiff solvers?
Perhaps for now it can add a check to not use @avx if the host is not on x86, as it will error if building VectorizationBase failed.

Yes, the implicit and semi-implicit methods.

Wasn’t Julia created to eliminate the two language Python/C Matlab/Fortran problem? Interested in finding out if your experiment is viable @Ronis_BR.

1 Like

I didn’t actually do the tests. I watched videos like this one:

https://www.youtube.com/watch?v=zF6vyZlw0Bc

Qualitative observations show a large chunk of aluminum with lots of surface area getting quite warm.
I’m confident the heatsink is doing the job, and simultaneously functioning as a protective case for the device.

The 2GB version is $35US. It works as an adequate desktop with a reasonably nice GUI. The power consumption is a downside, but other than that I find it to be a good device at a very fair price.

2 Likes

Indeed! I am really interested in this proof of concept. Even if it would require now a much more powerful computer, it can prove that Julia can eventually become the language to rule them all :smiley:

In my case, I will install a text only OS to try reducing as much as possible the RAM usage.

SSH + VIM + Tmux + julia-vim. You will have a fine system!

3 Likes

If you though 800 seconds was a long time to wait …

julia> @time using DifferentialEquations
[ Info: Precompiling DifferentialEquations [0c46a032-eb83-5123-abaf-570d42b7fbaa]
1220.191766 seconds (30.76 M allocations: 1.088 GiB, 1.11% gc time)

julia> 

I recall this not finishing in an hour or more when I tried it many months ago. Not sure if I did something differently. This is still not really usable but at least it finished. On a 1 GB device. I think we’re mainly bound by the CPU here (which remained cool as I have a fan on it).

1 Like

Awesome to have a baseline. We’ll see what happens after the inference fix.

4 Likes
julia> f(u,p,t) = 1.01*u
f (generic function with 1 method)

julia> u0 = 1/2
0.5

julia> tspan = (0.0, 1.0)
(0.0, 1.0)

julia> prob = ODEProblem(f,u0,tspan)
ODEProblem with uType Float64 and tType Float64. In-place: false
timespan: (0.0, 1.0)
u0: 0.5

julia> sol = @time solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8)
 40.663294 seconds (11.00 M allocations: 420.062 MiB, 9.49% gc time)
retcode: Success
Interpolation: specialized 4th order "free" interpolation
t: 17-element Array{Float64,1}:
 0.0                 
 0.012407826196308189
 0.04250125658161484 
 0.08178046092620397 
 0.12887379439591745 
 0.18409790041494495 
 0.24627449404376492 
 0.3147928829168652  
 0.38859624030646006 
 0.46686165530000767 
 0.5487159959104151  
 0.6334345501790717  
 0.7203628343994752  
 0.8089578125953629  
 0.8987653123338385  
 0.9894159840028138  
 1.0                 
u: 17-element Array{Float64,1}:
 0.5               
 0.5063053789114713
 0.5219304636285521
 0.5430526974619144
 0.5695067474049924
 0.6021743238204087
 0.6412025113764279
 0.687147458356146 
 0.7403257567387032
 0.8012222468290549
 0.8702767411264873
 0.9480213225441934
 1.0350184806191094
 1.131902913018661 
 1.239373221095387 
 1.3582036259485553
 1.3728005076225749
2 Likes

Well, the computational workload of the embedded algorithm will be orders of magnitude lower than solving this ODE.

2 Likes

plus mosh if you want to ssh to this thing when it’s in space. :slight_smile:

4 Likes

For his application, i.e. if you’re launching to space, when you’ve gotten there a fan will not be effective, and I believe a heatsink also relies on air [flow]. Or if they would work the latter is heavier, and every gram launched into space costs $54 at SpaceX prices. So e.g. “Weight (including one Raspberry Pi 4): 54 g” costs you $2916. Or just a typical desktop heatsink much more and “Arctic CoolingFreezer 13 Pro with weight of 1.05 kg” would set you back $56700 (excluding cost of itself, CPU, motherboard etc.).

The good thing is that space is cold and you need neither(?), assuming you get there still in a working state. For something like the ISS, you have air pressure, for humans, I’m just not sure what’s done for equipment (in general).

He’s doing: Spacecraft attitude control - Wikipedia

1 Like

I know the pain of slow compile times on aarch64 very well.

@staticfloat and I did a pretty deep dive into trying to figure out why the recent Nvidia aarch64 chips were so much slower to compile, when their CPU and disk io benchmark results from SystemBenchmark were competitive with modern x86 systems. We concluded that the most likely reason was due to much slower L1/2/3 cache speeds, even if RAM level memory was competitive. You can test that with the memory bandwidth tests on SystemBenchmark.jl, note how much slower they are than the reference, and in my experience the speed factor of the compilecache test was about the same as the smaller memory bandwidth tests.

I agree that invalidation squashing will likely help hugely here.

The good thing is that once you’re past compilation, these new aarch64 chips do seem competitive.

1 Like

FYI: A Pi has gone into space. http://spaceref.com/nasa-hack-space/nasas-raspberry-pi-based-pi-sat-cubesat.html even

train new researchers and students as well as to launch novel payloads aboard CubeSats, including a smartphone. The team is proud of the enthusiastic coverage that the BBC and New Scientist magazine gave to the world’s first ‘phone-sat’!

This was all educational, and while I’m not sure how far these went, I very much doubt I would use for anything critical, at least for navigation, even with this possibility:

One way of making sure that a Raspberry Pi can operate reliably in space is through redundancy: if multiple Raspberry Pis are used, then if one of them should fail, another can take over (the same system used on the space shuttle).

For some chips that should do, and Julia supports(?), or would with minor changes:

If you’re building a cubesat, great, just grab a microcontroller off the shelf, you probably don’t need to worry about radiation hardening. If you’re building an experiment for the ISS, just use any old microcontroller. Deep space? That’s a little harder, and you might need to look into radiation tolerant and radiation hardened microcontrollers. Microchip has just announced the release of two micros that meet this spec, in both radiation-tolerant and radiation-hardened varieties.

The new devices are the SAMV71Q21RT (radiation-tolerant) and the SAMRH71 (rad-hard), both ARM Cortex-M7 chips running at around 300 MHz with enough RAM to do pretty much anything you would want to do with a microcontroller.

Note also comments there, e.g.:

I run an experimental fusor – which makes a lot of neutrons, and a pretty good amount of EMI. […]
While I don’t use rad-hard stuff in the data acquisition […]
A raspberry pi (older models) with external storage rather than SD card…pretty decent, generally killed by some EMP that makes it past all the faraday-cage type shielding, at least so far. Arduino Unos seem to be everything-proof in this environment […]
For those who didn’t think of it – there are some differences between cosmic rays and a neutron flux. The main one is

That article also mentions (and links to below) “it absolutely is possible to build a rad-hard Arduino Mega” but note Julia doesn’t support those non-ARM chips, while some later Arduinos are ARM-based they would also be very hard to support because of memory limitations:

For every problem, imagined or not, there’s a solution. Now, finally, Atmel has released a rad tolerant AVR for space applications. It’s the ATmegaS128, the space-grade version of the ‘mega128. This chip is in a 64-lead ceramic package, has all the features you would expect from the ATmega128 and is, like any ‘mega128, Arduino compatible.

Atmel has an oddly large space-rated rad-hard portfolio, with space-grade FPGAs, memories, communications ICs, ASICs, memories, and now microcontrollers in their lineup.

While microcontrollers that aren’t radiation tolerant have gone up in cubesats and larger commercial birds over the years, the commercial-grade stuff is usually reserved for low Earth orbit stuff. For venturing more than a few hundred miles above the Earth, into the range of GPS satellites and to geosynchronous orbit 25,000 miles above, radiation shielding is needed.

1 Like