[ANN] Introducing BaremetalPi.jl - A package to access Raspberry Pi peripherals without external libraries

I’m comparing this to use of micropython on pyboards. On those I can do a bunch of work (e.g. a PID loop calculation and updating of a PWM or GPIO states) in a few hundred microseconds to a millisecond. I don’t know how much overhead pipes involve as opposed to using ccall to hit a .so file, buy the latter is fast.

So in GPIOZero, the most popular user friendly library for using GPIO pins in python, they depend on different backends (pin libraries) such as PiGPIO or RPi.GPIO. If it doesn’t find either of those, it falls back to pure python interface, which does pretty much what your’ve done here.

The exception is software PWM, which needs the CPU to signal at the PWM frequency. This will typically need a native thread. Hence, GPIOZero does not contain a pure python software PWM implementation, it always tries to use the pin libraries. If it cant find a backend, then the pure python implementation will not support software PWM.

Here is the C implementation of software PWM in Rpi.GPIO: https://sourceforge.net/p/raspberry-gpio-python/code/ci/default/tree/source/soft_pwm.c [MIT license] . There is another C implementation in PiGPIO.

Julia does have native threads now, but using them performantly in this fashion may not be trivial. It’s worth trying, but using a small C library for this might be easier?

Hardware PWM can be supported without threads, but that is available on only one pin on the pi, afaik.

[Thanks to Ben, the author of GPIOZero for helping me figure this out]

Can you please tell me what kind of functionalities related to PWM do you use? So I can focus on them initially?

Apart from the trivial thing of changing an LED brightness, its useful to to use PWM to control the speed of a motor using a motor controler (such as the one on the explorer hat) https://github.com/JuliaBerry/JuliaBerry.jl/blob/master/src/JuliaBerry.jl#L96

1 Like

I think I will support sysfs first so that we can use the hardware PWM. After, let’s see if we can do a software PWM using Julia :slight_smile: However, with the latency I see, I do not think it will be possible to do anything good at high frequency without a lot of jitter.

What is “high frequencies” for you? For most PWM it seems like ~10-20kHz would be plenty. Above we have:

A julia thread that bit-bangs out PWM pulses at 0.1 ms intervals would be pretty simple to write and would probably be quite reliable. Particularly if using the nanosleep function via a Ccall for timing?

hum, I saw latencies in the order of millisecond. Thus, I think that software PWM with frequencies higher than 1kHz will have a lot of jitter, but we will see :slight_smile:

If the GC pauses all threads, then it can take tens of milliseconds, so that’s a big problem. I actually was thinking about this when I posted this related question.

I wouldn’t want to bit-bang a PWM, I just did that as an exercise to see how fast things are.

I would describe typical PWMs as follows:
Motor and solenoid PWMs which use the inherent inductance of the device usually are just above audible, like 23 kHz.
LEDs are commonly done at rates like 120 Hz, but faster is becoming more common for video compatibility and to make it invisible even in fast saccades (you can see 120 Hz LEDs once you know how).
One can also use PWMs to make DACs, in which much higher frequencies are desirable to make filtering easier.

In short, I think you’d want to expose as much of the underlying hardware capability as possible, and simplifying methods for specific purposes could be layered on top.

My understanding is that there are a few hardware controlled PWMs but if you want more than those few, you have to software bit-bang them. But even for motor or solenoid at 23kHz this should be no problem in a threaded implementation. You’d just spawn off a thread that does bit-banging for you, and then send it messages about when you’d like it to change the duty cycle.

https://www.raspberrypi.org/documentation/usage/gpio/

Hardware PWM is pins 12,13,18,19

I am really not that certain that we will be able to do this inside a thread, inside Julia, without a huge jitter. If we had multiple processors and then make one core only to handle this, then I would be more optimistic. However, given the jitter I saw, I am not sure this can be achieved from a pure Julia implementation. But, let’s see :slight_smile:

Based on the results of that discussion over on the other thread on GC, it seems that an interface for software PWM might be to spawn a very simple C thread that reads some julia globals and constantly bit-bangs the appropriate values to the pins, calling nanosleep and doing some timing adjustments based on the OS clock. It should be fine for up to tens of kilohertz I would think. The thread could take real-time scheduling. I think the jitter should be handled well, even on the RPi zero.

1 Like

i would do support of H/W PWMs first. S/W PWMs are a bad idea IMHO.

1 Like

I did a small test to verify the possibility using @async.

usleep(t::Int) = ccall(:usleep, Cint, (Cint,), t)

function pwm_test()
    @sync begin

        @async begin
            t0 = time()
            t1 = time()
            max = 0
            min = 10_000
            mean = 0
            for i = 1:200_000
                usleep(100)
                t0 = t1
                t1 = time()
                d  = (t1-t0)*1e6
                mean += d
                max < d && (max = d)
                min > d && (min = d)
            end
            println("Mean: $(mean/200_000)")
            println("Max:  $max")
            println("Min:  $min")
        end

        @async begin
            a = 0
            for i = 1:25
                a = a+1
                sleep(1)
            end
        end
    end
end

This should mimic the code to set a PWM with 10kHz. What I got was:

Mean: 150.77258467674255
Max:  303.9836883544922
Min:  127.7923583984375
Task (done) @0xac2e8eb0

The mean time should be very close to 100us. Thus, it will really not be an easy task to maintain a SW PWM with frequencies like 10kHz. The jitter will be bad, very bad. I run this example on a process with the highest priority possible using PREEMPT_RT patch.

1 Like

Is this on an rpi zero? I’d be shocked if you couldn’t do better on a 3b or 4

1 Like

Yes, this is a Pi Zero W.

What is JULIA_NUM_THREADS in this case? It must be at least 2 or there is no real asynchronousness.

I used 2, but the same result is obtained with only the measurement thread.

mean value is 107 on x86 with SCHED_RR and priority 5 with 4 threads on a Debian stock kernel (tickless, high res timers, voluntary preempt). So it’s probably much more doable on the RPi 3 or 4.

Mean: 107.01049447059631
Max:  271.08192443847656
Min:  100.85105895996094
Task (done) @0x00007f555efde4a0

But the max value is still crazy high. I’m guessing it’s not from GC because nothing is allocating in the loops. I imagine it’s the kernel doing something to pause the thread due to hardware interrupts or something.

It’d be interesting to see more of the statistical distribution of timings. I’ll maybe pre-allocate a big vector and fill it with the timings… tomorrow, gotta get the kids to bed now.


using Queryverse,Statistics

usleep(t::Int) = ccall(:usleep, Cint, (Cint,), t)

function pwm_test()
    times = zeros(200000)
    @sync begin
        
        
        @async begin
            t0 = time()
            t1 = time()
            max = 0
            min = 10_000
            mean = 0
            for i = 1:200_000
                usleep(100)
                t0 = t1
                t1 = time()
                d  = (t1-t0)*1e6
                times[i] = d;
            end
            println("Mean: $(mean/200_000)")
            println("Max:  $max")
            println("Min:  $min")
        end

        @async begin
            a = 0
            for i = 1:25
                a = a+1
                sleep(1)
            end
        end
    end
    display(DataFrame(dt=times) |> @vlplot(mark="line", transform=[{density = :dt, bw=5}],encoding = {x={field="value"}, y={field="density"}}))
    println("Quantiles 0.01,.1,.25,.5,.75,.9,.99 are $(Statistics.quantile(times))")
end

pwm_test()

running on the regular scheduler, outputs:
Quantiles 0.01,.1,.25,.5,.75,.9,.99 are [103.95050048828125, 156.16416931152344, 159.0251922607422, 162.1246337890625, 1034.0213775634766]

wow! the outlier is quite impressive, but the consistency isn’t that bad.

I suspect that https://www.man7.org/linux/man-pages/man2/nanosleep.2.html is better, but I’m not proficient yet in figuring out how to call C functions with structs like that.

1 Like

If you want to use nanosleep, do this:

mutable struct struct_timespec
    tv_sec::Culong
    tv_nsec::Culong
end

@inline function nanosleep(req::struct_timespec)
    ccall(:nanosleep, Cint, (Ref{struct_timespec}, Ptr{Cvoid}), req, C_NULL)
end

julia> req = struct_timespec(0,100_000)

julia> nanosleep(req)

However, in my first test, it did not change much.

2 Likes

Thanks for the nanosleep code! Here is code that uses about 15% of one core and gives me very reliable timings when using SCHED_RR and priority 5 on my x86 desktop machine.



function pwm_test()
    times = zeros(200000)
    @sync begin
        
        
        @async begin
            t0 = time()
            t1 = time()

            sltime = struct_timespec(0,20_000);
            
            for i = 1:200_000
                t0 = time()
                while (time() - t0 < .00008)
                    nanosleep(sltime)
                end
                t1 = time()
                d  = (t1-t0)*1e6
                times[i] = d;
            end
            display(DataFrame(dt=times) |> @vlplot(mark="line", transform=[{density = :dt, bw=5}],encoding = {x={field="value"}, y={field="density"}}))
            q = [0.01,.1,.25,.5,.75,.9,.99];
            println("Quantiles $q are $(Statistics.quantile(times,q))")

        end

        @async begin
            a = 0
            for i = 1:25
                a = a+1
                sleep(1)
            end
        end
    end
end

Output:

Quantiles [0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99] are [82.0159912109375, 89.88380432128906, 97.03636169433594, 97.99003601074219, 98.94371032714844, 99.89738464355469, 104.9041748046875]

image

I imagine the pi zero might struggle a little, but a Pi 3 or 4 would do well here.

1 Like