[ANN] Introducing BaremetalPi.jl - A package to access Raspberry Pi peripherals without external libraries

I think I will support sysfs first so that we can use the hardware PWM. After, let’s see if we can do a software PWM using Julia :slight_smile: However, with the latency I see, I do not think it will be possible to do anything good at high frequency without a lot of jitter.

What is “high frequencies” for you? For most PWM it seems like ~10-20kHz would be plenty. Above we have:

A julia thread that bit-bangs out PWM pulses at 0.1 ms intervals would be pretty simple to write and would probably be quite reliable. Particularly if using the nanosleep function via a Ccall for timing?

hum, I saw latencies in the order of millisecond. Thus, I think that software PWM with frequencies higher than 1kHz will have a lot of jitter, but we will see :slight_smile:

If the GC pauses all threads, then it can take tens of milliseconds, so that’s a big problem. I actually was thinking about this when I posted this related question.

I wouldn’t want to bit-bang a PWM, I just did that as an exercise to see how fast things are.

I would describe typical PWMs as follows:
Motor and solenoid PWMs which use the inherent inductance of the device usually are just above audible, like 23 kHz.
LEDs are commonly done at rates like 120 Hz, but faster is becoming more common for video compatibility and to make it invisible even in fast saccades (you can see 120 Hz LEDs once you know how).
One can also use PWMs to make DACs, in which much higher frequencies are desirable to make filtering easier.

In short, I think you’d want to expose as much of the underlying hardware capability as possible, and simplifying methods for specific purposes could be layered on top.

My understanding is that there are a few hardware controlled PWMs but if you want more than those few, you have to software bit-bang them. But even for motor or solenoid at 23kHz this should be no problem in a threaded implementation. You’d just spawn off a thread that does bit-banging for you, and then send it messages about when you’d like it to change the duty cycle.

Hardware PWM is pins 12,13,18,19

I am really not that certain that we will be able to do this inside a thread, inside Julia, without a huge jitter. If we had multiple processors and then make one core only to handle this, then I would be more optimistic. However, given the jitter I saw, I am not sure this can be achieved from a pure Julia implementation. But, let’s see :slight_smile:

Based on the results of that discussion over on the other thread on GC, it seems that an interface for software PWM might be to spawn a very simple C thread that reads some julia globals and constantly bit-bangs the appropriate values to the pins, calling nanosleep and doing some timing adjustments based on the OS clock. It should be fine for up to tens of kilohertz I would think. The thread could take real-time scheduling. I think the jitter should be handled well, even on the RPi zero.

1 Like

i would do support of H/W PWMs first. S/W PWMs are a bad idea IMHO.

1 Like

I did a small test to verify the possibility using @async.

usleep(t::Int) = ccall(:usleep, Cint, (Cint,), t)

function pwm_test()
    @sync begin

        @async begin
            t0 = time()
            t1 = time()
            max = 0
            min = 10_000
            mean = 0
            for i = 1:200_000
                usleep(100)
                t0 = t1
                t1 = time()
                d  = (t1-t0)*1e6
                mean += d
                max < d && (max = d)
                min > d && (min = d)
            end
            println("Mean: $(mean/200_000)")
            println("Max:  $max")
            println("Min:  $min")
        end

        @async begin
            a = 0
            for i = 1:25
                a = a+1
                sleep(1)
            end
        end
    end
end

This should mimic the code to set a PWM with 10kHz. What I got was:

Mean: 150.77258467674255
Max:  303.9836883544922
Min:  127.7923583984375
Task (done) @0xac2e8eb0

The mean time should be very close to 100us. Thus, it will really not be an easy task to maintain a SW PWM with frequencies like 10kHz. The jitter will be bad, very bad. I run this example on a process with the highest priority possible using PREEMPT_RT patch.

1 Like

Is this on an rpi zero? I’d be shocked if you couldn’t do better on a 3b or 4

1 Like

Yes, this is a Pi Zero W.

What is JULIA_NUM_THREADS in this case? It must be at least 2 or there is no real asynchronousness.

I used 2, but the same result is obtained with only the measurement thread.

mean value is 107 on x86 with SCHED_RR and priority 5 with 4 threads on a Debian stock kernel (tickless, high res timers, voluntary preempt). So it’s probably much more doable on the RPi 3 or 4.

Mean: 107.01049447059631
Max:  271.08192443847656
Min:  100.85105895996094
Task (done) @0x00007f555efde4a0

But the max value is still crazy high. I’m guessing it’s not from GC because nothing is allocating in the loops. I imagine it’s the kernel doing something to pause the thread due to hardware interrupts or something.

It’d be interesting to see more of the statistical distribution of timings. I’ll maybe pre-allocate a big vector and fill it with the timings… tomorrow, gotta get the kids to bed now.


using Queryverse,Statistics

usleep(t::Int) = ccall(:usleep, Cint, (Cint,), t)

function pwm_test()
    times = zeros(200000)
    @sync begin
        
        
        @async begin
            t0 = time()
            t1 = time()
            max = 0
            min = 10_000
            mean = 0
            for i = 1:200_000
                usleep(100)
                t0 = t1
                t1 = time()
                d  = (t1-t0)*1e6
                times[i] = d;
            end
            println("Mean: $(mean/200_000)")
            println("Max:  $max")
            println("Min:  $min")
        end

        @async begin
            a = 0
            for i = 1:25
                a = a+1
                sleep(1)
            end
        end
    end
    display(DataFrame(dt=times) |> @vlplot(mark="line", transform=[{density = :dt, bw=5}],encoding = {x={field="value"}, y={field="density"}}))
    println("Quantiles 0.01,.1,.25,.5,.75,.9,.99 are $(Statistics.quantile(times))")
end

pwm_test()

running on the regular scheduler, outputs:
Quantiles 0.01,.1,.25,.5,.75,.9,.99 are [103.95050048828125, 156.16416931152344, 159.0251922607422, 162.1246337890625, 1034.0213775634766]

wow! the outlier is quite impressive, but the consistency isn’t that bad.

I suspect that nanosleep(2) - Linux manual page is better, but I’m not proficient yet in figuring out how to call C functions with structs like that.

1 Like

If you want to use nanosleep, do this:

mutable struct struct_timespec
    tv_sec::Culong
    tv_nsec::Culong
end

@inline function nanosleep(req::struct_timespec)
    ccall(:nanosleep, Cint, (Ref{struct_timespec}, Ptr{Cvoid}), req, C_NULL)
end

julia> req = struct_timespec(0,100_000)

julia> nanosleep(req)

However, in my first test, it did not change much.

2 Likes

Thanks for the nanosleep code! Here is code that uses about 15% of one core and gives me very reliable timings when using SCHED_RR and priority 5 on my x86 desktop machine.



function pwm_test()
    times = zeros(200000)
    @sync begin
        
        
        @async begin
            t0 = time()
            t1 = time()

            sltime = struct_timespec(0,20_000);
            
            for i = 1:200_000
                t0 = time()
                while (time() - t0 < .00008)
                    nanosleep(sltime)
                end
                t1 = time()
                d  = (t1-t0)*1e6
                times[i] = d;
            end
            display(DataFrame(dt=times) |> @vlplot(mark="line", transform=[{density = :dt, bw=5}],encoding = {x={field="value"}, y={field="density"}}))
            q = [0.01,.1,.25,.5,.75,.9,.99];
            println("Quantiles $q are $(Statistics.quantile(times,q))")

        end

        @async begin
            a = 0
            for i = 1:25
                a = a+1
                sleep(1)
            end
        end
    end
end

Output:

Quantiles [0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99] are [82.0159912109375, 89.88380432128906, 97.03636169433594, 97.99003601074219, 98.94371032714844, 99.89738464355469, 104.9041748046875]

image

I imagine the pi zero might struggle a little, but a Pi 3 or 4 would do well here.

1 Like

And, it’s registered now :slight_smile: We can now install it with ]add BaremetalPi.

Looking at the results, notice how bad the jitter will be. 10% of time, the code can execute faster than 90us. In this case, if it happens to be in the iteration that turns off the PWM and if you want a 50% DC, then it will be 45% (considering a full cycle of 200us). It is OK for things like dimming a LED, but certainly not for some control applications.

Does anyone has any idea how to improve it? I remember I had a big problem back in 2009 when I needed to control several servos using a SBC of EmbeddedArm. Even programming in C, I was not able to keep a good control using a SW PWM and had to buy expansion modules with hardware PWM. I thought things would get much better by now, but I am seeing somewhat similar results.

My thinking was that you might want say 8 bit levels of PWM, which means one full period would be 255 cycles of 100us = 25.5ms so that dx of 20us is less than .08% error :wink:

Of course if you just want 2 bit PWM and are then using 300us period, 20us period is 7% error. still reasonable I think? Anything less than 2 bit is just on vs off.

also, in those stats 98% of the time the timing is between 89.9 and 99.9 us, with some tweaking we could shift it to center it on 100us. that seems like decent reliability if 98% of the cycles are between say 95 and 105 us

Even for a motor operating at say 5000RPM = 83.3 rev/s, if you used 4 bits to get 16 speeds, you’d be able to change the speed every 1500us = 0.125 rotations and have jitter variability of only maybe 20 out of 1500 = 1.3%

2 Likes