Julia seems an order of magnitude slower than Python when printing to the terminal, because of issue with "sleep"

I can agree with this statement. But a sleep function that works for low delays (a few ms) with a reasonable accuracy is useful in many contexts, for me in particular in the fields of simulation and control. If I want to toggle a pin on a Raspberry Pi for 3 ms, I can easily do this with Python, but not with Julia. I find this an annoying and unnecessary limitation of Julia. Well, there are work-arounds, luckily, you can always call C functions in Julia directly, but this is not beginner friendly.

3 Likes

There is Libc.systemsleep.

Well, it is blocking… But otherwise not too bad:

julia> @btime sleep(0.0005)
  1.260 ms (5 allocations: 144 bytes)

julia> @btime Libc.systemsleep(0.0005)
  508.423 μs (0 allocations: 0 bytes)
0

But why can’t the sleep function call systemsleep for the part of the argument that is below 1ms automatically?

Not user friedly.

Because systemsleep is doing a fundamentally different thing — unlike sleep, it doesn’t yield control to other Julia tasks on the same thread.

1 Like

But who cares? I mean, who is using Julias green threads in the first place? Not many people.

Asynchronous I/O is pretty common.

4 Likes

That is super easy & reliable to do. As I mentioned, julia has millisecond resolution on its Timer, provided by libuv:

julia> t = Timer(0, interval=0.003) do _
           @show "Trigger every 3ms"
       end

This will accumulate timer skew, but will trigger roughly every 3ms. It’s equivalent to just waiting x milliseconds at the end of a loop. You can go as low as every 1ms, but with those kinds of times you really want to keep your callback extremely short and not do much more than push some message in e.g. a Channel to have some other background task deal with it and then start a new timer with a correctly calculated “new” offset.

In fact, earlier today I ran a little analysis to check out what sort of resolution python3 even offers, since it’s not even documented anywhere (in contrast to julia!) or I just couldn’t find it in the time.sleep or asyncio.sleep docs. Additionally, the docs of both only talk about “seconds”, with no concern for higher precision given (other than time.sleep mentioning “you can pass in floats for higher precision”, without telling you how far you can go).

With the following script:

time.py
import asyncio
import timeit
import sys
import time

async def func(arg):
    await asyncio.sleep(arg)

f = float(sys.argv[1])

def main():
    asyncio.run(func(f))

def main_sleep():
    time.sleep(f)

runs = 1000
print("Target: ", sys.argv[1])
resmain = (timeit.timeit(main, number=runs) / runs)
ressleep = (timeit.timeit(main_sleep, number=runs) / runs)
print("Sleep async: %.10f" % resmain)
print("Sleep time: %.10f" % ressleep)
abssync = abs(f - resmain)
abstime = abs(f - ressleep)
print("Err abs async: %.10f" % abssync)
print("Err abs time: %.10f" % abstime)
print("Relative error (closer to 1.0 is better)")
print("async: x%.2f" % (resmain / f))
print("time: x%.2f" % (ressleep / f))

we can investigate just how much timer skew & relative error crops up and when it happens. I tested both time.sleep as well as asyncio.sleep, since I felt that testing a single-threaded, single core, non-green thread system (time.sleep) against a green thread system (julia) wasn’t really an apples to apples comparison. The results? Well check them yourself:

Python timing results
$ python3 time.py 0.01
Target:  0.01
Sleep async: 0.0117151672
Sleep time: 0.0104052220
Err abs async: 0.0017151672
Err abs time: 0.0004052220
Relative error (closer to 1.0 is better)
async: x1.17 # 10% error so far, seems good
time: x1.04 # 4% error seems negligible & due to non-realtime guarantees of my kernel/hardware

$ python3 time.py 0.001
Target:  0.001
Sleep async: 0.0017855162
Sleep time: 0.0012482233
Err abs async: 0.0007855162
Err abs time: 0.0002482233
Relative error (closer to 1.0 is better)
async: x1.79  # what's happening here? 80% error?!
time: x1.25 # uh-oh - 25% error at the resolution julia already guarantees?

$ python3 time.py 0.0001
Target:  0.0001
Sleep async: 0.0020051429
Sleep time: 0.0001590123
Err abs async: 0.0019051429
Err abs time: 0.0000590123
Relative error (closer to 1.0 is better)
async: x20.05 # well that isn't good..
time: x1.59  # slowly accumulating more error

$ python3 time.py 0.00001
Target:  0.00001
Sleep async: 0.0007320734
Sleep time: 0.0000666075
Err abs async: 0.0007220734
Err abs time: 0.0000566075
Relative error (closer to 1.0 is better)
async: x73.21 # asyncio just seems to give up
time: x6.66 # woah! why are we suddenly 6 times slower! I thought we'd keep our precision in 10µs requests

$ python3 time.py 0.000001
Target:  0.000001
Sleep async: 0.0001497367
Sleep time: 0.0000538176
Err abs async: 0.0001487367
Err abs time: 0.0000528176
Relative error (closer to 1.0 is better)
async: x149.74 
time: x53.82 # yeah, this isn't realtime either

And yes, my machine does say that I have a nanosecond precision clock available for querying. Personally, I prefer a system with documented guarantees & failure modes to one that just does something in the hopes of being close to right. At least in julia you can very easily do Libc.systemsleep or, with minimal structs to define, call nanosleep yourself:

julia> struct TimeSpec
    tsec::Cint
    tns::Clong
end

julia> @ccall nanosleep(Ref(TimeSpec(0, 5))::Ptr{TimeSpec}, C_NULL::Ptr{TimeSpec})::Cint

If you actually want to use this when not at the REPL, you will have to interact with the task system to make sure there is only your specific task running on your specific OS-thread, so you’ll have to spawn your task sticky on a specific thread. The only cost you’re still eating is FFI, which in properly compiled julia code shouldn’t be any more than in any other C program that links this dynamically.

What’s with the hyperbole? Just because you don’t use it, doesn’t mean noone does. The python community itself didn’t have proper multithreading and exclusively used green threads for the longest time.

If I recall correctly, even the threaded interface from Base.Threads uses the green threads Tasks. They’re just pinned & stickied to an OS thread to prevent migration & data shuffling. So as a matter of fact, I’d wager almost everyone doing multithreading in julia is using the green threads feature, even if they don’t notice it.

You’re really not making a good case for your argument here.

2 Likes

The Julia REPL uses two Tasks — one for the frontend and one for evaluating user code. So pretty much everyone is using these, whether they know it or not.

@kdheepak by the way, here’s an example of what happens when Julia’s IO layer is bypassed:

A normal read from stdin can be cleanly interrupted with ^C:

julia> read!(stdin, zeros(UInt8, 100))
^C

ERROR: InterruptException:
...

But a direct read() from the stdin file descriptor can only be interrupted unsafely with multiple presses of ^C:

julia> @ccall read(0::Cint, zeros(UInt8,100)::Ptr{Cvoid}, 100::Csize_t)::Cssize_t
^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: InterruptException:
...
2 Likes

I hopped through a few more forums because it started to really bug me why a system would have multiple clocks with different tick rates. It made more sense to me for everything to run on one clock with a reasonably high rate for good time resolution. And I still don’t really know why, but I have a few more thoughts now:

  1. On a hardware level, higher tick rates means the processor has to do work (timer interrupt) more frequently. I think the hardware clocks are already doing this and we can’t do much about it, but tying the software to a higher rate clock will add even more overhead. I don’t know exactly how much more, but at some point you’d rather the computer spend more of its time and energy on your program than on keeping time. This is somewhat reminiscent of latency-throughput tradeoffs in the topic of garbage collectors.

  2. Something specific that bugged me was that Python’s time module has two monotonic, system-wide (counts sleep) timer functions: time.monotonic and time.perf_counter, the only apparent difference being that the latter has a higher resolution (depending on the hardware). The name perf_counter suggests it’s intended for measuring code performance, which would need high resolution over small time periods. PEP418, which introduces these timers, suggests that higher resolution clocks drift more, drift meaning an accumulating deviation from true time:

Different clocks have different characteristics; for example, a clock with nanosecond precision may start to drift after a few minutes, while a less precise clock remained accurate for days.

  1. Back to Julia, the resolution of performance measurers like @time or @btime indicate that it’s possible to have higher time resolution. But the asynchronous Base.sleep doesn’t seem like a good place for it. The number you put into it isn’t a hard guarantee because a task that is done sleeping cannot interrupt running tasks, it has to wait its turn in the tasks queue. If you want an accurate sleep period, you probably don’t want something that actively tries to keep the processor busy with other tasks. As an aside, async-await/coroutines recently arose in multiple languages as user-level cooperative multitasking, where you write turn-taking at specific points of your code. For true interrupts, you need preemptive multitasking, which almost all current operating systems opt for. A scheduler automatically decides the turn-taking, so order is much more unpredictable, and your control is much more indirect. You also have to start worrying about your subroutines being reentrant, or as my inexperience interpreted it, “can your code be interrupted at any point, outside your control, by your other code and still work as expected”.
3 Likes

I haven’t check python’s source, but I’d wager these map to CLOCK_MONOTONIC and CLOCK_REALTIME on linux respectively. There are a number of different clocks exposed, e.g. these exist on my machine in /usr/include/linux/time.h:

/*
 * The IDs of the various system clocks (for POSIX.1b interval timers):
 */
#define CLOCK_REALTIME          0
#define CLOCK_MONOTONIC         1
#define CLOCK_PROCESS_CPUTIME_ID    2
#define CLOCK_THREAD_CPUTIME_ID     3
#define CLOCK_MONOTONIC_RAW     4
#define CLOCK_REALTIME_COARSE       5
#define CLOCK_MONOTONIC_COARSE      6
#define CLOCK_BOOTTIME          7
#define CLOCK_REALTIME_ALARM        8
#define CLOCK_BOOTTIME_ALARM        9

each with different meanings & intended guarantees, though most of them seem to just return nanosecond resolution when queried with clock_getres.

As long as the sleep function of Julia does not get improved, at least the help of it should contain a reference to base.Libc.systemsleep with the hint to use it if a higher resolution than 1ms is needed.

8 Likes

With all the hardware variability, I think someone who understands all this stuff should make a whole package dedicated to timing and (synchronous) sleeping. Looking at libc.jl, Base.Libc.systemsleep is conditionally defined to ccall either usleep for Unix systems or Sleep for Windows, and there’s no quick function for giving details on the timer used. It’d be cool if we could get finer details on the clocks on our own OS and computer, and the more portable functions would pick the best available (and we can find out which).

Also I keep reading usleep is deprecated somewhere? Way too many unfamiliar things for me to really understand.

1 Like

Sounds like an easy PR for anyone with a GitHub account and an interest in the matter. The docstring is at https://github.com/JuliaLang/julia/blob/master/base/asyncevent.jl#L224 and can be edited directly in the web GUI, which creates a PR with minimal effort. Click the pencil icon at the upper right corner of the file to get started editing.

6 Likes

Sorry to revive this, but I found this thread on google and wanted to share my workaround that is both (1) non-blocking and (2) accurate:

function nonblocking_systemsleep(t)
   task = Threads.@spawn Libc.systemsleep($t)
   yield()
   fetch(task)
end

This is non-blocking because of the yield and since it uses Libc.systemsleep it’s more accurate than the built-in sleep:

julia> @btime nonblocking_systemsleep(1e-4);
  119.042 μs (5 allocations: 496 bytes)

julia> @btime sleep(1e-4)
  1.154 ms (4 allocations: 112 bytes)

Unfortunately it has some allocations and still isn’t 100% accurate (and doesn’t immediately regain control when the sleep ends). But for me the 1 ms sleep in the head worker was slowing things down, so this workaround was very much worth it.

4 Likes