I can agree with this statement. But a sleep function that works for low delays (a few ms) with a reasonable accuracy is useful in many contexts, for me in particular in the fields of simulation and control. If I want to toggle a pin on a Raspberry Pi for 3 ms, I can easily do this with Python, but not with Julia. I find this an annoying and unnecessary limitation of Julia. Well, there are work-arounds, luckily, you can always call C functions in Julia directly, but this is not beginner friendly.
There is Libc.systemsleep
.
Well, it is blocking… But otherwise not too bad:
julia> @btime sleep(0.0005)
1.260 ms (5 allocations: 144 bytes)
julia> @btime Libc.systemsleep(0.0005)
508.423 μs (0 allocations: 0 bytes)
0
But why can’t the sleep function call systemsleep for the part of the argument that is below 1ms automatically?
Not user friedly.
Because systemsleep
is doing a fundamentally different thing — unlike sleep
, it doesn’t yield control to other Julia tasks on the same thread.
But who cares? I mean, who is using Julias green threads in the first place? Not many people.
Asynchronous I/O is pretty common.
That is super easy & reliable to do. As I mentioned, julia has millisecond resolution on its Timer
, provided by libuv:
julia> t = Timer(0, interval=0.003) do _
@show "Trigger every 3ms"
end
This will accumulate timer skew, but will trigger roughly every 3ms. It’s equivalent to just waiting x
milliseconds at the end of a loop. You can go as low as every 1ms, but with those kinds of times you really want to keep your callback extremely short and not do much more than push some message in e.g. a Channel to have some other background task deal with it and then start a new timer with a correctly calculated “new” offset.
In fact, earlier today I ran a little analysis to check out what sort of resolution python3 even offers, since it’s not even documented anywhere (in contrast to julia!) or I just couldn’t find it in the time.sleep
or asyncio.sleep
docs. Additionally, the docs of both only talk about “seconds”, with no concern for higher precision given (other than time.sleep
mentioning “you can pass in floats for higher precision”, without telling you how far you can go).
With the following script:
time.py
import asyncio
import timeit
import sys
import time
async def func(arg):
await asyncio.sleep(arg)
f = float(sys.argv[1])
def main():
asyncio.run(func(f))
def main_sleep():
time.sleep(f)
runs = 1000
print("Target: ", sys.argv[1])
resmain = (timeit.timeit(main, number=runs) / runs)
ressleep = (timeit.timeit(main_sleep, number=runs) / runs)
print("Sleep async: %.10f" % resmain)
print("Sleep time: %.10f" % ressleep)
abssync = abs(f - resmain)
abstime = abs(f - ressleep)
print("Err abs async: %.10f" % abssync)
print("Err abs time: %.10f" % abstime)
print("Relative error (closer to 1.0 is better)")
print("async: x%.2f" % (resmain / f))
print("time: x%.2f" % (ressleep / f))
we can investigate just how much timer skew & relative error crops up and when it happens. I tested both time.sleep
as well as asyncio.sleep
, since I felt that testing a single-threaded, single core, non-green thread system (time.sleep
) against a green thread system (julia) wasn’t really an apples to apples comparison. The results? Well check them yourself:
Python timing results
$ python3 time.py 0.01
Target: 0.01
Sleep async: 0.0117151672
Sleep time: 0.0104052220
Err abs async: 0.0017151672
Err abs time: 0.0004052220
Relative error (closer to 1.0 is better)
async: x1.17 # 10% error so far, seems good
time: x1.04 # 4% error seems negligible & due to non-realtime guarantees of my kernel/hardware
$ python3 time.py 0.001
Target: 0.001
Sleep async: 0.0017855162
Sleep time: 0.0012482233
Err abs async: 0.0007855162
Err abs time: 0.0002482233
Relative error (closer to 1.0 is better)
async: x1.79 # what's happening here? 80% error?!
time: x1.25 # uh-oh - 25% error at the resolution julia already guarantees?
$ python3 time.py 0.0001
Target: 0.0001
Sleep async: 0.0020051429
Sleep time: 0.0001590123
Err abs async: 0.0019051429
Err abs time: 0.0000590123
Relative error (closer to 1.0 is better)
async: x20.05 # well that isn't good..
time: x1.59 # slowly accumulating more error
$ python3 time.py 0.00001
Target: 0.00001
Sleep async: 0.0007320734
Sleep time: 0.0000666075
Err abs async: 0.0007220734
Err abs time: 0.0000566075
Relative error (closer to 1.0 is better)
async: x73.21 # asyncio just seems to give up
time: x6.66 # woah! why are we suddenly 6 times slower! I thought we'd keep our precision in 10µs requests
$ python3 time.py 0.000001
Target: 0.000001
Sleep async: 0.0001497367
Sleep time: 0.0000538176
Err abs async: 0.0001487367
Err abs time: 0.0000528176
Relative error (closer to 1.0 is better)
async: x149.74
time: x53.82 # yeah, this isn't realtime either
And yes, my machine does say that I have a nanosecond precision clock available for querying. Personally, I prefer a system with documented guarantees & failure modes to one that just does something in the hopes of being close to right. At least in julia you can very easily do Libc.systemsleep
or, with minimal structs to define, call nanosleep
yourself:
julia> struct TimeSpec
tsec::Cint
tns::Clong
end
julia> @ccall nanosleep(Ref(TimeSpec(0, 5))::Ptr{TimeSpec}, C_NULL::Ptr{TimeSpec})::Cint
If you actually want to use this when not at the REPL, you will have to interact with the task system to make sure there is only your specific task running on your specific OS-thread, so you’ll have to spawn your task sticky
on a specific thread. The only cost you’re still eating is FFI, which in properly compiled julia code shouldn’t be any more than in any other C program that links this dynamically.
What’s with the hyperbole? Just because you don’t use it, doesn’t mean noone does. The python community itself didn’t have proper multithreading and exclusively used green threads for the longest time.
If I recall correctly, even the threaded interface from Base.Threads
uses the green threads Task
s. They’re just pinned & stickied to an OS thread to prevent migration & data shuffling. So as a matter of fact, I’d wager almost everyone doing multithreading in julia is using the green threads feature, even if they don’t notice it.
You’re really not making a good case for your argument here.
The Julia REPL uses two Tasks — one for the frontend and one for evaluating user code. So pretty much everyone is using these, whether they know it or not.
@kdheepak by the way, here’s an example of what happens when Julia’s IO layer is bypassed:
A normal read from stdin
can be cleanly interrupted with ^C:
julia> read!(stdin, zeros(UInt8, 100))
^C
ERROR: InterruptException:
...
But a direct read()
from the stdin
file descriptor can only be interrupted unsafely with multiple presses of ^C:
julia> @ccall read(0::Cint, zeros(UInt8,100)::Ptr{Cvoid}, 100::Csize_t)::Cssize_t
^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: InterruptException:
...
I hopped through a few more forums because it started to really bug me why a system would have multiple clocks with different tick rates. It made more sense to me for everything to run on one clock with a reasonably high rate for good time resolution. And I still don’t really know why, but I have a few more thoughts now:
-
On a hardware level, higher tick rates means the processor has to do work (timer interrupt) more frequently. I think the hardware clocks are already doing this and we can’t do much about it, but tying the software to a higher rate clock will add even more overhead. I don’t know exactly how much more, but at some point you’d rather the computer spend more of its time and energy on your program than on keeping time. This is somewhat reminiscent of latency-throughput tradeoffs in the topic of garbage collectors.
-
Something specific that bugged me was that Python’s
time
module has two monotonic, system-wide (counts sleep) timer functions:time.monotonic
andtime.perf_counter
, the only apparent difference being that the latter has a higher resolution (depending on the hardware). The nameperf_counter
suggests it’s intended for measuring code performance, which would need high resolution over small time periods. PEP418, which introduces these timers, suggests that higher resolution clocks drift more, drift meaning an accumulating deviation from true time:
Different clocks have different characteristics; for example, a clock with nanosecond precision may start to drift after a few minutes, while a less precise clock remained accurate for days.
- Back to Julia, the resolution of performance measurers like
@time
or@btime
indicate that it’s possible to have higher time resolution. But the asynchronousBase.sleep
doesn’t seem like a good place for it. The number you put into it isn’t a hard guarantee because a task that is donesleep
ing cannot interrupt running tasks, it has to wait its turn in the tasks queue. If you want an accurate sleep period, you probably don’t want something that actively tries to keep the processor busy with other tasks. As an aside, async-await/coroutines recently arose in multiple languages as user-level cooperative multitasking, where you write turn-taking at specific points of your code. For true interrupts, you need preemptive multitasking, which almost all current operating systems opt for. A scheduler automatically decides the turn-taking, so order is much more unpredictable, and your control is much more indirect. You also have to start worrying about your subroutines being reentrant, or as my inexperience interpreted it, “can your code be interrupted at any point, outside your control, by your other code and still work as expected”.
I haven’t check python’s source, but I’d wager these map to CLOCK_MONOTONIC
and CLOCK_REALTIME
on linux respectively. There are a number of different clocks exposed, e.g. these exist on my machine in /usr/include/linux/time.h
:
/*
* The IDs of the various system clocks (for POSIX.1b interval timers):
*/
#define CLOCK_REALTIME 0
#define CLOCK_MONOTONIC 1
#define CLOCK_PROCESS_CPUTIME_ID 2
#define CLOCK_THREAD_CPUTIME_ID 3
#define CLOCK_MONOTONIC_RAW 4
#define CLOCK_REALTIME_COARSE 5
#define CLOCK_MONOTONIC_COARSE 6
#define CLOCK_BOOTTIME 7
#define CLOCK_REALTIME_ALARM 8
#define CLOCK_BOOTTIME_ALARM 9
each with different meanings & intended guarantees, though most of them seem to just return nanosecond resolution when queried with clock_getres
.
As long as the sleep function of Julia does not get improved, at least the help of it should contain a reference to base.Libc.systemsleep with the hint to use it if a higher resolution than 1ms is needed.
With all the hardware variability, I think someone who understands all this stuff should make a whole package dedicated to timing and (synchronous) sleeping. Looking at libc.jl, Base.Libc.systemsleep
is conditionally defined to ccall either usleep for Unix systems or Sleep for Windows, and there’s no quick function for giving details on the timer used. It’d be cool if we could get finer details on the clocks on our own OS and computer, and the more portable functions would pick the best available (and we can find out which).
Also I keep reading usleep is deprecated somewhere? Way too many unfamiliar things for me to really understand.
Sounds like an easy PR for anyone with a GitHub account and an interest in the matter. The docstring is at https://github.com/JuliaLang/julia/blob/master/base/asyncevent.jl#L224 and can be edited directly in the web GUI, which creates a PR with minimal effort. Click the pencil icon at the upper right corner of the file to get started editing.
Sorry to revive this, but I found this thread on google and wanted to share my workaround that is both (1) non-blocking and (2) accurate:
function nonblocking_systemsleep(t)
task = Threads.@spawn Libc.systemsleep($t)
yield()
fetch(task)
end
This is non-blocking because of the yield
and since it uses Libc.systemsleep
it’s more accurate than the built-in sleep
:
julia> @btime nonblocking_systemsleep(1e-4);
119.042 μs (5 allocations: 496 bytes)
julia> @btime sleep(1e-4)
1.154 ms (4 allocations: 112 bytes)
Unfortunately it has some allocations and still isn’t 100% accurate (and doesn’t immediately regain control when the sleep ends). But for me the 1 ms sleep in the head worker was slowing things down, so this workaround was very much worth it.
This is great thanks!
Just curious, does anybody know if there are any potential issues with this nonblocking_systemsleep
? I am about to switch my package to it.
Specific questions:
- If
--threads=1
, would this end up blocking anything? - Would
@async
be better thanThreads.@spawn
?
I actually got an overall 30% improvement (!) in speed from making this change. It just means the head node can sleep at shorter intervals between checking the workers. And evidently it turns out that sleeping for 1 millisecond rather than my requested 1 microsecond was bottlenecking things.
I think Julia Base may want to merge something like this, maybe as a keyword like sleep(1e-4, system=true)
. Or just switch to it as the default.
Yes, since systemsleep
puts the OS-thread to sleep, i.e. no other code can run on the thread that executes this task while that systemsleep
is blocking execution. This is also mentioned in the docstring:
help?> Libc.systemsleep
systemsleep(s::Real)
Suspends execution for s seconds. This function does not yield to Julia's scheduler and
therefore blocks the Julia thread that it is running on for the duration of the sleep
time.
No, because @async
can cause tasks to be pinned to the current thread, possibly making contention of that thread much worse & subsequently limiting parallelism.
I’ve had some trouble with sleep
being inaccurate too, but I nevertheless wouldn’t want the behavior of systemsleep
to be the default. I’d much rather have the scheduler be more flexible & accurate.
Note also that the minimum time for sleep
is 1e-3
of 0.001
(again, see the docstring of sleep
), while you requested 1e-4
. So it shouldn’t come as a surprise that sleep
didn’t wake up before that. If you have tighter requirements than that, there’s nothing wrong with a small busy-loop like this:
function busy_sleep(duration::Real)
duration <= 0.001 || throw(ArgumentError("Duration must be smaller than 0.001 - use `sleep` for longer durations to allow other tasks to execute while sleeping!"))
t = time()
while (time() - t) < duration
# do nothing
end
end
julia> @time busy_sleep(1e-4)
0.000101 seconds
Of course, this comes at the cost of not allowing other things to run on the thread executing busy_sleep
either, just like systemsleep
.
you can also use the function sleep_ms()
of my package GitHub - ufechner7/Timers.jl: Timers for Julia
Just to check, this behavior doesn’t occur if it’s inside a Threads.@spawn
though, right? (Which is the reason for nonblocking_systemsleep
defined above). I guess the thing I am asking is whether you actually need a second thread or not for this to be nonblocking (my guess is: yes).
This is precisely what I want to avoid. Libc.systemsleep
is nice since it can get down to ~10 microseconds and, if put into @spawn
, it doesn’t block the main thread! Even if it blocks its own thread, it doesn’t use the CPU (unlike a busy loop).
Maybe what I can do is check Threads.nthreads()
and if it’s greater than 1, I use the Threads.@spawn Libc.systemsleep
trick; otherwise, I use the regular sleep
. Like:
const USE_SYSTEMSLEEP = Threads.nthreads() > 1
function systemsleep(dt::Number)
if USE_SYSTEMSLEEP
task = Threads.@spawn Libc.systemsleep(dt)
yield()
fetch(task)
else
sleep(dt)
end
end
So users running with threads = 1 would still have the sleep bottleneck, but users with threads > 1 can take advantage of the lightweight Libc version.
That depends on whether the task created by @spawn
is scheduled on a different thread or not. No matter what, the thread actually executing the task will be blocked & doesn’t participate in scheduling activity. It won’t be available for running other tasks.
That depends on what exactly you mean with “nonblocking”. Inherently, systemsleep
is blocking, since it prevents the thread currently executing from doing other more useful work.
I’m not sure why exactly you want to avoid a busy loop here - in terms of available computational power for your actual work, the two are equivalent. Whether the OS is available to schedule other programs (as would happen with systemsleep
, taking up that “unused” CPU time) or you are not relinquishing CPU & busy waiting, the effect is the same - your productive computation doesn’t run on that thread at all in either scenario.
I’m not sure that’s going to work - with thread adoption,Threads.nthreads()
is not a constant, so storing it globally won’t really have the desired effect.
The Libc version isn’t necessarily lightweight either. On my machine (Linux), this ends up calling usleep
, which also doesn’t guarantee any upper bounds on execution time:
DESCRIPTION
The usleep() function suspends execution of the calling thread for (at least) usec
microseconds. The sleep may be lengthened slightly by any system activity or by
the time spent processing the call or by the granularity of system timers.
I haven’t looked at the implementation on my machine, but I’d be very surprised if this wouldn’t busy-wait either for small enough durations.
On Windows, Libc.systemsleep
ends up calling the Sleep
function from Syncapi.h
, which also takes at least 1ms, just like our regular sleep
. The big downside of systemsleep
compared to sleep
on windows is of course that the former doesn’t allow other julia tasks to run, while the latter does.
Under the hood, pretty much all OS/Julia/Programming language level implementations of sleep
rely on some form of scheduling & time slicing for their “sleeping”. The minimum time available for sleeping is dictated by how small that time slice can be - on Linux, this is distribution dependent, but usually somewhere in the microsecond range IIRC. On windows, this is usually a millisecond.
As a consequence, if you want to go below that you generally have to roll your own, ending up with busy waiting at the smallest level.