Julia seems an order of magnitude slower than Python when printing to the terminal, because of issue with "sleep"

Looks like this is due to LibUV (what julia is using for I/O, task and timer stuff) only allowing that, at minimum. Its resolution is in milliseconds.

Thanks for all the answers! I’ll port my code to use @ccall write. I think because this is going to be part of the TUI, I don’t expect other tasks going on concurrently to write to stdout. I’ll document this behavior too.

A couple forums say that Python’s time.sleep uses the operating system’s sleep function, which depends on the OS clock’s tick rate. A couple more specific quotes claim that the minimum sleep is ~10-13 ms on Windows and ~1 ms on Linux, based on tick rates. I don’t know if Julia’s documented minimum of 1ms is also based on the Linux tick rate.

Some other forums also pointed out that the more common OSs are not real-time, so when the exact sleep time is finished, the OS may have scheduled something that has yet to finish. The latency was reported to be on the order of ms.

Correct, python’s time.sleep simply pauses execution for at least the specified time. For something that yields to other tasks, you need asyncio.sleep.

Just for clarification, by longer startup time, you’re comparing the Julia runtime to CPython runtime/interpreter, yes? Can some of this time be saved by making a PackageCompiler.jl executable (I’m guessing by doing compilation in advance)?

This version runs at a decent speed by busy-waiting, so I think the entire problem here is about the ability to sleep for a controlled amount of time.

function main()
    delay = 1e9 / 1200
    data = read(stdin, String)
    base_time = time_ns()
    for (i, b) in enumerate(data)
        print(b)
        while time_ns() < base_time + delay * i
        end
    end
end

main()
6 Likes

Yes, that’s what I’m comparing. I’m only superficially familiar with how CPython works internally, but my understanding is that it can start interpreting much sooner than julia can, which often makes up the difference in these kinds of benchmarks. The key phrase to search for on this forum and the issue tracker is “time to first plot”, which has already improved a lot in the past few years.

As for using PackageCompiler - maybe? I’m not a frequent user of it, but from what I’ve heard it can eliminate that startup time to some degree. Personally, I’m still waiting for a “builtin” way with tree shaking and smaller binaries before I dive too deep into it.

Well, the sleep function of Python on Linux is at least one order of magnitude more accurate then the sleep function of Julia.

FYI: This Julia program is 2.3s faster (EDIT: the other Julia program is that much faster, but on “real” only, I didn’t notice “user” the first time around) than for python2 (which is slightly faster than python3), despite Julia’s small, but larger startup overhead:

$ time ~/julia-1.9-DEV-76cf1761e6/bin/julia animation3.jl < animation.txt
real	0m16,651s
user	0m1,419s
sys	0m0,899s

old timing I got, I was accidentally timing @GunnarFarneback's original program:
real	0m13,709s
user	0m14,385s
sys	0m1,137s

vs. for Python:
real	0m16,021s
user	0m0,392s
sys	0m0,544s

I can’t take much credit for it, I slightly changed @GunnarFarneback’s program, to use Libc.systemsleep (since busy-waiting is non-ideal, but looking at “user”, it may still be happening?). It improves on the default sleep (which has two issues, and I hope those are fixed). See, the thread this one reminded me of: Is it necessary for sleep to allocate? - #3 by yashi

function main()
    delay = 1.0 / 1200
    data = read(stdin, String)
    for (i, b) in enumerate(data)
        print(b)
        Libc.systemsleep(delay)
    end
end

main()
3 Likes

That then depends on the implementation of systemsleep on your platform. On linux it just calls usleep for microsecond based sleeping, on windows it just goes back to Sleep (and I think millisecond, from what I can tell from @edit Libc.systemsleep(0.00001)).

1 Like

I know that Libc.systemsleep doesn’t yield to other tasks, but is there a reason that sleep doesn’t use Libc.systemsleep the same clock for the most accurate timing the OS is capable of (but uses whatever is in LibUV instead)?

May be relevant, discussion 7 years ago

https://github.com/JuliaLang/julia/issues/12770

and referenced POSIX interface (surely to update to modern Julia!)

https://github.com/ibadr/POSIXClock.jl

1 Like

It’s to allow other julia background tasks to run even on non-multithreaded systems. The OS has no concept of julia tasks, it just sees the OS-threads it provided to the julia process. If you use Libc.systemsleep the whole thread goes to sleep, not just the julia task (which is what happens if you use the regular julia sleep, which is task aware).

Be aware that this suffers from the same problem and really only has an extremely narrow usecase (busy waiting for very short timespaces).

1 Like

Whoops, I meant to ask why sleep doesn’t use the same clock as Libc.systemsleep, not call it. Guess I could use better sleep, too.

2 Likes

This solution works well for me, and is similar in performance to Python! Thank you for sharing. I guess busy waiting like this is the only way cross platform way to do this in Julia? I’ve only been testing on my mac.

And even that probably isn’t ideal, since it sounds like that and systemsleep will block the thread. That would mean using async wouldn’t work to perform computations concurrently on a different task while waiting.

Is there a way to do the same thing using yield and async tasks?

This is not a limitation of julia - on the time scales you’re looking for, busy waiting is more or less the only cross platform way of waiting (unless you have specialized hardware that can handle these short time spans efficiently).

Busy waiting also blocks the thread, just with busy work instead of allowing the OS to run a different program on the physical hardware threads/cores.

Under the hood sleep spawns a Task using Timer which is then scheduled. The task spawning the Timer is then yielded from, so pretty much exactly what you’re describing here (which also creates a LibUV task since that’s what’s handling all scheduling stuff in the first place).

4 Likes

A limitation of Julia is the fact, that the Linux function nanosleep() is not well supported. Without any specialized hardware the resolution WITHOUT busy waiting is about 1 micro second, and an accuracy of 10 to 100 microseconds can easily be achieved even on a loaded system with default kernel. It would be nice if this could be supported in the future, even if my bug report Accuracy and resolution of sleep() on Linux should be improved · Issue #12770 · JuliaLang/julia · GitHub is already 7 years old.

2 Likes

I disagree. The problems & arguments for why that’s not directly exposed raised by very knowledgable people in that issue don’t just go away by waiting a few years. You’re not prevented from achieving that goal, should your use case fall into the very narrow band of situations where that behavior is both desirable & achievable due to available hardware support, which just doesn’t exist on general purpose hardware that julia has to support as the lowest common denominator.

The main problem with calling write is not that it’ll mess up concurrent writes to stdout. The deeper problem is that it’ll block the entire hardware thread and not yield to any other Julia task doing unrelated concurrent work.

1 Like

Well, the wish to have good performance is NOT a very narrow band of situations…

which just doesn’t exist on general purpose hardware

This is just wrong. Any modern Desktop or Laptop supports timers with nanosecond or microsecond resolution, and any modern Linux can make use of them efficiently. Why do you repeat a statement that might have been true 20 years ago?

julia has to support as the lowest common denominator

Yes, and Julia COULD and SHOULD support the lowest common denominator and and the same time provide better performance/ resolution on the platforms that offer it as Python does (in this case)

Please don’t just pick parts of my answer to argue against a strawman. You’re intentionally omitting the part where I said that julia doesn’t prevent you from achieving your goal while at the same time (seemingly) arguing that the average julia code requires such high resolution timers in the first place. The case presented here (“how fast can I read & spit out bytes without doing any work on them except for sleeping between read & write”) is so extremely niche, it can not be considered representative.

1 Like