$ time julia +nightly --project=. -e 'for _ in 1:100000 print(Core.stdout,"Hello"); end' > /dev/null
real 0m0,331s
I.e. 0.331-0.247 = 0.087s after ignoring startup.
$ time julia +nightly --project=. -e 'for _ in 1:1000000 print(Core.stdout,"Hello"); end' > /dev/null
real 0m0,965s
0.965 - 0.247 = 0.718 so is actually scalable, less than 10x slower, so far.
$ time julia +nightly --project=. -e 'for _ in 1:100000 print("Hello"); end' > /dev/null
real 0m0,263s
0.263-0.247 = 0.016s starting out much faster.
$ time julia +nightly --project=. -e 'for _ in 1:1000000 print("Hello"); end' > /dev/null
real 0m0,323s
0.323-0.247 = 0.076s. Meaning 4.75x slower, for 10x more prints, very scalable. BOTH are [the superlinear scalability has to do with the Linux kernel, seemingly, because of syscall, not Julia]. It was a mistake thinking Core.stdout wasn’t scalable, it was only less scalable than regular stdout, then my calculated 15x difference (and widening gap) is/seems misleading.
Startup cost, very similar to subtract:
$ time julia +nightly --project=. -e 'print("Hello");' > /dev/null
real 0m0,247s
user 0m0,140s
sys 0m0,108s
$ time julia +nightly --project=. -e 'print(Core.stdout, "Hello");' > /dev/null
real 0m0,248s
user 0m0,158s
sys 0m0,092s
EDIT: ignore rest, I think I timed with sleep, will edit after retiming:
If I monitor the program with strace
I see:
...
write(14, "HelloHelloHelloHelloHelloHelloHe"..., 32765) = 32765
write(14, "HelloHelloHelloHelloHelloHelloHe"..., 32765) = 32765
write(14, "HelloHelloHelloHelloHelloHelloHe"..., 32765) = 32765
epoll_pwait(3, [], 1024, 0, NULL, 8) = 0
epoll_pwait(3, [], 1024, 0, NULL, 8) = 0
But then is stops for a bit, and then continues this loopy behavior. It seems very strange, as if the kernel is rate-limiting. But why should it? since I’m redirecting to /dev/null. [Though I AM outputting to the terminal for strace
, not a lot each time seemingly, so unclear if I’m only seeing rate-liming there..]
This shows clearly that Julia buffers itself, for Core.stdout however, then I get a syscall for each Hello, explaining why it’s slower. So that’s not too surprising then. Only if it gets slower the more you output. Could the kernel be adjusting its rate-limiting to get more aggressive over time? I no longer believe julia has a memory leak, likely ruled it out (monitor the program externally, see no increase in memory, though Julia starts very high, and may not need to get more memory from the kernel for a while; not sure how to really track malloc
s to libc, that do not go to the kernel…).
Intriguingly these pauses, for epoll_pwait
(is it related to locking?) do NOT happen with Core.stdout since then epoll_pwait
is absent. So it’s in a way better, just still slower since way more write(14, ..)