Makes sense given how the first script’s output shows up all at once. Terminal emulators are really not optimized for large quantities of text output…
I cannot test that, unfortunately.
A similar example has happened to me many times. Let’s say I am working with an array of 10 elements regularly. I am printing each item and in the end I print a value calculated in the loop. I realize I have a mistake in my computation algorithm and test with a large input to get a feeling of what might have happened wrong - thus I am only concerned about the final print and just ignore the preceding mess. But sure, you’re right that this is not the typical use case, @StefanKarpinski.
Yes, I have also tried this. This has been my initial inspiration of what might be happening in Julia.
Thanks for bringing this up. If you encounter any other issues, don’t hesitate to post.
I want to add, that on windows, this is deadly slow. I actually expected this (before I tried it).
The script from the first post takes 122 seconds (in
PowerShell did not seem faster (EDIT: 400s).
I also tried
wt: it took ages as well (EDIT: 280s)
I am not sure what terminal vscode uses by default, but it was definitely the slowest (EDIT: 1414s)
Notably I did run some of these in parallel.
I wonder if the size of the terminal (full screen, minimized, ‘normal’) has any impact…
Maybe someone with Windows could challenge/confirm my numbers?
VS Code uses a modified version of Xterm.js.
The size of the terminal window should have minimal impact. Either the terminal supports fast rendering of large texts regardless of window size or it doesn’t.
Does Python version run faster? If so, it is most likely the aforementioned buffering.
I have found that using this script produces the same output but in hundreds of milliseconds:
a =  for i in 1:1000000 push!(a, i) end println("Any[" * join(a, ", ") * "]")
Thus, it seems like an issue with printing arrays to me rather than IO in general. I am currently going through the code to try to narrow it down. Should I continue on GitHub or here?
I believe that the problem is that when manually joining numbers to a string then the content of the string is printed with single
write call in  whereas when printing a vector, you end up calling
write in  for each number from .
Yes. Python is actually blazing fast (about seven seconds (cmd) according to my watch (my Python know-how is near zero)).
The code you posted above is 8 seconds in Julia/cmd. So this is comparable to Python for me.
The only time I print tons of data to the REPL is by accident. When I am impatient, I just reach for good old Ctrl-C.
So if this can be fixed trivially, then that would be nice, otherwise I don’t think this is something that needs to be optimized.
So, as expected, when printing, for each object passed to Python
The flow is roughly
PyFile_WriteObject -> … ->
Step 4 returns the complete list converted to a string regardless of the type of objects in the list.
I do not know anything about Julia nor Python internals and proposing a fix is not in my capacity. I suppose that building the string in memory would raise other challenges. However, this problem would likely be solved by not
writeing for each element but rather building the entire string representation as in Python.
A different use of printing to stdout/stderr is to pipe output through
less (or similar pager) to be able to inspect the output interactively without having to do a full (long) run. E.g. grep for certain strings in the output to check a condition, then letting the program continue until the next matching line, etc. I find this very useful during development.
In this case you will also hit upon the slower output of Julia compared to Python. A simple (rough, due to the user keyboard interaction) test of printing one integer per line for 1 - 10,000,000 and piping through
less then using
q to jump to the end of the output to let the script finish:
paulm@cmstorm 09:38:/data/examples/julia$ cat print_integers.py for i in range(1, 10000001): print(i) paulm@cmstorm 09:38:/data/examples/julia$ cat print_integers.jl for i in 1 : 10000000 println(i) end paulm@cmstorm 09:38:/data/examples/julia$ time python print_integers.py | less <press ">" followed by "q" and wait for script to finish> real 0m3.474s user 0m3.723s sys 0m0.080s paulm@cmstorm 09:38:/data/examples/julia$ time julia -O3 print_integers.jl | less <press ">" followed by "q" and wait for script to finish> real 0m53.254s user 0m39.575s sys 0m47.758s
This is actually quite an eye-opener, to see the insane number of syscalls Julia does compared to Python:
paulm@cmstorm 09:43:/data/examples/julia$ strace -c -o out.python python print_integers.py | less <press ">" followed by "q" and wait for script to finish> paulm@cmstorm 09:43:/data/examples/julia$ strace -c -o out.julia julia -O3 print_integers.jl | less <press ">" followed by "q" and wait for script to finish> paulm@cmstorm 09:54:/data/examples/julia$ grep write out.python 18.68 0.001243 0 9626 write paulm@cmstorm 09:54:/data/examples/julia$ grep wait out.python paulm@cmstorm 09:54:/data/examples/julia$ grep write out.julia 41.49 49.752976 2 20000003 write 0.00 0.000010 0 13 pwrite64 paulm@cmstorm 09:55:/data/examples/julia$ grep wait out.julia 58.49 70.127743 1 40000001 epoll_pwait
println slower than print in julia, for same loop x XXX
in python you are using print, so it might be hard to compare directly!
if i join
a first i get x100 performance factor
What is your point?
println in Julia do the same?
That’s not very useful as that won’t work in general for a program where you have print’s (or log calls) scattered all over the place
well, yes, there must be sub-optimal in all case
I can’t really contribute to the discussion, just wanted to show this benchmark:
julia> using BenchmarkTools julia> @benchmark println($"1") # printing a ton BenchmarkTools.Trial: memory estimate: 144 bytes allocs estimate: 6 -------------- minimum time: 123.100 μs (0.00% GC) median time: 134.999 μs (0.00% GC) mean time: 233.607 μs (0.00% GC) maximum time: 10.268 ms (0.00% GC) -------------- samples: 10000 evals/sample: 1 julia> @benchmark print($"1\n") # printing a ton BenchmarkTools.Trial: memory estimate: 48 bytes allocs estimate: 2 -------------- minimum time: 69.100 μs (0.00% GC) median time: 76.000 μs (0.00% GC) mean time: 122.459 μs (0.00% GC) maximum time: 9.036 ms (0.00% GC) -------------- samples: 10000 evals/sample: 1
I get that too on Julia 1.5.1, but on my recent Julia master (6 days old) I get a regression:
julia> @time println("1") 1 0.000082 seconds (10 allocations: 240 bytes) what happens behind the scenes is (while with fewer allocations for some reason): julia> @time print(stdout, "1", '\n') 1 0.000053 seconds (8 allocations: 208 bytes) but even better is (for allocations, time is worse there but probably misleading when GC kicks in it should be better): julia> @time print(stdout, "1", "\n") # both should work fast, but I'm a bit surprised this isn't the slower version 1 0.000066 seconds (4 allocations: 96 bytes) both are worse than (almost ideal, not sure no allocations possible): julia> @time print("1\n") 1 0.000054 seconds (2 allocations: 48 bytes)
Well, yes, I already demonstrated joining the array, please read the thread.