Fastest way to print many numbers to text file

PeterB · July 30, 2021, 1:22am

EDIT: I found one speedup myself (see my answer below) but still interested in replies…

I have a critical loop which looks like this:

for j in 1:about_1000
    x = a_simple_calculation()
    @printf(op, " %g", x)   # this is line 241
end
@printf(op, "\n")

All in all this loop runs about 12 million times, and takes about 27 seconds total, i.e. about 2 microseconds per loop. The profiler confirms that the slowest part is the @printf line inside the loop, not the calculation of x. The profiler’s output for this part was:

10007 rfdata_core.jl:241; #rfdata_core#7(::Int64, ::Function, ::Array{Float64,1}, ::Bool, ::Array{Int64,1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Int64, ::...
 [many lines later...]
 6408 ./tuple.jl:60; indexed_iterate(::Tuple{Int32,Int32,Bool}, ::Int64, ::Int64)
  3    ./int.jl:53; +
  2105 ./tuple.jl:60; indexed_iterate
   3 ./int.jl:53; +

I also tried building a string (basically replacing the first @printf with the appropriate @sprintf command, and writing a line at a time), and that was actually slightly slower. So that suggests the problem is not the print buffering itself.

So… is there a fastest way to print (or make strings) in Julia? Or some other obvious improvement to this code?

PeterB · July 30, 2021, 1:36am

Using join() improved it by about a factor of 5 (down to about 5 seconds from 27) but I am still interested in speeding it more if there is something obvious. I did this:

x = zeros(about_1000)
for j in 1:about_1000
    x[j] = a_simple_calculation()
end
s = join(x, " ")
@printf(op, " %s\n", s)

p.s. It was not character-for-character the same, because join() always shows floats as floats where @printf with “%g” does not (e.g. if the data is 0, join prints “0.0” but @printf with %g prints “0”), but the data appears to be numerically identical, which is what matters.

dpsanders · July 30, 2021, 1:39am

What happens if you just use print instead of printf? Also, where are you printing to?

You could try using IOBuffer.

PeterB · July 30, 2021, 2:11am

On my second example (using join and then printing a line or part of a line at a time), print is comparable in speed to @printf. In fact looking at the profiler, the join takes far longer (about 30 times as long) than either print or @printf. So the critical path is creating the string, not printing it.

So my question now becomes “What is the fastest way to put many numbers into a string?”, and the current winner is join, but of course I am greedy so I would like to know if there might be something better…

PeterB · July 30, 2021, 2:42am

Sorry I missed one of your questions. I am writing to a file, which is read by a 3rd party Python module. (And then immediately deleted, so it is a temporary file).

I have never used IOBuffer before, as far as I can tell it is a file-like alternative to a temporary file? But I am not sure it would help in this case, without a major rewrite.

Sukera · July 30, 2021, 4:20am

An IOBuffer is an in-memory buffer/replacement for IO, so you can write to it like printing to stdout. The advantage is that you don’t hit the disk but stay in memory, so writing to it is usually faster than writing to disk.

Once you’re done with writing your results to it, you can consume all buffered data at once and write it out to disk in one swoop with e.g. write(stdout, take!(io_buf)).

Oscar_Smith · July 30, 2021, 4:22am

Are you sure you want to put numbers into a string? a binary file format (or just direct memory transfer) will be much faster.

PeterB · July 30, 2021, 5:08am

Yes I’m sure . The use of join() means the write-out-the-numbers part of the code now takes about 14% of the total run time, compared to about 64% before. I’m sure I could get that 14% down by doing a direct memory transfer (using JuliaPy I guess) or even rewriting the Python 3rd party module in Julia, but I’m up against diminishing returns now.

Sukera · July 30, 2021, 9:18am

I’d like to add here that constructing the String may not be necessary at all, when printing to an IOBuffer and writing its entire content to disk afterwards. Strings in julia are immutable, so joining them requires creating a new one. Using an IOBuffer should avoid all of that and it’ll still have the same representation as if you had printed it to some file or such.

rafael.guerra · July 30, 2021, 10:26am

@sukera, is this as per your advice and is it the “most efficient” way to print formatted numbers to ASCII file, or better use some package?

using Printf
# some example
io_buf = IOBuffer()
x = zeros(1000)
for j in 1:1000
    x[j] = j^2/π
    @printf(io_buf, "%10.3f\n", x[j])
end
write("out_x.txt", take!(io_buf))

Sukera · July 30, 2021, 11:02am

Yeah that looks good to me. Of course, only benchmarking can be the final judge for this, but from my intuition this should be pretty much ideal (apart from memory mapping the binary data instead of printing & parsing).

This setup obviously won’t work in a streaming setting, since you’d only write to disk/out of memory at the end, but for this usecase of creating a file it should be fine. It’ll also break if your data is too large to fit in memory, but at that point you can’t avoid hitting disk anyway (batching and partial writing to a known byte offset in a file could be done as well, though once you’re in that territory, it’s really just because you have way too much data to communicate).

stevengj · July 30, 2021, 11:58am

The CSV.jl package seems to be at least twice as fast on my machine. Part of the reason is perhaps that @printf still does heap allocations, while CSV’s inner loops work hard to be allocation-free.

Unfortunately, CSV.jl doesn’t implement a method that works on simple arrays for some reason (CSV.jl#861), so you have to wrap the array in “table” (which itself requires a matrix (Tables.jl#243), not a vector, so you have to do a reshape):

CSV.write("foo.dat", CSV.Tables.table(reshape(x, :, 1)))

nilshg · July 30, 2021, 12:11pm

Couldn’t it just be a NamedTuple, CSV.write("foo.dat", (x = x,))?

Sukera · July 30, 2021, 12:11pm

Yes, under the assumption that it’s possible to keep the full matrix in memory for CSV.jl to work in, I’m not surprised it’s faster! As far as I remember, CSV.jl already does batching to some extent. That’s not what the original OP asked about though, in that post there was a limitation about doing some calculation in each loop iteration.

If keeping all results at the same time in memory is an option, using CSV.jl or similar is going to be faster (though I admit that it’s kind of disappointing to see heap allocations from @sprintf…)

rafael.guerra · July 30, 2021, 12:23pm

Well, the CSV alternative does not seem to format the numbers into the text file. In that case, what is the advantage of CSV versus simpler:

print(io_buf, x)
write("IOBuffer_out_x.txt", take!(io_buf))

PeterB · July 30, 2021, 12:23pm

Yes, that does the write about another 50% faster (i.e. it takes about 2/3 as long as using join() and writing a line at a time). I wasn’t going to do any more optimisation because it was already fast enough (see my comment 7), but that was a very easy change to do. Thank you!

I might look at CSV.jl later, but it’s getting late here.

DNF · July 30, 2021, 8:46pm

What’s the purpose of the array here?

rafael.guerra · July 30, 2021, 9:02pm

Seeing you coming, said to myself: is there a collect somewhere, again?.. Uff, no…

It is just an example in line with OP’s problem. The key point is that Printf does not broadcast to arrays and a loop is required.

DNF · July 30, 2021, 9:51pm

I wouldn’t necessarily bother but since it said

I wondered why it’s there. Seemed like a mistake.

Topic		Replies	Views
@printf strange performance behavior Performance question , formatting	1	351	February 8, 2022
String optimisation in Julia General Usage performance , strings , io	21	606	September 21, 2024
Julia slower than Python to sort and reverse a list of integers Performance	40	2580	April 28, 2023
Why is printing to a terminal slow? Performance	28	5354	November 24, 2021
Writing array to file with format New to Julia io , formatting	11	3108	March 16, 2023

Fastest way to print many numbers to text file

Related topics