This was unexpected, but show(io::IO, i::Int) does allocate.
It’s clearly happening because printing an integer first converts it to a string and then writes a string to io: julia/show.jl at v1.6.0 · JuliaLang/julia · GitHub
Is there any rationale for doing it this way and not another way? Print number to IOBuffer and then convert that to a string seems a more straightforward way to implement it.
Performance of IO-based version should theoretically be superior, but it depends on what and how you measure, as usual
My main question I guess is whether anyone ran into similar issues, and if you think there’s a room for improving standard integer serialization.
function dec(x::Unsigned, pad::Int, neg::Bool)
n = neg + ndigits(x, pad=pad)
io = IOBuffer(fill(UInt8(0), n); write = true, maxsize = n)
dec_io(io, x, pad, neg)
String(take!(io))
end
dec_io(io, x, pad, neg) = print(io, '1', '2', '3') # for x = 123
then you could reuse dec_io() for implementing show(io, i::Int) as well as more complex stuff (for example date serialization which is the actual problem I’m looking at)
P.S. This specific implementation will not be faster than current implementation at all, more for illustration purposes
If you look at the current implementation of dec, it computes the digits from right to left, so you’d need a completely different algorithm to output the digits left-to-right into an io stream.
(One option would be to pre-allocate a per-thread buffer, which we used to do for printf and grisu but no longer do for some reason.)
Sure, the algorithm will be different. You don’t need to allocate any buffers, just compute digits in reverse order.
My point is that converting int to a string is inefficient because it requires memory allocation. You could print the integer without allocating any memory.
In my perftests it was pretty hard to beat current implementation of show(io, int) when perftested in isolation, but when part of more complex show(io, date), allocation-free version does much better.
The nice thing about computing digits from right to left is that it is pretty easy to come up with an algorithm that works for any precision simply by a sequence of divrem(n, 10) operations (actually Julia uses divrem(n, 100) to get 2 digits at a time), whereas from right-to-left it seems trickier to do efficiently.
Note also that we similarly need a buffer for float-to-string conversion, since the Ryu algorithm that we employ does not compute digits from left to right.
It seems like the simplest solution would be to pre-allocate a buffer array (per thread). Then the show method could output bytes directly from the buffer rather than constructing a String.
Sure, but maybe first try to put together a benchmark demonstrating the benefit of a pre-allocated buffer (you could just hack an alternative show by copy-and-pasting the Base code, and not worrying about thread safety).