Show(io, i::Int) allocates

This was unexpected, but show(io::IO, i::Int) does allocate.
It’s clearly happening because printing an integer first converts it to a string and then writes a string to io: julia/show.jl at v1.6.0 · JuliaLang/julia · GitHub

@btime print(io, 1)
# > 100.949 ns (2 allocations: 96 bytes)

Is there any rationale for doing it this way and not another way? Print number to IOBuffer and then convert that to a string seems a more straightforward way to implement it.

Performance of IO-based version should theoretically be superior, but it depends on what and how you measure, as usual :slight_smile:

My main question I guess is whether anyone ran into similar issues, and if you think there’s a room for improving standard integer serialization.

Maybe you can give a short example of the other way. Remember that "123" and writing 1, 2 and 3 has totally different representation:

julia> codeunits("123")
3-element Base.CodeUnits{UInt8, String}:
 0x31
 0x32
 0x33

I had something like this in mind (based on current dec() implementation (julia/intfuncs.jl at v1.6.0 · JuliaLang/julia · GitHub)

function dec(x::Unsigned, pad::Int, neg::Bool)
    n = neg + ndigits(x, pad=pad)
    io = IOBuffer(fill(UInt8(0), n); write = true, maxsize = n)
    dec_io(io, x, pad, neg)
    String(take!(io))
end

dec_io(io, x, pad, neg) = print(io, '1', '2', '3') # for x = 123

then you could reuse dec_io() for implementing show(io, i::Int) as well as more complex stuff (for example date serialization which is the actual problem I’m looking at)

P.S. This specific implementation will not be faster than current implementation at all, more for illustration purposes

1 Like

I am not entirely sure why would print(io, "123") ever produce anything different than print(io, '1', '2', '3'). Maybe I’m just clueless.

If you look at the current implementation of dec, it computes the digits from right to left, so you’d need a completely different algorithm to output the digits left-to-right into an io stream.

(One option would be to pre-allocate a per-thread buffer, which we used to do for printf and grisu but no longer do for some reason.)

1 Like

Sure, the algorithm will be different. You don’t need to allocate any buffers, just compute digits in reverse order.

My point is that converting int to a string is inefficient because it requires memory allocation. You could print the integer without allocating any memory.

In my perftests it was pretty hard to beat current implementation of show(io, int) when perftested in isolation, but when part of more complex show(io, date), allocation-free version does much better.

The nice thing about computing digits from right to left is that it is pretty easy to come up with an algorithm that works for any precision simply by a sequence of divrem(n, 10) operations (actually Julia uses divrem(n, 100) to get 2 digits at a time), whereas from right-to-left it seems trickier to do efficiently.

Note also that we similarly need a buffer for float-to-string conversion, since the Ryu algorithm that we employ does not compute digits from left to right.

It seems like the simplest solution would be to pre-allocate a buffer array (per thread). Then the show method could output bytes directly from the buffer rather than constructing a String.

2 Likes

Sure, that’s more versatile and I am guessing smaller change too I’m also happy.

Should I create an issue in Julia GitHub for this?

Sure, but maybe first try to put together a benchmark demonstrating the benefit of a pre-allocated buffer (you could just hack an alternative show by copy-and-pasting the Base code, and not worrying about thread safety).

4 Likes