Would it be interesting to optimize print(io, xs...) to be type stable?

I noticed that the varargs version of print/println is not type stable and it affects the performance. Some test code below. It’s interesting that map, foreach, and fold are implemented in a type stable way, so… println could probably just be a foreach call inside a lock.

The custom show methods aren’t directly related. I discovered those were allocating (found an issue about it) and got sidetracked, and didn’t want that to affect these timings.

using BenchmarkTools

function testPrint()
    io = Base.devnull
    f3 = x -> print(io, x)
    f4 = x -> print(io, x)
    tup = MyTuple((:d, MyInt(7)))
    n1 = MyInt(1)
    n2 = MyInt(18)

    println("#### print")
    display(@benchmark print($io, "a", $n1, :c, $n2, $tup))
    # median ≈ 213 ns

    println("\n#### testVargs1:")
    display(@benchmark testVargs1($io, "a", $n1, :c, $n2, $tup))
    # median ≈ 190 ns : Same as print, but doesn't lock so a touch faster.

    println("\n#### testVargs2:")
    display(@benchmark testVargs2($io, "a", $n1, :c, $n2, $tup))
    # median ≈ 99 ns

    println("\n#### foreach:")
    display(@benchmark foreach($f3, $("a", n1, :c, n2, tup)))
    println("\n#### map:")
    display(@benchmark map($f4, $("a", n1, :c, n2, tup)))

    # println("#### print 1 arg")
    # display(@benchmark print($io, "a"))

function testVargs1(io, args...)
    for x in args
        print(io, x)

@inline testVargs2(io, arg) = print(io, arg)
@inline function testVargs2(io, arg1, args...)
    print(io, arg1)
    @inline testVargs2(io, args...)

struct MyInt
struct MyTuple{T}
@inline Base.show(io::IO, n::MyInt) = show(io, Char(0x30 + n.x % 10))
testMyInt(io) = show(io, MyInt(15))
@inline function Base.show(io::IO, x::MyTuple)
    # This only handles 2 tuples, but that's enough for this test
    print(io, '(')
    print(io, first(x.x))
    print(io, ',')
    print(io, last(x.x))
    print(io, ')')
testMyTuple(io) = show(io, MyTuple((:a, MyInt(15))))

In my opinion, probably not. In cases where performance matters, you can print one argument at a time and make everything inferrable. The problem with making it type-stable is excess specialization and longer compile times. The latency may be less of an argument now that we can cache native code, of course. But you can’t get away from larger memory requirements: if you have k types, then print statements with N arguments could require up to k^N specializations. If these were all realized, it doesn’t take very large k and N to eat all the RAM in typical computers. That’s a terrible price to pay for a bit better performance, when there are other ways you can get the same performance benefit.

You can experiment with “what life would be like with an inferrable vararg print” by defining

myprint(io::IO, arg) = print(io, arg)
@inline myprint(io::IO, arg1, arg2...) = (myprint(io, arg1); myprint(io, arg2...))

If you find yourself liking it, feel free to use it in your own code. But before you get too excited, do make sure you test something like myprint(stdout, [rand(Char) for _ = 1:1000]...) (though there are tricks for addressing that, search Julia’s base/ for Any16).


Thank you for the reply. That could be included in application code and only used in places that need the extra performance based on profiling.