String optimisation in Julia

Came across this benchmark that tries to create large string and save to file.

The difference between Julia and Java is very high. (I’m assuming Java version the string is being saved and reused without allocating?). Does Julia have that optimization?

1 Like

The IOBuffer in the Julia code is created without sizehint:

With, say,

buf = IOBuffer(sizehint = num*(ndigits(num) + 3))

it runs faster. In the rather small examples I tried it took 1/3 40% less time.

EDIT: For Python at least there is nothing like sizehint, either. I don’t know if adding it defies the purpose of the benchmark.

4 Likes

1/3rd less time still isn’t great.

Also changing it to

for i in 1:num
    print(buf, " J ")
    print(buf, i)
end

makes it 38% faster.

5 Likes

And another 40% from

for i in 1:num
    write(buf, " J ")
    write(buf, string(i))
end
7 Likes

I’m surprised to see that

print(buf, " J ")
print(buf, i)

performs differently to

print(buf, " J ", i)

and that

write(buf, string(i))

is better than

print(buf, i)
4 Likes

I moved J above the loop, and it sped up a little.

and maybe also surprisingly considering the context I tested (with sizehint on the buffer)

write(buf, " J ", string(i))

and it performs as

write(buf, " J ")
write(buf, string(i))

seems like the automatic conversion is slow somehow, and slower with more than one argument?

Vararg functions are often type-unstable. For example,

buf = IOBuffer()
@code_warntype print(buf, " J ", 1)

prints out warnings for type instabilities. This doesn’t happen with

buf = IOBuffer()
@code_warntype print(buf, " J ")
@code_warntype print(buf, 1)

Is there some Julia package which adopts an API similar to C++'s iostream, which is type stable when printing more than one argument?

3 Likes

This is because for loops over a heterogeneous tuple are type-unstable. As discussed in For statement type instability - #7 by stevengj, however, it’s possible to fix this by unrolling, and should be a simple 1-line patch to change the print definition to something like:

function myprint(io::IO, xs...)
    lock(io)
    try
        foreach(xs) do x
            print(io, x)
        end
    finally
        unlock(io)
    end
    return nothing
end

which should be type-stable and hopefully should perform better. (Similarly for Base.print_to_string and Base.string_with_env.) Anyone want to submit a PR?

As for write(io, string(n)) vs. print(io, n), I’m not sure what is going on (if there is really a difference?), because print(io, n) calls show(io, n) which calls write(io, string(n)) already?

(However, the fact that writing an integer to a stream requires allocation of a string, for string(n), is certainly hurting us in writing to an IOBuffer where writes are fast. At least in the IOBuffer case, we should in principle be able to use a view into the IOBuffer itself as the necessary buffer. The downside of this would be needing a long list of specialized IOBuffer methods for show, unless we do some clever refactoring.)

6 Likes
8 Likes

This is about 10x faster than the original for me:

function do_thing(num)
    s = Vector{UInt8}(undef, num*(ndigits(num) + 3))
    digits = UInt8[0x30]
    p = 1
    for i in 1:num
        s[p] = UInt8(' ')
        s[p+1] = UInt8('J')
        s[p+2] = UInt8(' ')
        p += 3
        n = 0
        while true
            digits[end-n] += 0x01
            if digits[end-n] == 0x3A
                digits[end-n] = 0x30
                n += 1
                if length(digits) == n
                    pushfirst!(digits, 0x31)
                    break
                end
            else
                break
            end
        end
        copyto!(s, p, digits, 1, length(digits))
        p += length(digits)
    end
    resize!(s, p-1)
    s
end
1 Like

Intriguingly all the languages fail at this eventually, Rust is the last to fail, and Julia and Perl otherwise survive the longest, and Perl and Python can be faster than Julia from the start…

Java and C# are however the first to “Failed to save”, then C++ and Nim. Then C also before Julia, so is this an important benchmark at any iteration count?

Also is this improved in later (nightly) Julia? Or at least has a good workaround in any Julia?

1 Like

The tests were performed on a machine with only 16 GBytes RAM. So I’m not sure what is being tested for the larger cases. Use of paging by Windows?

why does printing an integer require string conversion? This can be improved.

1 Like

I also wonder if small string optimization (SSO) could help by allowing " J " to be fully stack allocated with no need for a pointer lookup.

2 Likes

Yes, it can, is on my TODO list… (a new string type that only live on the stack when small; there’s already a package for such, without my idea of a pointer for a to the heap, only used when longer), but until then not using Strings rather Chars should help:

print(buf, ' ', 'J', ' ', 1)`
1 Like

Will small-string-optimization enable const-folding, i.e compile-time computation of strings?

Strings are already be somewhat consant foldable (although the compiler understandably has some difficulty reasoning about some of the very illegal things we do in Base to construct strings)

Some of the basic testing I’ve done around a potential SSO (latest: Slack) has me hopeful about the potential benefits in replacing the internal structure of String with a short/long form (under the hood). Actually doing so is currently beyond me, but we’ll see what happens :slightly_smiling_face:

1 Like