ANN: StringBuilders.jl

StringBuilders.jl is a very small package that provides a convenient API for building up strings, similar to what you have in .Net and lots of other languages.

This package is mainly an API exploration. You can get more or less the same functionality by creating an IOBuffer, writing to that and at the end taking the buffer and converting it into a String. But at least I can never remember all the details of that for more than one day, and then have to google things. So, this is an attempt to provide a more streamlined API for the same use case. Any feedback would be welcome!

I don’t think I will do anything beyond what is there with this package. But it might be interesting to play around with different implementations of a StringBuilder and benchmark them against each other. This article has an interesting discussion how the implementation in .Net was changed over time. If anyone wants to experiment with those kind of things, please open an issue over in the package so that we can briefly discuss a good strategy for inclusion of experiments like that.

9 Likes

One more thing: if someone has an opinion about https://github.com/davidanthoff/StringBuilders.jl/issues/2, please let me know! It is not clear to me what the right function is for adding stuff to the string. Right now I use append!, but I’m not clear whether something like print or write or something else would be more appropriate.

I was thinking about a mutable AbstractString type along the lines of uio.h: struct iovec. It would be like a lazy string builder. Rather than allocating new strings, it would just keep a list of the strings that were appended. A new concatenated string would be created by doing convert(String, ...). Or, if the string was written to an IO stream, the new string would never need to be created. The write function could just write each collected item in turn.

(My use case is splicing changes into JSON texts)

struct SplicedString
    v::Vector{AbstractString}
end

append! just adds to the list:

append!(s::SplicedString, x::AbstractString) = push!(s.v, x)

splice-ing a String creates a SplicedString using SubStrings of the original string:

splice(s::AbstractString, i, x::AbstractString) =
    SplicedString([SubString(s, 1, i), x, SubString(s, i+1)])

splice! would figure out which item in s.v contains index i, then splice x into that item:

splice!(s::SplicedString, i, x) = ...
write(io::IO, s::SplicedString) = write(io, s.v...)

That is a bit similar to the new .Net implementation, but uses a vector instead of their linked list like thing, and I think they also combine very short strings right away into one buffer.

My guess is that each of these might be really good for a specific use case?

This sounds much like the RopeString type that was removed from base (and put into LegacyStrings).
(I like the idea, personally)

I’ve started fleshing out the SplicedString thing a bit.

I’ve done the indexing using sparse 64-bit indexes (fragment_number << 40 | index) which means that next,nextind, prevind etc are all constant time. I haven’t done any benchmarks yet, but so far it feels like it should be fairly efficient.

3 Likes