Replace char vs string

julia> @btime replace(" a "^1000, ' ' => "a")
  120.199 μs (7 allocations: 6.94 KiB)
"..."

julia> @btime replace(" a "^1000, " " => "a")
  193.500 μs (7 allocations: 6.94 KiB)
"..."

julia> @btime replace(" a "^1000, ' ' => 'a')
  142.000 μs (6 allocations: 6.91 KiB)
"..."

julia> @btime replace(" a "^1000, " " => 'a')
  206.600 μs (7 allocations: 6.94 KiB)
"..."

Is this behavior documented somewhere? Apparently, for the most performant replace, old => new pair should be Char => String.

2 Likes

I do not remember seeing this before. Very interesting. One could expect that Char => Char was the most performant. But at least the Char => ... are the better than String => ... what makes complete sense.

This makes sense, because strings in Julia are UTF-8 encoded, whereas chars are always 16 32 bits, which means that a char can have different lengths when inserted into a string. If a char is to be replaced, this can be still efficient, because replace has to look ahead at most 4 bytes, whereas when a string is on the left hand side, it has to look ahead the entire length of that string, so this extra logic required probably causes replacement of strings to be a bit slower. When inserting a char into a string though, it first has to calculate how many characters that char occupies as UTF-8, whereas for a string this is just the byte length, which probably makes the replacement by a string a bit faster here. For more details on how exactly this is implemented, you would have to look at it’s source code and it’s not impossible, that there is still room for some microoptimization here.

4 Likes

(Char is 32 bits.)

Yes, the write(io, ::Char) method could probably use some micro-optimization.

3 Likes