Super slow string performance

The main difference is that strings are UTF-8-encoded, which is a variable-length encoding. This means that length of a string is actually an O(n) operation. Since you’re calling length twice in each loop iteration, that means your code is suddenly O(n^2) instead of O(n).

From the way you are indexing the strings, you also seem to assume that all characters you are reading in are ASCII, since you always increase your offset by 1. This will error if the input contains any non-ASCII characters, typically you’ll want to use nextind or just a for loop over eachindex(::String) instead.

Julia does allow you to index strings by single-byte code units, which might be what you want. You can use codeunit(s, i) to get the ith code unit and codeunits(s) to get the total number of code units. Those will both be O(1) operations.

6 Likes