I would start looking into why so many allocations: [Likely misunderstanding edited out]
just String allocation dominating the scene
No not really, yes too many, but allocations aren’t really a problem in Julia, should be as costly as in Python (for each one; assuming you can get as few), it’s just that deallocations (RC) is better in Python, especially for strings, and Julia could do the same or something even better, skipping all allocations:
Mutable strings are better when you want to replace (i.e. in-place); as many bytes, e.g. r"fox" => s"bus", but can actually also be better when not or if e.g. as many letters, but not same length because UTF-8 variable-length.
What Julia needs is owned objects like in Pony language, then mutable strings are real good. Then you could change them to be immutable frozen, immutable standard strings, since they are also in some ways better (quoting from AI):
This immutability offers several advantages, including:
- Performance optimizations:
Immutable strings can be efficiently shared and cached, as their content is guaranteed not to change.
Multiple threads can safely access and use the same string without worrying about concurrent modifications.
The value of a string variable will not unexpectedly change due to actions elsewhere in the program.
Working with mutable string-like data:
While String
itself is immutable, there are ways to achieve mutable string-like behavior in Julia for specific use cases:
- Using
Vector{UInt8}
: For low-level manipulation, you can store string data as a Vector{UInt8}
(byte array). This allows direct modification of individual bytes. However, you lose the convenience of built-in string methods and need to handle encoding manually.
Code
data = UInt8['H', 'e', 'l', 'l', 'o'] # Modify data data[1] = UInt8('J') # Convert back to string (if needed) mutable_string = String(data)
- Using the
MutableStrings.jl
package: If you require mutable string types with familiar string methods for large-scale text processing, the MutableStrings.jl
package provides MutableASCIIString
and MutableUTF8String
types. This package allows in-place modification of string data.
I believe MutableStrings package was just a hallucination…
though there is Str.jl. [I was using WeakRefStrings.jl [that moast shouldn’t be using, unless really knowing what do to] to test/look into it more, though really only InlineStrings.jl it reexports, which is safe.]