I have heard a lot that string performance in Julia is bad. So is it possible to see performant string processing without sacrificing ease of use in Julia one day?
Is it possible to see performant string processing without sacrificing ease of use in Julia one day?
Depends on what you mean with “string processing”. If you don’t create new strings all over the place, string operations are quite fast already in my experience.
I’m curious what specifics you heard, or was it just general sentiment without examples?
Yes, you can (already). Strings (since variable-length) are generally stored on the heap, in all languages. Already faster (and likely as fast as possible when strings short enough):
You can use these strings where they apply. CSV.jl uses them by now, I believe by default, or at least DataFrames does when it imports CSV files, and since then got a lot faster. It seems the alternative:
might not use InlineString
s (from the docs seems uses regular String), still it claims “High performance” (would it use CSV.jl and whatever it provides?).
Right now you (or the package you use) needs to opt into using alternative string types. I had an idea of my own for a string type where the prefix of a string is stored with InlineString
plus a pointer to the whole string (that would by NULL for all shorter strings, avoiding the heap, only allocating therefor longer strings). I hope my idea, if I or someone else, ever gets implemented it would be merged into Julia to replace the standard String
type, so that you do not need to opt into my type.
[@Elrod mentioned recently a vector implementation in C++ (from LLVM), with that same idea, 3x faster than Julia Vector (or std:vector). It doesn’t seem like there’s any real reason it can’t be done in Julia, and while indexing is different, simpler, a lot is similar, and maybe part of the code could be shared for arrays and strings.].
This is something easy enough to implement at the package level in C++ (aside from the fact distributing and installing packages in C++ isn’t easy), but would need compiler support in Julia to be performant.
That said, there are PRs like: Move small arrays to the stack by pchintalapudi · Pull Request #43573 · JuliaLang/julia · GitHub
which would help a ton of existing code, especially once the Julia compiler starts making use of escape analysis.
Immutable arrays will help there as well.
Yes, I believe fast strings is possible one day for Julia. There is nothing in Julia’s design that prevents string being fast - on the contrary, I think the existing abstractions and interfaces of String
have been well designed with performance in mind. It’s just some important implementations that are lacking.
Is there anything in particular folks with interest in this space would suggest as a motivating example for benchmarking?
One benchmark would be Heng Li’s “biofast” benchmark, the fastq one. (biofast/fqcnt at master · lh3/biofast · GitHub) Although that is as much IO as strings.
But I agree it would be nice to find a good set of benchmarks for this to measure progress!
im just encouraging discussion . i am not very knowledgeable with it. just looking at some answers. though i wont lie that i heard some “complaints” from other julians