Can you give some concrete examples of that?
I’ve found that for the sorts of things I’ve been doing, where I jump between using a Ptr{UInt64}
and a Ptr{UInt8}
/Ptr{UInt16}
/Ptr{UInt32}
, in order to perform operations on multiple characters at once,
that trying to use Julia arrays is simply much too slow.
In the future, I’d like to make it so that I can use SIMD instructions, to do that on up to 64 bytes (AVX/512) at a time (i.e. 64/32/16 code units at a time).
The current Julia SIMD support does not handle loops where you have to stop early, or conditionally execute some code depending on what is found when checking the chunk (for example, when scanning UTF-8, and you find a non-ASCII character in a chunk that needs to be handled, or a surrogate character when scanning UTF-16).
I’m not sure how Julia (or LLVM) could be made smart enough to handle those sorts of loops.