I’ve never heard that! Can you point to some documentation of that (which Base violates, BTW)?
I’ve been doing this for about 25 years with different platforms vector/SIMD instructions.
It’s not that you want a single SIMD instruction to somehow exit without processing all of the parallel operations, it’s that after operating a SIMD instruction, you can check to see if you need to conditionally perform some other operation, and continue with the next chunk, or exit the loop completely.
For example, for counting newline characters in an ASCII, Latin1, or UTF-8 string, you can use one of the SIMD instructions to compare 16/32/64 bytes in parallel, and count the matches, and continue the loop (possibly unrolled if you are dealing with very large strings). You can also use the SIMD instructions to quickly see if a string is all ASCII, for example. That’s the sort of “early-exit” I’m talking about.
If you need to return the offset of the first non-ASCII character in the string, a little more work is required, but it can still be done using the faster SIMD instructions. (SIMD isn’t just to speed up floating point operations!)
I’ve done that sort of thing manually, working 64-bits at a time, in my string package, and achieved very nice speedups compared to String
, but I’d like to learn how to generate SIMD instructions from Julia if possible, using LLVM IR, so I don’t have to write an assembly language library for each processor to speed things up.
Any help in that direction would be greatly appreciated!