How I learned to stop worrying about being fastest and love microbenchmarks

If you believe that the developers of the ecosystem are users of Julia then we agree. I only meant some users - the developers of the algorithms.

It depends on what their goal is. If it’s to publish a proof of concept algorithm, I agree. But if it’s to produce something that other users want to use, then it would need to have some benefits over the alternatives. Performance is a common one. And I’ve used Julia enough to have reached the conclusion that you do need to be well versed in the language to write performant code.

2 Likes

Having looked more into it, I’m pretty sure it’s not possible to express the x86 intrinsic vpshufb using LLVM. There are two important aspects to vpshufb which is not preserved in LLVM’s shufflevector:

  • The mask vector for shufflevector must be a constant, whereas in my algorithm I need it to be a runtime value (in my algorithm, the mask is the input value and the “arguments” are compile time constants)
  • In vpshufb, if the top bit of a byte in the mask is set, the resulting output element will be zero. This is not the semantics of shufflevector, but that property is needed for a subset of the functionality in ScanByte.jl.
1 Like

That’s a pity :frowning:

It seems I was too optimistic on that, sorry.

Unfortunate that llvm doesn’t offer the language to describe your algorithm in a way that could also compile on e.g. arm.