[ANN] LoopVectorization

I am surprised that llvm left such optimization on the table, though for most (all?) of it, it again seems that LLVM does know what’s the offsets are but didn’t manage to emit the optimal code to generate it. It can certainly be added to the cases were better codgen is needed but I don’t think this info has ever be hidden from LLVM IR.

I like to do it when it matters for me ;-p… So yes when I use AVX512 capable hardware as my main computer.

As far as my concern, you are very welcome to submit that patch to be included in our LLVM build. I basically did that for all of the few LLVM performance related issues that I’ve fixed/experienced, and ususally before they are merged upstream. There’s no need to wait for a new LLVM release to be used.


And I’ll also add that doing these pattern matching with LLVM IR is not hard. In fact I always found it to be easier than working with julia IR. The switch to SSA for julia IR have cerntainly helpped (though that’s not the level this package uses) but LLVM still have WAY more utilities to do all of these analysis.

3 Likes