I’m curious about this as well.
FWIW, you don’t need -O3
, all you need is -instcombine
to get the fdiv fast
. So it seems to have been a change with the instcombine
pass between versions.
I also recently encountered an issue with fdiv and instcombine that started with LLVM 9 where the instcombine moves an fdiv
inside a loop, dramatically worsening performance.
My issue on LLVM was closed because that was intended behavior; you’re supposed to place a licm
at some point after the last instcombine
to move the division back out of the loop.
Maybe you could file this as an instcombine issue with LLVM. It seems reasonably likely they’re connected (instcombine getting more aggressive with divisions from LLVM 8 to 9), but your example seems harder to close as a Julia issue / shows up with the default -O3
optimization pipeline.