Usually, you have to opt-out suboptimal code with many Any
s to make your code faster. The problem here is unique because you have to opt-in suboptimal code.
I think automatically degrading to dynamic code is the basic property of Julia. That’s how you mock up things fast and then improve the hot-spot later. This property is broken when you combine loop and function-barrier but TCO can rescue it.
I am also not sure if this is worth the effort. As I said in the OP, if this kind of optimization can happen in a naive loop without function-barrier, that would be much better. Having said that, I am curious to know if I managed to increase the priority of TCO (even slightly).