One problem I see is that only the second indexing expression is guaranteed to be inbounds:
I would remove the @inbounds. That probably won’t help your performance though.
Loopvectorization.jl probably doesn’t like the jumping around in memory, vectorization is probably hard to achieve here.