Performance of methods to iterate through array

I stumbled across this post on the Discourse with a very interesting performance benchmark that goes over a series of methods of iterating through an array:

Methods (and results without inbounds): How much can hard-coded indexing be avoided in Julia? - #30 by amrods

Results (with and without inbounds): How much can hard-coded indexing be avoided in Julia? - #32 by amrods

Does anyone know why there’s such a difference in the benchmarks? My basic intuition would be that going through the array with the 3 ranges in the for loop (going down columns) (AKA option B) would be the fastest as it is the most basic, yet using CartesianIndices (AKA option E) is faster, even with an extra variable assignment to break apart the tuple to use in the functions.

More generally, is there a known ranking of performance for ways to go through an array? The post goes over basic ranges (e.g. 1:n), pairs, and CartesianIndices, and depending on the circumstances (do I need the exact indices, can I just work linearly through the whole array from beginning to end or am I operating on a subset of the elements, do I need the element) you have things like eachindex, LinearIndices, and axes, maybe even more I don’t know about, so there’s quite a selection with different potential performance per the above Discourse posts.