The optimal iteration order is the same as the storage order, i.e. rc[1, j][k] then rc[1, j][k+1], …, rc[2, j][1], …, rc[1, j+1][1], …
If you only want to generate one run at a time, you can simply append(rc.contents, run); push!(rc.colptr, length(rc.contents)+1);, and every second run you append will open a new pair of runs. These pairs are e.g. accessible by view(rc, :, j) or by (rc[1, j], rc[2, j]). They are not accessible by rc[:, i] because I did not implement getindex(rc::RunContainer, ::Colon, j).
Nobody prevents you from implementing e.g.
Base.@propagate_inbounds function Base.getindex(rc::RunContainer, k, i, j)
@boundscheck checkbounds(rc, i, j)
from = rc.colptr[2*(j-1) + i]
to = rc.colptr[2*(j-1) + i + 1]
@boundscheck checkbounds(from:to, k)
return rc.contents[from + k - 1]
end
such that the optimal memory order looks more julian (rc[k, i, j] is the same as rc[i, j][k]). This might confuse some readers of your code, though, because rc[k, i, j] is idiomatically used on <:AbstractArray{T, 3} where T instead of <:AbstractMatrix.
It is your choice what kind of API you want to use, as long as you provide whatever your later processing steps need.