But I’m fairly sure the result of copy!(xtmp, view(x, inds)) is an Array{Float64, 1} contiguous, since the result is a plain array instead of a SubArray.
julia> copy!(xtmp, view(x, inds))
800000-element Array{Float64,1}:
julia> view(x, inds)
800000-element SubArray{Float64,1,Array{Float64,1},Tuple{Array{Int64,1}},false}:
I know that the views themselves involve wild pointer-jumping which prevents SIMD and BLAS optimizations, so that’s why the below code is even slower.
julia> @time sum(view(A, :, inds) * view(x, inds))
0.602032 seconds (41 allocations: 1.391 KiB)
12203.768937947227
It is interesting to see the amount of time needed change so drastically when the indices are sorted, though. Is there some reason that the allocating version doesn’t speed up as much as the view version when the indices are sorted? I would imagine that getindex(…) would also suffer from pointer-jumping problems in the first case and benefit from sorted indices in the second.
Thanks for the in-depth analysis by the way!