A quick test shows a 15% decrease in mean run time for the non-contiguous case, which seems huge. Why would this happen?
using Random,BenchmarkTools,LinearAlgebra
rng = MersenneTwister(1234);
# data
a = rand(rng,1000,10000)
ind = [mod(i,2)+1 for i in 1:100]
ind = [findall(x->x==i,ind) for i in 1:2]
a_shuffle = [a[:,ind[1]] a[:,ind[2]]]
function redparts(x,i1,i2)
dot(sum(x[:,i1],dims=2),sum(x[:,i2],dims=2))
end
non-contiguous indices
julia> @benchmark redparts(a,ind[1],ind[2])
BenchmarkTools.Trial:
memory estimate: 797.45 KiB
allocs estimate: 15
--------------
minimum time: 58.690 μs (0.00% GC)
median time: 64.291 μs (0.00% GC)
mean time: 86.662 μs (22.37% GC)
maximum time: 2.861 ms (96.42% GC)
--------------
samples: 10000
evals/sample: 1
contiguous indices
julia> @benchmark redparts(a_shuffle,1:50,51:100)
BenchmarkTools.Trial:
memory estimate: 797.45 KiB
allocs estimate: 15
--------------
minimum time: 56.980 μs (0.00% GC)
median time: 62.941 μs (0.00% GC)
mean time: 101.624 μs (22.53% GC)
maximum time: 3.916 ms (97.64% GC)
--------------
samples: 10000
evals/sample: 1