A quick test shows a 15% decrease in mean run time for the non-contiguous case, which seems huge. Why would this happen?
using Random,BenchmarkTools,LinearAlgebra
rng = MersenneTwister(1234);
# data
a = rand(rng,1000,10000)
ind = [mod(i,2)+1 for i in 1:100]
ind = [findall(x->x==i,ind) for i in 1:2]
a_shuffle = [a[:,ind[1]] a[:,ind[2]]]
function redparts(x,i1,i2)
dot(sum(x[:,i1],dims=2),sum(x[:,i2],dims=2))
end
Memory latency (it’s expensive to fetch an arbitrary element from memory, but roughly free to get the next one)
Indirection (not only do you need to fetch the element from the array in question, you also need to fetch the index itself and figure out where to go. When you use a range like 1:50 Julia can compute the locations directly instead of looking it up).
Note, though, that most of your time is spent copying the data into contiguous arrays — indexing like x[:,i1] makes a copy. This is a great place for views… and you can very clearly see the difference between latency (not much here — since it’s already column-major) and indirection (using 2:2:100 instead of ind[1] is not only simpler, it’s significantly faster).
julia> @benchmark redparts($a,$ind[1],$ind[2])
BenchmarkTools.Trial:
memory estimate: 797.44 KiB
allocs estimate: 14
--------------
minimum time: 101.507 μs (0.00% GC)
median time: 421.714 μs (0.00% GC)
mean time: 439.045 μs (12.93% GC)
maximum time: 4.833 ms (90.74% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark redparts($a_shuffle,1:50,51:100)
BenchmarkTools.Trial:
memory estimate: 797.44 KiB
allocs estimate: 14
--------------
minimum time: 105.827 μs (0.00% GC)
median time: 426.267 μs (0.00% GC)
mean time: 444.088 μs (13.03% GC)
maximum time: 3.946 ms (88.15% GC)
--------------
samples: 10000
evals/sample: 1
julia> function redparts_view(x,i1,i2)
@views dot(sum(x[:,i1],dims=2),sum(x[:,i2],dims=2))
end
redparts_view (generic function with 1 method)
julia> @benchmark redparts_view($a,$ind[1],$ind[2])
BenchmarkTools.Trial:
memory estimate: 16.03 KiB
allocs estimate: 10
--------------
minimum time: 58.979 μs (0.00% GC)
median time: 70.947 μs (0.00% GC)
mean time: 72.124 μs (1.62% GC)
maximum time: 5.367 ms (98.58% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark redparts_view($a_shuffle, 1:50, 51:100)
BenchmarkTools.Trial:
memory estimate: 16.03 KiB
allocs estimate: 10
--------------
minimum time: 15.081 μs (0.00% GC)
median time: 17.178 μs (0.00% GC)
mean time: 19.884 μs (4.64% GC)
maximum time: 3.478 ms (99.07% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark redparts_view($a, 2:2:100, 1:2:100)
BenchmarkTools.Trial:
memory estimate: 16.03 KiB
allocs estimate: 10
--------------
minimum time: 16.211 μs (0.00% GC)
median time: 17.288 μs (0.00% GC)
mean time: 20.286 μs (3.94% GC)
maximum time: 3.295 ms (98.99% GC)
--------------
samples: 10000
evals/sample: 1