I am working on a project where one of the functions is the bottleneck of speed. The problem seems to do with accessing a large matrix. I’d appreciate any advice to improve the code.
The problem is represented by the following MWE. test3()
is what I need to do in the project and I found it slow. test1()
and test2()
are for comparisons.
using Statistics, BenchmarkTools
function test1()
c = rand(1000)
M = rand(256, 1000)
a = zero(eltype(c))
@inbounds for i in 1:1000
@views a += mean(c[i] .+ M[:,1]) # a fixed column of M
end
return a
end
function test2()
c = rand(1000)
M = rand(256, 1000)
a = zero(eltype(c))
@inbounds for i in 1:1000
@views a += mean(c[i] .+ M[:, 1 + i % 10]) # cycling through the first 10 columns of M
end
return a
end
function test3()
c = rand(1000)
M = rand(256, 1000)
a = zero(eltype(c))
@inbounds for i in 1:1000
@views a += mean(c[i] .+ M[:, i]) # using each column of M
end
return a
end
The execution time and memory allocation is:
julia> @btime test1();
789.300 μs (1003 allocations: 4.04 MiB)
julia> @btime test2();
801.600 μs (1003 allocations: 4.04 MiB)
julia> @btime test3();
985.400 μs (1003 allocations: 4.04 MiB)
test3()
is more than 20% slower than test1()
and test2()
, though memory allocation is the same. If there a way to improve test3()
's performance?
Many thanks.