I came across a significant slowdown in my code when attempting to write to too many matrices within one loop. Below is a silly (and slightly long, my apologies) MWE that just writes zeroes. In my actual code, I perform some operations and then store their results.
Basically, the moment I try to write to 9 matrices within the same loop, performance degrades by a factor of >10.
using UnPack
len_A = 15
len_B = 400
len_C = 1198
struct BigMatrices
matrix1::Array{Float64,3}
matrix2::Array{Float64,3}
matrix3::Array{Float64,3}
matrix4::Array{Float64,3}
matrix5::Array{Float64,3}
matrix6::Array{Float64,3}
matrix7::Array{Float64,3}
matrix8::Array{Float64,3}
matrix9::Array{Float64,3}
matrix10::Array{Float64,3}
matrix11::Array{Float64,3}
matrix12::Array{Float64,3}
matrix13::Array{Float64,3}
matrix14::Array{Float64,3}
end
bigMatrices = BigMatrices(
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A),
zeros(len_C, len_B, len_A)
);
function doOperations!(len_A,len_B,len_C,bigMatrices::BigMatrices)
@unpack matrix1, matrix2, matrix3, matrix4, matrix5, matrix6, matrix7,
matrix8, matrix9, matrix10, matrix11, matrix12, matrix13, matrix14 = bigMatrices;
for a β 1:len_A
for b β 1:len_B
for c β 1:len_C
matrix1[c, b, a] = 0.0
matrix2[c, b, a] = 0.0
matrix3[c, b, a] = 0.0
matrix4[c, b, a] = 0.0
matrix5[c, b, a] = 0.0
matrix6[c, b, a] = 0.0
matrix7[c, b, a] = 0.0
matrix8[c, b, a] = 0.0
end
end
end
for a β 1:len_A
for b β 1:len_B
for c β 1:len_C
matrix9[c, b, a] = 0.0
end
end
end
return nothing
end
function doOperationsSlow!(len_A,len_B,len_C,bigMatrices::BigMatrices)
@unpack matrix1, matrix2, matrix3, matrix4, matrix5, matrix6, matrix7,
matrix8, matrix9, matrix10, matrix11, matrix12, matrix13, matrix14 = bigMatrices;
for a β 1:len_A
for b β 1:len_B
for c β 1:len_C
matrix1[c, b, a] = 0.0
matrix2[c, b, a] = 0.0
matrix3[c, b, a] = 0.0
matrix4[c, b, a] = 0.0
matrix5[c, b, a] = 0.0
matrix6[c, b, a] = 0.0
matrix7[c, b, a] = 0.0
matrix8[c, b, a] = 0.0
matrix9[c, b, a] = 0.0
end
end
end
return nothing
end
julia> @benchmark doOperations!(len_A,len_B,len_C,bigMatrices)
BenchmarkTools.Trial: 93 samples with 1 evaluation per sample.
Range (min β¦ max): 31.240 ms β¦ 66.930 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 55.651 ms β GC (median): 0.00%
Time (mean Β± Ο): 54.205 ms Β± 7.953 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β β
β
ββ
β
β
ββ β ββ β
β
ββββββββββββ
β
β
βββββ
β
β
ββ
ββ
ββ
β
ββββββ
β
β
βββββββββββββββ
βββββββ β
31.2 ms Histogram: frequency by time 66.4 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark doOperationsSlow!(len_A,len_B,len_C,bigMatrices)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
Range (min β¦ max): 813.094 ms β¦ 896.622 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 837.690 ms β GC (median): 0.00%
Time (mean Β± Ο): 844.685 ms Β± 32.067 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β β β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
813 ms Histogram: frequency by time 897 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
I was hoping someone could help me understand what the issue/general principle is here, and how to avoid it? I assume it has to do with memory access, but am not sure when it is or is not something to be careful about.