# 3D Matrix vs Array of 2D Matrices using Comprehensions

I am trying to populate a large, dense 3D matrix (3000 x 18 x 1000). I tried iterating over all positions of the 3D matrix (testing various row/column iteration orders) - see Attempt 1. I also tried array comprehensions on an array of matrices - see Attempt 2. The timing for both is about the same (see below). Does anyone have suggestions for how to further speed this up? The examples below have the first dimension at 50 instead of 3000, but I need to scale up to 3000 (see the `pointNumber` dimension in the code).

Attempt 1 - iterate over all positions of 3D array with nested for loop

Summary
``````using LinearAlgebra
using BenchmarkTools

function greensMatrix3D1(matchCoordinates,axisCoordinates,zArray)

greensMatrix = zeros(Float64,length(matchCoordinates),length(axisCoordinates),length(zArray))

for v = 1:length(matchCoordinates),q = 1:length(axisCoordinates),k=1:length(zArray)
rvec = matchCoordinates[v] - [axisCoordinates[q],axisCoordinates[q],zArray[k]]
rivec = rvec + [0, 2*axisCoordinates[q],0]
r = norm(rvec)
ri = norm(rivec)
greensMatrix[v,q,k] = (1/r - 1/ri)
end

return greensMatrix
end

pointNumber = 50 #target 3000
matchCoordinates = [[10,i*0.02,0] for i in 1:pointNumber]
axisCoordinates = [[i-1,10,0] for i in 1:18]
zArray = range(-100,100,length=1024)

@btime output1 = greensMatrix3D1(matchCoordinates,axisCoordinates,zArray)

``````

Attempt 2 - use comprehensions and an array of matrices

Summary
``````using LinearAlgebra

function oneGreens(xyz1,xyz2)
rvec = xyz1 - xyz2
rivec = rvec + [0, 2*xyz2,0]
r = norm(rvec)
ri = norm(rivec)
greensMatrix = (1/r - 1/ri)
return greensMatrix
end

function greensMatrix3D(matchCoordinates,axisCoordinates,zArray)

greensMatrixArray = [zeros(Float64,length(matchCoordinates),length(axisCoordinates)) for _ in 1:length(zArray)]
for i = 1:length(zArray)
axisTempCoordinates = axisCoordinates
setindex!.(axisTempCoordinates,zArray[i],3)
greensMatrixArray[i] = [oneGreens(j,k) for j in matchCoordinates, k in axisTempCoordinates]
end

return greensMatrixArray
end

pointNumber = 50 #target 3000
matchCoordinates = [[10.0,i*0.02,0.0] for i in 1:pointNumber]
axisCoordinates = [[i-1.0,10.0,0.0] for i in 1:18]
zArray = range(-100.0,100.0,length=1024)

@btime output2 = greensMatrix3D(matchCoordinates,axisCoordinates,zArray)

``````

The timing results for the above are:
Attempt 1:>>> `1.826 s (8294402 allocations: 513.28 MiB)`
Attempt 2:>>> `1.673 s (7375873 allocations: 422.41 MiB)`

When you see that many allocations from your benchmark, that should be the first clue that something is sub-optimal. In particular, your function allocates brand new arrays in several places in your inner loop:

``````rvec = matchCoordinates[v] - [axisCoordinates[q],axisCoordinates[q],zArray[k]]
``````

This allocates two new arrays, one holding the elements on the right-hand-side of the `-` and then another one for the result of the subtraction.

``````rivec = rvec + [0, 2*axisCoordinates[q],0]
``````

likewise, this line allocates two more arrays. Since this is happening inside your loop, you end up seeing a huge number of allocation events when filling your whole matrix.

Since your matches and axes consist of a large number of small arrays, this is a textbook use case for https://github.com/JuliaArrays/StaticArrays.jl

For example, by storing the `matchCoordinates` and `axisCoordinates` as static arrays (`SVector`s, to be specific) and constructing more static arrays inside the loop instead of plain arrays, we can reduce the allocations from 8294402 to just 2:

``````using StaticArrays

function greensMatrix3D3(matchCoordinates,axisCoordinates,zArray)

greensMatrix = zeros(Float64,length(matchCoordinates),length(axisCoordinates),length(zArray))

for v = 1:length(matchCoordinates),q = 1:length(axisCoordinates),k=1:length(zArray)
rvec = matchCoordinates[v] - SVector(axisCoordinates[q], axisCoordinates[q], zArray[k])
rivec = rvec + SVector(0, 2*axisCoordinates[q], 0)
r = norm(rvec)
ri = norm(rivec)
greensMatrix[v,q,k] = (1/r - 1/ri)
end

return greensMatrix
end

pointNumber = 50 #target 3000
matchCoordinates = [SVector(10, i * 0.02, 0) for i in 1:pointNumber]
axisCoordinates = [SVector(i-1, 10, 0) for i in 1:18]
zArray = range(-100,100,length=1024)
@btime greensMatrix3D3(\$matchCoordinates, \$axisCoordinates, \$zArray);
``````

Output:

``````  7.016 ms (2 allocations: 7.03 MiB)
``````

That’s already 250 times faster, but there’s even more we can do.

Adding `@inbounds` helps a little bit:

``````@inbounds for v = 1:length(matchCoordinates),q = 1:length(axisCoordinates),k=1:length(zArray)
...
end
``````

giving a new time of `6.872 ms (2 allocations: 7.03 MiB)`.

Much more helpful is fixing the order of the nested loops. It turns out that your loop order was exactly backwards (see https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-column-major ), but it’s easy to fix:

``````function greensMatrix3D5(matchCoordinates,axisCoordinates,zArray)

greensMatrix = zeros(Float64,length(matchCoordinates),length(axisCoordinates),length(zArray))

@inbounds for k=1:length(zArray), q = 1:length(axisCoordinates), v = 1:length(matchCoordinates)
rvec = matchCoordinates[v] - SVector(axisCoordinates[q], axisCoordinates[q], zArray[k])
rivec = rvec + SVector(0, 2*axisCoordinates[q], 0)
r = norm(rvec)
ri = norm(rivec)
greensMatrix[v,q,k] = (1/r - 1/ri)
end

return greensMatrix
end

pointNumber = 50 #target 3000
matchCoordinates = [SVector(10, i * 0.02, 0) for i in 1:pointNumber]
axisCoordinates = [SVector(i-1, 10, 0) for i in 1:18]
zArray = range(-100,100,length=1024)
@btime greensMatrix3D5(\$matchCoordinates, \$axisCoordinates, \$zArray);
``````

That gives a final result of:

``````  3.301 ms (2 allocations: 7.03 MiB)
``````

or about 500 times faster than the original function. There’s probably even more that can be done, but this is a pretty good return on a small investment of effort.

9 Likes

By the way, with the `StaticArrays` change and the fixed loop order, computing 3000 points takes just 276ms:

``````pointNumber = 3000 #target 3000
matchCoordinates = [SVector(10, i * 0.02, 0) for i in 1:pointNumber]
axisCoordinates = [SVector(i-1, 10, 0) for i in 1:18]
zArray = range(-100,100,length=1024)
@btime greensMatrix3D5(\$matchCoordinates, \$axisCoordinates, \$zArray);
``````

output:

``````  276.100 ms (2 allocations: 421.88 MiB)
``````
7 Likes

Excellent. Thank you!

I’m trying to understand this column major concept a bit better. If the rows and columns are of unequal size and we have flexibility in how those are assigned, is it better to assign the larger dimension to rows?