I have a program I’m writing that reads in two Matrices, performs a bit of linear algebra in a pair of nested loops, then outputs a new Matrix. Because eventually it needs to operate on very large input Matrices, I’ve been testing its performance and found that it is quite a bit slower than I’m expecting, likely due to a very large number of memory allocations that I do not understand.

This is a simplified version of my code:

```
using BenchmarkTools: @benchmark
using LinearAlgebra: norm, dot, cross
A = rand(3, 1000)
B = rand(7, 1000)
function f(A::AbstractArray{Float64}, B::AbstractArray{Float64})
# Preallocate variables outside of loops; for this example, this is a total
# of (3000 + 4*3 Float64) * 8 bytes/Float64 ~= 24kB (?)
C = zeros(size(A))
x = zeros(3)
y = zeros(3)
z = zeros(3)
zcrossx = zeros(3)
for j in axes(B)[2]
for i in axes(A)[2]
x = B[4:6,j] - B[1:3,j]
y = B[1:3,j] - A[:,i]
z = B[4:6,j] - A[:,i]
zcrossx = cross(z,x)
C[:,i] += zcrossx * B[7,j] *(1/(norm(zcrossx)^2)) * (dot(x,z)/norm(z) - dot(x,y)/norm(y))
end
end
return C
end
@benchmark f($A, $B)
```

When running it, I get this output:

```
BenchmarkTools.Trial: 12 samples with 1 evaluation.
Range (min … max): 423.763 ms … 436.830 ms ┊ GC (min … max): 11.80% … 12.91%
Time (median): 426.853 ms ┊ GC (median): 12.24%
Time (mean ± σ): 427.625 ms ± 3.538 ms ┊ GC (mean ± σ): 12.37% ± 0.43%
▁ ▁ █ ▁ ▁ ▁▁ ▁ ▁ ▁ ▁
█▁▁▁█▁█▁▁▁▁█▁█▁██▁▁▁▁▁▁█▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
424 ms Histogram: frequency by time 437 ms <
Memory estimate: 1.12 GiB, allocs estimate: 15000006.
```

Per the quick math in the comments of my script, I assume the heap memory allocation would be on the order of 24 kB, not over 1 GB!

Can anyone explain what’s causing this behavior? Is there a simple fix to dramatically reduce this memory allocation?