# Memory usage in nested loops

Hey everyone,

i would like to have some advise how to perform loops over vector multiplication.
I need to do this for alot of vectors, so i try to use multi threading.
The memory consumption of my function is really high and i do not really understand why.

To be more specific, I have 4 arrays with data (a,b,c,d) with dimension (L, N, N).
I use loops over the indices (N,N) and elementwise multiplication in the direction (L).
For multi threading i creat 2 auxillary arrays (aux1,aux2) for each thread which store elementwise multiplication along L-direction of two element of (a,b,c,d).
Afterwards I take a shifted scalar product of the auxillary arrays.

Increasing N leads to a rapid growth in the memory consumption, which is far larger than the increase in size of the arrays. I would appreciate alot suggestions how to improve here.

The example code looks like this

``````function test()

in_p=100

N=12
L=2^14

a=rand(Complex{Float64},L,N,N)
b=rand(Complex{Float64},L,N,N)
c=rand(Complex{Float64},L,N,N)
d=rand(Complex{Float64},L,N,N)

for i2=1:N , i3=1:N  , i4=1:N
aux1[t] .=  a[:,i1,i2] .* b[:,i3,i4]
aux2[t] .=  c[:,i2,i3] .* d[:,i4 ,i1]
aux[t] +=  transpose(@views aux1[t][1:end-in_p]) * (@views aux2[t][1+in_p:end])
end
end

return sum(aux)
end
``````
``````aux1[t] .= a[:,i1,i2] .* b[:,i3,i4]
aux2[t] .= c[:,i2,i3] .* d[:,i4 ,i1]
``````

causes memory allocation on rhs before broadcasting `*`. It doesn’t seem like but

``````aux[t] += transpose(@views aux1[t][1:end-in_p]) * (@views aux2[t][1+in_p:end])
``````

also allocates in rhs with `aux[t] + transpose(...)*(...)`

You can just put `@views` in front of `function test()` in order to use views for slices everywhere in the function. (Creating a view still allocates a small amount of memory.)

In any case, doing 3 array passes just to compute a dot product at the end seems pretty wasteful, especially when you don’t even use all of the products in your `aux1` and `aux2` arrays. It looks like you can get the same result with just:

``````s = aux[t]
for i = 1:L-in_p
s += a[i,i1,i2]*b[i,i3,i4] * c[i+in_p,i2,i3]*d[i+in_p,i4,i1]
end
aux[t] = s
``````

(You might also want `@simd for ...` on this loop as well as `@inbounds s += ...`, for speed.)

5 Likes