Reduce allocation in nested for loops

EthanN · July 24, 2024, 4:25pm

I have a function that calculates the expectation and covariance of random variables on a grid with dimensions (nx, ny, nz). The variables y and z are Markovian, with Qy and Qz being their Markov matrices. Functions f and g compute the exact values of random variables a and b. The example I gave below is a bit awkward since it is turning Int to Float but that is for illustration only.

Given the large scale of the problem (with additional layers), preallocating large matrices as and bs outside the loop is impractical due to memory constraints. To avoid data races, I allocate these matrices within the threaded loops.

Profiling reveals significant allocations related to tensor contractions and the matrices as and bs. Is there a way to optimize this function to reduce allocations and improve efficiency?

Thank you!

using TensorOperations
using LinearAlgebra

function calc_mats(nx, ny, nz, Qy, Qz)
    ## preallocation, output
    μ_mat = Array{Float64}(undef, (nx, ny, nz))
    C_mat = similar(μ_mat)
    ## loop over all states and decisions today
    # states today
    #Threads.@threads 
    Threads.@threads for iz in 1:nz
        Threads.@threads for iy in 1:ny
            ## preallocation, temporary variables, has to be within the threads
            # given state and decisions today
            as = Array{Float64}(undef, (ny, nz))
            bs = Array{Float64}(undef, (nz))
            abs = similar(as)
            Qziz = @view Qz[iz, :]
            Qyiy = @view Qy[iy, :]
            for ix in 1:nx
                # uncertainties tomorrow
                # prepare return and flow shock matrices
                for izz in 1:nz
                    bs[izz] = f(izz)
                    for iyy in 1:ny
                        as[iyy, izz] = g(iyy, izz, ix, iy, iz)
                        abs[iyy, izz] = as[iyy, izz] * bs[iyy]
                    end
                end
                # calculate mean and covariance
                @show size(as), size(Qziz), size(Qyiy)
                μ = @tensor as[iiyy, iizz] * Qziz[iizz] * Qyiy[iiyy]
                μ_mat[ix, iy, iz] = μ
                Eb = dot(bs, Qziz)
                Eab = @tensor abs[iiyy, iizz] * Qziz[iizz] * Qyiy[iiyy]
                C = Eab - μ * Eb
                C_mat[ix, iy, iz] = C
            end
        end
    end
    return μ_mat, C_mat
end

g(a, b, c, d, e) = a/b+c-c/d +exp(e)
f(x) = sqrt(x)

nx = 3
ny = 5
nz = 7
Qy = randn(ny, ny)
Qz = randn(nz, nz)

calc_mats(nx, ny, nz, Qy, Qz)

I tried different packages on the tensor contraction but it seems TensorOperations is already the best solution in terms of tensor contraction. I tried using calculating the Kronecker products of the Markovian matrices, reshape the result, and multiply with the as and abs. But it was not very helpful either.

I haven’t figured out a way to bypass the allocation of as bs abs within threaded loops. That should also be helpful.

PetrKryslUCSD · July 24, 2024, 4:42pm

Is there any reason you need to allocate these arrays inside the loops? I would suggest pulling it out.

EthanN · July 24, 2024, 8:46pm

Somehow I missed Threads.@threads there. Sorry. I now added it back. I tested the two versions, one allocating these variables outside the loop, one with threading for the iz iy levels and allocating them inside the loop to avoid data races. It turns out the threaded version is faster.

Topic		Replies	Views
Pre-allocate intermediate arrays for multi-threading Performance multithreading	12	239	April 24, 2025
Large allocations using @Threads.threads in a loop leads to slow down New to Julia multithreading	17	1005	August 19, 2023
Avoid memory allocation in passing arrays Performance question , memory-allocation , tullio	14	2454	August 21, 2021
Allocations of @threads Performance memory-allocation	10	2334	August 28, 2021
Increased allocations when using threads Performance question	20	284	July 11, 2024

Reduce allocation in nested for loops

Related topics