Speeding up a sum involving 3 matrices

structural · August 13, 2022, 6:22pm

Thanks for your answers. It made a 20x improvement in my overall computation time.

May I get your help with another related task? I need to compute a 10,000+ row vector in which the i^{th} row is \sum_{jk} x^{}_{ij}z^{}_{ik}\pi^{}_{jk} where j,k < 10. x, z are data and \pi_{jk} are parameters to be estimated. I have written the following code to implement this sum. Are there any improvements I can make here?

x = data[:,1:8];
z = data[:,9:16];
pi = rand(Float64, (size(x,2), size(z,2)));

function nlsum(pi)
    s = zeros(eltype(pi), size(x, 1), 1)
    for j in axes(pi, 1)
        for k in axes(pi, 2)
            s[:] += x[:, j].*z[:, k].* pi[j, k]
        end
    end
    return s    
end

stevengj · August 13, 2022, 6:55pm

For one thing, your code is allocating like crazy, since s[:] += x[:, j].*z[:, k].* pi[j, k] allocates 4 temporary arrays on every iteration. You could use views, or just write out the loop over i, or use something like Tullio.jl. Also, don’t use global variables — pass x and z as parameters to your function. Just changing your function to @views function nlsum(x,z,pi) speeds it up by a factor of 3x on my machine.

Alternatively, if you think in terms of matrix operations, your function is exactly:

nlsum2(x,z,π)= sum(x .* (z * π'), dims=2)

and this gives me another factor of 25x, for overall almost 100x speedup. (And you could eke out some more performance by optimizing out the extra allocations. Tullio.jl is also worth trying.)

PS. Julia has 1-dimensional arrays, unlike Matlab. You can allocate s as a 1d array instead of a 2d array with 1 column.

DNF · August 14, 2022, 6:16am

pi is a built-in constant (equal to 3.14…) Overwriting it in global scope is probably not a good idea.

stevengj · August 14, 2022, 11:30am

This is totally safe (if a bit unusual in the case of pi) — it won’t affect usage of pi in other modules. (You are shadowing MathConstants.pi with a new binding, not overwriting it.)

(In a real application you’d probably be putting this code into functions anyway.)

It’s actually pretty crucial that this is fine — Base alone exports almost 1000 symbols, so if shadowing names were unsafe you’d see a lot of inadvertent breakage, and it would be dangerous to add new exports in future versions.

Mason · August 14, 2022, 10:41pm

stevengj:

nlsum2(x,z,π)= sum(x .* (z * π'), dims=2)
and this gives me another factor of 25x, for overall almost 100x speedup. (And you could eke out some more performance by optimizing out the extra allocations. Tullio.jl is also worth trying.)

Interesting, I find that this a factor of 2 slower on my machine than the OP’s version with your suggested fixes. What sizes did you use? I took the data to be 10_000 long in its first dimension.

Here’s my code, plus a Tullio.jl version that smokes them both:

function nlsum1(x, z, pi)
    s = zeros(eltype(pi), size(x, 1))
    for j in axes(pi, 1)
        for k in axes(pi, 2)
            @views s[:] .+= x[:, j].*z[:, k].* pi[j, k]
        end
    end
    return s    
end

nlsum2(x,z,π) = sum(x .* (z * π'), dims=2)

using Tullio, LoopVectorization
nlsum_3(x,z,π) = @tullio s[i] := x[i,j]*z[i,k]*π[j, k];

and timings:

julia> let x = randn(10_000, 8), z = randn(10_000, 5), π = rand(8, 5)
           
           out1 = @btime nlsum1($x, $z, $π)
           out2 = @btime nlsum2($x, $z, $π)
           out3 = @btime nlsum3($x, $z, $π)
           out1 ≈ out2 ≈ out3
       end
  111.783 μs (2 allocations: 78.17 KiB)
  198.939 μs (10 allocations: 1.30 MiB)
  28.667 μs (19 allocations: 79.02 KiB)
true

Topic		Replies	Views
Speed up Array computation New to Julia linearalgebra , arrays , matrices	4	355	October 17, 2022
How to improve performance of sum() Performance question , fast-math , tullio , loopvectorization , sum	19	4828	July 19, 2021
Multiplication of vector of matrices and vector of vectors Performance linearalgebra , tullio	3	417	November 7, 2022
Speed up simple product accumulator loop Performance loops , tullio	6	855	August 17, 2020
Assistance in speeding up Array operations and summations New to Julia performance , arrays , sum	12	561	August 23, 2023

Speeding up a sum involving 3 matrices

Related topics