Why is for loops so slow?

stevengj · May 22, 2023, 11:29am

To elaborate on @mstewart’s answer above, even for the simpler problem of multiplying two m \times m matrices, which naively takes about 10 lines of code (3 nested loops), the simplest implementations (in any language!) are typically orders of magnitude slower than highly optimized algorithms (which do the “same” \sim 2m^3 floating-point operations; the theoretical lower-complexity matrix-mult algorithms are virtually never used). Optimized matrix-multiply “BLAS” libraries often devote \gtrsim 10,000 lines of code to this problem! It’s not a question of the speed of “for loops” — optimized code completely re-organizes the algorithm into “block” operations that have better memory locality, for example.

See also the discussion thread: Julia matrix-multiplication performance

It is very hard to beat (or even come close to) the performance of highlyoptimized libraries for basic operations on generic dense matrices, even if you understand these performance issues! This is true in any language, even in compiled languages like C. (Where you can do better is if your matrices are very special and and you can exploit that for improved algorithms.)

perebalazs:

function symgauss!(A::Matrix{Float64}, b::Vector{Float64})
    @assert size(A, 1) == size(A, 2) && size(A, 1) == length(b) "Size mismatch"
    # Gauss elimination
    n = length(b)
    d::Float64 = 0
    e::Float64 = 0

Note, by the way, that all of these type declarations are not required for performance. As long as you initialize your variables in a type-stable way like d = zero(eltype(b)); e = zero(eltype(A)), you can omit all of the type declarations and the code will be just as fast … and more generic, since it will work on matrices of any numeric type. (This is a common misconception.) See “Argument-Type Declarations” in the Julia manual.

Topic		Replies	Views
Linear solver \(A, B) performance vs Matlab A\b General Usage	32	7670	May 21, 2017
Product of two symmetric matrices: LoopVectorization.jl vs LinearAlgebra Performance blas , linearalgebra , loopvectorization	9	975	August 31, 2021
How to solve this Ax=b faster? Numerics linearalgebra , linearsolve	24	2561	June 7, 2022
Something faster than for loops General Usage	27	6352	May 8, 2019
Unexpected poor FOR loops performance Performance question	8	411	June 28, 2025

Why is for loops so slow?

Related topics