Common allocation mistakes

jishnub · June 28, 2023, 1:29pm

julia> *(2)
2

julia> @descend *(2)
[ Info: tracking Base
*(x::Number) @ Base ~/packages/julias/julia-1.9/share/julia/base/operators.jl:516
[ Info: This method only fills in default arguments; descend into the body method to see the full source.
516 *(x::Int64::Number)::Int64 = x
[...]

sylvaticus · June 28, 2023, 1:37pm

You are right. The problem is reduce that apply the operation one element at the time:

julia> x = rand(5); y = rand(5);

julia> temp1 = x .* y
5-element Vector{Float64}:
 0.021011105122337788
 0.18748492243282094
 0.29715362865317285
 0.02001732616845591
 0.5826489054162168

julia> z1 = norm(temp1)
0.6810086504806679

julia> z2 = norm(xi*yi for (xi,yi) in zip(x,y))
0.6810086504806679

julia> z1 == z2
true

julia> temp3 = map(*,x,y)
5-element Vector{Float64}:
 0.021011105122337788
 0.18748492243282094
 0.29715362865317285
 0.02001732616845591
 0.5826489054162168

julia> temp3 == temp1
true

julia> z3 = reduce(norm,temp3)
0.021011105122337788

julia> z4 = norm(norm(norm(norm(temp3[1],temp3[2]),temp3[3]),temp3[4]),temp3[5])
0.021011105122337788

julia> z3 == z4
true

The correct version in this case is:

function f2(x,y)
    z = 0.
    for i in 1:10
      z += sum( xi^2 * yi^2 for (xi,yi) in zip(x,y)) |> sqrt 
    end
    z
end

This doesn’t allocate but of course it is specific to the toy example.

lmiq · June 28, 2023, 2:59pm

Hehe, why is that defined at all?

Henrique_Becker · June 28, 2023, 5:05pm

I was also confused until trying:

julia> +(1)
1

julia> *(1)
1

julia> ^(1)
ERROR: MethodError: no method matching ^(::Int64)
Closest candidates are:
  ^(::Number, ::Missing) at missing.jl:124
  ^(::Number, ::Rational) at rational.jl:481
  ^(::T, ::Complex{T}) where T<:Real at complex.jl:852
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

julia> +(1, 2, 3)
6

julia> *(1, 2, 3)
6

This is described in this footnote here: Mathematical Operations and Elementary Functions · The Julia Language

The operators +, ++ and * are non-associative. a + b + c is parsed as +(a, b, c) not +(+(a, b), c). However, the fallback methods for +(a, b, c, d…) and *(a, b, c, d…) both default to left-associative evaluation.

In other words, + and * are not truly binary operators like ^.

sob · June 29, 2023, 12:09pm

For those who have extensive experience with StaticArrays: as you increase the number of elements, at what point do you see a decrease in performance? The documentation says 100, but I’m curious what people’s real life experience suggests.

IlianPihlajamaa · August 20, 2023, 7:57pm

You can try to run this code https://github.com/JuliaArrays/StaticArrays.jl/blob/master/perf/README_benchmarks.jl

For different sizes ‘N’. I expect the result is highly machine specific but im interested in what you find.

sob · August 20, 2023, 10:57pm

So this is what I get for 3 and 10:

julia> simple_bench(3)
============================================
    Benchmarks for 3×3 Float64 matrices
============================================
Matrix multiplication               -> 9.3x speedup
Matrix multiplication (mutating)    -> 1.4x speedup
Matrix addition                     -> 22.2x speedup
Matrix addition (mutating)          -> 1.7x speedup
Matrix determinant                  -> 87.9x speedup
Matrix inverse                      -> 82.3x speedup
Matrix symmetric eigendecomposition -> 4.4x speedup
Matrix Cholesky decomposition       -> 12.3x speedup
Matrix LU decomposition             -> 7.1x speedup
Matrix QR decomposition             -> 61.8x speedup

julia> simple_bench(10)
============================================
    Benchmarks for 10×10 Float64 matrices
============================================
Matrix multiplication               -> 3.8x speedup
Matrix multiplication (mutating)    -> 2.4x speedup
Matrix addition                     -> 6.1x speedup
Matrix addition (mutating)          -> 2.0x speedup
Matrix determinant                  -> 1.5x speedup
Matrix inverse                      -> 1.8x speedup
Matrix symmetric eigendecomposition -> 1.0x speedup
Matrix Cholesky decomposition       -> 3.4x speedup
Matrix LU decomposition             -> 1.5x speedup
Matrix QR decomposition             -> 17.7x speedup

julia> versioninfo()
Julia Version 1.10.0-beta2
Commit a468aa198d0 (2023-08-17 06:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 4800U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 23 on 16 virtual cores
Environment:
  LD_LIBRARY_PATH = /home/***/lib/

I started N=100 3 hours ago and it’s not done yet. So that answers that.

IlianPihlajamaa · August 21, 2023, 5:42am

Yes, for N=100, you are asking the compiler to basically write out matrix methods for 10000-element matrices