Slower with threads

  1. You want static arrays wherever you create lots of them in a loop. They won’t incur allocations because their size is known so they’re put on the stack.
  2. You can’t modify static arrays, but usually you can store a completely new static array where some values are different than in the previous version, and this is usually fast because, again, no allocation is needed. The compiler is pretty smart about making such things fast. You could look at GitHub - JuliaObjects/Accessors.jl: Update immutable data for example which helps with that.
  3. Sparse arrays are something very different. There the optimization is that most elements are zero, so only the non-zero elements are saved with their positions in the array.
  4. MMatrix is slower than SMatrix, the mutability prohibits some performance optimizations.

When I switch to the main job, the optimization stops working! I think my optimization is not optimizing!

What you want to do is avoid creating new arrays, allocating memory, on the heap. If you do need to allocate memory, you should see if you can reuse them. This is what the bang functions are for such as mul!. Another method is to use broadcast assignment .=.

julia> @allocated A = rand(1:10, 1024, 1024) # Allocates 8 MiB
8388656

julia> @allocated B = rand(1:10, 1024, 1024) # Allocates 8 MiB
8388656

julia> @allocated C = zeros(1024, 1024) # Allocates 8 MiB
8388656

julia> @allocated A .+ B # Allocates 8 MiB to store result
8388720

julia> @allocated C .= A .+ B # Allocates only 64 bytes, avoids 8 MiB allocation
64

julia> using LinearAlgebra

julia> @allocated A * B # allocations due to compilation
115689839

julia> @allocated A * B # > 8 MiB allocated to store result
8420144

julia> @allocated mul!(C, A, B) # allocation due to compilation
123465049

julia> @allocated mul!(C, A, B) # ~31 KiB needed for multiplication, avoid allocating 8 MiB
31488

julia> @allocated C .= A .* B # elementwise multiplication avoids allocating 8 MiB
64

StaticArrays.jl uses an optimization for small amounts of memory. The larger strategy to control memory allocation tightly.

@allocated A = rand(1:10, 1024, 1024) is different from A = rand(1:10, 1024, 1024)?

help?> @allocated
  @allocated

  A macro to evaluate an expression, discarding the resulting value, instead
  returning the total number of bytes allocated during evaluation of the
  expression.

  See also @time, @timev, @timed, and @elapsed.

  julia> @allocated rand(10^6)
  8000080

No. I’m just measuring allocations

Got it. I measured, each thread should allocate around 50 KiB. I love learning Julia, but for now I’ll have to program the hard way: just writing physics equations, until I learn more. But I will use C .= A .+ B, in everything possible. Thanks mkitti.

Note that in Julia 1.8 we now have an allocation profiler:

2 Likes