Operations on `StaticArrays` without intermediate allocation

The following line defines a contraction between two vectors and a fourth-dimensional tensor, the result being a matrix.

contract(a::Vec3, C::Ten43, b::Vec3) = Mat3([dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3])

where Vec3 = SVector{3}, Mat3 = SMatrix{3,3}, Ten43{T} = SArray{(3,3,3,3)}. How would I achieve this without the intermediate allocation of a 3 x 3 matrix? (and ideally without manually writing out all 9 elements of the result)

Yeah, constructing static arrays is still a bit of a pain right now. WIP: Make mutating immutables easier by Keno · Pull Request #21912 · JuliaLang/julia · GitHub is the proposal to address that. Unfortunately, at the moment,
there isn’t too much you can do, other than writing it out manually. However, you can have the compiler generate the manually written out version for you, like so:

julia> @eval f() = SMatrix{3,3}($(Expr(:tuple, (:($i + $j) for i=1:3, j=1:3)...)))
f (generic function with 1 method)

julia> @btime f()
  2.930 ns (0 allocations: 0 bytes)
3×3 StaticArrays.SArray{Tuple{3,3},Int64,2,9}:
 2  3  4
 3  4  5
 4  5  6

Here is a neat way of not materializing things into an array:

test(C, b, a) = SMatrix{3, 3, Float64, 9}(
    NTuple{9, Float64}((dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3))
)

Benchmarktools confirms allocation goes down from 240 bytes to 80… so there is maybe still some room.
Time just goes down from 140 to 100ns… But if you use this in a hot loop, this might already relieve the gc a bit :slight_smile:

Thank you both for the quick summary + solutions

Interestingly, my version is faster then using @Keno version to unrole the operation… Don’t have time to investigate it further, but here are the results:

using StaticArrays, BenchmarkTools
@eval test1(C, b, a) = SMatrix{3, 3, Float64, 9}($(Expr(:tuple, (:(dot(C[$i,:,$j,:] * b, a)) for i=1:3, j=1:3)...)))
test2(C, b, a) = SMatrix{3, 3, Float64, 9}(NTuple{9, Float64}((dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3)))
C = rand(SArray{Tuple{3, 3, 3, 3}, Float64})
a = rand(SVector{3, Float64})
b = rand(SVector{3, Float64})
@assert test1(C, b, a) == test2(C, b, a)
b1 = @benchmark test1(C, b, a)
b2 = @benchmark test2(C, b, a)

Maybe just something silly…Or NTuple(::Generator) is actually better optimized than the unrolled version.

There’s some trickyness to using BenchmarkTools properly. Also -O3 is useful in this particular instance:

julia> @benchmark test2($C, $b, $a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     233.970 ns (0.00% GC)
  median time:      247.103 ns (0.00% GC)
  mean time:        253.813 ns (0.00% GC)
  maximum time:     527.180 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     460

julia> @benchmark test1($C, $b, $a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.383 ns (0.00% GC)
  median time:      18.433 ns (0.00% GC)
  mean time:        18.958 ns (0.00% GC)
  maximum time:     467.755 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

I left out the $, since it seem to me that it would constant fold the result some cases… It gave me results of 5ns, which seems highly unlikely to me… Well, but your numbers look fine, so I guess it doesn’t!

Is it possible that @keno’s version is only performant on v0.6, but not v0.5?

using BenchmarkTools, StaticArrays
typealias Vec3{T} SVector{3, T}
typealias Mat3{T} SMatrix{3,3,T}
contract1(a, C, b) = Mat3([dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3])
@eval contract3(a, C, b) = Mat3($(Expr(:tuple, (:(dot(C[$i,:,$j,:] * b, a)) for i=1:3, j=1:3)...)))
a, b = rand(Vec3), rand(Vec3)
C = @SArray rand(3,3,3,3)
(@benchmark contract1($a, $C, $b)) |> display; println()
(@benchmark contract3($a, $C, $b)) |> display; println()

results on v0.6:

julia> (@benchmark contract1($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  1
  --------------
  minimum time:     147.303 ns (0.00% GC)
  median time:      152.636 ns (0.00% GC)
  mean time:        163.265 ns (2.51% GC)
  maximum time:     1.335 μs (74.75% GC)
  --------------
  samples:          10000
  evals/sample:     792


julia> (@benchmark contract3($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     21.301 ns (0.00% GC)
  median time:      21.309 ns (0.00% GC)
  mean time:        21.806 ns (0.00% GC)
  maximum time:     123.591 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

Results on v0.5:

julia> (@benchmark contract1($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  8.45 KiB
  allocs estimate:  28
  --------------
  minimum time:     1.978 μs (0.00% GC)
  median time:      2.043 μs (0.00% GC)
  mean time:        2.353 μs (7.15% GC)
  maximum time:     83.486 μs (81.14% GC)
  --------------
  samples:          10000
  evals/sample:     10


julia>   (@benchmark contract3($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  8.30 KiB
  allocs estimate:  27
  --------------
  minimum time:     1.767 μs (0.00% GC)
  median time:      1.829 μs (0.00% GC)
  mean time:        2.104 μs (7.82% GC)
  maximum time:     63.618 μs (92.24% GC)
  --------------
  samples:          10000
  evals/sample:     10

StaticArrays.jl has undergone a total rewrite between the 0.5 and 0.6-only versions, so very probable.

ok, too bad. but hopefully I can find the time to move my codes to v0.6 soon.