Operations on `StaticArrays` without intermediate allocation

cortner · June 13, 2017, 9:25pm

The following line defines a contraction between two vectors and a fourth-dimensional tensor, the result being a matrix.

contract(a::Vec3, C::Ten43, b::Vec3) = Mat3([dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3])

where Vec3 = SVector{3}, Mat3 = SMatrix{3,3}, Ten43{T} = SArray{(3,3,3,3)}. How would I achieve this without the intermediate allocation of a 3 x 3 matrix? (and ideally without manually writing out all 9 elements of the result)

Keno · June 13, 2017, 9:49pm

Yeah, constructing static arrays is still a bit of a pain right now. WIP: Make mutating immutables easier by Keno · Pull Request #21912 · JuliaLang/julia · GitHub is the proposal to address that. Unfortunately, at the moment,
there isn’t too much you can do, other than writing it out manually. However, you can have the compiler generate the manually written out version for you, like so:

julia> @eval f() = SMatrix{3,3}($(Expr(:tuple, (:($i + $j) for i=1:3, j=1:3)...)))
f (generic function with 1 method)

julia> @btime f()
  2.930 ns (0 allocations: 0 bytes)
3×3 StaticArrays.SArray{Tuple{3,3},Int64,2,9}:
 2  3  4
 3  4  5
 4  5  6

sdanisch · June 13, 2017, 9:51pm

Here is a neat way of not materializing things into an array:

test(C, b, a) = SMatrix{3, 3, Float64, 9}(
    NTuple{9, Float64}((dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3))
)

Benchmarktools confirms allocation goes down from 240 bytes to 80… so there is maybe still some room.
Time just goes down from 140 to 100ns… But if you use this in a hot loop, this might already relieve the gc a bit

cortner · June 13, 2017, 9:51pm

Thank you both for the quick summary + solutions

sdanisch · June 13, 2017, 10:00pm

Interestingly, my version is faster then using @Keno version to unrole the operation… Don’t have time to investigate it further, but here are the results:

using StaticArrays, BenchmarkTools
@eval test1(C, b, a) = SMatrix{3, 3, Float64, 9}($(Expr(:tuple, (:(dot(C[$i,:,$j,:] * b, a)) for i=1:3, j=1:3)...)))
test2(C, b, a) = SMatrix{3, 3, Float64, 9}(NTuple{9, Float64}((dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3)))
C = rand(SArray{Tuple{3, 3, 3, 3}, Float64})
a = rand(SVector{3, Float64})
b = rand(SVector{3, Float64})
@assert test1(C, b, a) == test2(C, b, a)
b1 = @benchmark test1(C, b, a)
b2 = @benchmark test2(C, b, a)

Maybe just something silly…Or NTuple(::Generator) is actually better optimized than the unrolled version.

Keno · June 13, 2017, 10:06pm

There’s some trickyness to using BenchmarkTools properly. Also -O3 is useful in this particular instance:

julia> @benchmark test2($C, $b, $a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     233.970 ns (0.00% GC)
  median time:      247.103 ns (0.00% GC)
  mean time:        253.813 ns (0.00% GC)
  maximum time:     527.180 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     460

julia> @benchmark test1($C, $b, $a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.383 ns (0.00% GC)
  median time:      18.433 ns (0.00% GC)
  mean time:        18.958 ns (0.00% GC)
  maximum time:     467.755 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

sdanisch · June 13, 2017, 10:09pm

I left out the $, since it seem to me that it would constant fold the result some cases… It gave me results of 5ns, which seems highly unlikely to me… Well, but your numbers look fine, so I guess it doesn’t!

cortner · June 13, 2017, 10:25pm

Is it possible that @keno’s version is only performant on v0.6, but not v0.5?

using BenchmarkTools, StaticArrays
typealias Vec3{T} SVector{3, T}
typealias Mat3{T} SMatrix{3,3,T}
contract1(a, C, b) = Mat3([dot(C[i,:,j,:] * b, a) for i=1:3, j = 1:3])
@eval contract3(a, C, b) = Mat3($(Expr(:tuple, (:(dot(C[$i,:,$j,:] * b, a)) for i=1:3, j=1:3)...)))
a, b = rand(Vec3), rand(Vec3)
C = @SArray rand(3,3,3,3)
(@benchmark contract1($a, $C, $b)) |> display; println()
(@benchmark contract3($a, $C, $b)) |> display; println()

results on v0.6:

julia> (@benchmark contract1($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  1
  --------------
  minimum time:     147.303 ns (0.00% GC)
  median time:      152.636 ns (0.00% GC)
  mean time:        163.265 ns (2.51% GC)
  maximum time:     1.335 μs (74.75% GC)
  --------------
  samples:          10000
  evals/sample:     792


julia> (@benchmark contract3($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     21.301 ns (0.00% GC)
  median time:      21.309 ns (0.00% GC)
  mean time:        21.806 ns (0.00% GC)
  maximum time:     123.591 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

Results on v0.5:

julia> (@benchmark contract1($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  8.45 KiB
  allocs estimate:  28
  --------------
  minimum time:     1.978 μs (0.00% GC)
  median time:      2.043 μs (0.00% GC)
  mean time:        2.353 μs (7.15% GC)
  maximum time:     83.486 μs (81.14% GC)
  --------------
  samples:          10000
  evals/sample:     10


julia>   (@benchmark contract3($a, $C, $b)) |> display; println()
BenchmarkTools.Trial:
  memory estimate:  8.30 KiB
  allocs estimate:  27
  --------------
  minimum time:     1.767 μs (0.00% GC)
  median time:      1.829 μs (0.00% GC)
  mean time:        2.104 μs (7.82% GC)
  maximum time:     63.618 μs (92.24% GC)
  --------------
  samples:          10000
  evals/sample:     10

dpsanders · June 13, 2017, 10:26pm

StaticArrays.jl has undergone a total rewrite between the 0.5 and 0.6-only versions, so very probable.

cortner · June 13, 2017, 10:27pm

ok, too bad. but hopefully I can find the time to move my codes to v0.6 soon.

Topic		Replies	Views
Creating a Static Array with a loop General Usage	10	1008	April 12, 2022
Inexplicable allocations when summing `StaticArrays` Performance	8	1127	December 3, 2018
Linear algebra performance issue General Usage	20	1093	July 10, 2020
Allocation with StaticArrays New to Julia memory-allocation , staticarrays	19	699	September 15, 2021
Is there a way to build a static matrix programmatically? General Usage staticarrays	6	502	July 23, 2023

Operations on `StaticArrays` without intermediate allocation

Related topics