Help me on strange performance slow down using SMatrix

Roger-luo · April 11, 2019, 10:37am

function instruct2!(state, U, loc)
    a, c, b, d = U
    step = 1 << (loc - 1)
    step_2 = 1 << loc
    for j in 0:step_2:size(state, 1)-step
       for i in j+1:j+step
            u1rows!(state, i, i+step, a, b, c, d)
       end
    end
    return state
end

@inline @inbounds function u1rows!(state::AbstractVector, i::Int, j::Int, a, b, c, d)
    w = state[i]
    v = state[j]
    state[i] = a*w+b*v
    state[j] = c*w+d*v
    state
end

I’m using SMatrix instead of Matrix for a small matrix (2x2), the only related operations are iterate_index, which looks like a, b, c, d = U (U is the matrix), the rest of the code is only related to a, b, c, d, but the performance seems not to, the difference between SMatrix and Matrix increases along with the size of state

I tested this on Julia

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.2.0)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libimf
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code

julia> @benchmark foreach(k->instruct2!($st, $U, 1), 1:100)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     183.067 ms (0.00% GC)
  median time:      191.323 ms (0.00% GC)
  mean time:        192.796 ms (0.00% GC)
  maximum time:     209.240 ms (0.00% GC)
  --------------
  samples:          26
  evals/sample:     1

julia> @benchmark foreach(k->instruct2!($st, $(Matrix(U)), 1), 1:100)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     178.031 ms (0.00% GC)
  median time:      181.558 ms (0.00% GC)
  mean time:        184.131 ms (0.00% GC)
  maximum time:     219.924 ms (0.00% GC)
  --------------
  samples:          28
  evals/sample:     1

But this looks unexpected since the main cost has nothing to do with which kind of matrix type to use…

kristoffer.carlsson · April 11, 2019, 10:45am

Please provide enough code so that the benchmarks can be run, preferably with just copy and paste.

Also, there is no need to do a foreach loop for benchmarking, BenchmarkTools does that for your. The time difference seems very small as well.

Roger-luo · April 11, 2019, 7:00pm

sorry, I missed first two lines:

using StaticArrays, BenchmarkTools

U = @SMatrix rand(ComplexF64, 2, 2)
st = rand(ComplexF64, 1<<20)

The overhead is small indeed, but it is somehow seems not to be constant on my laptop, it scales with the total time cost (when increase the size of state), and instruct! function is actually inside another for loop, so it’s pretty obvious when there’s a loop, like a few ms.

In fact, if I just measure the time cost of a, b, c, d = U, SMatrix is much faster, which make this look strange to me.

Topic		Replies	Views
Why substituting "for loop" with direct assignment of Matrix worsen the performance time? General Usage question	20	876	September 23, 2021
Matrix performance General Usage performance	24	1637	August 23, 2017
Improve performance of function that produces and hcats SMatrices Performance question , performance	3	679	February 18, 2018
Request guidance in optimising the performance of my implementation (running time and allocation) General Usage question	9	982	April 30, 2018
Bug in BenchmarkTools? General Usage	2	198	February 8, 2023

Help me on strange performance slow down using SMatrix

Related topics