What is the fastest method to shift matrix?

acxz · August 4, 2022, 4:34pm

length(1:25e-6:0.6) = 0

is it 0 instead of 1?

Also it is important to note that after a certain point of shifting (col_dim < iteration) you’ll end up with just a matrix of 0’s, unless I presume you are doing other operations to populate A in the loop.

rocco_sprmnt21 · August 4, 2022, 4:48pm

a slight variant of your script that appears to be even faster (at least for arrays of this size)

begin
  A[:, 1] .= 0
  dropzeros!(A)
  A[1:size(A, 1), vcat(2:size(A, 2), 1)]
end

Amro · August 4, 2022, 4:49pm

Sorry, yes 0:25e-6:0.6.
Yes, I am doing operation later on A.

julia> @bprofile include("Code_withMatrixMul.jl")
  8.476085 seconds (46.76 M allocations: 4.289 GiB, 9.84% gc time, 32.81% compilation time)
  5.236861 seconds (38.98 M allocations: 3.900 GiB, 11.59% gc time)
  4.828120 seconds (38.98 M allocations: 3.900 GiB, 11.54% gc time)
  4.647583 seconds (38.98 M allocations: 3.900 GiB, 7.84% gc time)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 5.780 s (6.58% GC) to evaluate,
 with a memory estimate of 4.11 GiB, over 41170470 allocations.
julia> @bprofile include("Code_withMatrixPermute.jl")
  8.411843 seconds (48.63 M allocations: 4.943 GiB, 10.01% gc time, 28.52% compilation time)
  5.438983 seconds (40.90 M allocations: 4.558 GiB, 12.14% gc time)
  5.436992 seconds (40.90 M allocations: 4.558 GiB, 11.77% gc time)
  5.102955 seconds (40.90 M allocations: 4.558 GiB, 8.03% gc time)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 6.177 s (7.08% GC) to evaluate,
 with a memory estimate of 4.76 GiB, over 43090389 allocations.

BTW, I should take the average as the computation time or the one after the sentence “Single result which took”

acxz · August 4, 2022, 4:51pm

Run @bprofile on your function not the include method.

acxz · August 4, 2022, 4:54pm

Nice!, I guess permute! doesn’t work as well as simple slicing. Looks like it is fast enough over repeated operations to beat the multiplication as well!

rocco_sprmnt21 · August 4, 2022, 5:00pm

I didn’t quite understand what you have to do with this matrix, but it occurs to me that you could change the calculation algorithm a bit, using only a slice of the original matrix A [2: end], without making any shifts, and then take into account that the last column is all of zeros.
The exchange could be convenient. it depends on the operations you have to do with it.

acxz · August 4, 2022, 5:03pm

Updated MWE:

using BenchmarkTools
using LinearAlgebra
using SparseArrays
using Random

function trie()
    row_dim = 44
    col_dim = 33
    A_density = 0.2679063360881543
    rng = MersenneTwister(1)

    A = sprand(rng, row_dim, col_dim, A_density)
    B = sprand(rng, row_dim, col_dim, A_density)

    # Matrix mult
    shift = spdiagm(-1 => ones(size(A, 2) - 1))

    for _ in 1:24001
        # Matrix mult
        A .= A * shift .+ B

        # Slicing
        A[:, 1] .= 0
        dropzeros!(A)
        A[1:size(A, 1), vcat(2:size(A, 2), 1)]
        A .+= B
    end

end

Results in order of method:

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  301.004 ms … 324.754 ms  ┊ GC (min … max): 4.78% … 4.60%
 Time  (median):     317.756 ms               ┊ GC (median):    4.59%
 Time  (mean ± σ):   317.456 ms ±   5.221 ms  ┊ GC (mean ± σ):  4.60% ± 0.13%

  ▁                               ▁  ▁  ▁▁ ▁█▁  ▁▁▁▁   ▁▁     ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁██▁███▁▁████▁▁▁██▁▁▁▁▁█ ▁
  301 ms           Histogram: frequency by time          325 ms <

 Memory estimate: 1.13 GiB, allocs estimate: 144044.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 28 samples with 1 evaluation.
 Range (min … max):  175.798 ms … 184.465 ms  ┊ GC (min … max): 5.95% … 5.29%
 Time  (median):     179.539 ms               ┊ GC (median):    5.99%
 Time  (mean ± σ):   180.077 ms ±   2.321 ms  ┊ GC (mean ± σ):  5.94% ± 0.34%

                     █ █                                      ▃  
  ▇▁▁▇▁▁▁▁▁▁▁▁▇▁▁▁▇▇▇█▇█▁▁▇▁▁▁▇▇▇▁▇▁▁▁▁▇▁▇▁▇▁▇▁▁▇▁▇▁▁▇▁▁▁▁▇▁▁▁█ ▁
  176 ms           Histogram: frequency by time          184 ms <

 Memory estimate: 509.46 MiB, allocs estimate: 168029.

Amro · August 4, 2022, 7:16pm

Excuse me, this line should be:
A .= A[1:size(A, 1), vcat(2:size(A, 2), 1)]
right?
and if so, the speed is slower than multiplication.
I also did as below but gave slower speed:

A[:, 1] .= 0
dropzeros!(A)
A .= A[1:size(A, 1), vcat(2:size(A, 2), 1)] .+ B

acxz · August 4, 2022, 9:05pm

Please post your results of bprofile so we can quantify.

A .= A[1:size(A, 1), vcat(2:size(A, 2), 1)]

Why would u say that?

suavesito · August 4, 2022, 10:12pm

If you don’t mind accessing internals, you can use this version

function shiftleft!(A::SparseMatrixCSC, n)
    sz = size(A)
    n < sz[2] || throw(DimensionMismatch("$(n) bigger than the number of columns $(size[2])"))
    colnz = A.colptr[n+1] - 1
    @view(A.colptr[begin:end-n]) .= @view(A.colptr[begin+n:end])
    A.colptr[end-n+1:end] .= A.colptr[end]
    A.colptr .-= colnz
    deleteat!(A.rowval, 1:colnz)
    deleteat!(A.nzval, 1:colnz)
    A
end

shifleft(A::SparseMatrixCSC, n) = shiftleft!(copy(A), n)

Is in place (shifleft!) and maybe the fastest you can get.

acxz · August 4, 2022, 11:34pm

@suavesito while a clever idea using your function turned out to be a slower than the slicing method.

Results in order of mult, slice, shiftleft

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  281.928 ms … 360.109 ms  ┊ GC (min … max): 4.87% … 4.85%
 Time  (median):     312.091 ms               ┊ GC (median):    4.79%
 Time  (mean ± σ):   316.609 ms ±  25.090 ms  ┊ GC (mean ± σ):  4.92% ± 0.52%

  ▁  ▁   ▁    █ ▁   ▁   ▁ ▁     ▁ ▁  ▁       ▁            ▁▁  ▁  
  █▁▁█▁▁▁█▁▁▁▁█▁█▁▁▁█▁▁▁█▁█▁▁▁▁▁█▁█▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁██▁▁█ ▁
  282 ms           Histogram: frequency by time          360 ms <

 Memory estimate: 1.13 GiB, allocs estimate: 144045.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 23 samples with 1 evaluation.
 Range (min … max):  144.470 ms … 301.354 ms  ┊ GC (min … max): 7.42% … 6.76%
 Time  (median):     234.378 ms               ┊ GC (median):    6.32%
 Time  (mean ± σ):   223.732 ms ±  46.080 ms  ┊ GC (mean ± σ):  6.66% ± 0.57%

  ▁▁         █▁▁▁  ▁      ▁ ▁  ▁    ▁▁▁ ▁    ▁▁▁▁▁  ▁        ▁▁  
  ██▁▁▁▁▁▁▁▁▁████▁▁█▁▁▁▁▁▁█▁█▁▁█▁▁▁▁███▁█▁▁▁▁█████▁▁█▁▁▁▁▁▁▁▁██ ▁
  144 ms           Histogram: frequency by time          301 ms <

 Memory estimate: 509.46 MiB, allocs estimate: 168029.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  212.972 ms … 322.766 ms  ┊ GC (min … max): 4.63% … 4.78%
 Time  (median):     281.343 ms               ┊ GC (median):    4.62%
 Time  (mean ± σ):   283.081 ms ±  25.233 ms  ┊ GC (mean ± σ):  4.74% ± 0.49%

  ▁                     ▁   ▁  ▁    ▁██  ▁      ██ ▁  ▁▁      ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁█▁▁▁▁███▁▁█▁▁▁▁▁▁██▁█▁▁██▁▁▁▁▁▁█ ▁
  213 ms           Histogram: frequency by time          323 ms <

 Memory estimate: 487.23 MiB, allocs estimate: 96030.

suavesito · August 5, 2022, 12:05am

According to my benchmarks, it looks like it is much faster for the in-place shiftleft!, but still x3 faster for the out-of-place shiftleft. Also, my approach looks like is almost not affected by the percentage of non-zero values, vs. the other that do are affected.

julia> using SparseArrays, LinearAlgebra, BenchmarkTools

julia> function shiftleft!(A::SparseMatrixCSC, n)
           sz = size(A)
           n <= sz[2] || throw(DimensionMismatch("$(n) bigger than the number of columns $(size[2])"))
           colnz = A.colptr[n+1] - 1
           @view(A.colptr[begin:end-n]) .= @view(A.colptr[begin+n:end])
           A.colptr[end-n+1:end] .= A.colptr[end]
           A.colptr .-= colnz
           deleteat!(A.rowval, 1:colnz)
           deleteat!(A.nzval, 1:colnz)
           A
       end
shiftleft! (generic function with 1 method)

julia> shiftleft(A, n) = shiftleft!(copy(A), n)
shiftleft (generic function with 1 method)

julia> shiftleft_mul(A, n) = A * spdiagm(-n => ones(Int, size(A, 2)-n))
shiftleft_mul (generic function with 1 method)

julia> function shiftleft_slide(A::SparseMatrixCSC, n)
           A[:, begin:begin+n-1] .= zero(eltype(A))
           dropzeros!(A)
           A[:, vcat(begin+n:end, begin:begin+n-1)]
       end
shiftleft_slide (generic function with 1 method)

Results in

julia> @benchmark shiftleft!(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 803 evaluations.
 Range (min … max):  150.450 ns …  1.189 μs  ┊ GC (min … max): 0.00% … 66.28%
 Time  (median):     164.403 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   193.630 ns ± 89.107 ns  ┊ GC (mean ± σ):  3.87% ±  8.50%

  ██▆▃▃▂▁▁▁▁▂▄▃▂▃▂▁▁▁ ▁▁                                       ▂
  ██████████████████████▇▆▄▆▄▄▁▃▁▁▁▃▁▁▁▁▃▁▁▃▁▁▁▁▁▃▆▅▆▇▅▅▄▄▆▇▆▇ █
  150 ns        Histogram: log(frequency) by time       697 ns <

 Memory estimate: 816 bytes, allocs estimate: 1.

julia> @benchmark shiftleft!(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 800 evaluations.
 Range (min … max):  145.001 ns …   2.450 μs  ┊ GC (min … max): 0.00% … 84.63%
 Time  (median):     167.564 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   214.467 ns ± 130.594 ns  ┊ GC (mean ± σ):  4.92% ±  8.98%

  ▇█▆▄▃▂▂▂▃▄▃▃▃▃▃▂▂▁▁                                           ▂
  ████████████████████▇▇▇▇▇▅▆▆▆▆▆▅▅▅▅▆▃▄▄▄▇▆▄▅▅▅▄▅▅▅▃▄▃▃▁▃▅▆▆▆▇ █
  145 ns        Histogram: log(frequency) by time        860 ns <

 Memory estimate: 816 bytes, allocs estimate: 1.

julia> @benchmark shiftleft(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  1.769 μs … 297.860 μs  ┊ GC (min … max):  0.00% … 96.98%
 Time  (median):     2.857 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.596 μs ±  10.683 μs  ┊ GC (mean ± σ):  11.58% ±  5.31%

  ▃█▆▆▆▅▄▄▃▃▂▂▂▁ ▁             ▁▁▁                            ▂
  ████████████████▇▇▆▇▅▅▅▅▅▄▆███████▇▇▇▇▇▇▇▇▇▇▇▆▅▆▆▆▆▅▅▅▄▅▄▅▅ █
  1.77 μs      Histogram: log(frequency) by time      20.2 μs <

 Memory estimate: 15.80 KiB, allocs estimate: 4.

julia> @benchmark shiftleft(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  11.614 μs … 808.363 μs  ┊ GC (min … max):  0.00% … 94.41%
 Time  (median):     14.659 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   18.201 μs ±  35.230 μs  ┊ GC (mean ± σ):  10.51% ±  5.41%

   ▂▅▇██▆▄▃▂▃▂▂▁▂▁ ▁▁▁▂▂▁                                      ▂
  ▇██████████████████████▇▇▇▇▇▆▇▆▅▇▆▇▆▇▆▆▅▄▄▃▄▃▄▄▄▄▄▅▄▄▃▃▄▃▁▃▃ █
  11.6 μs       Histogram: log(frequency) by time      48.1 μs <

 Memory estimate: 140.52 KiB, allocs estimate: 6.

julia> @benchmark shiftleft_mul(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  17.919 μs …  1.694 ms  ┊ GC (min … max): 0.00% … 95.68%
 Time  (median):     22.169 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.795 μs ± 36.031 μs  ┊ GC (mean ± σ):  3.66% ±  2.65%

       ▅███▄▄▁
  ▁▁▂▅████████▇▅▄▃▃▃▃▃▃▃▃▃▃▂▂▃▂▂▂▃▂▂▃▃▃▄▃▃▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
  17.9 μs         Histogram: frequency by time        41.7 μs <

 Memory estimate: 25.09 KiB, allocs estimate: 19.

julia> @benchmark shiftleft_mul(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  51.419 μs … 938.537 μs  ┊ GC (min … max): 0.00% … 86.56%
 Time  (median):     55.294 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   62.974 μs ±  43.053 μs  ┊ GC (mean ± σ):  4.06% ±  5.75%

  ▃█▇▆▅▅▄▃▃▃▂▂▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁                               ▂
  ████████████████████████████████▇▇█▇▇▆▇▆▅▆▅▅▅▄▄▄▅▃▃▃▄▁▄▅▄▃▁▄ █
  51.4 μs       Histogram: log(frequency) by time       127 μs <

 Memory estimate: 246.69 KiB, allocs estimate: 23.

julia> @benchmark shiftleft_slide(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.884 μs … 230.295 μs  ┊ GC (min … max): 0.00% … 88.43%
 Time  (median):     5.887 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.586 μs ±   8.344 μs  ┊ GC (mean ± σ):  5.40% ±  4.16%

       ▁▁▃▄▆██▇▅▁
  ▁▂▃▆▇███████████▆▇▅▅▄▄▄▃▃▂▃▃▃▃▃▃▂▂▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
  4.88 μs         Histogram: frequency by time        9.59 μs <

 Memory estimate: 14.62 KiB, allocs estimate: 4.

julia> @benchmark shiftleft_slide(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  27.227 μs … 927.842 μs  ┊ GC (min … max): 0.00% … 90.21%
 Time  (median):     30.218 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.314 μs ±  37.987 μs  ┊ GC (mean ± σ):  4.89% ±  4.92%

  ▃▇█▇▅▅▅▃▃▂▁▂▃▂▁▁▁▂▂▂▃▃▃▂▂▂▂▁▁▁▁ ▁▁                           ▂
  ████████████████████████████████████▇▇▇█▇▇▆▆▆▆▆▆▆▆▆▅▅▄▆▆▄▅▅▅ █
  27.2 μs       Histogram: log(frequency) by time        79 μs <

 Memory estimate: 126.72 KiB, allocs estimate: 6.

rocco_sprmnt21 · August 5, 2022, 7:46am

I had also tried to go down this path, but I had problems with the modification of the structure which is immutable.
Now (seeing @suavesito 's example) I understand better how to do it.
Here is a version (of @suavesito 's solution) that saves some allocations.

using  SparseArrays,   ShiftedArrays

function spshiftl(A,n)
cnz=A.colptr[n+1]-1
SparseMatrixCSC{Float64,Int64}(size(A)..., ShiftedArray(A.colptr,-n, default=A.colptr[end]).-cnz,
A.rowval[cnz+1:end], A.nzval[cnz+1:end])
end

rocco_sprmnt21 · August 5, 2022, 9:26am

for example, if we were to multiply spshiftl (A, 1) by a matrix B, we could use the following relations

# A(mXn)	 = A'| A'' = A[:,2:n]|spzeros(n)
# B(nXk) = B' | B'' = B[1:n-1,:]|B[n,:]

# C[i, j] = dot(A[i, :], B[:, j]) = dot(A’[i, :], B’[:, j]) + 0*B[n,j] => A*B == A'*B'

acxz · August 5, 2022, 1:26pm

I ran my benchmarks with your tests and I got similar results that agree with them.

So I think this is the fastest method we have come up so far for shifting a matrix, but I don’t think it will be faster for @Amro 's workload.

Notice that @Amro is shifting the same matrix length(0:25e-6:0.6) times and that he is doing an operation on A. I tried to recreate this scenario with an updated MWE here: What is the fastest method to shift matrix? - #27 by acxz

And under this test, shiftleft! falls short. I’m not sure why, but I guess doing an operation ends up being slower on the shiftleft!'d matrix than the slide’d matrix? I really don’t understand why tho.

Do you mind running your benchmarks with the MWE linked?

I think this is a really useful observation!

Amro · August 5, 2022, 1:41pm

Because:

# Slicing
A[1:size(A, 1), vcat(2:size(A, 2), 1)]

is not updating A. This is what I got.

Amro · August 5, 2022, 2:36pm

Good one to save allocations.

Amro · August 5, 2022, 2:41pm

# A(mXn)	 = A'| A'' = A[:,2:n]|spzeros(n)
# B(nXk) = B' | B'' = B[1:n-1,:]|B[n,:]

# C[i, j] = dot(A[i, :], B[:, j]) = dot(A’[i, :], B’[:, j]) + 0*B[n,j] => A*B == A'*B'

I am not sure if I got it well here.

rocco_sprmnt21 · August 5, 2022, 2:52pm

Do you have any doubts about the correctness of the thesis or is it not clear what the thesis affirms?

Amro · August 5, 2022, 3:04pm

It is not clear for me what the thesis affirms.

Topic		Replies	Views
Shifting the base value of sparse arrays from 0 to 1 New to Julia	9	832	November 20, 2018
How to shift bits faster General Usage	8	3451	January 10, 2019
Julia is slower than matlab when it comes to matrix diagonalization? General Usage question , eigs , eigenvalue-problem	39	1685	January 14, 2024
Matrix-vector multiplication slower than a 'naive' for loop? Performance vector	7	1651	July 30, 2020
Circshift of a SparseMatrix Internals & Design	5	529	December 8, 2018

What is the fastest method to shift matrix?

Related topics