What is the fastest method to shift matrix?

a slight variant of your script that appears to be even faster (at least for arrays of this size)

begin
  A[:, 1] .= 0
  dropzeros!(A)
  A[1:size(A, 1), vcat(2:size(A, 2), 1)]
end
1 Like

Sorry, yes 0:25e-6:0.6.
Yes, I am doing operation later on A.

julia> @bprofile include("Code_withMatrixMul.jl")
  8.476085 seconds (46.76 M allocations: 4.289 GiB, 9.84% gc time, 32.81% compilation time)
  5.236861 seconds (38.98 M allocations: 3.900 GiB, 11.59% gc time)
  4.828120 seconds (38.98 M allocations: 3.900 GiB, 11.54% gc time)
  4.647583 seconds (38.98 M allocations: 3.900 GiB, 7.84% gc time)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 5.780 s (6.58% GC) to evaluate,
 with a memory estimate of 4.11 GiB, over 41170470 allocations.
julia> @bprofile include("Code_withMatrixPermute.jl")
  8.411843 seconds (48.63 M allocations: 4.943 GiB, 10.01% gc time, 28.52% compilation time)
  5.438983 seconds (40.90 M allocations: 4.558 GiB, 12.14% gc time)
  5.436992 seconds (40.90 M allocations: 4.558 GiB, 11.77% gc time)
  5.102955 seconds (40.90 M allocations: 4.558 GiB, 8.03% gc time)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 6.177 s (7.08% GC) to evaluate,
 with a memory estimate of 4.76 GiB, over 43090389 allocations.

BTW, I should take the average as the computation time or the one after the sentence “Single result which took”

Run @bprofile on your function not the include method.

Nice!, I guess permute! doesn’t work as well as simple slicing. Looks like it is fast enough over repeated operations to beat the multiplication as well!

I didn’t quite understand what you have to do with this matrix, but it occurs to me that you could change the calculation algorithm a bit, using only a slice of the original matrix A [2: end], without making any shifts, and then take into account that the last column is all of zeros.
The exchange could be convenient. it depends on the operations you have to do with it.

1 Like

Updated MWE:

using BenchmarkTools
using LinearAlgebra
using SparseArrays
using Random

function trie()
    row_dim = 44
    col_dim = 33
    A_density = 0.2679063360881543
    rng = MersenneTwister(1)

    A = sprand(rng, row_dim, col_dim, A_density)
    B = sprand(rng, row_dim, col_dim, A_density)

    # Matrix mult
    shift = spdiagm(-1 => ones(size(A, 2) - 1))

    for _ in 1:24001
        # Matrix mult
        A .= A * shift .+ B

        # Slicing
        A[:, 1] .= 0
        dropzeros!(A)
        A[1:size(A, 1), vcat(2:size(A, 2), 1)]
        A .+= B
    end

end

Results in order of method:

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  301.004 ms … 324.754 ms  ┊ GC (min … max): 4.78% … 4.60%
 Time  (median):     317.756 ms               ┊ GC (median):    4.59%
 Time  (mean ± σ):   317.456 ms ±   5.221 ms  ┊ GC (mean ± σ):  4.60% ± 0.13%

  ▁                               ▁  ▁  ▁▁ ▁█▁  ▁▁▁▁   ▁▁     ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁██▁███▁▁████▁▁▁██▁▁▁▁▁█ ▁
  301 ms           Histogram: frequency by time          325 ms <

 Memory estimate: 1.13 GiB, allocs estimate: 144044.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 28 samples with 1 evaluation.
 Range (min … max):  175.798 ms … 184.465 ms  ┊ GC (min … max): 5.95% … 5.29%
 Time  (median):     179.539 ms               ┊ GC (median):    5.99%
 Time  (mean ± σ):   180.077 ms ±   2.321 ms  ┊ GC (mean ± σ):  5.94% ± 0.34%

                     █ █                                      ▃  
  ▇▁▁▇▁▁▁▁▁▁▁▁▇▁▁▁▇▇▇█▇█▁▁▇▁▁▁▇▇▇▁▇▁▁▁▁▇▁▇▁▇▁▇▁▁▇▁▇▁▁▇▁▁▁▁▇▁▁▁█ ▁
  176 ms           Histogram: frequency by time          184 ms <

 Memory estimate: 509.46 MiB, allocs estimate: 168029.

Excuse me, this line should be:
A .= A[1:size(A, 1), vcat(2:size(A, 2), 1)]
right?
and if so, the speed is slower than multiplication.
I also did as below but gave slower speed:

A[:, 1] .= 0
dropzeros!(A)
A .= A[1:size(A, 1), vcat(2:size(A, 2), 1)] .+ B

Please post your results of bprofile so we can quantify.

A .= A[1:size(A, 1), vcat(2:size(A, 2), 1)]

Why would u say that?

If you don’t mind accessing internals, you can use this version

function shiftleft!(A::SparseMatrixCSC, n)
    sz = size(A)
    n < sz[2] || throw(DimensionMismatch("$(n) bigger than the number of columns $(size[2])"))
    colnz = A.colptr[n+1] - 1
    @view(A.colptr[begin:end-n]) .= @view(A.colptr[begin+n:end])
    A.colptr[end-n+1:end] .= A.colptr[end]
    A.colptr .-= colnz
    deleteat!(A.rowval, 1:colnz)
    deleteat!(A.nzval, 1:colnz)
    A
end

shifleft(A::SparseMatrixCSC, n) = shiftleft!(copy(A), n)

Is in place (shifleft!) and maybe the fastest you can get.

1 Like

@suavesito while a clever idea using your function turned out to be a slower than the slicing method.

Results in order of mult, slice, shiftleft

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  281.928 ms … 360.109 ms  ┊ GC (min … max): 4.87% … 4.85%
 Time  (median):     312.091 ms               ┊ GC (median):    4.79%
 Time  (mean ± σ):   316.609 ms ±  25.090 ms  ┊ GC (mean ± σ):  4.92% ± 0.52%

  ▁  ▁   ▁    █ ▁   ▁   ▁ ▁     ▁ ▁  ▁       ▁            ▁▁  ▁  
  █▁▁█▁▁▁█▁▁▁▁█▁█▁▁▁█▁▁▁█▁█▁▁▁▁▁█▁█▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁██▁▁█ ▁
  282 ms           Histogram: frequency by time          360 ms <

 Memory estimate: 1.13 GiB, allocs estimate: 144045.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 23 samples with 1 evaluation.
 Range (min … max):  144.470 ms … 301.354 ms  ┊ GC (min … max): 7.42% … 6.76%
 Time  (median):     234.378 ms               ┊ GC (median):    6.32%
 Time  (mean ± σ):   223.732 ms ±  46.080 ms  ┊ GC (mean ± σ):  6.66% ± 0.57%

  ▁▁         █▁▁▁  ▁      ▁ ▁  ▁    ▁▁▁ ▁    ▁▁▁▁▁  ▁        ▁▁  
  ██▁▁▁▁▁▁▁▁▁████▁▁█▁▁▁▁▁▁█▁█▁▁█▁▁▁▁███▁█▁▁▁▁█████▁▁█▁▁▁▁▁▁▁▁██ ▁
  144 ms           Histogram: frequency by time          301 ms <

 Memory estimate: 509.46 MiB, allocs estimate: 168029.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  212.972 ms … 322.766 ms  ┊ GC (min … max): 4.63% … 4.78%
 Time  (median):     281.343 ms               ┊ GC (median):    4.62%
 Time  (mean ± σ):   283.081 ms ±  25.233 ms  ┊ GC (mean ± σ):  4.74% ± 0.49%

  ▁                     ▁   ▁  ▁    ▁██  ▁      ██ ▁  ▁▁      ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁█▁▁▁▁███▁▁█▁▁▁▁▁▁██▁█▁▁██▁▁▁▁▁▁█ ▁
  213 ms           Histogram: frequency by time          323 ms <

 Memory estimate: 487.23 MiB, allocs estimate: 96030.

According to my benchmarks, it looks like it is much faster for the in-place shiftleft!, but still x3 faster for the out-of-place shiftleft. Also, my approach looks like is almost not affected by the percentage of non-zero values, vs. the other that do are affected.

julia> using SparseArrays, LinearAlgebra, BenchmarkTools

julia> function shiftleft!(A::SparseMatrixCSC, n)
           sz = size(A)
           n <= sz[2] || throw(DimensionMismatch("$(n) bigger than the number of columns $(size[2])"))
           colnz = A.colptr[n+1] - 1
           @view(A.colptr[begin:end-n]) .= @view(A.colptr[begin+n:end])
           A.colptr[end-n+1:end] .= A.colptr[end]
           A.colptr .-= colnz
           deleteat!(A.rowval, 1:colnz)
           deleteat!(A.nzval, 1:colnz)
           A
       end
shiftleft! (generic function with 1 method)

julia> shiftleft(A, n) = shiftleft!(copy(A), n)
shiftleft (generic function with 1 method)

julia> shiftleft_mul(A, n) = A * spdiagm(-n => ones(Int, size(A, 2)-n))
shiftleft_mul (generic function with 1 method)

julia> function shiftleft_slide(A::SparseMatrixCSC, n)
           A[:, begin:begin+n-1] .= zero(eltype(A))
           dropzeros!(A)
           A[:, vcat(begin+n:end, begin:begin+n-1)]
       end
shiftleft_slide (generic function with 1 method)

Results in

julia> @benchmark shiftleft!(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 803 evaluations.
 Range (min … max):  150.450 ns …  1.189 μs  ┊ GC (min … max): 0.00% … 66.28%
 Time  (median):     164.403 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   193.630 ns ± 89.107 ns  ┊ GC (mean ± σ):  3.87% ±  8.50%

  ██▆▃▃▂▁▁▁▁▂▄▃▂▃▂▁▁▁ ▁▁                                       ▂
  ██████████████████████▇▆▄▆▄▄▁▃▁▁▁▃▁▁▁▁▃▁▁▃▁▁▁▁▁▃▆▅▆▇▅▅▄▄▆▇▆▇ █
  150 ns        Histogram: log(frequency) by time       697 ns <

 Memory estimate: 816 bytes, allocs estimate: 1.

julia> @benchmark shiftleft!(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 800 evaluations.
 Range (min … max):  145.001 ns …   2.450 μs  ┊ GC (min … max): 0.00% … 84.63%
 Time  (median):     167.564 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   214.467 ns ± 130.594 ns  ┊ GC (mean ± σ):  4.92% ±  8.98%

  ▇█▆▄▃▂▂▂▃▄▃▃▃▃▃▂▂▁▁                                           ▂
  ████████████████████▇▇▇▇▇▅▆▆▆▆▆▅▅▅▅▆▃▄▄▄▇▆▄▅▅▅▄▅▅▅▃▄▃▃▁▃▅▆▆▆▇ █
  145 ns        Histogram: log(frequency) by time        860 ns <

 Memory estimate: 816 bytes, allocs estimate: 1.

julia> @benchmark shiftleft(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  1.769 μs … 297.860 μs  ┊ GC (min … max):  0.00% … 96.98%
 Time  (median):     2.857 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.596 μs ±  10.683 μs  ┊ GC (mean ± σ):  11.58% ±  5.31%

  ▃█▆▆▆▅▄▄▃▃▂▂▂▁ ▁             ▁▁▁                            ▂
  ████████████████▇▇▆▇▅▅▅▅▅▄▆███████▇▇▇▇▇▇▇▇▇▇▇▆▅▆▆▆▆▅▅▅▄▅▄▅▅ █
  1.77 μs      Histogram: log(frequency) by time      20.2 μs <

 Memory estimate: 15.80 KiB, allocs estimate: 4.

julia> @benchmark shiftleft(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  11.614 μs … 808.363 μs  ┊ GC (min … max):  0.00% … 94.41%
 Time  (median):     14.659 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   18.201 μs ±  35.230 μs  ┊ GC (mean ± σ):  10.51% ±  5.41%

   ▂▅▇██▆▄▃▂▃▂▂▁▂▁ ▁▁▁▂▂▁                                      ▂
  ▇██████████████████████▇▇▇▇▇▆▇▆▅▇▆▇▆▇▆▆▅▄▄▃▄▃▄▄▄▄▄▅▄▄▃▃▄▃▁▃▃ █
  11.6 μs       Histogram: log(frequency) by time      48.1 μs <

 Memory estimate: 140.52 KiB, allocs estimate: 6.

julia> @benchmark shiftleft_mul(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  17.919 μs …  1.694 ms  ┊ GC (min … max): 0.00% … 95.68%
 Time  (median):     22.169 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.795 μs ± 36.031 μs  ┊ GC (mean ± σ):  3.66% ±  2.65%

       ▅███▄▄▁
  ▁▁▂▅████████▇▅▄▃▃▃▃▃▃▃▃▃▃▂▂▃▂▂▂▃▂▂▃▃▃▄▃▃▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
  17.9 μs         Histogram: frequency by time        41.7 μs <

 Memory estimate: 25.09 KiB, allocs estimate: 19.

julia> @benchmark shiftleft_mul(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  51.419 μs … 938.537 μs  ┊ GC (min … max): 0.00% … 86.56%
 Time  (median):     55.294 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   62.974 μs ±  43.053 μs  ┊ GC (mean ± σ):  4.06% ±  5.75%

  ▃█▇▆▅▅▄▃▃▃▂▂▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁                               ▂
  ████████████████████████████████▇▇█▇▇▆▇▆▅▆▅▅▅▄▄▄▅▃▃▃▄▁▄▅▄▃▁▄ █
  51.4 μs       Histogram: log(frequency) by time       127 μs <

 Memory estimate: 246.69 KiB, allocs estimate: 23.

julia> @benchmark shiftleft_slide(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.884 μs … 230.295 μs  ┊ GC (min … max): 0.00% … 88.43%
 Time  (median):     5.887 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.586 μs ±   8.344 μs  ┊ GC (mean ± σ):  5.40% ±  4.16%

       ▁▁▃▄▆██▇▅▁
  ▁▂▃▆▇███████████▆▇▅▅▄▄▄▃▃▂▃▃▃▃▃▃▂▂▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
  4.88 μs         Histogram: frequency by time        9.59 μs <

 Memory estimate: 14.62 KiB, allocs estimate: 4.

julia> @benchmark shiftleft_slide(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  27.227 μs … 927.842 μs  ┊ GC (min … max): 0.00% … 90.21%
 Time  (median):     30.218 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.314 μs ±  37.987 μs  ┊ GC (mean ± σ):  4.89% ±  4.92%

  ▃▇█▇▅▅▅▃▃▂▁▂▃▂▁▁▁▂▂▂▃▃▃▂▂▂▂▁▁▁▁ ▁▁                           ▂
  ████████████████████████████████████▇▇▇█▇▇▆▆▆▆▆▆▆▆▆▅▅▄▆▆▄▅▅▅ █
  27.2 μs       Histogram: log(frequency) by time        79 μs <

 Memory estimate: 126.72 KiB, allocs estimate: 6.

I had also tried to go down this path, but I had problems with the modification of the structure which is immutable.
Now (seeing @suavesito 's example) I understand better how to do it.
Here is a version (of @suavesito 's solution) that saves some allocations.

using  SparseArrays,   ShiftedArrays

function spshiftl(A,n)
cnz=A.colptr[n+1]-1
SparseMatrixCSC{Float64,Int64}(size(A)..., ShiftedArray(A.colptr,-n, default=A.colptr[end]).-cnz,
A.rowval[cnz+1:end], A.nzval[cnz+1:end])
end

1 Like

for example, if we were to multiply spshiftl (A, 1) by a matrix B, we could use the following relations

# A(mXn)	 = A'| A'' = A[:,2:n]|spzeros(n)
# B(nXk) = B' | B'' = B[1:n-1,:]|B[n,:]

# C[i, j] = dot(A[i, :], B[:, j]) = dot(A’[i, :], B’[:, j]) + 0*B[n,j] => A*B == A'*B' 

I ran my benchmarks with your tests and I got similar results that agree with them.

So I think this is the fastest method we have come up so far for shifting a matrix, but I don’t think it will be faster for @Amro 's workload.

Notice that @Amro is shifting the same matrix length(0:25e-6:0.6) times and that he is doing an operation on A. I tried to recreate this scenario with an updated MWE here: What is the fastest method to shift matrix? - #27 by acxz

And under this test, shiftleft! falls short. I’m not sure why, but I guess doing an operation ends up being slower on the shiftleft!'d matrix than the slide’d matrix? I really don’t understand why tho.

Do you mind running your benchmarks with the MWE linked?

I think this is a really useful observation!

1 Like

Because:

# Slicing
A[1:size(A, 1), vcat(2:size(A, 2), 1)]

is not updating A. This is what I got.

Good one to save allocations.

# A(mXn)	 = A'| A'' = A[:,2:n]|spzeros(n)
# B(nXk) = B' | B'' = B[1:n-1,:]|B[n,:]

# C[i, j] = dot(A[i, :], B[:, j]) = dot(A’[i, :], B’[:, j]) + 0*B[n,j] => A*B == A'*B' 

I am not sure if I got it well here.

Do you have any doubts about the correctness of the thesis or is it not clear what the thesis affirms?

It is not clear for me what the thesis affirms.

try if this is faster

Spshiftl(A,1) .+ B == hcat(A[:,2:end] .+ B[:1,1:end-1], B[:,end])
1 Like