What is the fastest method to shift matrix?

I want to shift a sparse matrix (A) one digit to left with adding zeros in the first column at the right. For that I am multiplying the matrix with Rotate. However, when the matrix is big in size and diversity of values, it turn on to be slow. Is there a function in Julia to do that more efficiently?

Rotate=sparse([zeros(1,33); Matrix(1.0I, 33-1, 33-1) zeros(33-1,1)])

A=[-3     -4     -4     -4     -4     -3     -3     -3     -4     -4    -4     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
-3782  -3828  -3873  -3919  -3964  -4008  -4052  -4096  -4140  -4184     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 4000   3966   3932   3898   3863   3828   3792   3756   3719   3682     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    1      1      1      0      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  677    667    658    649    639    630    620    610    601      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  960    967    974    980    987    993   1000   1006   1012      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    0      0      0      0      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  606    604    603    601    599    598      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  107    113    119    124    130    135      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    1      1      1      0      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 -784   -783   -782      0      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  -65    -73    -80      0      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    1      1      1      1      1      1      1      1      1      1     1     1     1     1     1     1     1     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  687    674    661    648    635    621    608    595    581    568   555   541   527   514   500   486   473   459   445   431   417   403  0  0  0  0  0  0  0  0  0  0  0;
 1352   1359   1366   1372   1379   1385   1391   1397   1403   1409  1415  1420  1426  1431  1436  1441  1446  1450  1455  1459  1464  1468  0  0  0  0  0  0  0  0  0  0  0;
    0      0      0      0      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  588    569    550    531    512    493    474    455      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 1980   1986   1992   1997   2003   2008   2013   2018      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
   -1     -1     -1     -2     -2     -2     -2     -2     -2     -2    -2    -2     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 1081   1079   1077   1075   1073   1071   1068   1066   1064   1061     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  223    234    244    254    265    275    285    295    306    316     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
   -1     -1     -1     -1     -1     -1     -1     -1     -1      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 1428   1420   1413   1405   1397   1390   1382   1374      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  742    756    769    783    796    810    823    837      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    1      1      1      1      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 1867   1844   1821   1798   1774   1751   1727   1703   1679   1654  1630  1605  1581  1556  1531  1506     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 2458   2476   2493   2511   2528   2545   2562   2579   2595   2611  2627  2643  2658  2674  2688  2703     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    0      0      0      0      0      0      0      0      0      0     0     0     1     1     1     1     1     1     1     1     1     1  1  1  0  0  0  0  0  0  0  0  0;
 2262   2235   2208   2180   2152   2124   2096   2067   2039   2010  1981  1952  1922  1893  1863  1833  1803  1773  1743  1712     0     0  0  0  0  0  0  0  0  0  0  0  0;
 2918   2939   2961   2982   3003   3023   3044   3064   3083   3103  3122  3141  3160  3178  3196  3214  3232  3249  3266  3283     0     0  0  0  0  0  0  0  0  0  0  0  0;
    0      0     -1     -1     -1     -1     -1     -1     -1      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 1703   1699   1696   1692   1688   1684   1680   1676      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
  313    330    346    363    379    395    411    428      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    1      1      1      1      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 -174   -171   -168   -165      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 -343   -345   -346   -347      0      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    4      4      3      3      3      3      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 -413   -440   -468   -495   -522      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 2855   2852   2849   2845   2841      0      0      0      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
   -4     -4     -4     -5     -5     -5     -5     -5     -5      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 1313   1293   1273   1253   1233   1212   1191   1171      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
 2396   2408   2420   2431   2443   2454   2465   2476      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
    1      1      1      1      1      1      1      1      1      1     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0;
-1564  -1557  -1551  -1544  -1537  -1530  -1523  -1516      0      0     0     0     0     0     0     0     0     0     0     0     0     0  0  0  0  0  0  0  0  0  0  0  0]
julia> A*Rotate

Would something like this work for you? Obviously could be written smarter than this, but the point is to use SparseArrays.permute:

B = permute(A, 1:6, vcat(2:5,1))
fill!(view(B, :, size(B,2)), zero(eltype(B)))
dropzeros!(B)

Thanks for your reply.

How?

I have benchedmark but it seems not effecient:

julia> @btime A*Rotate
  266.172 ns (5 allocations: 512 bytes)
julia> @btime dropzeros!(fill!(view(permute(A, 1:6, vcat(2:5,1)), :, size(permute(A, 1:6, vcat(2:5,1)),2)), zero(eltype(permute(A, 1:6, vcat(2:5,1))))))
  9.000 ΞΌs (140 allocations: 5.88 KiB)

You should put the lines of code in a function before using @btime. The results don’t make much sense otherwise.

However, when the matrix is big in size and diversity of values, it turn on to be slow.

Can you be more specific and specify the dimension/density you are working with and update your MWE with it so that we can better benchmark it?

I have updated my MWE.

1 Like

As @acxz mentions, that’s not the most correct way to benchmark. But even with more careful benchmarking, you’re rightβ€”it is slower. I just assumed that that function would be doing something clever enough to beat the matrix-matrix multiply, but clearly that’s not the case. Sorry for suggesting that without actually benchmarking.

I think there is hope with your solution!

That dropzeros is probably quite expensive. Maybe there is a way without using dropzeros.

So looking at the source code of SparseArrays we have the following method: dropstored!

Doing the following might be faster than the just multiplying:

    dropstored!(A, 1:size(A, 1), 1)
    permute!(A, 1:size(A, 1), vcat(2:size(A, 2), 1))

The problem is you can’t use dropstored! atm since its not in the latest released SparseArrays version afaik

The smarter thing to do would be to only dropzeros! everyso often when multiplication starts becoming slow. You just want to drop the extra values you keep around, but drop them every operation will be too costly.

why don’t you try this?

using ShiftedArrays
ShiftedArray(A,(0,-1),default=0)

In that case we’ll lose the sparsity nature of these matrices. Doing operations on sparse matrices will be much slower using ShiftedArrays than with SparseArrays.

Just to check, is A of type SparseMatrixCSC{T, Int}? As your MWE, it is currently a Matrix.

I have no way to test, but does such a thing have a chance to be faster than multiplying A * Rotate?

r,c,v=findnz(A)
c.-=1
n=count(==(0),c)+1  # or n=findfirst(!=(0),c)
sparse(r[n:end], c[n:end],v[n:end], size(A)...)

Just tested it, im afraid it is slower than both of the previous methods discussed.

Here are the results for the methods discussed. Tested for larger matrix sizes and the permute method is noticeably faster, and for the dimensions and density (extrapolated from the dense matrix given) the permute method is still a slightly bit better.
I’d just test the permute solution for your workload @Amro and see if the timings are better on your specific hardware/workload.

And to get more speed out of the permute method I would wait for dropstored! to be released in the SparseArrays library as well as dropzeros! not at every multiplication but every say 5 multiplications.

Summary:

using BenchmarkTools
using LinearAlgebra
using SparseArrays
using Random

function trie()
    row_dim = 44
    col_dim = 33

    A_density = 0.2679063360881543
    rng = MersenneTwister(1)

    A = sprand(rng, row_dim, col_dim, A_density)

    # Matrix mult
    shift = spdiagm(-1 => ones(size(A, 2) - 1))
    A *= shift

    # Permute
    A[:, 1] .= 0
    dropzeros!(A)
    permute!(A, 1:size(A, 1), vcat(2:size(A, 2), 1))

    # New matrix
    rows, cols, vals = findnz(A)
    new_idx = findfirst(!=(1), cols)
    cols .-= 1
    A = sparse(
        rows[new_idx:end],
        cols[new_idx:end],
        vals[new_idx:end],
        size(A)...
    )

end

Results in order of method appearing in function. Other methods commented out when code was ran.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  21.370 ΞΌs …  2.427 ms  β”Š GC (min … max): 0.00% … 91.08%
 Time  (median):     23.271 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   26.106 ΞΌs Β± 41.812 ΞΌs  β”Š GC (mean Β± Οƒ):  3.85% Β±  2.55%

  β–†β–ˆβ–ƒβ–β–β–                                                      ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–…β–‡β–‡β–‡β–†β–†β–…β–…β–…β–…β–β–…β–…β–„β–β–…β–„β–„β–†β–„β–β–ƒβ–β–„β–ƒβ–ƒβ–β–ƒβ–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–ƒβ–ƒβ–…β–…β–†β–†β–…β–…β–… β–ˆ
  21.4 ΞΌs      Histogram: log(frequency) by time      99.2 ΞΌs <

 Memory estimate: 53.78 KiB, allocs estimate: 37.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  20.606 ΞΌs …  2.020 ms  β”Š GC (min … max): 0.00% … 67.20%
 Time  (median):     22.429 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   25.574 ΞΌs Β± 47.173 ΞΌs  β”Š GC (mean Β± Οƒ):  4.06% Β±  2.27%

  β–†β–ˆβ–‚β–β–‚β–β–β–β–‚                                                   ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–…β–…β–…β–ƒβ–…β–„β–„β–…β–ƒβ–„β–β–ƒβ–ƒβ–ƒβ–ƒβ–„β–ƒβ–β–β–β–ƒβ–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–ƒβ–ƒβ–…β–…β–…β–†β–…β–„β–† β–ˆ
  20.6 ΞΌs      Histogram: log(frequency) by time      98.9 ΞΌs <

 Memory estimate: 34.62 KiB, allocs estimate: 23.

julia> include("shift_sparse.jl")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  29.102 ΞΌs …  2.655 ms  β”Š GC (min … max): 0.00% … 78.84%
 Time  (median):     32.237 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   39.756 ΞΌs Β± 81.615 ΞΌs  β”Š GC (mean Β± Οƒ):  5.91% Β±  2.98%

  β–„β–ˆ                                                           
  β–ˆβ–ˆβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–‚β–β–β–β–β–β–β–β–β–‚β–β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–‚
  29.1 ΞΌs         Histogram: frequency by time         179 ΞΌs <

 Memory estimate: 60.55 KiB, allocs estimate: 32.

TLDR: use permute method

1 Like

It is sparse matrix in my real code. But I could not find a way to copy and paste it here. So, I changed it to dense and paste it.

You shouldn’t be copy values directly from your code, but rather try to create your MWE from important properties of your code. For example, using the density/dimensions of the matrix you are working with and creating a random sparse matrix with that information. See my example on how to create such a MWE without hardcoding your values but using properties and random values.

Thank you very much for your tips.

I have tried the permute! with dropzeros! but it is slower. Yes, I think I should not apply the dropzeros! at each time, but this is a bit annoying.:slight_smile:

Can you post the @bprofile output comparison of it? I couldn’t imagine the difference to be that significant if there is one, considering on my PC I get the opposite results.

Ok, I will. Are you using dropzeros!(A) at each time step?

I am only shifting the matrix once as per the MWE. But yea let’s see what the results are for using dropzeros! at each timestep.

I’ll update my tests to try with multiple timesteps as well.

Edit: can you mention how many timesteps you are working with? Based on that you could tune the value of how many times to call dropzeros!

1 Like

time loop simulation 0:25e-6:0.6.