Is there a way to speed up this matrix (with only 1 & 0) multiplication & power operation?

Endeavour · January 17, 2022, 9:46am

I have a matrix (cost_mat[~600,~600]) with only 1 & 0 as its elements. You can think that each element of the matrix is individually put as 1 (with probability p) or 0. I need to find out (cost_mat^N) where N(~600) is a large number. I don’t need the actual value of the elements in (cost_mat^N). I just need to know whether they are positive or not.(1 if +ve else 0,as all elements are anyway positive here so need not consider any negative values).

I didn’t use the Integer matrix as BLAS doesn’t use multicores then. Besides, the values tend to get quite high. So,I had to do the matrix power operation in step (with overflow check in each step).

Here is a sample of my code,

size=553
cost_mat=zeros(Float64, size,size);

 for j in 1:1:size
          for i in 1:1:size
                   cost_mat[i,j] = rand()< 0.7 ? 1 : 0;
                   cost_mat[j,i]=cost_mat[i,j];
            end
  end

cost_mat = cost_mat+LinearAlgebra.I;
println("cost matrix calculated")
c = copy(cost_mat);

N=48
step=24

for i in 1:1:N
       println(i,"/",N," multiplication done ")
       #After doing each power operation,I convert the values  to 1 or 0
        c = !=(0).(c*(cost_mat^(step)));

       end

This operation takes around 10 seconds on a 32 core machine(BLAS seems to use only 20 cores tough 30 cores allotted). I have to repeat this calculation ~1000 times. So, this seems a quite difficult bottleneck.

Besides to assign 1(if element != 0) or 0 (if element is 0), I have checked equality. I think checking (>0.5 or <0.5) instead of equality would make more sense as we have floats here. (Anyway, the elements should have an integer value, so checking greater or less than 1 should also work),
Surprisingly this operation is significantly faster in Numpy. (though it uses all 32 cores)
Taking the power (cost_mat)^(N*step) is faster (is it because of the avoidance of fore loop & offloaded to BLAS?) but I need to take care of the overflow. So, doing it in steps seem safer to me.

I can’t think of any way to speed up this operation. I don’t have much experience with Julia. Any suggestion would help me a lot.

Thank you in advance.Have a nice day.

mcabbott · January 17, 2022, 12:53pm

cost_mat^N is certainly going to overflow. And even cost_mat^2 seems to have no zeros. Is the real problem much more sparse than this?

In terms of code speed, the biggest slowdown is because c = !=(0).(c*(cost_mat^(step))); makes c a boolean array, in fact a BitMatrix, and matrix multiplication between that and an array of floats is slow. c .= ... will be faster. But just writing trues(553, 553) will also be faster!

goerch · January 17, 2022, 3:41pm

Using @mcabbott 's hint and Russian Peasant exponentiation

using LinearAlgebra, BenchmarkTools

function init(size)
    cost_mat=zeros(Float64, size,size);
    for j in 1:size
        for i in j:size
            cost_mat[i,j] = rand()< 0.7 ? 1 : 0;
            cost_mat[j,i] = cost_mat[i,j];
        end
    end
    cost_mat += LinearAlgebra.I;
end

function compute1(cost_mat)
    N=48
    step=24
    c = copy(cost_mat);
    for i in 1:N
        # println(i,"/",N," multiplication done ")
        #After doing each power operation,I convert the values  to 1 or 0
        c = map(cij -> cij != 0.0 ? 1.0 : 0.0, c*(cost_mat^(step)));
    end
    c
end

function compute2(cost_mat)
    N=48*24
    c = copy(cost_mat);
    while N > 0
        if N % 2 == 0
            c *= c
            N ÷= 2
        else
            c *= cost_mat
            N -= 1
        end
        c = map(cij -> cij != 0.0 ? 1.0 : 0.0, c);
    end
    c
end

const cost_mat = init(553)
println("cost matrix calculated")

c1 = compute1(cost_mat)

c2 = compute2(cost_mat)
@assert c2 ≈ c1

@btime compute1(cost_mat)
@btime compute2(cost_mat)

I see

 762.914 ms (674 allocations: 786.30 MiB)
  36.623 ms (50 allocations: 58.33 MiB)

GunnarFarneback · January 17, 2022, 4:18pm

Why do you need to compute this? To me it looks like it will get you the connected components of the corresponding graph, which should be orders of magnitude faster if you use graph algorithms instead.

Oscar_Smith · January 17, 2022, 4:32pm

specifically, you want to use Floyd–Warshall algorithm - Wikipedia

etienne_dg · January 17, 2022, 5:11pm

Why do you need to compute this? To me it looks like it will get you the connected components of the corresponding graph, which should be orders of magnitude faster if you use graph algorithms instead.

Not exactly, if your graph is an even cycle, you will get two classes of vertices, but I agree some graph theory is the way to go. I think we need to find the decomposition of the connected components in classes of vertices under the period of the connected component. Still much faster than with matrices. Edit: : for undirected graphs, the period is either 1 or 2, so we only need to check if the connected component forms a bipartite graph.

specifically, you want to use Floyd–Warshall algorithm - Wikipedia

Floyd–Warshall is for shortest paths, connected components is simply BFS

GunnarFarneback · January 17, 2022, 6:25pm

I don’t follow how this works. Notice that the diagonal is filled.

rocco_sprmnt21 · January 17, 2022, 6:33pm

replacing

c = map(cij -> cij != 0.0 ? 1.0 : 0.0, c);

with

replace!(cij->ifelse(cij != 0.0,1.0,0.0), c);

gives another small speed boost (I don’t know why)

goerch · January 17, 2022, 6:36pm

Nice:

30.199 ms (26 allocations: 30.33 MiB)

Even fewer allocations.

Edit: using MKL shaves off another 10 ms.

etienne_dg · January 17, 2022, 6:45pm

The diagonal can have elements, but I don’t see how it is necessarily filled.
If you take the graph composed of a single edge, the associated matrix A have the antidiagonal filled with ones. A^2 is the identity, and not a matrix full of ones.

GunnarFarneback · January 17, 2022, 6:47pm

I base that on this line from the original post:

cost_mat = cost_mat+LinearAlgebra.I;

GunnarFarneback · January 17, 2022, 7:35pm

Here is an implementation:

using Graphs

function compute3(cost_mat)
    g = SimpleGraph(cost_mat)
    c = zeros(size(cost_mat))
    for component in connected_components(g)
        for i in component, j in component
            c[i, j] = 1
        end
    end
    return c
end

Most of the time is spent on constructing the graph, which, depending on where the real data is coming from, might be done faster than via the matrix.

goerch · January 17, 2022, 7:48pm

This is way superior!

6.096 ms (2794 allocations: 10.44 MiB)

And even better with sparser input.

rafael.guerra · January 17, 2022, 8:56pm

Sorry for asking: why isn’t the inner loop along the faster column-major direction?

GunnarFarneback · January 17, 2022, 9:00pm

Feel free to switch it.

rafael.guerra · January 17, 2022, 9:09pm

@GunnarFarneback, that was a minor thing. The real question is how did you relate the OP’s question (at no point graphs were mentioned) with your amazing graph-theory solution? Any good reference you could recommend for beginners would be appreciated.

GunnarFarneback · January 17, 2022, 9:20pm

No shortcut there I’m afraid. With a sufficiently solid foundation in linear algebra and some exposure to the concept of an adjacency matrix, you can recognize these kinds of structures.

Endeavour · January 18, 2022, 11:29am

Thanks a lot for the really great suggestion. The sparsity of the actual matrix depends on the probability used for the generation of cost_mat(here we used 0.7,it will be more sparse at 0.1). But this is one of the samples I need for the actual use case.
I didn’t know that Bitmatrix slows down the operation. I used it here to take advantage of operator broadcasting which I expected to be vectorized. I shall keep this point in mind.Iust curious,does Bitmatrix multiplication with float matrix don’t get offloaded to BLAS?

Endeavour · January 18, 2022, 11:40am

Thanks a lot. As @goerch pointed out this approach seems much more efficient than using matrices directly. You are exactly right. I have a graph with nodes on a rotated square lattice. Some of the bonds between the nodes are broken and some are not.(The probability 0.7 here implies the probability of the broken bond). I need to find the shortest path to the top/left/right boundary from a starting point at the middle node of the bottom layer such that I need to cross the minimum number of bonds(that are not broken) & I also need to know the associated cost for such a path. I don’t have much experience with graph theory. I shall look into this. Thanks, for pointing out this.

etienne_dg · January 18, 2022, 1:45pm

I need to find the shortest path to the top/left/right boundary from a starting point at the middle node of the bottom layer such that I need to cross the minimum number of bonds(that are not broken) & I also need to know the associated cost for such a path. I don’t have much experience with graph theory. I shall look into this. Thanks, for pointing out this.

You can achieve this by building the dual graph of your lattice (one vertex for each square face of the lattice, and one edge between each two adjacent faces. Each edge of the dual graph is associated with an edge of the original lattice. Walking on an edge of the dual graph is equivalent to crossing an edge of the original graph. Give a weight of 1 on the edges of the dual graph corresponding to the intact edges of the original graph and 0 for the broken ones and find the shortest path (using a shortest path algorithm like djikstra or A* (for A*, you need to provide an heuristic, the Manhattan distance should be a good one in this scenario)

Topic		Replies	Views
Julia matrix-multiplication performance Performance linearalgebra	20	8471	October 30, 2022
Multiplying boolean matrices Performance question , performance , linearalgebra	4	786	November 7, 2021
Fastest way to perform A * B * A’ General Usage question , blas , mkl , linearalgebra , optimization	9	566	August 13, 2024
How to make faster in a matrix multiplication inside a loop? Performance	2	453	July 27, 2020
Matrix Element-wise operations slow down in a certain size interval Numerics	1	561	August 16, 2017

Is there a way to speed up this matrix (with only 1 & 0) multiplication & power operation?

Related topics