Fastest way possible to find index of value equals 1 across a matrix column

Hi there everyone. I am currently working on my thesis which involves an optimization problem in which I have an X decision matrix filled with 0s, each column must have a 1 value as it represents an allocation.
In my local search algorithm I iterate over the matrix as follows:

S, B = size(X)
for j₁ in 1:B
  ĩ = findfirst(==(1), X[:, j₁])
  for j₂ in j₁+1:B
    i✶ = findfirst(==(1), X[:, j₂])
    # compute the move with ĩ and i✶ ...
  end
end

However after timing and benchmarking I find that a significant amount of time is spent on the i✶ = findfirst(==(1), X[:, j₂]) line which made me ask myself: What is the fastest way out there to find which index equals one across my entire column? I am talking about absolute blazing speed as this computation is performed hundreds of thousands of times. Any tricks or tips or memory shenanigans to do so? My matrix is not particularly large but it’s not trivially small either (eg. 500 rows, 2650 columns, double that).

For reference, here is a TimerOutputs output:

| Section | ncalls | time   | %tot   | avg    | alloc   | %tot   | avg     |
|---------|--------|--------|--------|--------|---------|--------|---------|
| Loop j1 | 6      | 63.5s  | 100.0% | 10.6s  | 81.7GiB | 100.0% | 13.6GiB |
| ĩ       | 15.9k  | 31.9ms | 0.1%   | 2.01μs | 63.1MiB | 0.1%   | 4.06KiB |
| Loop j2 | 15.9k  | 63.4s  | 99.9%  | 3.99ms | 81.6GiB | 99.9%  | 5.25MiB |
| i⋆      | 21.1M  | 39.1s  | 61.5%  | 1.85μs | 81.6GiB | 99.9%  | 4.06KiB |

Edit: wow I feel dumb, after posting this I saw the number of calls to i✶ and no wonder it spends so much time there, it gets called millions of times ):

Also note that X[:, j₂] makes a copy. you might want to either use @views.

Also if your matrix has very few 1 values, you should consider using a sparse matrix.

What’s a good rule of thumb to know when to use @views? It’s still not very clear to me when or when not to use it. Thank you very much. Using the macro brought times down significantly so it’s very useful towards achieving my goal.

You should use @views whenever you take a slice of an array (i.e. any indexing that has a : in it), unless you actually want a copy. This is one of my least favorite parts about Julia but it is unfortunately pretty much impossible to fix.

2 Likes

Looking at the code, it would be strange if the the # compute the move part changes X. Therefore, the for loops basically look for all pairs of initial “1” elements in the columns. The way the pairs are found is searching for the same “1” many times over a column. Furthermore, there might be a problem of stability if a column does not contain a “1” (which may never occur for reasons specific to this case).

In any case, looking for the initial “1” once (like in dosomething2() is better, i.e.

julia> using Random

julia> Random.seed!(124)

julia> X = rand(10,10) .< 0.2;

julia> function dosomething(X)
           s=0
           S, B = size(X) 
           for j₁ in 1:B
               ĩ = findfirst(==(1), @view X[:, j₁])
               for j₂ in j₁+1:B
                   i✶ = findfirst(==(1), @view X[:, j₂])
                   s += ĩ * i✶
               end
           end
           s
       end
dosomething (generic function with 1 method)
julia> function dosomething2(X)
           s=0
           S, B = size(X)
           js = [findfirst(==(1), @view X[:, i]) for i in 1:B]
           for j₁ in 1:B
               for j₂ in j₁+1:B
                   s += js[j₁]*js[j₂]
               end
           end
           s
       end
dosomething2 (generic function with 1 method)
julia> using BenchmarkTools

julia> @btime dosomething($X)
  237.532 ns (0 allocations: 0 bytes)
1240

julia> @btime dosomething2($X)
  152.856 ns (1 allocation: 144 bytes)
1240

As shown, this different approach gives a 40% speedup even with a small matrix and the speedup increases with X’s size!

P.S. js can also be calculated with:

js = findfirst.(==(1), eachcol(X))

which is a bit shorter.

1 Like

Thank you for your input! However my current approach does mean that my X is changed during the iterations so precalculating which indices are where is out of the window.

I see. These mutating algorithms are tricky indeed. In any case, the no findfirst approach can still be used, if the first 1 are dynamically kept up to date (when assigning to X a 1 fixup the initial 1 vector. If X mutates violently, this cannot be done, but I’ve given my input :wink: