# Perform a multi-column, multi-dimensional search quickly

I would like to perform a multi-column search from a multi-dimensional matrix like the one below, is there a fast way to do this? (In the example below, I want to get the rows next to -5.0 3.0 2.0)

Test_matrix =
[-5.0 4.0 2.0
-5.0 3.0 2.0
-5.0 3.0 3.0
-5.0 3.0 1.0
-4.0 3.0 2.0
-6.0 3.0 2.0
-5.0 3.0 2.0
5.0 -9.0 0.0
5.0 -3.0 0.0
5.0 -3.0 2.0]

In the example above, we want to get the columns 2 ,7,10.

This is why I wrote the following.
result_index_can = (Test_matrix[:,]. == -5). *(Test_matrix[:,]. == 3). *(Test_matrix[:,]. == 2).

but in the real data, the speed of the search is a bottleneck due to the large matrix(over 1000000). Does anyone have a good solution for this?

One way of writing this is:

``````eachrow(Test_matrix) .== Ref([-5, 3, 2])
``````

Another is:

``````findall(==([-5, 3, 2]), eachrow(Test_matrix))
``````

NB: enclose your whole code inside triple backticks for proper display in discourse, which also enables the copy button

1 Like

Thank you for your beautiful codes!
But , index_can = eachrow(Test_matrix) .== Ref([-5, 3, 2])
is slower than
result_index_can = (Test_matrix[:,]. == -5). *(Test_matrix[:,]. == 3). *(Test_matrix[:,]. == 2).

Is this missing interpolation, or the very small size of the MWE? I see:

``````julia> f1(x) = (x[:, 1] .== -5.0) .* (x[:, 2] .== 3.0) .* (x[:, 3] .== 2.0);

julia> f2(x) = eachrow(x) .== Ref([-5.0, 3.0, 2.0]);

julia> f3(x) = findall(==([-5.0, 3.0, 2.0]), eachrow(x));

julia> @btime f1(\$x);
10.526 ms (10 allocations: 23.01 MiB)

julia> @btime f2(\$x);
18.738 ms (8 allocations: 38.27 MiB)

julia> @btime f3(\$x);
16.527 ms (1000008 allocations: 76.32 MiB)
``````

consistent with what @Hikaru_Matsuoka says.

Btw is your example wrong? You say you want rows 2, 7, 10, but row 10 has 5 and -3 instead of -5 and 3? All the methods discussed so far wouldn’t return row 10 of `Test_matrix`

On my machine for the 1e6 example I find this to be an order of magnitude faster:

``````julia> function f4(x)
res = Int64[]
for i ∈ 1:size(x, 1)
@inbounds if x[i, 1] == -5 && x[i, 2] == 3.0 && x[i, 3] == 2.0
push!(res, i)
end
end
res
end
f4 (generic function with 1 method)

julia> @btime f4(\$x);
1.694 ms (6 allocations: 21.86 KiB)
``````

Note that `f3` and `f4` above return only the integer indices of the matching rows (i.e. roughly a length 1,000,000/11^3 ≈ 751 vector for the example above with 11 randomly chosen integers from -5 to 5), while versions 1, and 2 give you a `size(x, 1)`-lenght vector of booleans.

To get such a vector with the speed of `f4` you can do:

``````julia> function f5(x)
res = fill(false, size(x, 1))
for i ∈ 1:size(x, 1)
@inbounds if x[i, 1] == -5 && x[i, 2] == 3.0 && x[i, 3] == 2.0
res[i] = true
end
end
res
end
f5 (generic function with 1 method)

julia> @btime f5(\$x);
1.686 ms (2 allocations: 976.67 KiB)
``````
2 Likes

Thanks @nilshg, using @benchmark without interpolation seems to work most of the time (similar results to @btime with interpolation). It works alright for my two expressions but not for OP’s…

Yeah I’ve never quite understood when you’re okay not to interpolate, so I just always do it (I always thought the issue was access to global variables, which I would have expected to affect all three expressions similarly, but clearly not!)

I should also add that while I tend to always sprinkle `@inbounds` into performance sensitive loops, this doesn’t actually seem to have any effect here - the compiler might be smart enough to eliminate the boundscheck for this simple loop.

And finally if one of the actual experts stops by and proposes some sort of `@tturbo` solution that runs in 50ns or something, maybe they could also explain why `fill(false, size(x, 1)` is two allocations?

As a side note, we normally use less brackets to index a matrix:

``````(x[:,1] .== -5) .& (x[:,2] .== 3) .& (x[:,3] .== 2)
``````

And here are some other simple options, using comprehensions, which are faster than your original code and only slightly slower than Nils’ solution above:

``````# outputs only matching integer indices:
[i for i in axes(x,1) if x[i,1]==-5 && x[i,2]==3 && x[i,3]==2]

# outputs all indices, as Vector{Bool} with 0 or 1:
[x[i,1] == -5 && x[i,2] == 3 && x[i,3] == 2 for i in axes(x,1)]
``````
2 Likes

a slight variant of the solution of @rafael.guerra

``````[i for i in axes(\$x,1) if (\$x[i,1]==-5 * \$x[i,2]==3 * \$x[i,3]==2)==1]
``````