I have two DataFrames. The size of the first is (7390, 2), represented below by a small (3, 2) array. The size of the second is quite a bit larger, at (482244, 8), represented below by a larger (7, 4) array.
I need to compare the values of EACH row in the small array with the values of ALL rows of the large array.
Obviously, it doesn’t take long in this example, but using my very large-sized DataFrames (and comparing against 8 values in the large one), it takes about 10 minutes. Unfortunately, I have to do this over 24 times. Not knowing any better, I used a for
loop, but wonder if there’s a better way. Any suggestions?
function inbound(x::Float64, y::Float64, bounds::Matrix{Float64})
lastrow, _ = size(bounds)
return any(x .>= bounds[1:lastrow, 1] .&&
x .<= bounds[1:lastrow, 3] .&&
y .>= bounds[1:lastrow, 2] .&&
y .<= bounds[1:lastrow, 4])
end
julia> smallarray = [2.2 7.5
3.5 8.1
1.7 3.6];
julia> nrow_smallarray, _ = size(smallarray);
julia> largearray = [1.6 3.5 1.8 3.7
2.0 8.5 2.2 8.7
3.0 9.5 3.2 9.7
4.0 10.5 4.2 10.7
5.0 11.5 5.2 11.7
6.0 12.5 6.2 12.7
7.0 13.5 7.2 13.7];
julia> result = Vector{Bool}(undef, nrow_smallarray);
julia> for i in 1:nrow_smallarray
result[i] = inbound(smallarray[i, 1], smallarray[i, 2], largearray)
end
julia> result
3-element Vector{Bool}:
0
0
1