This creates unnecessary extra closures - >(0) already creates a curried version ![]()
This has to allocate the resulting mask, which is potentially slower (since it has to iterate twice, once to create the mask and once to find the indices).
julia> using BenchmarkTools
julia> @btime intersect(findall(x -> x>0,$A), findall(x -> x>0,$B))
885.417 ns (19 allocations: 1.84 KiB)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(3, 3)
julia> @btime intersect(findall(>(0),$A), findall(>(0),$B))
868.421 ns (18 allocations: 1.83 KiB)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(3, 3)
julia> greater0(A,B) = findall((A .> 0) .& (B .> 0))
greater0 (generic function with 1 method)
julia> @btime greater0($A,$B)
112.854 ns (3 allocations: 240 bytes)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(3, 3)
If you want to go for speed, something like this might be best, which stays fast even with huge arrays:
julia> findGreater0(A,B) = [ idx for idx in CartesianIndices(A) if @inbounds (A[idx] > 0 && B[idx] > 0) ]
findGreater0 (generic function with 1 method)
julia> A = repeat(-1:1',1,3)
3×3 Array{Int64,2}:
-1 -1 -1
0 0 0
1 1 1
julia> B = A';
julia> @btime findGreater0($A,$B)
93.085 ns (3 allocations: 240 bytes)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(3, 3)
julia> B = copy(A'); # manifest the adjoint
julia> @btime findGreater0($A,$B)
95.364 ns (3 allocations: 240 bytes) # staying fast!
1-element Array{CartesianIndex{2},1}:
CartesianIndex(3, 3)
julia> A = zeros(Int,10000,10000);
julia> A[1,:] .= -1;
julia> A[end,:] .= 1;
julia> B = A';
julia> @btime greater0($A,$B)
1.230 s (5 allocations: 11.93 MiB)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(10000, 10000)
julia> @btime findGreater0($A,$B)
116.800 ms (3 allocations: 240 bytes)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(10000, 10000)
Julias Adjoint is lazy, so that’s why creating and iterating the bitmask becomes very sluggish - the memory is not accessed in an optimal manner. Iterating with CartesianIndices skips that problem entirely.
Manifesting the adjoint makes it almost as fast as the generator version - the difference being the extra iteration that has to be done as well as the big allocation (that might not be possible if we run out of memory):
julia> B = copy(A'); # manifest the adjoint (10000x10000 array)
julia> @btime greater0($A,$B)
128.834 ms (5 allocations: 11.93 MiB)
1-element Array{CartesianIndex{2},1}:
CartesianIndex(10000, 10000)
Note that there’s not really much more that could be done here - since we don’t know when the condition will be true, SIMD will be a little hard to do since it’s effectively a coin flip whether or not a given index will be included or not.