What is the correct syntax for the following:
julia> xV = 1:5; yV = 2:6;
julia> findall((x,y) -> ((x > 2) & (y > 4)), (xV, yV))
The docs only have examples with one argument.
What is the correct syntax for the following:
julia> xV = 1:5; yV = 2:6;
julia> findall((x,y) -> ((x > 2) & (y > 4)), (xV, yV))
The docs only have examples with one argument.
findall(x -> x[1] > 2 && x[2] > 4, collect(zip(xV, yV)))
will do what you want, though this doesn’t answer your question.
You can also write this as:
((x, y),) -> x > 2 && y > 4
i.e. a function which takes a single argument which is unpacked into the tuple (x, y)
. For example:
julia> f = ((x, y),) -> x > 2 && y > 4
#11 (generic function with 1 method)
julia> args = (1, 2)
(1, 2)
julia> f(args)
false
Thank you for that. Is this any better than
findall((xV .> 2) .& (yV .> 4))
I was trying to avoid this because it would allocate two vectors.
Sorry if I am dense here, but how what I apply this to findall
?
Well, you’ll want to use the logical &&
instead of the binary &
operator, for one. Other than that, it’s probably not much different in terms of allocations. The collect
is necessary because you can’t index a zip
(unless there’s some really cool thing I don’t know about), so you’re going to allocate for that.
You’d use that as a replacement for my single-argument function:
findall(f, collect(zip(xV, yV)))
Thanks again.
This is something very common in my line of work. It’s unfortunate that the allocations are not easily avoided. I guess I need to think a bit more.
You can do it with a list comprehension
[i for (i,(x,y)) in enumerate(zip(xV,yV)) if x > 2 && y > 4]
but that gets a bit unwieldily. Maybe just a loop?
You should benchmark to see what’s fastest (BenchmarkTools.jl).
julia> @btime findall(($xV .> 2) .& ($yV .> 4))
101.396 ns (3 allocations: 224 bytes)
2-element Array{Int64,1}:
4
5
julia> @btime findall(x -> x[1] > 2 && x[2] > 4, collect(zip($xV, $yV)))
184.162 ns (7 allocations: 432 bytes)
2-element Array{Int64,1}:
4
5
The first one allocates a bitvector, which is very memory efficient.
This is a great point and the difference between the two approaches will only get more pronounced as the vectors get bigger.
I really like this solution. It reads unwieldy, but can be packaged into a function that makes the intention clear.
But before I do this, I need to benchmark the naive solution
findall((xV .> 2) .& (yV .> 4))
against it. The point made by @DNF that this allocates a bitvector perhaps means that I should not worry too much about allocations. Though I am running this on vectors of length around 10,000 in a loop that gets called about 100 times for each model solution.
Again, thanks for all the suggestions.
By my benchmark, the simple findall
wins hands down:
using BenchmarkTools, Random
rng = MersenneTwister(123);
n = 10_000;
xV = rand(rng, UInt8.(1:4), n);
yV = rand(rng, UInt8.(1:5), n);
x0 = UInt8(2);
y0 = UInt8(4);
idxV = findall((xV .== x0) .& (yV .== y0));
idx2V = [i for (i,(x,y)) in enumerate(zip(xV,yV)) if x == x0 && y == y0];
@assert isequal(idxV, idx2V)
println("findall")
@benchmark idxV = findall(($xV .== $x0) .& ($yV .== $y0))
println("comprehension")
@benchmark idx2V = [i for (i,(x,y)) in enumerate(zip($xV,$yV)) if x == $x0 && y == $y0]
with output
findall
BenchmarkTools.Trial:
memory estimate: 9.55 KiB
allocs estimate: 4
--------------
minimum time: 1.892 μs (0.00% GC)
median time: 2.402 μs (0.00% GC)
mean time: 3.665 μs (29.99% GC)
maximum time: 877.968 μs (99.38% GC)
--------------
samples: 10000
evals/sample: 9
comprehension
BenchmarkTools.Trial:
memory estimate: 8.52 KiB
allocs estimate: 15
--------------
minimum time: 31.903 μs (0.00% GC)
median time: 32.940 μs (0.00% GC)
mean time: 34.938 μs (0.00% GC)
maximum time: 135.270 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
I used UInt8
because that’s what I am using in my actual application.
Thanks again to everyone for helping out.
I’ve benchmarked that kind of thing about half a dozen times when trying to improve efficiency. The allocating version was as fast or faster. This may change in the future due to internal changes. And of course, which is faster may depend on the details of your problem, eg size of the vectors.