Anonymous function applied to multiple arguments

What is the correct syntax for the following:

julia> xV = 1:5; yV = 2:6;

julia> findall((x,y) -> ((x > 2) & (y > 4)), (xV, yV))

The docs only have examples with one argument.

findall(x -> x[1] > 2 && x[2] > 4, collect(zip(xV, yV)))

will do what you want, though this doesn’t answer your question.

You can also write this as:

((x, y),) -> x > 2 && y > 4

i.e. a function which takes a single argument which is unpacked into the tuple (x, y). For example:

julia> f = ((x, y),) -> x > 2 && y > 4
#11 (generic function with 1 method)

julia> args = (1, 2)
(1, 2)

julia> f(args)
false
2 Likes

Thank you for that. Is this any better than

findall((xV .> 2) .& (yV .> 4))

I was trying to avoid this because it would allocate two vectors.

Sorry if I am dense here, but how what I apply this to findall?

Well, you’ll want to use the logical && instead of the binary & operator, for one. Other than that, it’s probably not much different in terms of allocations. The collect is necessary because you can’t index a zip (unless there’s some really cool thing I don’t know about), so you’re going to allocate for that.

You’d use that as a replacement for my single-argument function:

findall(f, collect(zip(xV, yV)))

Thanks again.

This is something very common in my line of work. It’s unfortunate that the allocations are not easily avoided. I guess I need to think a bit more.

You can do it with a list comprehension

[i for (i,(x,y)) in enumerate(zip(xV,yV)) if x > 2 && y > 4]

but that gets a bit unwieldily. Maybe just a loop?

You should benchmark to see what’s fastest (BenchmarkTools.jl).

julia> @btime findall(($xV .> 2) .& ($yV .> 4))
  101.396 ns (3 allocations: 224 bytes)
2-element Array{Int64,1}:
 4
 5

julia> @btime findall(x -> x[1] > 2 && x[2] > 4, collect(zip($xV, $yV)))
  184.162 ns (7 allocations: 432 bytes)
2-element Array{Int64,1}:
 4
 5

The first one allocates a bitvector, which is very memory efficient.

1 Like

This is a great point and the difference between the two approaches will only get more pronounced as the vectors get bigger.

I really like this solution. It reads unwieldy, but can be packaged into a function that makes the intention clear.

But before I do this, I need to benchmark the naive solution

findall((xV .> 2) .& (yV .> 4))

against it. The point made by @DNF that this allocates a bitvector perhaps means that I should not worry too much about allocations. Though I am running this on vectors of length around 10,000 in a loop that gets called about 100 times for each model solution.

Again, thanks for all the suggestions.

By my benchmark, the simple findall wins hands down:

using BenchmarkTools, Random

rng = MersenneTwister(123);
n = 10_000;
xV = rand(rng, UInt8.(1:4), n);
yV = rand(rng, UInt8.(1:5), n);

x0 = UInt8(2);
y0 = UInt8(4);

idxV = findall((xV .== x0) .& (yV .== y0));
idx2V = [i for (i,(x,y)) in enumerate(zip(xV,yV)) if x == x0 && y == y0];
@assert isequal(idxV, idx2V)

println("findall")
@benchmark idxV = findall(($xV .== $x0) .& ($yV .== $y0))

println("comprehension")
@benchmark idx2V = [i for (i,(x,y)) in enumerate(zip($xV,$yV)) if x == $x0 && y == $y0]

with output

findall
BenchmarkTools.Trial:
  memory estimate:  9.55 KiB
  allocs estimate:  4
  --------------
  minimum time:     1.892 μs (0.00% GC)
  median time:      2.402 μs (0.00% GC)
  mean time:        3.665 μs (29.99% GC)
  maximum time:     877.968 μs (99.38% GC)
  --------------
  samples:          10000
  evals/sample:     9

comprehension
BenchmarkTools.Trial:
  memory estimate:  8.52 KiB
  allocs estimate:  15
  --------------
  minimum time:     31.903 μs (0.00% GC)
  median time:      32.940 μs (0.00% GC)
  mean time:        34.938 μs (0.00% GC)
  maximum time:     135.270 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

I used UInt8 because that’s what I am using in my actual application.

Thanks again to everyone for helping out.

3 Likes

I’ve benchmarked that kind of thing about half a dozen times when trying to improve efficiency. The allocating version was as fast or faster. This may change in the future due to internal changes. And of course, which is faster may depend on the details of your problem, eg size of the vectors.