# Add varargs methods of `any`, `all`, and `count`

It would be helpful to have variable-length methods of `any`, `all`, and `count` for comparing collections, so that e.g. you could write this:

``````all(≡, [1,2,3], [1,2,3]) # true
count(≥, (3,2,1), 1:3) # 2
``````

Proof of concept:

``````Base.any(f::Function, c1, c2, cs...) = any(splat(f), zip(c1, c2, cs...))
Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
Base.count(f::Function, c1, c2, cs...) = count(splat(f), zip(c1, c2, cs...))
``````

This method of `all` might be nicer without calling `length`, but I haven’t yet dug into the best of way of doing that. I’m also not sure whether it’s better for `all` to return `false` when collection lengths mismatch, or to throw an error.

Thoughts?

5 Likes

It does not feel very natural to me. I would better understand the simple

``````count(t -> t[1] ≥ t[2], zip((3,2,1), 1:3))
``````

and if that’s too ugly, then be explicit about the splatting (which is the POC implementation in OP):

``````f = (a, b) -> a ≥ b
count(Base.splat(f), zip((3,2,1), 1:3))
``````

I can understand this code immediately, whereas I have to think about what `count(≥, (3,2,1), 1:3)` means.

Overall, I don’t see much added value, given the added complexity. But that can just be me, I am generally conservative about the new features.

2 Likes

Here’s a vote in the opposite direction. I find

``````count(≥, (3,2,1), 1:3)
``````

significantly more clear and obvious than

``````count(t -> t[1] ≥ t[2], zip((3,2,1), 1:3))
``````

and dramatically clearer than

``````f = (a, b) -> a ≥ b
count(Base.splat(f), zip((3,2,1), 1:3))
``````

That latter can be simplified, though, to this:

``````count(Base.splat(>=), zip((3,2,1), 1:3))
``````

Still, anything involving the `splat` function feels a bit mind-bending.

Having vararg methods for `any`, `all`, and `count` dovetails nicely with `map`. IMO. If you can parse

``````map(>, [3,2,1], 1:3)
``````

then it’s not hard to see what

``````any(>, [3,2,1], 1:3)
``````

does.

6 Likes

Yes, I think it is to some extent a matter of personal preference. I was not trying to disprove other people’s opinions.

I guess this might be a better example:

``````a = (2, 2)
b = (1, 2)
count(≥, a, b)
``````

It is not obvious to me that this should return `2` since `a[1] ≥ b[1]` and `a[2] ≥ b[2]`. I could imagine a different semantics where this returns `1` since `a[1] ≥ a[2]` and `b[1] < b[2]`.

I was not aware this existed. I have the same problem with it as I sketched above though. But I guess I am late to the party and should have objected when this `map` method was discussed

imagine a different semantics where this returns `1` since `a[1] ≥ a[2]` and `b[1] < b[2]`

This is a bit like the difference between `max` and `maximum`: you can write `maximum(f, c)` as `mapreduce(f, max, c)`. While `max` acts on the objects in its arguments directly, and is thus a useful comparison operator, `maximum` reduces a collection of objects.

It should be clear that the current behavior of `count` is closer to `maximum` than `max`, as `count` acts on a collection. We could also write:

``````count(f, c) = mapreduce(x->f(x)::Bool, +, c; init=0)
any(f, c)   = mapreduce(x->f(x)::Bool, |, c; init=false)
all(f, c)   = mapreduce(x->f(x)::Bool, &, c; init=true)
``````

which makes these relationships clear. Thus, (#1) the OP is more consistent with the current definition.

Additionally, (#2) if the varargs methods behaved as you imagine, it would create an ambiguity with the current definition. In the case of `count(≥, (2,2))`, should it throw an error as it does now, or should it return `1`? Should `count(≥(2), (2,2))` return `2` as it does now, or should it throw an error?

There’s also (#3) an argument to be made for efficiency. For a large collection of objects `c`, it turns out that calling `maximum(c)` is more efficient than calling `max(c...)`, as the latter will require compiling a large number of methods. A similar argument applies here: if the size of the collection will be greater than about 31, splatting it across the function’s arguments becomes expensive.

And finally, (#4) the semantics proposed in the OP are consistent with those of varargs `map` and `mapreduce`.

So for these four reasons, I think the semantics of the OP make more sense, even though it does raise the question of what `all` should do for collections of mismatched lengths.

Sidenote…

I don’t know what I expected, but it wasn’t this

``````julia> all(==(0), Iterators.repeated(0))
false
``````

It seems disjoint from this behavior:

``````all(==(()), zip())
``````

That’s UB, one of these:

1 Like

On further thought, it seems like what’s the best thing for `all` to do, when iterators are of different lengths, is to terminate iteration on the first terminating iterator (i.e., the same behavior as `zip` and `map`). I’ve updated the OP accordingly.

Why? For the same reason as `zip`: some iterators go on forever:

``````julia> Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
mod3 = Iterators.cycle(0:2);
all(==, mod3, [0,1,2,0,1,2])
true
``````

If the length of all the collections must be equal, then it makes sense for the user to specify that separately.

1 Like

But you have

``````julia> all(==(0), Iterators.repeated(0,0))
true
``````

I must confess I don’t understand the motivation for the OP. Julia already supports this:

``````julia> all([1,2,3] .== 1:3)
true

julia> count((3,2,1) .>= 1:3)
2
``````

which is clear and convenient.

1 Like
``````julia> using BenchmarkTools
Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
@btime all(a .== b)  setup=(a=[1,2,3]; b=1:3)
@btime all(==, a, b) setup=(a=[1,2,3]; b=1:3)
42.553 ns (2 allocations: 96 bytes)
3.700 ns (0 allocations: 0 bytes)
true
``````

If a language is going to call itself performant, then its easiest and most natural idioms should be those that avoid performance traps.

6 Likes

I would never write something like this.

2 Likes

One could argue that `all(a .== b)` is the easiest and most natural. We can also hope that escape analysis can avoid the intermediate here if the code is inside a function. At the global scope it is not avoidable since the argument is parsed first.

But anyway I like your proposal since it is the way this problem is handled everywhere in Julia.

3 Likes

its easiest and most natural idioms should be those that avoid performance traps

Sure. The OP gave the impression the problem was lack of convenient syntax, not performance. As shown in your post above, `mapreduce` already does the job, but your proposed syntax is admittedly prettier. I wonder how many other collection-reducing functions there are that would benefit from such syntax? There can be such a thing as too many convenience methods, but `any` and `all` are two of the more important functions.

How much better would the world be if broadcasting returned a lazy collection?

``````julia> [1, 2, 3] .== 1:3
1
1
1
``````

“All things are poison and nothing is without poison. Solely the dose determines that a thing is not a poison” - Paracelsus

I don’t think `any` and `all` are so special. If they’re gonna have variadic methods then so should many other reductions. I’m not convinced that’s worth the trouble.

What do you have in mind?

minimum, maximum (which really make any/all unnecessary), extrema, findmin, findmax, argmin, …

I can see how vararg `minimum` and `maximum`, if they existed, could be trickily punned into serving the role of vararg `any` and `all`, but I struggle to see how any of the items on the list {minimum, maximum, extrema, findmin, findmax, argmin, argmax} would actually “make sense,” conceptually/semantically speaking, with vararg methods. Unlike any of those, `count`, `any`, and `all` take logical predicates, which can be meaningfully applied to a collection of collections; I’m not sure I can find an agreeable way for the “maximum” of a collection of collections to be meaningful.

That said, we do find {findfirst, findlast, findnext} which take logical predicates—and there we find conflict because of `findnext(pred::Function, A, i)`. I would point out, though, that {findfirst, findlast, findnext} are expected to return the index of the collection, which isn’t generally a meaningful concept for a vararg method that will accept multiple collections.

So, I don’t think that the slippery slope argument here poses a real challenge; there doesn’t seem to me any reason to extend vararg methods to these other reducing functions.

``````julia> using LazyArrays: @~

julia> using BenchmarkTools
Base.all(f, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
@btime all(a .== b)  setup=(a=[1,2,3]; b=1:3)
@btime all(==, a, b) setup=(a=[1,2,3]; b=1:3)
@btime all(@~ a .== b) setup=(a=[1,2,3]; b=1:3)
28.533 ns (2 allocations: 96 bytes)
2.809 ns (0 allocations: 0 bytes)
5.169 ns (0 allocations: 0 bytes)
true
``````
4 Likes