Add varargs methods of `any`, `all`, and `count`

It would be helpful to have variable-length methods of any, all, and count for comparing collections, so that e.g. you could write this:

all(≡, [1,2,3], [1,2,3]) # true
count(≥, (3,2,1), 1:3) # 2

Proof of concept:

Base.any(f::Function, c1, c2, cs...) = any(splat(f), zip(c1, c2, cs...))
Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
Base.count(f::Function, c1, c2, cs...) = count(splat(f), zip(c1, c2, cs...))

This method of all might be nicer without calling length, but I haven’t yet dug into the best of way of doing that. I’m also not sure whether it’s better for all to return false when collection lengths mismatch, or to throw an error.

Thoughts?

5 Likes

It does not feel very natural to me. I would better understand the simple

count(t -> t[1] ≥ t[2], zip((3,2,1), 1:3))

and if that’s too ugly, then be explicit about the splatting (which is the POC implementation in OP):

f = (a, b) -> a ≥ b
count(Base.splat(f), zip((3,2,1), 1:3))

I can understand this code immediately, whereas I have to think about what count(≥, (3,2,1), 1:3) means.

Overall, I don’t see much added value, given the added complexity. But that can just be me, I am generally conservative about the new features.

2 Likes

Here’s a vote in the opposite direction. I find

count(≥, (3,2,1), 1:3)

significantly more clear and obvious than

count(t -> t[1] ≥ t[2], zip((3,2,1), 1:3))

and dramatically clearer than

f = (a, b) -> a ≥ b
count(Base.splat(f), zip((3,2,1), 1:3))

That latter can be simplified, though, to this:

count(Base.splat(>=), zip((3,2,1), 1:3))

Still, anything involving the splat function feels a bit mind-bending.

Having vararg methods for any, all, and count dovetails nicely with map. IMO. If you can parse

map(>, [3,2,1], 1:3)

then it’s not hard to see what

any(>, [3,2,1], 1:3)

does.

6 Likes

Yes, I think it is to some extent a matter of personal preference. I was not trying to disprove other people’s opinions.

I guess this might be a better example:

a = (2, 2)
b = (1, 2)
count(≥, a, b)

It is not obvious to me that this should return 2 since a[1] ≥ b[1] and a[2] ≥ b[2]. I could imagine a different semantics where this returns 1 since a[1] ≥ a[2] and b[1] < b[2].

I was not aware this existed. I have the same problem with it as I sketched above though. But I guess I am late to the party and should have objected when this map method was discussed :slight_smile:

imagine a different semantics where this returns 1 since a[1] ≥ a[2] and b[1] < b[2]

This is a bit like the difference between max and maximum: you can write maximum(f, c) as mapreduce(f, max, c). While max acts on the objects in its arguments directly, and is thus a useful comparison operator, maximum reduces a collection of objects.

It should be clear that the current behavior of count is closer to maximum than max, as count acts on a collection. We could also write:

count(f, c) = mapreduce(x->f(x)::Bool, +, c; init=0)
any(f, c)   = mapreduce(x->f(x)::Bool, |, c; init=false)
all(f, c)   = mapreduce(x->f(x)::Bool, &, c; init=true)

which makes these relationships clear. Thus, (#1) the OP is more consistent with the current definition.

Additionally, (#2) if the varargs methods behaved as you imagine, it would create an ambiguity with the current definition. In the case of count(≥, (2,2)), should it throw an error as it does now, or should it return 1? Should count(≥(2), (2,2)) return 2 as it does now, or should it throw an error?

There’s also (#3) an argument to be made for efficiency. For a large collection of objects c, it turns out that calling maximum(c) is more efficient than calling max(c...), as the latter will require compiling a large number of methods. A similar argument applies here: if the size of the collection will be greater than about 31, splatting it across the function’s arguments becomes expensive.

And finally, (#4) the semantics proposed in the OP are consistent with those of varargs map and mapreduce.

So for these four reasons, I think the semantics of the OP make more sense, even though it does raise the question of what all should do for collections of mismatched lengths.

Sidenote…

I don’t know what I expected, but it wasn’t this :sweat_smile:

julia> all(==(0), Iterators.repeated(0))
false

It seems disjoint from this behavior:

all(==(()), zip())

That’s UB, one of these:

1 Like

On further thought, it seems like what’s the best thing for all to do, when iterators are of different lengths, is to terminate iteration on the first terminating iterator (i.e., the same behavior as zip and map). I’ve updated the OP accordingly.

Why? For the same reason as zip: some iterators go on forever:

julia> Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
       mod3 = Iterators.cycle(0:2);
       all(==, mod3, [0,1,2,0,1,2])
true

If the length of all the collections must be equal, then it makes sense for the user to specify that separately.

1 Like

But you have

julia> all(==(0), Iterators.repeated(0,0))
true

I must confess I don’t understand the motivation for the OP. Julia already supports this:

julia> all([1,2,3] .== 1:3)
true

julia> count((3,2,1) .>= 1:3)
2  

which is clear and convenient.

1 Like
julia> using BenchmarkTools
       Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
       @btime all(a .== b)  setup=(a=[1,2,3]; b=1:3)
       @btime all(==, a, b) setup=(a=[1,2,3]; b=1:3)
  42.553 ns (2 allocations: 96 bytes)
  3.700 ns (0 allocations: 0 bytes)
true

If a language is going to call itself performant, then its easiest and most natural idioms should be those that avoid performance traps.

6 Likes

I would never write something like this.

2 Likes

One could argue that all(a .== b) is the easiest and most natural. We can also hope that escape analysis can avoid the intermediate here if the code is inside a function. At the global scope it is not avoidable since the argument is parsed first.

But anyway I like your proposal since it is the way this problem is handled everywhere in Julia.

3 Likes

its easiest and most natural idioms should be those that avoid performance traps

Sure. The OP gave the impression the problem was lack of convenient syntax, not performance. As shown in your post above, mapreduce already does the job, but your proposed syntax is admittedly prettier. I wonder how many other collection-reducing functions there are that would benefit from such syntax? There can be such a thing as too many convenience methods, but any and all are two of the more important functions.

How much better would the world be if broadcasting returned a lazy collection?

julia> [1, 2, 3] .== 1:3
3-element BroadcastingVector{Bool, Tuple{typeof(==), Vector{Int64}, UnitRange{Int64}}}:
 1
 1
 1

“All things are poison and nothing is without poison. Solely the dose determines that a thing is not a poison” - Paracelsus

I don’t think any and all are so special. If they’re gonna have variadic methods then so should many other reductions. I’m not convinced that’s worth the trouble.

What do you have in mind?

minimum, maximum (which really make any/all unnecessary), extrema, findmin, findmax, argmin, …

I can see how vararg minimum and maximum, if they existed, could be trickily punned into serving the role of vararg any and all, but I struggle to see how any of the items on the list {minimum, maximum, extrema, findmin, findmax, argmin, argmax} would actually “make sense,” conceptually/semantically speaking, with vararg methods. Unlike any of those, count, any, and all take logical predicates, which can be meaningfully applied to a collection of collections; I’m not sure I can find an agreeable way for the “maximum” of a collection of collections to be meaningful.

That said, we do find {findfirst, findlast, findnext} which take logical predicates—and there we find conflict because of findnext(pred::Function, A, i). I would point out, though, that {findfirst, findlast, findnext} are expected to return the index of the collection, which isn’t generally a meaningful concept for a vararg method that will accept multiple collections.

So, I don’t think that the slippery slope argument here poses a real challenge; there doesn’t seem to me any reason to extend vararg methods to these other reducing functions.

julia> using LazyArrays: @~

julia> using BenchmarkTools
       Base.all(f, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
       @btime all(a .== b)  setup=(a=[1,2,3]; b=1:3)
       @btime all(==, a, b) setup=(a=[1,2,3]; b=1:3)
       @btime all(@~ a .== b) setup=(a=[1,2,3]; b=1:3)
  28.533 ns (2 allocations: 96 bytes)
  2.809 ns (0 allocations: 0 bytes)
  5.169 ns (0 allocations: 0 bytes)
true
4 Likes