Add varargs methods of `any`, `all`, and `count`

uniment · February 23, 2023, 11:13am

It would be helpful to have variable-length methods of any, all, and count for comparing collections, so that e.g. you could write this:

all(≡, [1,2,3], [1,2,3]) # true
count(≥, (3,2,1), 1:3) # 2

Proof of concept:

Base.any(f::Function, c1, c2, cs...) = any(splat(f), zip(c1, c2, cs...))
Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
Base.count(f::Function, c1, c2, cs...) = count(splat(f), zip(c1, c2, cs...))

This method of all might be nicer without calling length, but I haven’t yet dug into the best of way of doing that. I’m also not sure whether it’s better for all to return false when collection lengths mismatch, or to throw an error.

Thoughts?

barucden · February 23, 2023, 12:34pm

It does not feel very natural to me. I would better understand the simple

count(t -> t[1] ≥ t[2], zip((3,2,1), 1:3))

and if that’s too ugly, then be explicit about the splatting (which is the POC implementation in OP):

f = (a, b) -> a ≥ b
count(Base.splat(f), zip((3,2,1), 1:3))

I can understand this code immediately, whereas I have to think about what count(≥, (3,2,1), 1:3) means.

Overall, I don’t see much added value, given the added complexity. But that can just be me, I am generally conservative about the new features.

DNF · February 23, 2023, 1:16pm

Here’s a vote in the opposite direction. I find

count(≥, (3,2,1), 1:3)

significantly more clear and obvious than

count(t -> t[1] ≥ t[2], zip((3,2,1), 1:3))

and dramatically clearer than

f = (a, b) -> a ≥ b
count(Base.splat(f), zip((3,2,1), 1:3))

That latter can be simplified, though, to this:

count(Base.splat(>=), zip((3,2,1), 1:3))

Still, anything involving the splat function feels a bit mind-bending.

Having vararg methods for any, all, and count dovetails nicely with map. IMO. If you can parse

map(>, [3,2,1], 1:3)

then it’s not hard to see what

any(>, [3,2,1], 1:3)

does.

barucden · February 23, 2023, 1:37pm

Yes, I think it is to some extent a matter of personal preference. I was not trying to disprove other people’s opinions.

I guess this might be a better example:

a = (2, 2)
b = (1, 2)
count(≥, a, b)

It is not obvious to me that this should return 2 since a[1] ≥ b[1] and a[2] ≥ b[2]. I could imagine a different semantics where this returns 1 since a[1] ≥ a[2] and b[1] < b[2].

I was not aware this existed. I have the same problem with it as I sketched above though. But I guess I am late to the party and should have objected when this map method was discussed

uniment · February 23, 2023, 9:25pm

imagine a different semantics where this returns 1 since a[1] ≥ a[2] and b[1] < b[2]

This is a bit like the difference between max and maximum: you can write maximum(f, c) as mapreduce(f, max, c). While max acts on the objects in its arguments directly, and is thus a useful comparison operator, maximum reduces a collection of objects.

It should be clear that the current behavior of count is closer to maximum than max, as count acts on a collection. We could also write:

count(f, c) = mapreduce(x->f(x)::Bool, +, c; init=0)
any(f, c)   = mapreduce(x->f(x)::Bool, |, c; init=false)
all(f, c)   = mapreduce(x->f(x)::Bool, &, c; init=true)

which makes these relationships clear. Thus, (#1) the OP is more consistent with the current definition.

Additionally, (#2) if the varargs methods behaved as you imagine, it would create an ambiguity with the current definition. In the case of count(≥, (2,2)), should it throw an error as it does now, or should it return 1? Should count(≥(2), (2,2)) return 2 as it does now, or should it throw an error?

There’s also (#3) an argument to be made for efficiency. For a large collection of objects c, it turns out that calling maximum(c) is more efficient than calling max(c...), as the latter will require compiling a large number of methods. A similar argument applies here: if the size of the collection will be greater than about 31, splatting it across the function’s arguments becomes expensive.

And finally, (#4) the semantics proposed in the OP are consistent with those of varargs map and mapreduce.

So for these four reasons, I think the semantics of the OP make more sense, even though it does raise the question of what all should do for collections of mismatched lengths.

uniment · February 23, 2023, 9:43pm

Sidenote…

I don’t know what I expected, but it wasn’t this

julia> all(==(0), Iterators.repeated(0))
false

It seems disjoint from this behavior:

all(==(()), zip())

jar1 · February 23, 2023, 11:44pm

That’s UB, one of these:

github.com/JuliaLang/julia

Reconsider stance on forward progress guarantees

opened 07:23AM - 13 Mar 21 UTC

Keno

decision optimizer

Our longstanding policy on forward progress guarantees has basically been that …we expose the LLVM behavior and follow the C/C++ world by considering infinite loops to be undefined behavior, leading to behavior like: ``` julia> f(x) = x ? true : while true; end f (generic function with 1 method) julia> f(true) true julia> f(false) true ``` This might not seem to bad, but it can lead to unsoundness and security vulnerabilities ("nice error check you have there, would be a shame if somebody were to see the UB before it") ``` julia> @noinline function my_checkbounds(Alen::Int, i::Int) if i < 0 || i > Alen while true; end # A bad error path, maybe inlined from somewhere return false end return true end my_checkbounds (generic function with 2 methods) julia> function my_getindex(A::Vector{Int}, i::Int) my_checkbounds(length(A), i) || error("Out of bounds!") @inbounds A[i] end my_getindex (generic function with 1 method) julia> my_getindex(Int[1], 100000000) signal (11): Segmentation fault ``` For this reason, more modern languages tend to not make forward progress assumptions. The Rust folks in particular took this quite seriously and recently changed LLVM to require the `mustprogress` attribute in order to top into forward progress guarantees (https://reviews.llvm.org/D85393). We should carefully chose what to do here. Since we don't set the new LLVM attribute, LLVM actually no longer makes these forward progress assumptions on our code. However, we do still make such assumptions in our own inference code. Now that LLVM is fixed, my preference would be to follow Rust here and stop making any forward progress assumptions.

uniment · February 24, 2023, 3:48am

On further thought, it seems like what’s the best thing for all to do, when iterators are of different lengths, is to terminate iteration on the first terminating iterator (i.e., the same behavior as zip and map). I’ve updated the OP accordingly.

Why? For the same reason as zip: some iterators go on forever:

julia> Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
       mod3 = Iterators.cycle(0:2);
       all(==, mod3, [0,1,2,0,1,2])
true

If the length of all the collections must be equal, then it makes sense for the user to specify that separately.

Jean_Michel · February 24, 2023, 10:07am

But you have

julia> all(==(0), Iterators.repeated(0,0))
true

benninkrs · February 25, 2023, 12:10am

I must confess I don’t understand the motivation for the OP. Julia already supports this:

julia> all([1,2,3] .== 1:3)
true

julia> count((3,2,1) .>= 1:3)
2

which is clear and convenient.

uniment · February 25, 2023, 12:22am

julia> using BenchmarkTools
       Base.all(f::Function, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
       @btime all(a .== b)  setup=(a=[1,2,3]; b=1:3)
       @btime all(==, a, b) setup=(a=[1,2,3]; b=1:3)
  42.553 ns (2 allocations: 96 bytes)
  3.700 ns (0 allocations: 0 bytes)
true

If a language is going to call itself performant, then its easiest and most natural idioms should be those that avoid performance traps.

DNF · February 25, 2023, 12:37am

I would never write something like this.

lmiq · February 25, 2023, 12:44am

One could argue that all(a .== b) is the easiest and most natural. We can also hope that escape analysis can avoid the intermediate here if the code is inside a function. At the global scope it is not avoidable since the argument is parsed first.

But anyway I like your proposal since it is the way this problem is handled everywhere in Julia.

benninkrs · February 25, 2023, 12:50am

its easiest and most natural idioms should be those that avoid performance traps

Sure. The OP gave the impression the problem was lack of convenient syntax, not performance. As shown in your post above, mapreduce already does the job, but your proposed syntax is admittedly prettier. I wonder how many other collection-reducing functions there are that would benefit from such syntax? There can be such a thing as too many convenience methods, but any and all are two of the more important functions.

uniment · February 25, 2023, 2:36am

How much better would the world be if broadcasting returned a lazy collection?

julia> [1, 2, 3] .== 1:3
3-element BroadcastingVector{Bool, Tuple{typeof(==), Vector{Int64}, UnitRange{Int64}}}:
 1
 1
 1

“All things are poison and nothing is without poison. Solely the dose determines that a thing is not a poison” - Paracelsus

jar1 · February 25, 2023, 2:45am

I don’t think any and all are so special. If they’re gonna have variadic methods then so should many other reductions. I’m not convinced that’s worth the trouble.

uniment · February 25, 2023, 3:07am

What do you have in mind?

jar1 · February 25, 2023, 3:15am

minimum, maximum (which really make any/all unnecessary), extrema, findmin, findmax, argmin, …

uniment · February 25, 2023, 4:31am

I can see how vararg minimum and maximum, if they existed, could be trickily punned into serving the role of vararg any and all, but I struggle to see how any of the items on the list {minimum, maximum, extrema, findmin, findmax, argmin, argmax} would actually “make sense,” conceptually/semantically speaking, with vararg methods. Unlike any of those, count, any, and all take logical predicates, which can be meaningfully applied to a collection of collections; I’m not sure I can find an agreeable way for the “maximum” of a collection of collections to be meaningful.

That said, we do find {findfirst, findlast, findnext} which take logical predicates—and there we find conflict because of findnext(pred::Function, A, i). I would point out, though, that {findfirst, findlast, findnext} are expected to return the index of the collection, which isn’t generally a meaningful concept for a vararg method that will accept multiple collections.

So, I don’t think that the slippery slope argument here poses a real challenge; there doesn’t seem to me any reason to extend vararg methods to these other reducing functions.

Mason · February 25, 2023, 4:49am

julia> using LazyArrays: @~

julia> using BenchmarkTools
       Base.all(f, c1, c2, cs...) = all(splat(f), zip(c1, c2, cs...))
       @btime all(a .== b)  setup=(a=[1,2,3]; b=1:3)
       @btime all(==, a, b) setup=(a=[1,2,3]; b=1:3)
       @btime all(@~ a .== b) setup=(a=[1,2,3]; b=1:3)
  28.533 ns (2 allocations: 96 bytes)
  2.809 ns (0 allocations: 0 bytes)
  5.169 ns (0 allocations: 0 bytes)
true

Topic		Replies	Views
Varargs function General Usage question	4	345	May 11, 2021
Varargs function chaining General Usage	4	1945	August 24, 2017
dynamic Vararg count General Usage	10	899	April 4, 2018
Variable number of arguments with different types General Usage varargs	8	391	July 25, 2023
Passing a list of arguments to a varargs C function General Usage question	2	905	August 14, 2017

Add varargs methods of `any`, `all`, and `count`

Related topics