I have an array A. I want to test if there are any values in A satisfying some test:
julia> f1(x) = any(x .> 99)
Easy code to read, but inefficient since we allocate the whole boolean array.
We could also define the following:
julia> function f2(xs)
for x in xs
if x > 99
return true
end
end
false
end
which will be faster, but we have to write a new for loop for every boolean expression.
Has anyone explored macros / tricks to get the syntactical lightness of the first with the performance of the second? i.e. to somehow get short-circuited broadcast fusion of elementwise boolean expressions? This should extend naturally to more complex expressions e.g. any((A .> 0) .& (A .< 1) .| (B .== 5)). The same pattern applies to all - i.e. return false as soon as the first false is encountered.
My impression was that #24990 was some distance away from reaching consensus. I really like that notation, but it’s not obvious to me if it would be equally well-suited to multi-argument lambdas, such as
(_ > 0) && (_ < 0) && (_ == 0)
A really generic solution might be some sort of lazy evaluation, perhaps obviating the need for the reducer(f, itr) pattern entirely.
For some reason, I haven’t become comfortable with the do-notation. Every time I encounter it, I try to read it as the with/as notation from Python. Some times that clarifies it, but some times it doesn’t.
is a multi-argument lambda — are the _ placeholders different arguments?
In any case, while I am looking forward to the _ syntax, I think I would mostly use it for simple stuff.
For some of that, a simple partial application construct is also feasible; the question is
whether we make some functions PA manually, like isequal (this has already happened, should it be extended like you suggested for >?),
introduce a macro to make (1) more convenient, but still manually applied (ie at function definition),
make PA automatic (I don’t think this meshes well with Julia’s semantics though, and is a large and breaking change),
introduce a macro for using PA, like the @pa I suggested above.
I am always undecided between these (except 3, which I don’t consider to be a good solution) and alternatives (the _ syntax for closures, etc). Sometimes I would find the more convenient, but at the same time closures are universal, quite readable, and just take a few extra characters.
Exactly, it’s not obvious what the meaning ought to be. In this case I intended it to be different arguments, but it could just as easily be one argument.
It’s not that there aren’t more concise ways of writing this, but none of these are as general and/or run into corner cases (as shown eg in the discussion of the PR above).
I’ve wondered for a while whether it is possible to abuse the broadcast machinery to do something like this (basically to avoid materialize being called on the lazy broadcasted object). I’ve not found a nice way of doing it (i.e., a way that doesn’t obscure/muddy the semantics of dot broadcasting) but here is a not-so-nice way of doing it.
myany(x) = any(x) # avoid piracy!
Broadcast.broadcasted(::typeof(myany), x) = myany(x) # exploit the fact that the broadcasted object is iterable
So much faster than the original case but not as fast as the anonymous function or generator expression versions as mentioned above (and it allocates more).
That would be very nice if there was a way to make that work generically; this pattern comes up pretty frequently, e.g. dot(x,y) = sum(x.*y), exit = all(abs.(err) <.= tol) and friends. Can’t all reducers in Base just define broadcasted like you do?
Wouldn’t that be bad if you want to apply the reduction on a collection of iterables? Seems like it would be bette to have a way to signal this explicitly.
Yes, I don’t like this solution since it obscures the meaning of the dot notation. I’ve been trying to find a way to do this explicitly but I can’t find a way to coerce the lowering code to remove the final materialize call that appears when the dot notation is expanded out. (It doesn’t help that I don’t really know enough Scheme to be able to understand the lowering code properly.)
@inline _lazy(x) = x[1] # unwrap the tuple
@inline Broadcast.broadcasted(::typeof(_lazy), x) = (x,) # wrap the Broadcasted object in a tuple to avoid materializing
macro lazy(x)
return esc(:(_lazy(_lazy.($x))))
end
Which gives
julia> a = rand(10000) ; b = rand(10000) ;
julia> @btime sum(a .+ b)
10.896 μs (5 allocations: 78.27 KiB)
9902.652963666198
julia> @btime sum(@lazy a .+ b)
13.095 μs (3 allocations: 64 bytes)
9902.652963666245
julia> @btime any(a .+ b .> 1.9)
15.549 μs (7 allocations: 5.64 KiB)
true
julia> @btime any(@lazy a .+ b .> 1.9)
1.016 μs (4 allocations: 96 bytes)
true
So not materializing the array isn’t always faster (depends on the operation) but it does save on allocations.