I would like to propose a new syntactic sugar to the list comprehension syntax.
Take the following, somewhat forced example:
[2abs(x)
for x in -20:20
if abs(x) > 10]
Here, the abs function is invoked twice. Of course, in this case, it is not a big deal, but it could be a more costly computation, or just a longer expression, which I don’t want to invoke or write down twice. So I can apply the following trick:
[2abs_x
for x in -20:20
for abs_x in [abs(x)]
if abs_x > 10]
Which works fine, but its syntax does not really help to understand what is going on. In situations like this, I wish I could write something as follows:
[2abs_x
for x in -20:20
let abs_x = abs(x)
if abs_x > 10]
Note that this would only be a syntactic change, which maps to the same generator combination as the above, and it would be a natural extension of the current syntax.
I guess that for pure functions the compiler will recognize that the values of abs(x) result in the same output and that it will only calculate it once even though you call it twice:
using BenchmarkTools
function f()
[2abs(x)
for x in -20:20
if abs(x) > 10]
end
function g()
[2abs_x
for x in -20:20
for abs_x in (abs(x),) # Note that I used a Tuple here to achieve the same performance
if abs_x > 10]
end
@btime f() # 564.104 ns (4 allocations: 544 bytes)
@btime g() # 563.778 ns (4 allocations: 544 bytes)
You are probably right. Not all functions are pure, though, and sometimes, as I mentioned briefly in the original post, the function call itself is just too long so repeating it twice is just plain noisy and more error-prone.
Yeah, sure, the example is oversimplified. There could be multiple variables introduced, and one might need to access the original values as well. E.g.,
[2abs_x + log(x)
for x in -20:20
for abs_x in [abs(x)]
if abs_x > 10]
Obviously, what I’m proposing can be solved in other ways. One can always write a map wrapped in a filter, etc. But IMHO, none of those would be as concise as the list comprehension form.
The benefit of the list comprehension over a standard loop with calls to push! diminishes slightly if it’s no longer short and concise. Is the benefit of adding this special-purpose syntax really worth it, considering that the list comprehension now looks more complicated (IMO) than the corresponding loop below?
vals = Float64[]
for x in -20:20
absx = abs(x)
absx > 10 && push!(vals, 2absx)
end
or on one line
vals = Float64[]
for x in -20:20 (absx = abs(x)) > 10 && push!(vals, 2absx) end
Sometimes the return type is quite involved and changes while your develop your code, like a NamedTuple. Comprehensions infer the eltype making vals = Float64[] unnecessary
The for loop syntax is imperative style programming, and could have a significant impact on performance and memory usage as well. (The push! will reallocate as needed.) I’m not saying it’s wrong, but I wouldn’t call it equivalent either. I very much prefer declarative style programming whenever feasible.
Edit: I was wrong about the difference in performance, see discussion below. I still think that declarative style is favorable, though.
This is a neat idea! It is the closest to what I’m looking for, anyway. It still causes a momentary pause while one decodes what is going on, though, especially if one is familiar with Refs.
How do you think the list comprehension is implemented under the hood when it does not know the length of the resulting array beforehand due to the filtering?
Frankly, I don’t know. It does have an upper limit on the length of the result, though, so I can think of several optimization strategies that could be applied. With the for loop, the correlation between the length of the input and the output vectors is less straightforward. But I believe you if you say that this information is not used by the Julia compiler.
But even if they are equivalent in performance, there is the issue of using imperative vs. declarative style.
Fair point, and thanks for digging deeper. But I think we are getting side-tracked here. The crux of my proposal has nothing to do with performance. It was probably my mistake to argue against the for loop with performance concerns and thus bringing it into the mix.
I’m not a fan personally of adding additional complexity to comprehensions, but I was curious if this idea had been discussed before, so I did some searching. There’s a related idea for allowing binding variables in comprehensions in issue 32905, so I went ahead and posted your proposed syntax there.
@Tamas_Papp, I’m afraid you missed my point. I have encountered in the past more complex cases where the list comprehension syntax was the most concise form, and my personal preference, except for this local declaration part. Like I said before, I, too, can think of various, equally efficient and equivalent, ways of writing the same thing. I’m not looking for a solution (especially not for the rather dumb example above), I’m looking for the ability to choose, for freedom of expression
I rather like this idea and have occasionally wanted something like that. Would you mind opening an issue? Once again I’m reminded that the order of evaluation of comprehensions is kind of weird and confusing, but we were kind of handed that by other languages and this doesn’t make it any worse.
This is a cool thread and a cool idea. Out of curiosity, aren’t transducers also sort of supposed to handle chaining these operations better and generating more efficient loops? When I write this example:
using Transducers
function printing_abs(x)
println("abs...")
abs(x)
end
function abs_transducer(v)
t = Map(printing_abs) |> Filter(x->x>10) |> Map(x->2*x)
collect(t, v)
end
function abs_comprehension(v)
[2*printing_abs(x)
for x in v
if printing_abs(x) > 10]
end
const test_v = [1.0, 2.0, 11.0, -11.0]
and then call
abs_transducer(test_v)
abs_comprehension(test_v)
I see that the transducer version has the effect you want of only calling printing_abs 4 times instead of the 6 in the comprehension.
I don’t know if this is just because it’s a simple enough example that the compiler managed to do something smart, though, so maybe this doesn’t really get at your question. And I don’t really understand transducers very well. Maybe @tkf could comment on whether I am off the mark?
EDIT: I recognize that you aren’t looking for a solution and just want to discuss a syntax. I’m bringing this up because it seems like a similarly appealing syntax and I just kind of vaguely remember from the intro of transducers that a complex comprehension maybe doesn’t always compile down to the best possible loop structure.