Proposal to extend the syntax of list comprehensions with a `let` clause

I would like to propose a new syntactic sugar to the list comprehension syntax.

Take the following, somewhat forced example:

[2abs(x)
 for x in -20:20
 if abs(x) > 10]

Here, the abs function is invoked twice. Of course, in this case, it is not a big deal, but it could be a more costly computation, or just a longer expression, which I don’t want to invoke or write down twice. So I can apply the following trick:

[2abs_x
 for x in -20:20
 for abs_x in [abs(x)]
 if abs_x > 10]

Which works fine, but its syntax does not really help to understand what is going on. In situations like this, I wish I could write something as follows:

[2abs_x
 for x in -20:20
 let abs_x = abs(x)
 if abs_x > 10]

Note that this would only be a syntactic change, which maps to the same generator combination as the above, and it would be a natural extension of the current syntax.

WDYT?

6 Likes

I guess that for pure functions the compiler will recognize that the values of abs(x) result in the same output and that it will only calculate it once even though you call it twice:

using BenchmarkTools
function f()
    [2abs(x)
        for x in -20:20
        if abs(x) > 10]
end
function g()
    [2abs_x
        for x in -20:20
        for abs_x in (abs(x),) # Note that I used a Tuple here to achieve the same performance
        if abs_x > 10]
end
@btime f() # 564.104 ns (4 allocations: 544 bytes)
@btime g() # 563.778 ns (4 allocations: 544 bytes)

You are probably right. Not all functions are pure, though, and sometimes, as I mentioned briefly in the original post, the function call itself is just too long so repeating it twice is just plain noisy and more error-prone.

1 Like

If you know you will use exactly the same evalution, can’t you just do

[2abs_x
 for abs_x in map(abs, -20:20)
 if abs_x > 10]

instead? Seems similarly simple to me. Though I guess if there are yet another spot that would need the original value that could become messier.

Yeah, sure, the example is oversimplified. There could be multiple variables introduced, and one might need to access the original values as well. E.g.,

[2abs_x + log(x)
 for x in -20:20
 for abs_x in [abs(x)]
 if abs_x > 10]

Obviously, what I’m proposing can be solved in other ways. One can always write a map wrapped in a filter, etc. But IMHO, none of those would be as concise as the list comprehension form.

1 Like

The benefit of the list comprehension over a standard loop with calls to push! diminishes slightly if it’s no longer short and concise. Is the benefit of adding this special-purpose syntax really worth it, considering that the list comprehension now looks more complicated (IMO) than the corresponding loop below?

vals = Float64[]
for x in -20:20
    absx = abs(x)
    absx > 10 && push!(vals, 2absx)
end

or on one line

vals = Float64[]
for x in -20:20 (absx = abs(x)) > 10 && push!(vals, 2absx) end
7 Likes

EDIT: this gives incorrect result, it does not change x. Please, disregard =)
ORIGINAL:
You can assign the result of abs(x) without let block

julia> function g()
           [2x for x in -20:20 if (x=abs(x)) > 10]
       end
g (generic function with 1 method)

julia> g()
20-element Vector{Int64}:
 -40
 -38
...

Sometimes the return type is quite involved and changes while your develop your code, like a NamedTuple. Comprehensions infer the eltype making vals = Float64[] unnecessary

Extending your enclose-in-list idea very slightly:

[2a for x in -20:20 for a = Ref(x) if a > 10]

Hopefully that is similar enough to your wished-for syntax in terms of readability.

2 Likes

The for loop syntax is imperative style programming, and could have a significant impact on performance and memory usage as well. (The push! will reallocate as needed.) I’m not saying it’s wrong, but I wouldn’t call it equivalent either. I very much prefer declarative style programming whenever feasible.

Edit: I was wrong about the difference in performance, see discussion below. I still think that declarative style is favorable, though.

1 Like

This is a neat idea! It is the closest to what I’m looking for, anyway. It still causes a momentary pause while one decodes what is going on, though, especially if one is familiar with Refs.

How do you think the list comprehension is implemented under the hood when it does not know the length of the resulting array beforehand due to the filtering?

Frankly, I don’t know. It does have an upper limit on the length of the result, though, so I can think of several optimization strategies that could be applied. With the for loop, the correlation between the length of the input and the output vectors is less straightforward. But I believe you if you say that this information is not used by the Julia compiler.

But even if they are equivalent in performance, there is the issue of using imperative vs. declarative style.

The number of allocations grow with the length of the iterator, indicating that the result array is indeed re-allocated as it grows.

julia> foo(n) = [a for a in 0:n if a % 2 == 0];

julia> @time foo(1);
  0.000001 seconds (3 allocations: 208 bytes)

julia> @time foo(10);
  0.000003 seconds (3 allocations: 208 bytes)

julia> @time foo(1000);
  0.000012 seconds (6 allocations: 7.609 KiB)

julia> @time foo(100000);
  0.000454 seconds (11 allocations: 812.734 KiB)

Maybe Transducers.jl offers you something that you’d find useful to accomplish a declarative style?

6 Likes

Fair point, and thanks for digging deeper. But I think we are getting side-tracked here. The crux of my proposal has nothing to do with performance. It was probably my mistake to argue against the for loop with performance concerns and thus bringing it into the mix.

I’m not a fan personally of adding additional complexity to comprehensions, but I was curious if this idea had been discussed before, so I did some searching. There’s a related idea for allowing binding variables in comprehensions in issue 32905, so I went ahead and posted your proposed syntax there.

6 Likes
collect(IterTools.imap(x -> 2x,
                       Iterators.filter(x -> x > 10,
                                        IterTools.imap(abs, -20:20))))

You can improve the syntax a bit with pipes.

2 Likes

@Tamas_Papp, I’m afraid you missed my point. I have encountered in the past more complex cases where the list comprehension syntax was the most concise form, and my personal preference, except for this local declaration part. Like I said before, I, too, can think of various, equally efficient and equivalent, ways of writing the same thing. I’m not looking for a solution (especially not for the rather dumb example above), I’m looking for the ability to choose, for freedom of expression :slight_smile:

I rather like this idea and have occasionally wanted something like that. Would you mind opening an issue? Once again I’m reminded that the order of evaluation of comprehensions is kind of weird and confusing, but we were kind of handed that by other languages and this doesn’t make it any worse.

2 Likes

This is a cool thread and a cool idea. Out of curiosity, aren’t transducers also sort of supposed to handle chaining these operations better and generating more efficient loops? When I write this example:

using Transducers

function printing_abs(x)
  println("abs...")
  abs(x)
end

function abs_transducer(v)
  t = Map(printing_abs) |> Filter(x->x>10) |> Map(x->2*x)
  collect(t, v)
end

function abs_comprehension(v)
  [2*printing_abs(x)
   for x in v
   if printing_abs(x) > 10]
end

const test_v = [1.0, 2.0, 11.0, -11.0]

and then call

abs_transducer(test_v)
abs_comprehension(test_v)

I see that the transducer version has the effect you want of only calling printing_abs 4 times instead of the 6 in the comprehension.

I don’t know if this is just because it’s a simple enough example that the compiler managed to do something smart, though, so maybe this doesn’t really get at your question. And I don’t really understand transducers very well. Maybe @tkf could comment on whether I am off the mark?

EDIT: I recognize that you aren’t looking for a solution and just want to discuss a syntax. I’m bringing this up because it seems like a similarly appealing syntax and I just kind of vaguely remember from the intro of transducers that a complex comprehension maybe doesn’t always compile down to the best possible loop structure.

1 Like