Array comprehension is inconsistent, given the column-major property of julia

So here’s an annoying bit, if applies a filter to the elements, and it removes the multiple dimensions from the multidimensional syntax:

julia> [(r, c) for r in 1:2, c in 1:3]
2×3 Matrix{Tuple{Int64, Int64}}:
 (1, 1)  (1, 2)  (1, 3)
 (2, 1)  (2, 2)  (2, 3)

julia> [(r, c) for r in 1:2, c in 1:3 if iseven(c+r)]
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)
 (1, 3)

We just get a vector of the allowed elements. However, this is not falling back to the nested for behavior, which would reverse the axes’ order:

julia> [(r, c) for r in 1:2 for c in 1:3 if iseven(c+r)]
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (1, 3)
 (2, 2)

I only use the latter because looking more like the nested for-loop helps, and it never made sense to me to filter a multidimensional syntax and sacrifice the multidimensionality.

We are also unable to filter an axis on its own:

julia> [(r, c) for r in 1:2 if r > 1, c in 1:3]
ERROR: ParseError:
# Error @ REPL[57]:1:30
[(r, c) for r in 1:2 if r > 1, c in 1:3]
#                            └ ── Expected `]`

A filter depending on 2 axes like for i=1:4; j=1:3 if i>j for k=5:6 would go a step further than that.

In my experience, if works in the nested for syntax as expected from the corresponding loop, but you should check your examples in the REPL.

If we’re going that far (in a breaking change), I’d rather not write for at all. for just makes me think for-loop, and I reasonably expect that’s true for most people. Multidimensional array comprehension is an outlier, and taking a cue from 1-dimensional set/list comprehension syntax was going to be weird.

Interesting that you phrase it as iteration because I can’t find a clear specification. The comprehension docs don’t really say how it’s iterated, but the implementation of a generator iterates a composite Iterators.product that changes the leftmost iterators first (column-major order iteration does not imply storage), and both linear indexing and the optional IndexLinear trait specify column-major order iteration for AbstractArray. eachindex seems to leave iteration order open, but I don’t know of an array type that diverges from column-major order, nor do I know what happens if we mix it with typical objects e.g. would RowMajorIterationArray{Tuple{Int,Int}}(undef, 2, 3) .= ((r, c) for r in 1:2, c in 1:3) result in an array equal to a Array comprehension?

3 Likes

Wow, I didn’t even know that this is possible and also it doesn’t make a lot of sense to me. This seems so obscure to me, I wonder whether this ever was used by someone or whether one could just disallow this behavior and noone would complain…

I don’t know whether this is an ‘official’ term. I just wanted to refer to symtax constructs that do/mean something loop/iterator/iteration related. Maybe there exists some more precise term.

see line 237 below

@odow want to revise?

Why? Seems like it works okay to me

3 Likes

Interesting. In this case the order of the elements does not matter. So in this specific case it does not matter at all which form of the iteration is used. So the comma saves 3 whole characters :sweat_smile:

1 Like

The order (1 → 2 → 3 → 4) reads awkward, if you imagine the ordinary programming counterpart.

Using a filter (through the if clause) would necessarily have to collapse the multidimensional structure. Otherwise, what shape would the output have?

It’s the same as if you do:

julia> filter(iseven∘sum, [(r, c) for r in 1:2, c in 1:3])
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)
 (1, 3)

or

julia> filter(>(0.95), rand(4,3,2,5))
3-element Vector{Float64}:
 0.9696942632241587
 0.9939223654034923
 0.9533802058515822
2 Likes

No. For establishing a JuMP Affine Expression, the iterated for syntax in comprehension fits better (e.g. to the JuMP.Containers.SparseAxisArray).

See this toy example, where the isum is reasonable (and I’m making use of it currently in my realistic JuMP code), whereas the csum has other meanings.

julia> csum(t) = sum((println("Info> t=$t, s=$s"); t) for t=t, s=t);

julia> isum(t) = sum((println("Info> t=$t, s=$s"); t) for t=t for s=t);

julia> csum((1, 2, 3))
Info> t=1, s=1
Info> t=2, s=1
Info> t=3, s=1
Info> t=1, s=2
Info> t=2, s=2
Info> t=3, s=2
Info> t=1, s=3
Info> t=2, s=3
Info> t=3, s=3
18

julia> isum((1, 2, 3))
Info> t=1, s=1
Info> t=2, s=2
Info> t=3, s=3
6

Other julia users can also take a look, the csum here seems to be a completely different construct from the ordinary Cartesian-for loop grammar, which can be very confusing to reason about. Cf.

julia> function c(t)
           for t=t, s=t
               println("Info> t=$t, s=$s")
           end
       end;

julia> c((1,2,3))
Info> t=1, s=1
Info> t=2, s=2
Info> t=3, s=3
1 Like

Using t for both the collection and the elements within is bound to lead to trouble. If you would write

csum2(t) = sum(x for x=t, y=t);
isum2(t) = sum(x for x=t for y=x);

the difference is immediately obvious. This csum2 produces the same results (though in a different order) as

csum3(t) = sum(x for x=t for y=t);

isum2 does not need the second for (looping over a singleton) , i.e. it is equivalent to

isum3(t) = sum(x for x=t);  # (or just sum(t) )

or

isum4(t) = sum((y = x; x) for x=t)

if for whatever reason you really want a second variable.


I do agree that it’s annoying that (a summing version of) c behaves differently from csum. But also here, avoiding for t=t takes away the confusion:

function c_indep(t)
    for x=t, y=t
        println("Info> x=$x, y=$y")
    end
end

function c_dep(t)
    for x=t, y=x
        println("Info> x=$x, y=$y")
    end
end
2 Likes