Confusion with eachrow and pipe

Let’s say we have a simple 2D matrix

a = [0 1 2 3 4; 
    5 6 7 8 9; 
    10 11 12 13 14;
    15 16 17 18 19]

map

It looks like you can apply a function to rows and columns like this

julia> map(sum, eachrow(a))
4-element Vector{Int64}:
 10
 35
 60
 85

and you get the expected results.

list comprehension

You can also use a list comprehension like

julia> [sum(x) for x in eachrow(a)]
4-element Vector{Int64}:
 10
 35
 60
 85

pipe

but if you try to achieve that using pipes then that doesn’t work and seemingly gives you the result as if you’d asked for sums of columns

a |> eachrow |> sum
5-element Vector{Int64}:
 30
 34
 38
 42
 46

and instead using eachcol gives you the row sums

julia> a |> eachcol |> sum
4-element Vector{Int64}:
 10
 35
 60
 85

What’s all that about?

As a relative beginner, these results are quite confusing. This doesn’t seem like it makes sense.

1 Like

If you broadcast the piped sum over each row it seems to work alright:

a |> eachrow |> x -> sum.(x)
# equal to: sum(a,dims=2)
1 Like

Thanks @rafael.guerra Im my mental model, the sum is receiving a rows, one at a time, so not sure why you would need to use broadcasting in that case.

Can you offer any insights into why the naive pipe approach did not work, or why it seemed to be operating on columns and not rows?

eachrow(a) is an iterator of vectors, and + is defined for vectors. So sum is applying + to an accumulator.

julia> [1, 2] + [3, 4]
2-element Vector{Int64}:
 4
 6

julia> sum([[1, 2], [3, 4]]) # vector of vectors kind of like eachrow
2-element Vector{Int64}:
 4
 6

Should do collect(eachrow(a)) to see the vector of vectors that is piped into sum.

1 Like

My best take on this would be to have a look what the iterator created by “eachrow” does yields:

julia> b = collect.(eachrow(a))
4-element Vector{Vector{Int64}}:
 [0, 1, 2, 3, 4]
 [5, 6, 7, 8, 9]
 [10, 11, 12, 13, 14]
 [15, 16, 17, 18, 19]

If you collect the result you see that you get a vector of vectors. Now you can check what happens if you apply sum to such an object:

julia> sum(b)
5-element Vector{Int64}:
 30
 34
 38
 42
 46

And you can check which function is called for it:

julia> @which sum(b)
sum(a::AbstractArray; dims, kw...) in Base at reducedim.jl:873
1 Like

Thanks @DorianT. So in short eachrow is not generating vectors as such, but a generator which pumps out vectors?

But it’s not immediately clear why what sum is being provided with is different when doing a |> eachcol |> sum and [sum(x) for x in eachrow(a)]? I think that’s what I’m finding particularly confusing.

As in, why the need for collect when using pipes, but not when using a list comprehension?

Sum gets the whole iterator at once and sums the elements in the pipe version. In your list comprehension you apply one sum to each element of the iterator. You can use .|> sum though

3 Likes

Let’s say the rows of A are r1, r2, r3. Then the following have the same meaning:

A |> eachrow |> sum

sum(eachrow(A))

sum([r1, r2, r3])  # Sum of a list of 3 elements (each element being a vector)

r1 + r2 + r3

Your list comprehension does something else… These are equivalent:

[sum(r) for r in eachrow(A)]  # Calculate sum(r) for each row r

[sum(r1), sum(r2), sum(r2)]

Here instead of calculating the sum of three arrays we calculate three sums!

4 Likes