Unexpected broadcasting behavior involving eachrow()

in the following, the result of y2 is counter-intuitive:

Random.seed!(1);
x = randn(6, 4);

scale = (1 ./ LinearAlgebra.norm.(eachrow(x)) );
y1 = x .* scale;

y2 = copy(x);
y2 .*= (1 ./ LinearAlgebra.norm.(eachrow(y2)) ); # !!! NOT working as expected !!!

julia> LinearAlgebra.norm.(eachrow(y1))
6-element Vector{Float64}:
 1.0
 1.0
 0.9999999999999998
 1.0
 1.0
 1.0

julia> LinearAlgebra.norm.(eachrow(y2))
6-element Vector{Float64}:
 1.009683702121516
 0.9911431758342742
 1.3007113737845422
 1.1961701627839139
 1.1024677374682847
 1.1995854569352082

seems like the broadcasting has something strange to do with eachrow().

Actually a lot of parentheses are put to indicate the norm should be calculated before the multiplication but seems like the parentheses are ignored.

this seems to prevent the problem

y2 .*= 1 ./ collect(LinearAlgebra.norm.(eachrow(y2)))

maybe the norm() function is lazy type too!?

it is not good! Why we need to use collect() or work out scale first?

The parentheses should has given clear enough indication of the order of computation.

I think this behavior causes confusions and bugs.

Note that broadcasting expands to loop. Therefore norm is repeatedly called for each element of the array (not once per row), because y2 is two dimensional. Since y2 is updated in place you do not get what you expect. To see this consider:

julia> y2 = randn(2, 2)
2Ă—2 Matrix{Float64}:
 -0.753134  -0.149061
 -0.597888  -0.0740529

julia> f(x) = (println(x); norm(x))
f (generic function with 1 method)

julia> y2 .*= (1 ./ f.(eachrow(y2)))
[-0.7531338971077326, -0.14906101769437347]
[-0.5978883610219364, -0.07405291834272695]
[-0.9809709136664748, -0.14906101769437347]
[-0.9924168057314952, -0.07405291834272695]
2Ă—2 Matrix{Float64}:
 -0.980971  -0.150228
 -0.992417  -0.0744119
3 Likes

So what’s essentially going on is loop fusion? Where the .*= means everything broadcasts to the size of y2, rather than the 1-dimension of eachrow(y2)? I’m not seeing it

julia> begin
       y2 = copy(x)
       for j in 1:size(y2, 2)
           for i = 1:size(y2, 1)
               row = y2[i, :]
               n = LinearAlgebra.norm(row)
               inv_n = 1 / n
               y2[i, :] .= y2[i, :] * inv_n
           end
       end
       end

julia> LinearAlgebra.norm.(eachrow(y2))
6-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0000000000000002
 1.0
 1.0000000000000002

I mean its the confusion caused by the trust of using parentheses…

No, that’s not the cause. I’ve figured our what the loop is doing. You can see the way it expands.

julia> begin
       y2 = copy(x)
       y2 .*= 1 ./ LinearAlgebra.norm.(eachrow(y2))
       LinearAlgebra.norm.(eachrow(y2))
       end
6-element Vector{Float64}:
 0.8621336125903504
 1.2096359092602622
 1.0905033928976264
 1.155913402801104
 1.0225581006589461
 1.1695084569556826

julia> begin
       y2 = copy(x)
       for i = 1:size(y2, 1)
           for j in 1:size(y2, 2)
               row = y2[i, :]
               n = LinearAlgebra.norm(row)
               inv_n = 1 / n
               y2[i, j] = y2[i, j] * inv_n
           end
       end
       LinearAlgebra.norm.(eachrow(y2))
       end
6-element Vector{Float64}:
 0.8621336125903504
 1.2096359092602622
 1.0905033928976264
 1.155913402801104
 1.0225581006589461
 1.1695084569556826
2 Likes

These examples seem to work as expected.
But I don’t understand how they differ (structurally) from the original example.

julia> using StatsBase

julia> N=copy(M)
3Ă—2 Matrix{Float64}:
 2.0  1.0
 3.0  1.0
 4.0  1.0
julia> N = N .*  mean.(eachrow(N))
3Ă—2 Matrix{Float64}:
  3.0  1.5
  6.0  2.0
 10.0  2.5

julia> N=copy(M)
3Ă—2 Matrix{Float64}:
 2.0  1.0
 3.0  1.0
 4.0  1.0

julia> N = N .* (1 .*  mean.(eachrow(N)))
3Ă—2 Matrix{Float64}:
  3.0  1.5
  6.0  2.0
 10.0  2.5
1 Like

Yes, which means that norm is called for each element of a row separately and it sees updated values in y2. The problem is that:

julia> x = rand(2,2)
2Ă—2 Matrix{Float64}:
 0.752214  0.337909
 0.190713  0.866113

julia> Base.mightalias(x, eachrow(x))
false

I have opened Improve handling of aliasing of AbstractSlices by bkamins · Pull Request #49182 · JuliaLang/julia · GitHub to fix it.

They are not in place, so there is no issue with aliasing.

1 Like
julia> y3 = copy(x);

julia> y3 .*= (1 ./ Iterators.map(norm, eachrow(y3)));

julia> norm.(eachrow(y3))
6-element Vector{Float64}:
 1.0
 1.0
 0.9999999999999998
 1.0
 1.0
 1.0

works as desired. And since Iterators.map is an iterator, it isn’t evaluated O(n*m) times but evaluated O(n) times and values reused.

besides correctness, this seems like an enormous performance footgun

This has been discussed before. For pure functions, this is a footgun, but when sideeffects of the function are desired, it isn’t. But somebody who particiapated in those previous discussions can weigh in.

using LinearAlgebra: norm
x = rand(6, 4)
x ./= norm.(eachrow(x))
norm.(eachrow(x)) # returns something very surprising!

There’s absolutely no world I want to live in where the above snippet doesn’t normalize the rows’ norms to 1.

If side-effects are desired then surely there is another (more clear, and less magical) implementation that can achieve what the above snippet currently does

1 Like

As commented above - this is the issue with dealiasing by broadcasting. So hopefully my PR to Julia would be merged and the issue will be fixed (still - it would be better to precompute norm to get better performance)

3 Likes

Broadcasting an iterator collects it first, e.g. Broadcast.broadcastable(Iterators.map(sqrt, 1:4)). So this is the same as writing eager map.

It’s worse, as x ./= norm.(eachcol(x)) also fails for other reasons. You need this mouthful: x ./= map(norm, eachslice(x; dims=1, drop=false)).

For sum instead of norm, what works fine is x ./= sum(x; dims=2). IMO we should have norm(x; dims=2), which besides avoiding aliasing problems, can be much faster.

2 Likes

Besides @bkamins 's PR (for which I am not an expert reviewer but it seems the right way to go), would it be feasible to emit a compile-time AliasWarning or something like that when this kind of construction is encountered?

1 Like

yeah. Again, the violation against the intuitive use of parentheses is very dangerous I think.

This doesn’t seem like an issue with parentheses and order of operations at all, it seems like not recognizing that chained dot syntax fuses the operations into 1 loop. For a simpler example, a .* (b .+ c) does a[i] * (b[i] + c[i]) per iteration. Note that the parentheses does not prevent fusing, in fact it sensibly specifies the order of operations per iteration. The only way to split that example into separate loops is to broadcast separately: temp = b .+ c; a .* temp.

Generally, if your input is being changed per iteration, results will be different if loop fusion has a reducing operation like norm(eachrow(...; put plainly, reducing operations depend on the other elements, including ones changed by previous iterations. y1 was computed by broadcasting two times over an input x, which did not change in the process. y2 was changing over its own broadcast. If the iterations’ inputs are different, of course the output is different.

Changing the input to an unchanging x is a simple way to fix the issue: y2 .*= (1 ./ LinearAlgebra.norm.(eachrow(x)) );. A previous comment suggested collect, which split the line into 2 broadcast loops and made unchanging temporary data based on y2 before it was changed.

3 Likes

Again the problem is that the in-place loop has a reducing operation; the changing output is also the input. In the linked PR you argued that the equivalent NumPy code does not have this issue, but it is allocating temporary arrays in separate loops of each operation. Julia’s fused broadcast loops may look similar but are not equivalent, and I think people should understand the difference.

People have suggested manually inserting unbroadcasted identity calls into a single line to unfuse loops and allocate temporary arrays, like x ./= identity(norm.(eachrow(x))). But that’s a lot of bloat with more dots, and a single line could be transformed by a macro like @unfuse x ./= norm.(eachrow(x))