Unexpected broadcasting behavior involving eachrow()

tomtom · March 29, 2023, 1:28pm

in the following, the result of y2 is counter-intuitive:

Random.seed!(1);
x = randn(6, 4);

scale = (1 ./ LinearAlgebra.norm.(eachrow(x)) );
y1 = x .* scale;

y2 = copy(x);
y2 .*= (1 ./ LinearAlgebra.norm.(eachrow(y2)) ); # !!! NOT working as expected !!!

julia> LinearAlgebra.norm.(eachrow(y1))
6-element Vector{Float64}:
 1.0
 1.0
 0.9999999999999998
 1.0
 1.0
 1.0

julia> LinearAlgebra.norm.(eachrow(y2))
6-element Vector{Float64}:
 1.009683702121516
 0.9911431758342742
 1.3007113737845422
 1.1961701627839139
 1.1024677374682847
 1.1995854569352082

seems like the broadcasting has something strange to do with eachrow().

Actually a lot of parentheses are put to indicate the norm should be calculated before the multiplication but seems like the parentheses are ignored.

rocco_sprmnt21 · March 29, 2023, 2:56pm

this seems to prevent the problem

y2 .*= 1 ./ collect(LinearAlgebra.norm.(eachrow(y2)))

maybe the norm() function is lazy type too!?

tomtom · March 29, 2023, 3:14pm

it is not good! Why we need to use collect() or work out scale first?

The parentheses should has given clear enough indication of the order of computation.

I think this behavior causes confusions and bugs.

bkamins · March 29, 2023, 3:40pm

Note that broadcasting expands to loop. Therefore norm is repeatedly called for each element of the array (not once per row), because y2 is two dimensional. Since y2 is updated in place you do not get what you expect. To see this consider:

julia> y2 = randn(2, 2)
2×2 Matrix{Float64}:
 -0.753134  -0.149061
 -0.597888  -0.0740529

julia> f(x) = (println(x); norm(x))
f (generic function with 1 method)

julia> y2 .*= (1 ./ f.(eachrow(y2)))
[-0.7531338971077326, -0.14906101769437347]
[-0.5978883610219364, -0.07405291834272695]
[-0.9809709136664748, -0.14906101769437347]
[-0.9924168057314952, -0.07405291834272695]
2×2 Matrix{Float64}:
 -0.980971  -0.150228
 -0.992417  -0.0744119

pdeffebach · March 29, 2023, 4:11pm

So what’s essentially going on is loop fusion? Where the .*= means everything broadcasts to the size of y2, rather than the 1-dimension of eachrow(y2)? I’m not seeing it

julia> begin
       y2 = copy(x)
       for j in 1:size(y2, 2)
           for i = 1:size(y2, 1)
               row = y2[i, :]
               n = LinearAlgebra.norm(row)
               inv_n = 1 / n
               y2[i, :] .= y2[i, :] * inv_n
           end
       end
       end

julia> LinearAlgebra.norm.(eachrow(y2))
6-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0000000000000002
 1.0
 1.0000000000000002

tomtom · March 29, 2023, 4:20pm

I mean its the confusion caused by the trust of using parentheses…

pdeffebach · March 29, 2023, 4:30pm

No, that’s not the cause. I’ve figured our what the loop is doing. You can see the way it expands.

julia> begin
       y2 = copy(x)
       y2 .*= 1 ./ LinearAlgebra.norm.(eachrow(y2))
       LinearAlgebra.norm.(eachrow(y2))
       end
6-element Vector{Float64}:
 0.8621336125903504
 1.2096359092602622
 1.0905033928976264
 1.155913402801104
 1.0225581006589461
 1.1695084569556826

julia> begin
       y2 = copy(x)
       for i = 1:size(y2, 1)
           for j in 1:size(y2, 2)
               row = y2[i, :]
               n = LinearAlgebra.norm(row)
               inv_n = 1 / n
               y2[i, j] = y2[i, j] * inv_n
           end
       end
       LinearAlgebra.norm.(eachrow(y2))
       end
6-element Vector{Float64}:
 0.8621336125903504
 1.2096359092602622
 1.0905033928976264
 1.155913402801104
 1.0225581006589461
 1.1695084569556826

rocco_sprmnt21 · March 29, 2023, 4:40pm

These examples seem to work as expected.
But I don’t understand how they differ (structurally) from the original example.

julia> using StatsBase

julia> N=copy(M)
3×2 Matrix{Float64}:
 2.0  1.0
 3.0  1.0
 4.0  1.0
julia> N = N .*  mean.(eachrow(N))
3×2 Matrix{Float64}:
  3.0  1.5
  6.0  2.0
 10.0  2.5

julia> N=copy(M)
3×2 Matrix{Float64}:
 2.0  1.0
 3.0  1.0
 4.0  1.0

julia> N = N .* (1 .*  mean.(eachrow(N)))
3×2 Matrix{Float64}:
  3.0  1.5
  6.0  2.0
 10.0  2.5

bkamins · March 29, 2023, 4:40pm

Yes, which means that norm is called for each element of a row separately and it sees updated values in y2. The problem is that:

julia> x = rand(2,2)
2×2 Matrix{Float64}:
 0.752214  0.337909
 0.190713  0.866113

julia> Base.mightalias(x, eachrow(x))
false

I have opened Improve handling of aliasing of AbstractSlices by bkamins · Pull Request #49182 · JuliaLang/julia · GitHub to fix it.

bkamins · March 29, 2023, 4:41pm

They are not in place, so there is no issue with aliasing.

Dan · March 29, 2023, 5:09pm

julia> y3 = copy(x);

julia> y3 .*= (1 ./ Iterators.map(norm, eachrow(y3)));

julia> norm.(eachrow(y3))
6-element Vector{Float64}:
 1.0
 1.0
 0.9999999999999998
 1.0
 1.0
 1.0

works as desired. And since Iterators.map is an iterator, it isn’t evaluated O(n*m) times but evaluated O(n) times and values reused.

adienes · March 29, 2023, 5:25pm

besides correctness, this seems like an enormous performance footgun

Dan · March 29, 2023, 5:34pm

This has been discussed before. For pure functions, this is a footgun, but when sideeffects of the function are desired, it isn’t. But somebody who particiapated in those previous discussions can weigh in.

adienes · March 29, 2023, 5:40pm

using LinearAlgebra: norm
x = rand(6, 4)
x ./= norm.(eachrow(x))
norm.(eachrow(x)) # returns something very surprising!

There’s absolutely no world I want to live in where the above snippet doesn’t normalize the rows’ norms to 1.

If side-effects are desired then surely there is another (more clear, and less magical) implementation that can achieve what the above snippet currently does

bkamins · March 29, 2023, 9:05pm

As commented above - this is the issue with dealiasing by broadcasting. So hopefully my PR to Julia would be merged and the issue will be fixed (still - it would be better to precompute norm to get better performance)

mcabbott · March 29, 2023, 9:22pm

Broadcasting an iterator collects it first, e.g. Broadcast.broadcastable(Iterators.map(sqrt, 1:4)). So this is the same as writing eager map.

adienes:

x ./= norm.(eachrow(x))
norm.(eachrow(x)) # returns something very surprising!
There’s absolutely no world I want to live in where the above snippet doesn’t normalize the rows’ norms to 1.

It’s worse, as x ./= norm.(eachcol(x)) also fails for other reasons. You need this mouthful: x ./= map(norm, eachslice(x; dims=1, drop=false)).

For sum instead of norm, what works fine is x ./= sum(x; dims=2). IMO we should have norm(x; dims=2), which besides avoiding aliasing problems, can be much faster.

adienes · March 29, 2023, 9:26pm

Besides @bkamins 's PR (for which I am not an expert reviewer but it seems the right way to go), would it be feasible to emit a compile-time AliasWarning or something like that when this kind of construction is encountered?

tomtom · March 30, 2023, 12:59am

yeah. Again, the violation against the intuitive use of parentheses is very dangerous I think.

Benny · March 30, 2023, 1:41am

This doesn’t seem like an issue with parentheses and order of operations at all, it seems like not recognizing that chained dot syntax fuses the operations into 1 loop. For a simpler example, a .* (b .+ c) does a[i] * (b[i] + c[i]) per iteration. Note that the parentheses does not prevent fusing, in fact it sensibly specifies the order of operations per iteration. The only way to split that example into separate loops is to broadcast separately: temp = b .+ c; a .* temp.

Generally, if your input is being changed per iteration, results will be different if loop fusion has a reducing operation like norm(eachrow(...; put plainly, reducing operations depend on the other elements, including ones changed by previous iterations. y1 was computed by broadcasting two times over an input x, which did not change in the process. y2 was changing over its own broadcast. If the iterations’ inputs are different, of course the output is different.

Changing the input to an unchanging x is a simple way to fix the issue: y2 .*= (1 ./ LinearAlgebra.norm.(eachrow(x)) );. A previous comment suggested collect, which split the line into 2 broadcast loops and made unchanging temporary data based on y2 before it was changed.

Benny · March 30, 2023, 2:19am

adienes:

x ./= norm.(eachrow(x))
norm.(eachrow(x)) # returns something very surprising!
There’s absolutely no world I want to live in where the above snippet doesn’t normalize the rows’ norms to 1.

Again the problem is that the in-place loop has a reducing operation; the changing output is also the input. In the linked PR you argued that the equivalent NumPy code does not have this issue, but it is allocating temporary arrays in separate loops of each operation. Julia’s fused broadcast loops may look similar but are not equivalent, and I think people should understand the difference.

People have suggested manually inserting unbroadcasted identity calls into a single line to unfuse loops and allocate temporary arrays, like x ./= identity(norm.(eachrow(x))). But that’s a lot of bloat with more dots, and a single line could be transformed by a macro like @unfuse x ./= norm.(eachrow(x))

Topic		Replies	Views
Broadcast macro "breaks" norm General Usage	1	262	March 5, 2020
Unexpected broadcasting behaviour (?) General Usage	2	66	January 11, 2025
Weird broadcasting behaviour General Usage question , broadcasting	1	487	May 19, 2020
Improving function performance - Broadcasting vs Loops Performance	14	1893	August 17, 2021
Broadcasting in matrix multiplication General Usage broadcasting	11	2813	March 26, 2020

Unexpected broadcasting behavior involving eachrow()

Related topics