Unexpected broadcasting behavior involving eachrow()

I’m perfectly willing to believe that there is some subtlety by which this design was determined to be reasonable, but I just can’t possibly accept the conclusion.

x = expr

Is and will always be interpreted by 99% of people “evaluate expr then assign it to the name x”, and the whole .= shenanigans is Julia for “do this in place” or “broadcast this correctly.”

I would be very surprised if anybody who is not already very familiar with Julia or this specific interaction would ever expect x .= expr(x) to behave in the mutating-input way it does, and I know I certainly would expect the RHS to be allocated first before assigning to LHS.

Why shouldn’t this be safe by default? and instead restore the current behavior with

@fuse x ./= norm.(eachrow(x))

It’s just such a sneaky bug. I guarantee there is code out there falling into this trap.

3 Likes

I don’t think it’s subtle at all, syntactic loop fusion has been explained here. The basic gist is that 1) we can broadcast any operation without having to write boilerplate (like in NumPy via C or Numba), and 2) we can avoid a lot of allocations, and by writing implicit loops in one-liners. And contrary to the examples in this thread, this works out for the most part. Entirely elementwise operations are totally fine, and even reducing operations alone aren’t a problem, it’s only in combination with in-place broadcasting that we have to be careful to allocate sensibly.

I’m not arguing that this isn’t an advantage. It’s nice to be saved from aliasing problems automatically. But would people really be happy with their code’s performance degrading, even in cases where aliasing would never be a problem?

I just think that having to manually fuse loops is a lot harder than unfusing loops. Even now, I can write separate or semicolon-delimited lines of code (though I’d need let blocks to avoid lingering variables in the global scope), and that provides some clarity to what isn’t in a broadcasting line’s loop. A @unfuse macro would just be nice in the very rare case where no adjacent dotted operations are fused.

I agree with you that

  • the default here prefers performance over correctness (as measured by fewer characters to “fix”)
  • the behavior has been documented somewhere

But I 100% prefer the numpy approach here where I am never going to be surprised by computations that appear “obvious” in fact doing something totally wrong, and silently so, under the hood.

this works out for the most part

Ok but when it doesn’t, it’s pretty brutal? If I had written code making this mistake I think it would take me a very very long time to identify, since I would never expect this behavior.

Maybe it’s just a difference in priorities. But I think we both agree that the language gives us tools to do this either 1. guaranteed-alias-free or 2. guaranteed-fusing-high-performance, and to switch behavior requires only a few characters.

I think where we disagree is that you are suggesting 2. should be default, whereas I strongly feel that 1. should be the default.

implicit loops in one-liners

Snazzy one-liners are fun, but I prefer to have more obvious and predictable (and bug-free) behavior

2 Likes

This is the proposal of the PR. Elsewhere, broadcasting does make some attempt to avoid aliasing surprises.

The downside is that it costs a copy – it ends up the same speed as out-of-place x ./ norm.(eachrow(x)). Which is anyway much slower than the best case:

julia> let x = rand(500, 100)
        @btime $x ./= norm.(eachrow($x))  # with 49182
        @btime $x ./= map(norm, eachrow($x))
        @btime foreach(normalize!, eachrow($x))  # [edit, suggested below]
        @btime $x ./= norm($x; dims=2)   # with 43459
       end;
  min 11.594 ms, mean 11.839 ms (2 allocations, 390.67 KiB)
  min 155.416 μs, mean 173.778 μs (1 allocation, 4.06 KiB)
  min 157.542 μs, mean 158.453 μs (0 allocations)
  min 44.375 μs, mean 51.069 μs (5 allocations, 4.14 KiB)
2 Likes

I think this expectation stems from the comparison between languages, and to be fair the syntax does look similar. But it’s fairly easy to see because I think of Julia’s dots as an easier way of writing loops. Explicit loops are supposedly avoided in performant Python, but when I had to “fuse loops” for performance, I’d be writing those loops in Numba. Even Pythonistas learn to be wary when mutating and iterating the same object in a loop.

I guess I just don’t see Julia dots this way. I much more frequently use them to mean “do my algebra in the correct way on this linalg object.” If I write r .= Xθ .- y, yes I technically suppose that’s (mathematically) equivalent to

for i ∈ eachindex(r)
    r[i] = X[i, :]θ - y[i]
end

But there’s no reason to me that thinking of it as a loop should be more natural than as vector subtraction. I agree that one should be careful when mutating an object being looped over, but that’s usually much more obvious when that is happening than the case here. I don’t think it’s just a python thing.

How does this work anyway? It looks like mightalias handles it somehow, but it’s not in the docs, and its docstring doesn’t really explain alias handling in general.

I don’t want to actively enter (even if I follow you with interest) in the “high” part of the discussion.
Nor, even less, to suggest what might be a “right” solution that satisfies the different needs.
I just want to report some reflections that may be useful to those who, like me, are not very familiar with the internal aspects of Julia.
And take comfort from the experts that I’m not completely off track.
I tried to find a form of the problem that was easy to follow on numbers even “by hand”

At first I thought the problematic expression was equivalent to this, which, in fact, gives no problems.

   x=[1 2; 3 4]
   x = x .* mean.(eachrow(x))

I then realized that instead it is equivalent to this

   x=[1 3; 2 4]
   x .= x .* mean.(eachrow(x))

which, put this way, actually has several problems.
the first, from what I understand, is that it can’t put real numbers into an array of integers

julia>    x .= x .* mean.(eachrow(x))
ERROR: InexactError: Int64(7.5)

the second, actually related to the first, is the unexpected result reported by the OP.

julia>    x=[1. 3; 2 4]
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia>    x .= x .* mean.(eachrow(x))
2×2 Matrix{Float64}:
 2.0   7.5
 6.0  20.0

Where in the second column one would expect what is obtained from the following expression

Which simultaneously solves the problem of putting real numbers into an matrix of integers.

  julia>    x=[1. 3; 2 4]
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia>    x = x .* mean.(eachrow(x))
2×2 Matrix{Float64}:
 2.0   6.0
 6.0  12.0

The fact is that in this expression the x on the left side (x= …) is not the same as the x on the right side.
While in the previous expression (x.=…) the two x’s are the same memory area.
I’m not sure if that’s completely correct and relevant, but I’d like to show that…

julia>    x=[1. 3; 2 4]
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia>    pointer(x)
Ptr{Float64} @0x000001e021f2d840

julia>    x .= x .* mean.(eachrow(x))
2×2 Matrix{Float64}:
 2.0   7.5
 6.0  20.0

julia> pointer(x)
Ptr{Float64} @0x000001e021f2d840  # same pointer as above

#--------------

julia>    x=[1. 2; 3 4]
2×2 Matrix{Float64}:
 1.0  2.0
 3.0  4.0

julia>    pointer(x)
Ptr{Float64} @0x000001e05c1d27a0

julia>    x = x .* mean.(eachrow(x))
2×2 Matrix{Float64}:
  1.5   3.0
 10.5  14.0

julia>    pointer(x)
Ptr{Float64} @0x000001e05c369e40

This exact example (but with sum not mean so that we do not have to depend on Statistics) I have in the testset for the PR :smile: Improve handling of aliasing of AbstractSlices by bkamins · Pull Request #49182 · JuliaLang/julia · GitHub.

1 Like

I don’t really see the problem here at all. The current behaviour is clearly correct. It’s not immediately obvious, so I would probably go wide-eyed for a moment, but after looking closer, it’s more of a facepalm situation. Of course it must behave like that. You are mutating your array while you are calculating properties of it which are used for the mutation step. That should be a huge red flag.

Here’s a better way anyway:

normalize!.(eachrow(X))
4 Likes

the problem is the implication of parenthesis on the ordering of computations differs from all other 30+ years of programming experience!

that said, in Julia (and only in Julia), whenever . broadcasting is used, we can not rely on the parenthesis to direct the order of computations. Instead, separate computations should be explicitly put in suitable order. Single line broadcasting should be avoided for clarity.

to clarify what’s the “parenthesis” and “order of computation” things, let’s consider:

case 1:

y = reshape(1.0:6, 3, 2);
julia> y .*= 1 ./ LinearAlgebra.norm.(eachrow(y))
3×2 Matrix{Float64}:
 0.242536  0.998167
 0.371391  0.997253
 0.447214  0.997234

case 2:

y = reshape(1.0:6, 3, 2);
julia> y .*= ( 1 ./ LinearAlgebra.norm.(eachrow(y))   )   # with ( )
3×2 Matrix{Float64}:
 0.242536  0.998167
 0.371391  0.997253
 0.447214  0.997234

case 1 & 2 get the same result!? That means the extra ( ) used in case 2, which intend to compute things inside first, are ignored, overrided!!!

Don’t we see it’s a problem?

normalize!.(eachrow(X))

It’s also horrifying that doesn’t work :grimacing: . What’s the point of having lovely generic and composable functions like normalize! and eachrow if we can’t compose them without fear that it’s doing something silently wrong?

This is not just a familiarity thing. My eyes did go wide when I first read the OP, but they went wider when I saw people trying to justify the interaction. I do now understand the mechanism that causes this behavior to arise, but I feel so strongly that it should be treated as an urgent bugfix that I’m not sure what else to say.

I would put pretty good money on the claim that there is at least one major package (or Base/stdlib) that makes this mistake somewhere. I agree with @tomtom. This is not just a regular language quirk; it breaks established convention across any language I’ve ever used.

1 Like

I haven’t read this entire thread, but this rule that you propose is far too vague for the language to use. It’s an example of “Do what I mean, not what I say.” The language has to adopt a more formal definition of what in-place broadcasting means. Furthermore, how are we supposed to reason about broadcasting code by the rule “do my algebra in the correct way on this object”? It’s just not a coherent API.

1 Like

.= must allocate the RHS before broadcast-assigning to LHS unless the user explicitly opts into fusion” seems coherent and specific

How many languages have you used with an in-place broadcasting syntax?

Seems to work for me:

julia> A = [1.0 2.0
            3.0 4.0];

julia> normalize!.(eachrow(A));

julia> norm.(eachrow(A))
2-element Vector{Float64}:
 0.9999999999999999
 1.0
2 Likes

But that’s exactly what in-place broadcasting is trying to avoid. It’s trying to avoid temporary array allocations!

1 Like

My point was that it works, and is also clearer. What did I miss?

The user did explicitly opt into fusion. And you get your desired behaviour, trivially, by using = instead of .=

Why force .= to behave like =, that ruins the point of it.

4 Likes

You can check the order of operations in Meta.@lower y .*= 1 ./ LinearAlgebra.norm.(eachrow(y)). The operations on the right side are already getting computed first per iteration.

Even in simpler cases without broadcasting, updating operators are expected to occur last:

julia> x = 2; x *= x + 10      ##
24
julia> x = 2; x = x * x + 10 #
14
julia> x = 2; x = x * (x + 10) ##
24
julia> x = 2; (x *= x) + 10  #
14

To be fair, that’s not documented in the Updating Operators section right now. But also to be fair, assignment and updating operators’ precedence being the lowest is documented, and this behavior is present in other languages too.

My mistake, I had thought you were implying it was also broken, but I didn’t actually try it. I retract the statement :slight_smile:

well, no, no I don’t, if my desired behavior was the “in-place” aspect. if I write a function

normalizerows!(x)
    x .= normalize.(eachrow(x))
end

Then it will be broken. Obviously this is a bad implementation but it should still work.

One of the biggest problems for me is that this violates transitivity with =, in that

x ./= norm.(eachrow(x))

Means something different than

val = norm.(eachrow(x))
x ./= val

Transitivity of assignment seems like a pretty inviolable principle to me. Anyway, it’s seeming unlikely we will convince each other this way, so I’ll stop being inflammatory. Let me take a little bit to collect my thoughts and maybe I can come back with a new angle