Unexpected broadcasting behavior involving eachrow()

Sorry to bring this discussion down a notch, but I need more clarification
I thought I understood. But still something I’m missing on the current way of functioning.
Why doesn’t getindex “suffer” from the same “problem” as eachrow?
what is different in the following expressions?

  
julia>     x=[2. 2; 3 4; 5 6]
3×2 Matrix{Float64}:
 2.0  2.0
 3.0  4.0
 5.0  6.0

julia>     x .= x .+ getindex(x,[1,2,3])
3×2 Matrix{Float64}:
  4.0   4.0
  6.0   7.0
 10.0  11.0

julia>     x=[2. 2; 3 4; 5 6]
3×2 Matrix{Float64}:
 2.0  2.0
 3.0  4.0
 5.0  6.0

julia>     x .= x .+ first.(eachrow(x))
3×2 Matrix{Float64}:
  4.0   6.0
  6.0  10.0
 10.0  16.0

Nonononono. Conventionally the ( ) indicates the whole thing inside should be computed first, not per iteration-wise.

The computation inside the ( ) is 1./LinearAlgebra.norm.(eachrow(y)), so a vector of norms (for each row of y) should be created before fusing with other things outside the ( ).

1 Like

There is no such convention for in-place broadcasting because no other language exposes this kind of feature. That’s not to invalidate your point that it seems strange if you approach it from the perspective of another language, but you can easily come up with examples like this for any language (e.g. why does --i in C not give me back the old i?).

I think one solution to this familiarity problem is to be significantly more up-front about what .= is doing behind the scenes. If I told you that y .*= 1 ./ LinearAlgebra.norm.(eachrow(y)) desugared to copy!(y, lazy_mul(y, lazy_div(1, lazy_norm(eachrow(y))))), it would be super obvious that adding an extra pair of parens does nothing: copy!(y, lazy_mul(y, ( lazy_div(1, lazy_norm(eachrow(y))) ))).

Another avenue to tackle this from would be more proactive warnings/errors in the Broadcast machinery when it detects that this kind of overlapping access is happening. The linked PR from @bkamins is arguably a start, but it would be nice to have something like Java’s ConcurrentModificationException (just for broadcasts instead of iterators).

4 Likes

x /= np.linalg.norm(x, axis=1).reshape(-1, 1) works just fine in numpy. yes the spelling is a little more verbose & clunky than in Julia, but at least it computes the correct answer!

To be fair, this also works as a more literal translation of the numpy

x ./= mapslices(norm, x, dims=2)

But this just adds another layer of surprise: why doesn’t mapslices(norm, x, dims=2) compute the same thing as norm.(eachrow(x)) ?

Again, I understand the desire for the fusing assignment, but I think it just breaks too many rules about what (the royal) “we” have come to expect about the behavior of =. I’m not saying to remove the functionality, but it should be much more obviously demarcated in the code that this use of = does not follow normal assignment conventions

FWIW, rather than “transitivity of assignment”, I think you’re actually referring to “referencial transparency”.

Your observation that

x ./= norm.(eachrow(x))

is different from

val = norm.(eachrow(x))
x ./= val

is very relevant, but also note that this latter line is not an assignment: it’s an in-place mutation of x. Referencial transparency is preserved (at least in this case) if there are only real assignments:

x = x ./ norm.(eachrow(x))

is the same as

val = norm.(eachrow(x))
x = x ./ val
3 Likes

I believe I was, thank you for the correction. As just one more:

x[:, :] = x ./ norm.(eachrow(x))

Works fine (and in-place!), so the original example also breaks the rule that x (op)= expr is equivalent to x = op(x, expr)

getindex(x,[1,2,3]) is not broadcasted.

1 Like

@ffevotte basically pointed this out, but it’s not equivalent because the RHS is always materialized and thus incurs another allocation. As noted, writing an equivalent Julia one-liner also avoids any issues.

A proper equivalent would be np.divide(x, np.linalg.norm(x, axis=1, out=x), out=x), where the out= parameter for norm is hypothetical. As you can see, an implementation of either function which doesn’t account for aliasing will exhibit the same behaviour as the Julia in-place broadcast. Thus, I see syntax confusion as only part of a larger picture. Regardless of what syntax is used, possible aliasing footguns like this should be caught at runtime and reported to the user, ideally with an explanation of how to avoid them.

No disagreement there. I think it would help to really emphasize in the docs that . inserts a loop around all dotted operators and = is not special. Once people start seeing a wrapping for ... end in their heads every time this syntax comes up, there should be far fewer surprises.

1 Like

I cannot understand how this would be the expected behaviour. Function calls are also known (in most, if not all other languages) to work as parenthesis: the full function is evaluated before returning a result. However, loop fusion (which is the reason for the broadcast notation to exist) already breaks that assumption, because it does not make the innermost function call to evaluate for all its input (and then the second-innermost gets all its input, and so on) but instead it interleaves the calls element-wise. So in f.(g.(list)) you can have calls to f happening before all calls to g happen, and calls to g happening after some calls to f already did happen.

In other words, if you understand how broadcast works over functions, then it should be no surprise how it works over parentheses.

It only “works in-place” in the sense you are not explicitly allocating an intermediary array before copying to the original array; but to work (i.e., do not change the structure until the end of the procedure) it is clear that an intermediary array is being created by Julia, and in fact this expression is just you implicitly creating an intermediary array without a name before copying to the original array.

I have to say I’m a bit surprised by the following behavior:

julia> function allocate_vec()
           println("Allocating vector")
           [1, 2, 3]
       end;

julia> function return_1(x)
           println("Returning 1")
           1
       end;

julia> x = rand(3, 2);

julia> x ./ return_1.(allocate_vec())
Allocating vector
Returning 1
Returning 1
Returning 1
Returning 1
Returning 1
Returning 1
3×2 Matrix{Float64}:
 0.174193  0.50535
 0.798471  0.323255
 0.205496  0.281456

Let’s say return_1 was a very slow function. What’s the best practice here for getting it not to evaluate 6 times? wrap in an identity call?

EDIT: I guess the right mental model for dot-fusion is

  1. Recursively search through calls for . Returns a collection of underlying non-broadcasted objects
  2. Collects those non-broadcasted objects and determines the shared output size for all of them, with shape X
  3. Calls each function marked with ., one for each element in the non-materialized shape X.

@pdeffebach Another good point… so I guess what I’m learning here is that chained broadcasting breaks referential transparency all over the place w.r.t. both outcome and performance. I’ve used the pattern you just noted many many times and didn’t realize it was probably hurting performance.

tbh, I thought this was the whole point of the @. macro, to fuse broadcast calls together. I had always just seen dot-broadcast as sugar for map (and would allocate every intermediate step!)

1 Like

Then an expression like this:

(x .+ 1) .* 2

would also have to allocate an extra temporary array (so two allocations in all). You are basically saying you don’t want dot broadcasting to fuse at all, in any situation.

Can you stop saying this, please. The answer Julia calculates is perfectly correct. It’s just needlessly inflammatory to keep repeating this.

This calculation is in fact not in-place. It allocates a temporary array.

3 Likes

I view it as incorrect. I think it takes a pretty tortured interpretation of = to see it as anything else. But I would rather be constructive than inflammatory so I will stop using the word.

When I say in-place I mean it mutates the array pointed to by that variable, I don’t mean it doesn’t allocate any new memory.

But you don’t have any problems with this:

x = 2
x = 3

How can 2 equal 3? Because = means something different in mathematics and in programming. You got used to the extremely weird meaning of = in programming, so it should be possible to adjust to .= being different.

Is this in-place, then?

temp = x ./ norm.(eachrow(x))
x[:, :] = temp

That’s what’s actually going on. I would not call that in-place, it’s a fully out-of-place operation, followed by a copy operation. Wikipedia, at least, agrees:

In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the output as the algorithm executes. An in-place algorithm updates its input sequence only through replacement or swapping of elements. An algorithm which is not in-place is sometimes called not-in-place or out-of-place.

The “small amount of extra storage space” is not compatible with allocating a full copy of the data.

These two sentences seem to be contradictory. Yes, the point is to fuse broadcast calls. And no, for that reason it cannot be the same as a sequence of maps.

I’m a bit stumped. It seems like in-place broadcasting is not for everyone. Now you know that you should use = instead of .=. Problem solved?

1 Like

To the advance the conversation, it may of interest to see how the initial expression here is lowered.

julia> begin
           using Random, LinearAlgebra
           y2 = x = randn(6,4)
           Meta.@lower y2 .*= (1 ./ LinearAlgebra.norm.(eachrow(y2)) )
       end
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1  = y2
│   %2  = *
│   %3  = y2
│   %4  = /
│   %5  = Base.getproperty(LinearAlgebra, :norm)
│   %6  = eachrow(y2)
│   %7  = Base.broadcasted(%5, %6)
│   %8  = Base.broadcasted(%4, 1, %7)
│   %9  = Base.broadcasted(%2, %3, %8)
│   %10 = Base.materialize!(%1, %9)
└──       return %10
))))
1 Like

The issue, as has been noted above, is that .= is only superficially similar to =. It is in fact quite different. It’s not an assignment, it’s an in-place mutation of a pre-existing variable.

The .= symbol is actually syntax sugar for a call to broadcast!. So this,

x .+= y

is equivalent to this,

broadcast!(+, x, x, y)

as we can see by running the latter in the REPL:

julia> x = [1, 2, 3]; y = [1, 2, 3];

julia> broadcast!(+, x, x, y);

julia> x
3-element Vector{Int64}:
 2
 4
 6
2 Likes

my answer is: yes, an allocation should be made for (x .+ 1) first. Then, this vector could be modified in-place by .* 2. So, total one allocation.

More importantly, in this manner the conventional meaning of ( ) is well preserved.

Note that following conventions makes a programming language easier to learn & understand and thus more popular.

What I would like to suggest is: introduce a new kind of bracket that allows to be fused into, e.g. {{ and }}? I believe this kind of explicit distinction would help to reduce a lot of bugs and confusions. It would be a subtle but important language improvement, maybe in 2.0?

1 Like

I think the lesson from this thread is that the precise behavior of broadcasting for confusing cases like x ./= norm.(eachrow(x)) is not very well documented in the manual. We could use a dedicated page that really explains the logic by which various broadcasting expressions should be interpreted.

For instance, I’m still trying to figure out how to translate

x ./= norm.(eachrow(x))

into a call to broadcast!. Ten internet points to anyone who figures it out.

3 Likes

I agree. I also tended to equate the dot-fusion approach and map approach, because in typical cases they will be the same in terms of desired output, but when the same variable appears on the left and the right they are clearly not. This has definite “gotcha” potential (as evidenced by the thread), so should maybe be said more explicitly and more often.

2 Likes