I just want to point out that only computer scientists know the meaning of the word “aliasing”, so we should use other wording in the documentation.
well, not if x
and y
alias you can’t!
related docs pr: [docs] add performance tip concerning overly-fused broadcast loops by adienes · Pull Request #49228 · JuliaLang/julia · GitHub and previous discussion Unexpected broadcasting behavior involving eachrow()
I feel that if my post had been read more closely, including the prompt I quoted immediately above that statement, one would recognize my statement had nothing to do with aliasing but rather motivated the existence of mutating reduction functions. Also note that in my examples I created non-aliasing variables for this purpose.
That said, so long as x
is row-aligned with y
, x .= sum.(eachrow(y))
still definitely works. For example, x = view(y,:,rand(axes(y,2)))
will work flawlessly. This is because the entire row is read by the non-mutating sum
, the result is computed, and only then is a value in that row overwritten. This is the same reason that map!(foo,x,x)
and x .= foo.(x)
still work despite aliasing. It’s when aliasing creates an order-dependent calculation (basically a race condition) that one gets in trouble.
I agree that the story for the !
functions can be very different than that for the rest of Julia — and I think that there we should document it.
And I think that this is very different from the “implicit” !
s: indexed and broadcasted assignment. For example, the following used to segfault Julia (and still might for some custom array types):
C = [2, 1]
C[C] .= -2^30
This is where it’s very important to firm things up; I’m trying to resurrect some of my old work on this in #26237.
I would go as far as saying that the terminology in the pending PR doc is also CompSci speak:
target
A
must not share memory with the sourceX
or the indicesinds
, otherwise the result is undefined.
It’s better, an improvement, I know what it (and aliasing) means but do scientists? I.e. “share memory”. As opposed to e.g. “SMP” (which is good). And do they really know what “undefined” actually means [here]…
I support the doc fix, or even better wording, as a stop-gap, but I want aliasing problems gone, and thus not mentioning shared memory (aliasing) bad; and later also get rid of mutable arrays and structs…
Exactly, and I checked and confirmed sum! is not used by Julia itself (and that likely goes for most ! functions), so arguably should have been left out, what I meant by why have it “at all”.
ALL ! functions are dangerous (mutability, i.e arrays if you write to them… and mutable structs in general…), even without aliasing, but I don’t want them extra dangerous with aliasing. I.e. we need e.g. push! (and sort!) and we’ve trained ! is normal in Julia, not dangerous, well actually, no more dangerous than in other mainstream-languages. push! (and sort!) and all mutability (arrays etc) is a problem, i.e. Rick Hickey’s dangerous place-oriented programming idea (PLOP) most languages use, rather than persistent-data structures (as Clojure).
I would have thought the opposite. It’s possible we disagree on what large is, with allocations it implies scanning memory twice, two O(n), still O(n). For small n however you could stack allocate (but still scan twice).
My proposal with allocating (malloc/free) was to eliminate the aliasing problem, relatively simple PR for sum! (and other similar) I could even do if people are ok with allowing allocation. In a later update the allocations could probably be optimized away. I’m not sure if that’s always possible or just in the common case. And my using aliasing_ok
idea would also be possible with or as a complement.
I would also like to not that having completely non-allocating functions is necessary for things like embedded systems and real-time systems, so even if the difference in performance is not very relevant, these non-allocating alternatives should exist.
Yeah, I’d favor “1” or “3”. I do not like “2”, because there may be intentional behavior users get from aliasing:
julia> function add!(c,a,b)
for i = eachindex(c,a,b)
c[i] = a[i] + b[i]
end
end
add! (generic function with 1 method)
julia> x = rand(44); y = copy(x); z = cumsum(x);
julia> add!(@view(y[2:end]), @view(y[2:end]), @view(y[1:end-1]))
julia> y ≈ z
true
julia> @view(x[2:end]) .= @view(x[2:end]) .+ @view(x[1:end-1]);
julia> x ≈ z
false
julia> x'
1×44 adjoint(::Vector{Float64}) with eltype Float64:
0.929882 1.68613 1.5451 1.12712 0.583532 … 1.28532 1.47994 1.48781 1.17836
julia> y'
1×44 adjoint(::Vector{Float64}) with eltype Float64:
0.929882 1.68613 2.47499 2.81326 3.05852 … 20.8214 21.5226 22.3092 22.7009
julia> z'
1×44 adjoint(::Vector{Float64}) with eltype Float64:
0.929882 1.68613 2.47499 2.81326 3.05852 … 20.8214 21.5226 22.3092 22.7009
in which case they are expecting a specific order.
If the intent is preallocation, then having a temporary allocated to copy over to the preallocation was almost certainly not intentional, so throwing an error is better so users become aware.
Not exactly. If a broadcasted operation has >1 outputs, the out
must have that many output arrays, and it can’t do anything about you providing aliased output arrays. However, if you provided inputs aliased with outputs, it will use a heuristic to see if that likely deviates from the unaliased case. If so, intermediate output arrays are allocated and written to during broadcasting, after which the results are copied over to the provided output arrays. There’s no way to opt into aliased-input-output broadcasting in NumPy.
Why is that? C[C] .= -2^30
should be (and is) equivalent to the explicit broadcast!(identity, @view(C[C]), -2^30)
, right? @view(C[C])
doesn’t result in the catastrophic C[-2^30] = -2^30
iteration because the particular view
method unalias
es the indices from the parent array:
julia> out = @view(C[C])
2-element view(::Vector{Int64}, [2, 1]) with eltype Int64:
1
2
julia> out.parent === C, out.indices === C
(true, false)
julia> out[1] = -2^30; out
2-element view(::Vector{Int64}, [2, 1]) with eltype Int64:
-1073741824
2
The aliasing issue bears resemblence to the overflow issue in integers. The initial no-check overflow decision in the name of performance, would drive a similar no-check decision for aliasing.
One method of solving the overflow issue is using a “safe” type in GitHub - JeffreySarnoff/SaferIntegers.jl: These integer types use checked arithmetic, otherwise they are as system types. . A similar approach can be taken with aliasing and have a type which would enable the checks and raise the warnings/errors in the necessary functions. Having such a type might also allow further optimizations down the line and allows programmers to express their desired level of safety even before a safer implementation is provided.
Admittedly, this may be a bit cumbersome and require more effort for an ecosystem-wide fix.
That’s exactly my point — the paths that follow from non-!
usages (like .=
and views) are the ones that should do unaliasing for you (or at least try to). I see this as different from explicitly using a !
function for several reasons:
- You didn’t write a
!
. The!
s are there for a reason — and it’s to alert you to exactly this sort of funny stuff that might happen. That’s why I really like just documenting the!
functions as @gdalle has undertaken in #50824. - Some of these usages do up-front checkbounds of arrays of indices that the caller can control assuming that they don’t change… and then use
@inbounds
. That makes this especially pernicious and dangerous in a way thatmap!
orsum!
isn’t. - They’re more common usages with syntaxes directly built-in to the language.
That’s exactly my point — the paths that follow from non-
!
usages (like.=
and views) are the ones that should do unaliasing for you (or at least try to).
The unaliasing is really the work of (earlier mentioned example):view
on its inputs, not the broadcasting on its inputs vs outputs. In-place broadcasting will happily give alias-affected results, even if the operation isn’t reduction
julia> x = ones(Int, 4); println(map!(+, @view(x[2:4]), @view(x[2:4]), x[1:3]), x) # unproblematic aliasing
[2, 2, 2][1, 2, 2, 2]
julia> x = ones(Int, 4); println((@views map!(+, x[2:4], x[2:4], x[1:3])), x) # aliasing makes iterations affect each other
[2, 3, 4][1, 2, 3, 4]
julia> x = ones(Int, 4); println(broadcast!(+, @view(x[2:4]), @view(x[2:4]), x[1:3]), x)
[2, 2, 2][1, 2, 2, 2]
julia> x = ones(Int, 4); println(@view(x[2:4]) .+= @view(x[1:3]), x)
[2, 2, 2][1, 2, 2, 2]
I’m not entirely sure about the discrepancy there, it’s not simd as I first guessed, I overwrote So broadcasting itself, not the view steps, does unalias its arguments from the destinations, the relevant lines areBroadcast.copyto!
without @simd
and the result didn’t change.
broadcast_unalias(dest, src) = dest === src ? src : unalias(dest, src)
...
preprocess(dest, x) = extrude(broadcast_unalias(dest, x))
I can make broadcasting act like map!
if I overwrite @eval Broadcast preprocess(dest, x) = extrude(x)
.
I’m not entirely sure about the discrepancy there, I’m guessing it’s because broadcasting is
@simd
andmap!
isn’t.
This has nothing to do with @simd
— it’s exactly the unaliasing we’re talking about here.
map!
and broadcast!
on a non-view x[2:4]
will of course not affect x
; they mutate the returned copy from x[2:4]
. So that’s why those are [1,1,1,1]
.
The [1, 2, 2, 2]
is what you’d expect from adding [1,1,1] + [1,1,1]
and storing it in the last three values of the array. That’s what happens when the input is unaliased from the output (as broadcast!
and .+=
does).
The [1, 2, 3, 4]
is what you’d expect from an “alias-affected” result — you’re adding one, then adding the result of that to the next location, and then again.
I just realized that when I removed the @views
from the example to unalias, printing the original array x
was pointless for comparison. Edited the example, still wrapping my head around it. figured it out.
You can see the defensive copies being made in the allocations reported by @time
:
julia> f1(x) = (map!(+, @view(x[2:4]), @view(x[2:4]), x[1:3]); nothing);
julia> f2(x) = ((@views map!(+, x[2:4], x[2:4], x[1:3])); nothing);
julia> f3(x) = (broadcast!(+, @view(x[2:4]), @view(x[2:4]), x[1:3]); nothing);
julia> f4(x) = (@view(x[2:4]) .+= @view(x[1:3]); nothing);
julia> f1(x); @time f1(x) # allocation for the written non-view x[1:3]
0.000003 seconds (1 allocation: 80 bytes)
julia> f2(x); @time f2(x) # no allocations; input aliases the output
0.000003 seconds
julia> f3(x); @time f3(x) # allocation for the written non-view x[1:3]
0.000003 seconds (1 allocation: 80 bytes)
julia> f4(x); @time f4(x) # allocation because broadcast made a copy of the `view(x, 1:3)`
0.000003 seconds (1 allocation: 80 bytes)
It’s worth noting that broadcast didn’t make a copy of the input argument view(x, 2:4)
in either f3 or f4. It identified that it is exactly the output and thus the memory access pattern will exactly match between input and output so you shouldn’t have results clobbering inputs. This is a hugely important performance optimization… but it turns out it has some holes in it. I should revitalize #31407 as well.
I don’t know if that would be considered a breaking change (because of performance drawbacks), but, besides just documenting and letting the user decide, other alternative could be that the in-place functions always copy the output before overwriting.
The !
does not mean “non-allocating”, it means that the output array will be mutated. We have some BLAS
routine calls that are mutating and allocate intermediates, for example. The unsafe version of these functions (non-copying) could receive a new name, or a type-parameter, or even don’t exist, given that for that level of performance optimization the user can just write a custom function, loop, or whatever, using the known information of the provided data structure optimally.
It’s worth noting that broadcast didn’t make a copy of the input argument
view(x, 2:4)
in either f3 or f4. It identified that it is exactly the output and thus the memory access pattern will exactly match between input and output so you shouldn’t have results clobbering inputs.
That logic being in this broadcast-specific unalias:
# v this aliasing ok v handle this problem
broadcast_unalias(dest, src) = dest === src ? src : unalias(dest, src)
That unalias-elision does not affect results for elementwise operations (without repeated aliasing in the destination, that is), but it does for operations with a reduction step. Have seen a thread complaining about .=
not handling dest === src
when the fused operation had a reducing sum
step.
This is a hugely important performance optimization… but it turns out it has some holes in it.
Just goes to show that unaliasing can be more complicated than just checking mightalias
at the start and allocating temporary outputs. I also noticed that broadcast-unaliasing is making temporary copies of the inputs rather than the outputs/destination, which I’m assuming is because a broadcasted output often is much larger than its inputs e.g. a MxN >> M+N. I bet NumPy has specific unaliasing considerations for its functions, too, and they don’t need a generic fallback because it’s impossible to make custom types and it’s difficult to just make custom elementwise functions. It’s not like Julia where I could (and did) tweak the broadcasting mechanism to factcheck myself.
…other alternative could be that the in-place functions always copy the output before overwriting.
…
The unsafe version of these functions (non-copying) could receive a new name, or a type-parameter, or even don’t exist, given that for that level of performance optimization the user can just write a custom function, loop, or whatever
I’d rather not have to reimplement a non-copying function by hand, and it doesn’t seem worth replacing an “unsafe” version with an unaliasing version, when an output-allocating version usually already exists like sum(A; dims)
for sum!(r, A)
. Throwing away performance for redundant safety seems backwards, like asking for @inbounds
to check bounds when you accidentally wrote out-of-bounds code.
Depending on the function, it could usually be cheaper to copy the inputs instead, or some kinds of aliasing is ok; broadcast!
needed its own broadcast_unalias
to do that. A generic unaliasing would be useful (well, the copying step would still be specific to the array type), but it’d throw away a lot of function-specific unaliasing optimizations, just like the copying versions. On the other hand, function-specific unaliasing would be a lot of work, and a guide on how to implement it for custom !
functions would probably be a lot more complicated than a guide on manual alias-checking/avoidance.
In my experience, people usually instinctively do the latter by preallocating zeroed output arrays entirely separate from their inputs, and they’d copy from output to input if they need to e.g. input .= output; sum!(output, input)
. The sum!(output, output)
issue would only show up when someone is going overboard with optimization.