What's the aliasing story in Julia

So let’s say I’m looking at a function that takes multiple mutable arguments, say, any! or extrema! or sortperm!, or sum!, or copyto!, or mul!. Who is responsible for handling aliasing? Am I ever allowed to pass in two objects that share memory and what can I expect if I do?

(c.f. aliasing in Julia · Issue #8087 · JuliaLang/julia · GitHub, sum!, prod!, any!, and all! may silently return incorrect results · Issue #39385 · JuliaLang/julia · GitHub, mitigate in-place reducedim alias bug by Moelf · Pull Request #45821 · JuliaLang/julia · GitHub)

4 Likes

Just noting in advance before the conversation inevitably gets heated that the following two things can simultaneously be true:

  • Aliasing input arguments to mutating functions opens the door to surprising behaviors, and ultimately the onus is on the user to ensure their code is doing what they think it is
  • Julia’s aliasing story can almost certainly be improved and the language provides very few safeguards against encountering (or documentation for) said surprising behavior

I think a reasonable benchmark might be Numpy with its out= keyword on many functions, e.g.

>>> np.add(x, x, out=x)
10 Likes

In my experience, it is sometimes taken into account and sometimes not. I think the best way forward would be to go through each function one by one and add test that it works correctly with aliasing.

For those functions that cannot be implemented efficiently for aliasing arrays, a warning could be added to the docs, and Base.mightalias (which is efficiently implemented) can be used to check for aliasing at runtime and throw an exception.

4 Likes

we likely need to update the interfaces documention to say that types which wrap arrays and allow mutation should also overload Base.mightindex
Since the wrapper arrays in the stdlib get it right, but the ecosystem doesn’t all get it right. OffsetArrays.jl does, but NamedDims.jl doesn’t.

julia> x = [1,2,3];

julia> Base.mightalias(x, x)
true

julia> Base.mightalias(x, view(x, :))
true

julia> Base.mightalias(x, x')
true

julia> Base.mightalias(x, view(x', :))
true

julia> using OffsetArrays; Base.mightalias(x, OffsetArray(x))
true

julia> using NamedDims; Base.mightalias(x, NamedDimsArray{(:foo,)}(x);)
false

Still this seems a reasonable request to put on to package authors.

if mightalias is indeed the correct function to use for this.

1 Like

Numpy’s aliasing detection is indeed quite interesting — but it’s also a wildly different class of problem than what we could potentially address as their rough analog of an “abstract array” is the strided array interface. I looked at what they do as a reference when implementing the beginnings of our “aliasing story”, but as you might expect their detection is explicitly targeted at (and limited to) strided arrays. The trade-off is, of course, that they can’t represent non-strided arrays, so sometimes things are views and sometimes things aren’t.

They say this in their docs:

A potential drawback is that writing to a view can alter the original as well. If this is a problem, NumPy instead needs to create a physically distinct array – a copy.

Some NumPy routines always return views, some always return copies, some may return one or the other, and for some the choice can be specified. Responsibility for managing views and copies falls to the programmer.

I couldn’t find any documentation about aliasing (or sharing memory) in the context of out= kwargs, and I don’t see them trying to detect aliasing in such cases, but I may be missing it.

4 Likes

The correct function to overload to opt-in to having a custom array participate in aliasing detection is Base.dataids.

4 Likes

Having any function that accepts multiple mutable arguments and wants to assume they are not aliased call mightalias seems like a bad solution because mightalias has false positives, false negatives, runtime overhead, and is verbose.

I like the numpy policy of “Responsibility for managing views and copies falls to the programmer.” coupled with a good way to document (maybe formally) when it is acceptable for arguments to alias each other. Perhaps we can typically assume that aliasing is okay among arguments that are not to be mutated and not okay between an argument that is to be mutated and any other argument unless documented otherwise? With that default norm, #39385 is not a bug at all.

10 Likes

Just note that it’s not a simple dichotomy: map!(f, x, x) works but @views map!(f, x[(begin + 1):end], x[begin:(end - 1)]) does not (per https://github.com/JuliaLang/julia/pull/50824#issuecomment-1668634706).

1 Like

That seems like it’ll involve a whole lot of other problems. These in-place versions of methods exist because we want no allocations. Numpy allocates copies to ensure the results are the same as results from unaliased inputs, which is more complicated than just copying upon alias detection, and may allocate needless copies because it’s expensive to fully prove:

Operations where ufunc input and output operands have memory overlap are defined to be the same as for equivalent operations where there is no memory overlap. Operations affected make temporary copies as needed to eliminate data dependency. As detecting these cases is computationally expensive, a heuristic is used, which may in rare cases result in needless temporary copies. For operations where the data dependency is simple enough for the heuristic to analyze, temporary copies will not be made even if the arrays overlap, if it can be deduced copies are not necessary. As an example, np.add(a, b, out=a) will not involve copies.

Since alias-checking isn’t guaranteed to be cheap or no-cost, I’d rather that there be a method that doesn’t check. If I’m repeating a call in a loop (results can change with mutated inputs), then I could elect to check aliasing once prior to the loop to save time. My first thought is adding a keyword argument defaulting to no checks, but I’m not sure. Checking true/false may not be no-cost, and dispatching on singleton types can be unstable if I vary whether I check or not.

1 Like

But does anyone have a clue what that list of functions is? I found some in the PR below but I can’t go through every fun! in the codebase, and so I’m afraid the list is rather partial:

1 Like
julia> filter(map(String, names(Base))) do i
           endswith(i, '!')
       end

This list has only 74 elements - and many of them such as ! and take! can be disregarded straight away. It’s not complete, because some functions are hiding in submodules of Base, and some are not exported but documented nonetheless. Anyway, it’s a good place to start.

5 Likes

Would broadcast! be an important one? Should have the same issue as map! pointed out before, and I recall another aliasing frustration thread proposing automatic intermediate copies because they were broadcasting a reduction over axes including previous iterations. The tricky thing about this is in-place broadcasting is more often done with .= .+= expressions that are lowered to other function calls, not broadcast!, and it’s a function without a !, broadcasted, that needs an alias check.

Was thinking about having a version without aliasing checks and a version that checks for aliasing then calls the first version. But then it occurs to me, can alias-handling even be a general call? As pointed out, sometimes aliasing doesn’t cause a different result from nonaliased inputs, depending on the function (and I can’t rule out the input types), and NumPy uses a heuristic to reduce some (not all) unnecessary copying rather than a straightforward deep copy upon detecting aliasing. But it doesn’t seem good to implement alias handling by call signature in a language with easy custom functions, types, and composition.

So I’m thinking maybe we can get away with simpler deep copies upon detecting aliasing, at least that doesn’t depend on the function, though the alias check depends on the input types. If incorporating this into alias-handling versions is too expensive (like if the call is in a loop), maybe it should just stay a separate manual step like sum!(unalias(a, a)...) or a1, a2 = unalias(a, a); for _ in 1:1000 sum!(a1, a2) end. This should probably exist anyway as an option if alias-handing versions don’t exist, yet or ever.

Tricky aspect of the first call is how to specify the results to not copy. It’s usually the first argument, but a mutating function can write to >1 inputs. If multiple destinations alias, it doesn’t even make sense to write results to them. Here’s a NumPy example of what happens if result arrays are aliased:

import numpy as np

# make a (slow) +- ufunc that broadcasts over arguments
def plusminus(x,y): return (x-y,x+y)
uplusminus = np.frompyfunc(plusminus, 2, 2) # 2 in, 2 out

# write 1+-1 to 2 aliased vectors
a = np.zeros(3)
a01 = a[:2]
a12 = a[1:]
uplusminus(1, 1, out = (a01, a12))
print((a01, a12))
# (array([0., 2.]), array([2., 2.]))

# write 1+-1 to 2 non-aliased vectors
b01 = np.zeros(2)
c01 = np.zeros(2)
uplusminus(1, 1, out = (b01, c01))
print((b01, c01))
# (array([0., 0.]), array([2., 2.]))
#             ^ aliasing overwrites this element

As much as NumPy copies to avoid the effects of aliasing, it can’t do anything if you specify aliased result arrays.

1 Like

What is the correct behavior? Semantically writing to a temporary, and then copying it over when done?

2 Likes

That’s the only way to make it correct unfortunately.

1 Like

One of them, yes. They expect to write to preallocated outputs as if the inputs do not change, but inputs do change between iterations when aliased with outputs. This affects results if overwritten iterations are used in subsequent iterations, so reducing operations are more susceptible. Aliasing unchanging inputs with each other would not affect results, so people don’t really care about that. People seem to lean toward 1 of these 3:

  1. an error thrown to stop them from aliasing inputs with outputs,
  2. automatically decouple the inputs from the outputs, whether it’s copying the aliased inputs or writing to intermediate outputs,
  3. find and document mutating methods affected by aliasing, gives the responsibility to the user. Maybe we can have a function to facilitate and standardize de-aliasing.

A frequent argument against (2) is that if we’re doing mutating methods, we’re probably going for 0 allocations, and there is usually (not always) nonmutating versions that allocate outputs we can reassign to variables. I see a couple problems: a) (1) and (2) stops us from doing aliased weird iterations on purpose to write a loop more simply, b) (2) may mislead people into thinking all aliasing is handled automatically, but this can’t do anything about specified outputs aliasing each other. If we must manually unalias outputs from each other, why not unalias outputs from the inputs while we’re at it?

5 Likes

After following this discussion, I am somewhat confused about the purpose of these mutating methods. It seems to me that some people just use them whenever they want the result written back to some variable which they could achieve with an (inplace) assignment. Then sometimes stuff breaks because inputs where aliased and that’s why they want some changes made (outlined in the above).

For me these mutating methods are something I apply when optimizing code. I essentially never use them initially or when writing one-shot code. That’s my understanding of these functions: they are more low-level and useful for optimization. So adding additional, maybe expensive, maybe imperfect, checks kind of defeats their purpose. I’d say, if you want safety, just use the normal version and only touch the “!” version if you really need to.

If this understanding is correct, then I think we should go the documentation/education route. Teach people that the “!” means danger, less safety nets and should not be used as default.

17 Likes

I do agree with being confused by this discussion. One should always assume aliasing is not allowed if the object is mutable and passed to more than one parameter (for which at least one of them modify it). It is responsibility of the caller to guarantee this never happens. It is responsibility of an implementer of a function that allows aliasing to make this clear in the documentation. There is little sense in trying to make every function to deal with this, as detection is not even guaranteed to be possible (distinct objects can have the same mutable object stored in some field which is changed by a function).

4 Likes

I’m thinking could we do that, actually, not just semantically (i.e. an actual allocation, it could be thrown away immediately after though to not stress GC). I mean you would think people using and seeing sum! (and mul!) etc. would know about mutability, and maybe aliasing, but that’s probably not true. Scientists, are not necessarily computer scientists.

I didn’t look into all these functions, but why do we even have sum! at all? It seems not needed, redundant with sum, semantically. I think we could have:

using aliasing_ok

and it would override sum! etc. to the old behavior (or even add these functions back in case were were to consider 2.0 and excision), and CompSci people could opt into that once at the top of their program. If scientists do that too, then they should know, and consent to what they are doing… We could name that module differently or have it as a function, enable_scary_aliasing()…

I see at NumPy for out= (for np.add):

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned.

It seems that you need to opt into aliasing there that way. I think we could do that too, both local and the global option.

One definition of what should be in Base (or anywhere in Julia) is what Julia itself needs. sum! doesn’t fit the bill (not even sum either?), and is only in for historical reasons. [Neither does even LinearAlgebra that I’ve excised privately in my personal “Julia 2.0”, for 8% faster startup because of smaller sysimage, and using LinearAlgebra recovers all compatibility.]

I also share the point of view that it is sufficient to just add a warning about aliasing to the docstrings that asserts the result is undefined when there is aliasing between the inputs and output. If you want safety from aliasing affects, use the non-mutating versions. If you need performance and you are confident that your output array is not aliased with any of the input arrays, then go ahead and use the mutating version.

4 Likes

One could write x .= sum.(eachrow(y)) to sum each row of y and store the result to x. However, this traverses y row-wise, which is highly suboptimal due to constant cache misses. The sum! function reorders the calculations so that it can traverse the argument efficiently.

julia> using BenchmarkTools

julia> y = randn(500,500); x = randn(500,1);

julia> @btime x .= sum.(eachrow($y)); # bad for cache
  206.900 μs (1 allocation: 32 bytes)

julia> @btime x .= sum.(eachcol($y)); # good for cache
  31.300 μs (1 allocation: 32 bytes)

julia> @btime sum!($x,$y); # compare to eachrow
  32.100 μs (0 allocations: 0 bytes)

julia> @btime sum!($x',$y); # compare to eachcol
  31.500 μs (0 allocations: 0 bytes)

julia> @btime sum($y;dims=2); # compare to eachrow
  38.400 μs (5 allocations: 4.14 KiB)

julia> @btime sum($y;dims=1); # compare to eachcol
  32.600 μs (1 allocation: 4.06 KiB)

Is this a critical function? No. And the allocation has a pretty minimal impact on performance except at small sizes. But someone would want allocation-free efficient sum and it’s annoying to write oneself, so I guess we have sum!.

Personally, I don’t think I’ve ever used sum! and seldom use sum(_;dims) since Julia makes arrays-of-arrays so usable. But that’s my take on the reason in-place reduction functions exists.

2 Likes