I’m thinking about making a custom array type that defaults to creating a view when sliced. Is this a bad idea? Would it break things (if so, what things)?
For clarity, this question is definitely not to resurrect the debate about views for Array
(Slices: should they default to views?, Arraypocalypse Now and Then · Issue #255 · JuliaLang/LinearAlgebra.jl · GitHub, Array slices as views - what's the state of affairs?). The question is only about having a custom array type for users who find that (almost) everywhere they need to use @views
, find this inconvenient, and would like to explicitly opt-in to views as the default.
More, maybe unnecessary, details: inspired by a comment on a question about removing bounds-checking on arrays in a more convenient way (Removing bounds checking for HPC - `--check-bounds=unsafe`? - #17 by greatpet) I started developing an InboundsArrays.jl package that makes all array-accesses @inbounds
. It seems like it might be simple to extend that package to (optionally, controlled by a type-parameter) make slices return views by default, so I’m wondering if there are gotchas that are obvious to anyone before I start working on that feature.
The biggest risk with a views-by-default array is that the “interface” (if somewhat informal) assumes that calling getindex
on an array will create an array that does not alias with the parent. Correctness issues will likely arise when using this default-views array in other code. It might be that in many places the new type is simply unusable.
The issue with always-inbounds is that it turns code that people expect to be “safe” (have defined behavior) into undefined behavior without the usual hazard sign. I recall some discussion around --check-bounds=false
also indicating that this might occasionally reduce performance due to interactions with compiler assumptions, but am not equipped to give a full explanation of that.
In general, it’s better to write code in ways that the compiler itself recognizes them as inbounds (e.g., use eachindex
to iterate arrays. You should only need @inbounds
in somewhat-rare cases where the compiler “can’t” “reasonably” infer that an access is inbounds (and also where it’s actually performance sensitive). For example, indexing by the results of findall
(although that is usually not performance-sensitive because it is unlikely to SIMD).
As someone who evaluated the impact of doing this many years ago, I can say that aliasing is pernicious and very challenging. There is one thing that’s slightly better than it was back then: we do have rudimentary aliasing checks (and automatic unaliasing) for some operations now. But it’s far from a panacea: Add some aliasing warnings to docstrings for mutating functions in Base by gdalle · Pull Request #50824 · JuliaLang/julia · GitHub. Any function you call will be assuming that x[:,1]
makes a copy. As just one example, I know some folks use idioms like sum!(x[:,1], x)
for dimensional reductions. These will silently break.
There are many such examples.
If I were you I’d make this new array type immutable, then I think it’s safe and array-API-compliant.
To avoid aliasing issues, you’ll need to ensure that the array being wrapped cannot be modified elsewhere (i.e. the wrapper array “owns” the wrapped array). This could be either the user explicitly says the wrapped array cannot be modified, or the wrapper can make a copy of the wrapped array, or the wrapped array is already immutable.
You can always get a mutable version of the wrapped array out by copy
it.
3 Likes
I’m (at least for the moment) ignoring issues about whether my array type ‘owns’ the wrapped array, and just assuming that it does actually have the only copy - that’s the way I intend to use it (was that what you meant by “the user explicitly says the wrapped array cannot be modified”?).
I think this is a cunning idea. Actually I want to make a slight modification to your suggestion (I think it’s a modification): a separate type to be returned by the slice operation, which is similar to my original type but immutable (immutable just means setindex!()
errors, right?). Then the ‘main’ type (that I proposed originally) can be mutable, so I can use it just like Array, but rather than just hoping that my auto-produced slices aren’t aliased and used wrong, I get an error if they are, which can then be fixed.
This seems much better than returning a view and crossing my fingers! Thanks!
1 Like
The problem is that exactly the same as sum!(x[:, 1], x)
, but with the opposite order. Something like LinearAlgebra.mul!(x, x[:, 1:1], x[1:1, :])
is perfectly valid and correct with the normal semantics, but will silently give the wrong answer with views.
It’s probably less common to do something like that (particularly as a one-liner mutating function like the above), but it can definitely still cause issues. Any function you call might assume that mutating x
won’t affect a slice it previously took.
1 Like
OK, so another slightly bonkers alternative to make this safe…
In my own package, I know that I always want views of my ‘Array-like’ struct AutoView
, and I can make it a rule that I must avoid aliasing, etc. However, I cannot trust any other package to follow those rules. Therefore I don’t make AutoView
a subtype of AbstractArray
. Any time I pass an AutoView
to a function from another package, I therefore either explictly extract the wrapped Array
and pass that, or (more conveniently) add a method for the function from the other package that extracts the array for me. Whatever custom array type I define, I probably often need that kind of wrapper anyway, because there are often optimised implementations (e.g. for Array
rather than AbstractArray
) that I would want to use. This wrapper requirement probably makes it impossible to distribute AutoView
as a general-purpose package, but it could make my life easier.