Making views the default indexing behavior in 2.0?

What would people think about making view default indexing behavior in 2.0? It would be a pretty big change, but given how as of 1.4 views allegedly might be non-allocating, it might make sense.

5 Likes

Users coming from R already seem to find this surprising:

julia> A = [1 2 3 4 5];

julia> B = A;

julia> B[2] = 7;

julia> A
1×5 Array{Int64,2}:
 1  7  3  4  5

Perhaps the same lesson that explains that could explain that this is also the case for slicing, yet not scalar getindex.

Personally, I love optimizing code and cutting down on allocations, so I would like slicing as views. But I also don’t find @views hard to write, so I do think it’s worth considering what people would and would not find intuitive.

3 Likes

Even if views are non-allocating, they can involve an indirection for lookup or suboptimal memory access, so this could make some code (a lot) slower. Cf

https://docs.julialang.org/en/v1/manual/performance-tips/#Copying-data-is-not-always-bad-1

3 Likes

That is true, but in general, I feel like views will never be more than a few x slower, so should be the default. If you want the extra performance in the (imo less frequent) case, you can always just copy manually.

1 Like

Optimized BLAS libraries pack, copying elements into preallocated blocks for better locality.
Here is a comment explaining how important it is to performance, by mratsim who implemented a high performance BLAS in Nim.

In Fortran, the gfortran compiler uses views when they are contiguous, but copies otherwise. This is often a good heuristic, but can prevent vectorization if not inlined when using a struct-of-arrays style memory layout (where what would be the fields to the struct are distributed across the columns of a matrix). If inlined, the compiler will hopefully make the correct perform- vs eliminate-the-copy decision.
If the calling function mutates the view, gfortran also emits an unpack, to maintain the same behavior in both versions.

Perhaps a view is only ever up to several times slower than a copy, but couldn’t the same normally be said about a copy, barring fairly extreme cases, such as

foo(x) = x[1] + 1

function bar(x)
    s = zero(eltype(x))
    @inbounds @simd for i ∈ eachindex(x)
        #s += @views foo(x[i:i])
        s += foo(x[i:i])
    end
    s
end
1 Like

For better or worse, most programming languages only copy a reference/pointer when arrays are assigned. This may be surprising for R developers, but it’s the expected behavior for most of us.

I split this out from its previous thread since it’s fairly tangential.

I don’t anticipate this changing in 2.0 — we did a fairly thorough evaluation back around 0.4 and 0.5. While there are still places where views might get faster in the near future (by always putting them on the stack, for example), the power of contiguous accesses and heavy penalty for discontiguous accesses is not something that will ever change.

As crazy as it sounds, I think it’d be more likely for non-scalar indexing to go away entirely than for it to ever return views. Seriously.

10 Likes

You know, had views always been the default from the get-go, we might still have views as the default. I could see us doing the same sort of inverse evaluation during the Arraypocalypse (considering returning copies, that is), finding it a mixed bag, and deciding it’s not worth the churn. And similarly, we might have someone coming to discourse during 1.x asking if we could change them to copies in 2.0 because it’s faster in some use-cases… and I could see me giving that person the same answer I just gave you!

That’s just the thing: it’s gotta be pretty compelling for it to be worth changing the status quo. We’re not going to make breaking changes in 2.0 that are of marginal utility. It’s gotta be fairly universally compelling IMO. Maybe by the time 3.0 comes around — with one breaking release under our belt — we’ll feel a little more cavalier, but there’s still gotta be a carrot to get folks to update.

6 Likes

Isn’t view-by-default strictly more flexible, as functions can always materialize given views as dense arrays if necessary? I can imagine it would make Base and stdlib much more complex, though.

1 Like

Yes, that was expressly one of the points we weighed back in 2016:

https://github.com/JuliaLang/julia/issues/13157#issuecomment-187247540

1 Like

I really like the idea of using broadcast for this. In fact Broadcasted could play the role of a view. This is similar to some ideas toyed with in LazyArrays where getindex is accomplished by materialising a view.

3 Likes

Broadcasting is great but I think supporting boolean indexing is tricky.

1 Like