Making views the default indexing behavior in 2.0?

Oscar_Smith · October 12, 2019, 11:10pm

What would people think about making view default indexing behavior in 2.0? It would be a pretty big change, but given how as of 1.4 views allegedly might be non-allocating, it might make sense.

Elrod · October 13, 2019, 3:05am

Users coming from R already seem to find this surprising:

julia> A = [1 2 3 4 5];

julia> B = A;

julia> B[2] = 7;

julia> A
1×5 Array{Int64,2}:
 1  7  3  4  5

Perhaps the same lesson that explains that could explain that this is also the case for slicing, yet not scalar getindex.

Personally, I love optimizing code and cutting down on allocations, so I would like slicing as views. But I also don’t find @views hard to write, so I do think it’s worth considering what people would and would not find intuitive.

Tamas_Papp · October 13, 2019, 6:40am

Even if views are non-allocating, they can involve an indirection for lookup or suboptimal memory access, so this could make some code (a lot) slower. Cf

https://docs.julialang.org/en/v1/manual/performance-tips/#Copying-data-is-not-always-bad-1

Oscar_Smith · October 13, 2019, 6:52am

That is true, but in general, I feel like views will never be more than a few x slower, so should be the default. If you want the extra performance in the (imo less frequent) case, you can always just copy manually.

Elrod · October 13, 2019, 8:30am

Optimized BLAS libraries pack, copying elements into preallocated blocks for better locality.
Here is a comment explaining how important it is to performance, by mratsim who implemented a high performance BLAS in Nim.

In Fortran, the gfortran compiler uses views when they are contiguous, but copies otherwise. This is often a good heuristic, but can prevent vectorization if not inlined when using a struct-of-arrays style memory layout (where what would be the fields to the struct are distributed across the columns of a matrix). If inlined, the compiler will hopefully make the correct perform- vs eliminate-the-copy decision.
If the calling function mutates the view, gfortran also emits an unpack, to maintain the same behavior in both versions.

Perhaps a view is only ever up to several times slower than a copy, but couldn’t the same normally be said about a copy, barring fairly extreme cases, such as

foo(x) = x[1] + 1

function bar(x)
    s = zero(eltype(x))
    @inbounds @simd for i ∈ eachindex(x)
        #s += @views foo(x[i:i])
        s += foo(x[i:i])
    end
    s
end

ninjaaron · October 15, 2019, 9:14pm

For better or worse, most programming languages only copy a reference/pointer when arrays are assigned. This may be surprising for R developers, but it’s the expected behavior for most of us.

mbauman · October 15, 2019, 9:31pm

I split this out from its previous thread since it’s fairly tangential.

I don’t anticipate this changing in 2.0 — we did a fairly thorough evaluation back around 0.4 and 0.5. While there are still places where views might get faster in the near future (by always putting them on the stack, for example), the power of contiguous accesses and heavy penalty for discontiguous accesses is not something that will ever change.

As crazy as it sounds, I think it’d be more likely for non-scalar indexing to go away entirely than for it to ever return views. Seriously.

mbauman · October 15, 2019, 9:43pm

You know, had views always been the default from the get-go, we might still have views as the default. I could see us doing the same sort of inverse evaluation during the Arraypocalypse (considering returning copies, that is), finding it a mixed bag, and deciding it’s not worth the churn. And similarly, we might have someone coming to discourse during 1.x asking if we could change them to copies in 2.0 because it’s faster in some use-cases… and I could see me giving that person the same answer I just gave you!

That’s just the thing: it’s gotta be pretty compelling for it to be worth changing the status quo. We’re not going to make breaking changes in 2.0 that are of marginal utility. It’s gotta be fairly universally compelling IMO. Maybe by the time 3.0 comes around — with one breaking release under our belt — we’ll feel a little more cavalier, but there’s still gotta be a carrot to get folks to update.

tkf · October 15, 2019, 10:01pm

Isn’t view-by-default strictly more flexible, as functions can always materialize given views as dense arrays if necessary? I can imagine it would make Base and stdlib much more complex, though.

mbauman · October 15, 2019, 10:38pm

Yes, that was expressly one of the points we weighed back in 2016:

https://github.com/JuliaLang/julia/issues/13157#issuecomment-187247540

dlfivefifty · October 21, 2019, 10:09pm

I really like the idea of using broadcast for this. In fact Broadcasted could play the role of a view. This is similar to some ideas toyed with in LazyArrays where getindex is accomplished by materialising a view.

tkf · October 21, 2019, 11:09pm

Broadcasting is great but I think supporting boolean indexing is tricky.

Topic		Replies	Views
Slices: should they default to views? General Usage	11	830	June 20, 2022
Too many allocations when indexing with slices Performance indexing , memory-allocation	16	2746	August 17, 2018
When should you use views? Performance	9	2935	June 10, 2019
Surprised that broadcast does not use views? Performance broadcast	8	1209	June 26, 2019
Could (non) @view be made faster? I.e. @view be implicit using read-only arrays? Internals	10	662	August 1, 2023

Making views the default indexing behavior in 2.0?

Related topics