Surprised that broadcast does not use views?

I am using a simple broadcast expression, and getting a surprising (to me) result depending on whether I use views or not.

#setup
a = rand(UInt64, 1000, 1000)
b = rand(UInt64, 1000, 1000)
c = rand(UInt64, 1000);

Doing a broadcast op without views:

@benchmark $c .= $a[10,:] .& $b[:,10]

BenchmarkTools.Trial: 
  memory estimate:  15.88 KiB
  allocs estimate:  2
  --------------
  minimum time:     3.607 μs (0.00% GC)

Doing the op when employing views is much faster:

@benchmark $c .= (@view $a[10,:]) .& (@view $b[:,10])

BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  2
  --------------
  minimum time:     1.491 μs (0.00% GC)

Obviously, the explicit use of views does not need to allocate copies of the subarrays I want to operate on. However, I was very surprised that the broadcasting does not use views automatically.

I was hoping someone can help me build a more “julia philosophy coherent” mental image of why broadcast does not automatically use views. Is this something that might be automated in future versions of Julia?

Broadcast uses whatever you give it — you made the decision to use a view (or not) before you invoked the broadcast.

That is, when you write c .= a[10,:] .& b[:,10], you are essentially writing broadcast!(&, c, a[10,:], b[:,10]). So a[10,:] is the argument to the broadcast, and it is nonsensical to ask why broadcast! didn’t use a view, because the choice was made before it was called.

The real question is why Julia doesn’t use views by default. This was discussed for a long time, see e.g. range indexing should produce a subarray, not a copy · Issue #3701 · JuliaLang/julia · GitHub, Arraypocalypse Now and Then · Issue #13157 · JuliaLang/julia · GitHub, etcetera. Partly this is a historical thing — once people got used to slices giving copies, it was painful to change the default. The issue became somewhat moot once the @views macro allowed you to opt-in to views for a large block of code (e.g. an entire function).

7 Likes

The other part was that it wasn’t a clear performance win in all cases.

Note that ideally, we’d use neither views nor allocating indexing — we’d fuse the indexing operation directly into the kernel. There’s a bit of an impedance mismatch, though, making that a tough thing to do, but you can manually do it:

julia> @btime $c .= $a[10,:] .& $b[:,10];
  4.636 μs (2 allocations: 15.88 KiB)

julia> @btime @views $c .= $a[10,:] .& $b[:,10];
  2.210 μs (2 allocations: 96 bytes)

julia> @btime $c .= getindex.(($a,), 10, axes($a, 2)) .& getindex.(($b,), axes($b, 1), 10);
  2.658 μs (0 allocations: 0 bytes)

Which will win will in any given case will just depend upon what exactly you’re doing.

8 Likes

Thank you both for the explanations! I guess the root of my confusion was that I was expecting the broadcast syntactic sugar to go one step beyond what it is doing currently and provide indexing information to the broadcast function, instead of just evaluating the indexing operation. I guess this would be too restrictive though…

If views were non-allocating, views and fused indexing would be equivalent, no?

Theoretically yes, but in practice there are differences. Take the example above — that’s allocating for its views but still faster than the fused indexing! The difference is actually real and meaningful: the views pre-compute a fast linear indexing transformation and use linear indexing instead of cartesian indexing (I’m guessing that’s the difference without looking, anyways — edit: the bigger difference is bounds checking, see below!).

Fusion will be much more likely to win if you can also fuse the whole way into the computation of an index vector, whereas views would require allocating the index vector before doing the operation. There may be other differences, too.

3 Likes

I think the difference is just due to bounds-checking happening at each getindex, whereas before it was done once for the entire view.

@btime $c .= getindex.(($a,), 10, axes($a, 2)) .& getindex.(($b,), axes($b, 1), 10);
#  1.852 μs (0 allocations: 0 bytes)

g(x, i, j) = @inbounds x[i,j]
@btime $c .= g.(($a,), 10, axes($a, 2)) .& g.(($b,), axes($b, 1), 10);
#  1.492 μs (0 allocations: 0 bytes)
2 Likes

The syntax a.[10,:] is available. Would it make sense to turn this into syntactic sugar for getindex.((a,), 10, axes(a, 2)) ?

2 Likes

See https://github.com/JuliaLang/julia/issues/22858 and linked issues.

2 Likes