Surprised that broadcast does not use views?

Krastanov · June 25, 2019, 6:42pm

I am using a simple broadcast expression, and getting a surprising (to me) result depending on whether I use views or not.

#setup
a = rand(UInt64, 1000, 1000)
b = rand(UInt64, 1000, 1000)
c = rand(UInt64, 1000);

Doing a broadcast op without views:

@benchmark $c .= $a[10,:] .& $b[:,10]

BenchmarkTools.Trial: 
  memory estimate:  15.88 KiB
  allocs estimate:  2
  --------------
  minimum time:     3.607 μs (0.00% GC)

Doing the op when employing views is much faster:

@benchmark $c .= (@view $a[10,:]) .& (@view $b[:,10])

BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  2
  --------------
  minimum time:     1.491 μs (0.00% GC)

Obviously, the explicit use of views does not need to allocate copies of the subarrays I want to operate on. However, I was very surprised that the broadcasting does not use views automatically.

I was hoping someone can help me build a more “julia philosophy coherent” mental image of why broadcast does not automatically use views. Is this something that might be automated in future versions of Julia?

stevengj · June 25, 2019, 6:57pm

Broadcast uses whatever you give it — you made the decision to use a view (or not) before you invoked the broadcast.

That is, when you write c .= a[10,:] .& b[:,10], you are essentially writing broadcast!(&, c, a[10,:], b[:,10]). So a[10,:] is the argument to the broadcast, and it is nonsensical to ask why broadcast! didn’t use a view, because the choice was made before it was called.

The real question is why Julia doesn’t use views by default. This was discussed for a long time, see e.g. range indexing should produce a subarray, not a copy · Issue #3701 · JuliaLang/julia · GitHub, Arraypocalypse Now and Then · Issue #13157 · JuliaLang/julia · GitHub, etcetera. Partly this is a historical thing — once people got used to slices giving copies, it was painful to change the default. The issue became somewhat moot once the @views macro allowed you to opt-in to views for a large block of code (e.g. an entire function).

mbauman · June 25, 2019, 7:21pm

The other part was that it wasn’t a clear performance win in all cases.

Note that ideally, we’d use neither views nor allocating indexing — we’d fuse the indexing operation directly into the kernel. There’s a bit of an impedance mismatch, though, making that a tough thing to do, but you can manually do it:

julia> @btime $c .= $a[10,:] .& $b[:,10];
  4.636 μs (2 allocations: 15.88 KiB)

julia> @btime @views $c .= $a[10,:] .& $b[:,10];
  2.210 μs (2 allocations: 96 bytes)

julia> @btime $c .= getindex.(($a,), 10, axes($a, 2)) .& getindex.(($b,), axes($b, 1), 10);
  2.658 μs (0 allocations: 0 bytes)

Which will win will in any given case will just depend upon what exactly you’re doing.

Krastanov · June 25, 2019, 7:32pm

Thank you both for the explanations! I guess the root of my confusion was that I was expecting the broadcast syntactic sugar to go one step beyond what it is doing currently and provide indexing information to the broadcast function, instead of just evaluating the indexing operation. I guess this would be too restrictive though…

cstjean · June 25, 2019, 7:52pm

If views were non-allocating, views and fused indexing would be equivalent, no?

mbauman · June 25, 2019, 8:02pm

Theoretically yes, but in practice there are differences. Take the example above — that’s allocating for its views but still faster than the fused indexing! The difference is actually real and meaningful: the views pre-compute a fast linear indexing transformation and use linear indexing instead of cartesian indexing (I’m guessing that’s the difference without looking, anyways — edit: the bigger difference is bounds checking, see below!).

Fusion will be much more likely to win if you can also fuse the whole way into the computation of an index vector, whereas views would require allocating the index vector before doing the operation. There may be other differences, too.

Per · June 26, 2019, 6:35am

I think the difference is just due to bounds-checking happening at each getindex, whereas before it was done once for the entire view.

@btime $c .= getindex.(($a,), 10, axes($a, 2)) .& getindex.(($b,), axes($b, 1), 10);
#  1.852 μs (0 allocations: 0 bytes)

g(x, i, j) = @inbounds x[i,j]
@btime $c .= g.(($a,), 10, axes($a, 2)) .& g.(($b,), axes($b, 1), 10);
#  1.492 μs (0 allocations: 0 bytes)

Per · June 26, 2019, 7:18am

The syntax a.[10,:] is available. Would it make sense to turn this into syntactic sugar for getindex.((a,), 10, axes(a, 2)) ?

fredrikekre · June 26, 2019, 7:35am

See https://github.com/JuliaLang/julia/issues/22858 and linked issues.

Topic		Replies	Views
Can I avoid allocations when broadcasting over slices? Performance	3	172	August 4, 2023
Broadcasting on views General Usage broadcast , views	2	152	August 22, 2024
Array broadcasting slower than numpy? Performance	20	722	June 4, 2022
Why is broadcasting allocating so much memory? Performance	8	726	May 21, 2020
Why is there memory allocation and time difference in @views and view? Performance memory-allocation	9	485	March 29, 2021

Surprised that broadcast does not use views?

Related topics