Performance of views vs. explicit index referencing

I spent some time trying to understand the advantages and disadvantages of using views in my code. The main reason for using views, in my case, was simplifying the code, by (as one example), having only one method to compute distances between two points in space, instead of several functions depending on type of data received (a vector and and an array, two vectors, two arrays). The use of views simplifies quite a lot the code, as the indexes of some arrays sometimes have to be passed through several functions until they are used.

However, I am noticing that using views leads to quite slower codes than passing the indexes of the elements of the vector one wants to consider.

An illustrative example is bellow, where I compute the sum of the distance between two arrays of 3D vectors. The use of views leads to the 10x slower code.

I am not sure if I have a question, but perhaps someone has something to say about this which might enlighten a better way to deal with these situations.

d(x,y,i,j) = sqrt( (x[i,1]-y[j,1])^2 + (x[i,2]-y[j,2])^2 + (x[i,3]-y[j,3])^2 )
function f1(x,y)
  dsum = 0.
  nx = size(x,1)
  ny = size(y,1)
  for i in 1:nx
    for j in 1:ny
      dsum = dsum + d(x,y,i,j)
    end
  end
  return dsum
end

d(x,y) = sqrt( (x[1]-y[1])^2 + (x[2]-y[2])^2 + (x[3]-y[3])^2 )
function f2(x,y)
  dsum = 0.
  nx = size(x,1)
  ny = size(y,1)
  for i in 1:nx
    for j in 1:ny
      dsum = dsum + d(@view(x[i,1:3]),@view(y[j,1:3]))
    end
  end
  return dsum
end

x = rand(1000,3)
y = rand(1000,3)

println(f1(x,y))

println(f2(x,y))

using BenchmarkTools

println(" With indexes: ")
@btime f1($x,$y)

println(" With views: ")
@btime f2($x,$y)

Result:

661731.9520584571
661731.9520584571
 With indexes:
  2.011 ms (0 allocations: 0 bytes)
 With views:
  19.450 ms (2000000 allocations: 122.07 MiB)

FWIW, this is greatly improved on Julia 1.5+

 With indexes:
  1.600 ms (0 allocations: 0 bytes)
 With views:
  6.563 ms (0 allocations: 0 bytes)

If you explicitly inline that d function, then you see the same performance:

julia> @inline d(x,y) = sqrt( (x[1]-y[1])^2 + (x[2]-y[2])^2 + (x[3]-y[3])^2 )
d (generic function with 2 methods)

julia> @btime f2($x,$y)
  1.502 ms (0 allocations: 0 bytes)
661146.8809553175

Edit: the core reason for this difference is that d(x, y, i, j) happens to land just under the inlining threshold (and automatically inlines), whereas d(x, y) is just above (thus needing the manual annotation). It’s a pretty interesting case since the two are effectively doing the same thing, but apparently Julia thinks the SubArray indexing is costlier.

9 Likes

Fantastic. Thank you very much. Actually it became effectively faster than index passing with that (still in Julia 1.4). One more thing to learn.