Thanks for the inputs. I believe the Performance Tips in the manual might need an update. As of now, the pros and cons of view()
are not very clear. Also, the often mentioned (at Discourse) recommendation of using view()
should perhaps be qualified.
Perhaps the relative merits of copies/views depend on OS/hardware (cache size, latency etc), but this is what I understand from some simple calculations on an old Win10 notebook:
Views can improve performance when the indices are contiguous and supplied as a UnitRange (and perhaps a StepRange). In most other cases, a copy performs better.
An illustration of these claims:
N = 8_000
x = rand(N)
f1(x,inds) = sum(x[inds])
f2(x,inds) = sum(view(x,inds))
println("vector of length $N, 1st is 'x[inds]', 2nd is view(x,inds)")
println("\ninds is UnitRange")
inds = 1:cld(N,2)
@btime f1($x,$inds)
@btime f2($x,$inds)
println("\ninds is StepRange")
inds = 1:10:N
@btime f1($x,$inds)
@btime f2($x,$inds)
println("\ninds is a contiguous vector")
inds = collect(1:cld(N,2))
@btime f1($x,$inds)
@btime f2($x,$inds)
println("\ninds is a vector of random indices")
inds = rand(1:N,cld(N,2))
@btime f1($x,$inds)
@btime f2($x,$inds)
println("\ninds is a BitVector")
inds = 1:N .> cld(N,2)
@btime f1($x,$inds)
@btime f2($x,$inds)
which gives (on my slow Win10 notebook):
vector of length 8000, 1st is 'x[inds]', 2nd is view(x,inds)
inds is UnitRange
2.200 μs (2 allocations: 31.30 KiB)
350.237 ns (0 allocations: 0 bytes)
inds is StepRange
970.833 ns (1 allocation: 6.38 KiB)
958.333 ns (0 allocations: 0 bytes)
inds is a contiguous vector
4.586 μs (2 allocations: 31.30 KiB)
5.917 μs (0 allocations: 0 bytes)
inds is a vector of random indices
4.833 μs (2 allocations: 31.30 KiB)
5.917 μs (0 allocations: 0 bytes)
inds is a BitVector
3.900 μs (2 allocations: 31.30 KiB)
7.400 μs (2 allocations: 31.30 KiB)