Declare vector type or not / views / subarrays

Consider the example, in which I want to perform some calculations on a part of vector:

julia> f( x :: Vector{Float64} ) = (x[1]^2 + x[2]^2 + x[3]^2)
f (generic function with 1 method)

julia> g( x :: AbstractArray ) = (x[1]^2 + x[2]^2 + x[3]^2)
g (generic function with 1 method)

julia> x = rand(10) ;

julia> @btime f(x[1:3])
  36.707 ns (1 allocation: 112 bytes)
0.5715412873805833

julia> @btime g(@view(x[1:3]))
  31.029 ns (2 allocations: 64 bytes)
0.5715412873805833

julia> @btime g(x[1:3])
  51.352 ns (2 allocations: 128 bytes)
0.5715412873805833

The question is: the optimal choice appears to allow abstract arrays into the function, and pass on β€œviews”. However, if the input vector is not a subarray, the computation using the explicit declaration of the type (Vector{Float64}) results in faster calculations:

julia> y = rand(3);

julia> @btime f(y)
  3.808 ns (0 allocations: 0 bytes)
0.6112547604892745

julia> @btime g(y)
  15.278 ns (1 allocation: 16 bytes)
0.6112547604892745

The β€œproblem” is that the function with the explicit declaration of the type of vector does not accept views as input. I am not sure how to deal with this situation, except by duplicating the functions, but it not clear to me which function will be called in each case, in a situation like this:

julia> f( x :: AbstractArray ) = (x[1]^2 + x[2]^2 + x[3]^2)
f (generic function with 2 methods)

julia> f( x :: Vector{Float64} ) = (x[1]^2 + x[2]^2 + x[3]^2)
f (generic function with 2 methods)

julia> x = rand(3);

julia> @btime f(x)
  7.033 ns (1 allocation: 16 bytes)
0.6452099587041162

julia> @btime f(x[1:3])
  44.356 ns (2 allocations: 128 bytes)
0.6452099587041162

julia> @btime f(@view(x[1:3]))
  34.107 ns (2 allocations: 64 bytes)
0.6452099587041162

Edit: is seems, from the allocations, that the function accepting Abstract arrays is being called in all three cases in this last example. Thus, the function with the explicit declaration is not used at all, and one looses performance.

No, it does dispatch according to the most specific type (Vector{Float64} in the two first cases):

julia> @which f(x)
f(x::Array{Float64,1}) in Main at REPL[3]:1

julia> @which f(x[1:3])
f(x::Array{Float64,1}) in Main at REPL[3]:1

julia> @which f(@view(x[1:3]))
f(x::AbstractArray) in Main at REPL[2]:1

I don’t understand exactly how it works, but the difference in allocations you see in @btime when f has 1 or 2 methods dissapears if you use the recommended syntax interpolating the arguments:

julia> @btime f($x)
  2.399 ns (0 allocations: 0 bytes)
1.7296597608499495
2 Likes

Yes, this is just a spurious result, due to erroneous benchmarking. Remember to interpolate.

2 Likes

Oh, thank you. With that the times are almost identical in all cases:

julia> f( x :: Vector{Float64} ) = (x[1]^2 + x[2]^2 + x[3]^2)
f (generic function with 1 method)

julia> g( x :: AbstractArray ) = (x[1]^2 + x[2]^2 + x[3]^2)
g (generic function with 1 method)

julia>  @btime f($(x[1:3]))
  2.250 ns (0 allocations: 0 bytes)
1.8551075747910126

julia>  @btime g($(x[1:3]))
  2.248 ns (0 allocations: 0 bytes)
1.8551075747910126

julia>  @btime g($(@view(x[1:3])))
  2.236 ns (0 allocations: 0 bytes)
1.8551075747910126

julia> y = rand(3);

julia> @btime f($y)
  2.380 ns (0 allocations: 0 bytes)
1.1112077517574401

julia> @btime g($y)
  2.311 ns (0 allocations: 0 bytes)
1.1112077517574401

Yes. According to the description of SubArrays, I don’t expect f(x) to perform very differently if x is an array, or a contiguous subarray. On the other hand, using more complex subarrays, with nonuniform strides, could be more expensive than using a plain array. What matters is the balance between this overhead of creating and striding through subarrays vs. copying an array slice. This many depend on how many times you are using the subarray/slice, as stated in the performance tips.

Since there is a trade-off between both options, micro-benchmarks like the one you presented may be misleading. It may be better to benchmark a function that has arrays of the actual size you want to use, slices with strides like those you have to use, and similar loops etc. that you use to operate with the subarrays/slices.

By the way, in the toy example, I think you might have only defined f(x::AbstractArray). Since your definition of f(x::Vector{Float64}) is the same operation, the specialised code of f(x::AbstractArray) when dispatching on Vector{Float64} should be the same. Cf:

julia> f( x :: Vector{Float64} ) = (x[1]^2 + x[2]^2 + x[3]^2)
f (generic function with 1 method)

julia> g( x :: AbstractArray ) = (x[1]^2 + x[2]^2 + x[3]^2)
g (generic function with 1 method)

julia> x = rand(10);

julia> @code_typed f(x)
CodeInfo(
1 ─ %1 = Base.arrayref(true, x, 1)::Float64
β”‚   %2 = Base.mul_float(%1, %1)::Float64
β”‚   %3 = Base.arrayref(true, x, 2)::Float64
β”‚   %4 = Base.mul_float(%3, %3)::Float64
β”‚   %5 = Base.arrayref(true, x, 3)::Float64
β”‚   %6 = Base.mul_float(%5, %5)::Float64
β”‚   %7 = Base.add_float(%2, %4)::Float64
β”‚   %8 = Base.add_float(%7, %6)::Float64
└──      return %8
) => Float64

julia> @code_typed g(x) # (exactly the same)
CodeInfo(
1 ─ %1 = Base.arrayref(true, x, 1)::Float64
β”‚   %2 = Base.mul_float(%1, %1)::Float64
β”‚   %3 = Base.arrayref(true, x, 2)::Float64
β”‚   %4 = Base.mul_float(%3, %3)::Float64
β”‚   %5 = Base.arrayref(true, x, 3)::Float64
β”‚   %6 = Base.mul_float(%5, %5)::Float64
β”‚   %7 = Base.add_float(%2, %4)::Float64
β”‚   %8 = Base.add_float(%7, %6)::Float64
└──      return %8
) => Float64
2 Likes

Thank you, there is a lot to learn there.