Hello,
I was playing with both SVector
and Vec
from StaticArrays and SIMD respectively.
I was surprised obtaining an 8x speedup using SVector
vs Vec
. Am I using Vec
in a non suitable example or maybe I’m not using it properly?
Do I get this results only because x and y don’t change and StaticArrays makes some sort of optimization because of this?
Benchmark results
SIMD vectors
Trial(247.513 ns)
SVector vectors
Trial(39.496 ns)
Code to benchmark both functions
using StaticArrays
using BenchmarkTools
using SIMD
function make_n_sums_vec(x::Vec, y::Vec, n::Int)
aux = zero(x)
for i in 1:n
aux += x + y
end
return aux
end
function make_n_sums_sarr(x::SArray, y::SArray, n::Int)
aux = zero(x)
for i in 1:n
aux += x + y
end
return aux
end
x_vec = Vec{4,Int64}((1,2,3,4))
x_sarr = SVector{4,Int64}([1,2,3,4]);
# The result should be `x_vec * 2 * 100`
println("SIMD vectors")
println(@benchmark make_n_sums_vec(x_vec, x_vec, 100))
println("SVector vectors")
println(@benchmark make_n_sums_sarr(x_sarr, x_sarr, 100))
Looking at the native code it seems both use vector instructions vpaddq
to make the additions
@code_native x_vec + x_vec
.section __TEXT,__text,regular,pure_instructions
; ┌ @ SIMD.jl:1020 within `+'
; │┌ @ SIMD.jl:604 within `llvmwrap' @ SIMD.jl:604
; ││┌ @ SIMD.jl:1020 within `macro expansion'
vmovdqa (%edx), %xmm0
vmovdqa 16(%edx), %xmm1
vpaddq 16(%esi), %xmm1, %xmm1
vpaddq (%esi), %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
; │└└
vmovaps %ymm0, (%edi)
decl %eax
movl %edi, %eax
vzeroupper
retl
nopw %cs:(%eax,%eax)
; └
@code_native x_sarr + x_sarr
.section __TEXT,__text,regular,pure_instructions
; ┌ @ linalg.jl:10 within `+'
; │┌ @ mapreduce.jl:17 within `map'
; ││┌ @ mapreduce.jl:21 within `_map'
; │││┌ @ mapreduce.jl:41 within `macro expansion'
; ││││┌ @ linalg.jl:10 within `+'
vmovdqu (%edx), %xmm0
vmovdqu 16(%edx), %xmm1
vpaddq (%esi), %xmm0, %xmm0
vpaddq 16(%esi), %xmm1, %xmm1
; │└└└└
vmovdqu %xmm1, 16(%edi)
vmovdqu %xmm0, (%edi)
decl %eax
movl %edi, %eax
retl
nop
; └