Trigonometric functions do not use SIMD - Solved: use LoopVectorization.jl

That’s a problem with the documentation. It uses check_args to check if the arguments are supported:

julia> v1 = IntParam{Float64}(rand(80_000));

julia> v2 = IntParam{Float64}(rand(80_000));

julia> out = similar(v1);

julia> LoopVectorization.check_args(v1,v2,out)
false

The next major release of LoopVectorization (which I hope to have out by the end of the year) will use ArrayInterface.jl for this, which will change the recommended way of adding support.

But for now, defining these two methods should work:

LoopVectorization.check_args(::VComponent{T}) where {T} = LoopVectorization.check_type(T)
Base.pointer(v::VComponent) = pointer(v.v)

For example:

julia> v1 = IntParam{Float64}(rand(80_000));

julia> v2 = IntParam{Float64}(rand(80_000));

julia> out = similar(v1);

julia> @benchmark two_mul_sin_avx!($out, $v1, $v2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     477.467 μs (0.00% GC)
  median time:      480.595 μs (0.00% GC)
  mean time:        480.921 μs (0.00% GC)
  maximum time:     519.928 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> LoopVectorization.check_args(::VComponent{T}) where {T} = LoopVectorization.check_type(T)

julia> Base.pointer(v::VComponent) = pointer(v.v)

julia> @benchmark two_mul_sin_avx!($out, $v1, $v2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     81.493 μs (0.00% GC)
  median time:      82.477 μs (0.00% GC)
  mean time:        82.725 μs (0.00% GC)
  maximum time:     152.587 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
5 Likes