# How to use SIMD with a very simple example

Hello,

I am trying to understand how to use SIMD.jl.
After trying the `vadd!` example in the documentation I wanted use SIMD in a different example. No success so far.

This is what I would like to simdify:

``````function myfunc(a, b)
if a > b
return a - b
else
return a + b
end
end
x = rand(1_000_000);
# do myfunc.(x, 2.) with explicit simd calls
myfunc.(x, 2.)
``````

I tried the following

``````function myfunc_simd(x::Vector{T}, value::T, ::Type{Vec{N,T}}) where {N, T}
@assert length(x) % N == 0
result = Array{T}(undef, length(x))
lane   = VecRange{N}(0)
@inbounds for i in 1:N:length(x)
result[lane + i] += x[lane+i] > value ? x[k+lane] - value : x[k+lane] +value
end
end
``````

But I gon an error

``````x = rand(Float32,1_000_000);
myfunc_simd(x, Float32(2.), Vec{8,Float32})

TypeError: non-boolean (Vec{8,Bool}) used in boolean context
``````

Any ideas on how to do something like this?

Do you know about any tutorial containing examples using SIMD.jl ?

I have seen https://kristofferc.github.io/post/intrinsics/ , which is pretty awesome, but does not contain any example with “flag-boolean vectors” (masks) used to control an if-else statement like the one I have here.

Please ensure your code runs before you post it; it doesn’t produce the same error. You’ll need `vifelse`.

2 Likes

To quote from @kristoffer.carlsson s blog post

Any type of control flow inside the loop will likely mean the loop will not vectorize.

I guess that might be your problem here?

Maybe this is of some little help:

``````julia> using SIMD

julia> x = 8 * rand(Float32, 8)
8-element Array{Float32,1}:
0.10612774
5.7371054
0.8144512
6.429695
4.938607
7.0792313
0.9309225
4.3026505

julia> v = vload(Vec{8, Float32}, x, 1)
<8 x Float32>[0.10612774, 5.7371054, 0.8144512, 6.429695, 4.938607, 7.0792313, 0.9309225, 4.3026505]

julia> vifelse(v > 2, v - 2, v + 2)
<8 x Float32>[2.1061277, 3.7371054, 2.8144512, 4.429695, 2.9386072, 5.0792313, 2.9309225, 2.3026505]
``````
1 Like

You are right, post updated.

``````julia> function myfunc_simd(x::Vector{T}, value::T, ::Type{Vec{N,T}}) where {N, T}
@assert length(x) % N == 0
result = Array{T}(undef, length(x))
lane   = VecRange{N}(0)
@inbounds for i in 1:N:length(x)
x_vslice    = vload(Vec{N, T}, x, i) # i = 2*k+1 where k=1,2,3,4,...
result[lane + i] = vifelse(x_vslice > 2, x_vslice - value, x_vslice + value)
end
return result
end
myfunc_simd (generic function with 1 method)
``````

Using the `vifelse` I managed to do the function that does what I want

``````julia> x = rand(Float32,1_000_000);

julia> result_1 = myfunc.(x,1);

julia> result_2 = myfunc_simd(x, Float32(1), Vec{8,Float32});

julia> result_1 == result_2
true
``````

Nevertheless, both versions take a similar amount of time:

``````
julia> using BenchmarkTools

jjulia> @btime myfunc.(x,1);
887.890 μs (4 allocations: 3.81 MiB)

julia> @btime myfunc_simd(x, Float32(1), Vec{8,Float32});
757.392 μs (2 allocations: 3.81 MiB)

``````

Maybe for this example the compiler is smart enough to vectorize the result. Or there is not enough compute to be done.

In any case a good exercise to better understand how to do his manually.

``````julia> @btime vadd!(\$x,\$y,Vec{32,Float32})
660.512 μs (0 allocations: 0 bytes)

julia> @btime \$x .= \$x .+ \$y;
750.731 μs (0 allocations: 0 bytes)
``````

You forgot to dot the plus sign. Also, you should interpolate the variables.

1 Like

I think that’s what’s happening in this case. In the `code_llvm` for `broadcast(myfunc, x, 1)`, there are already occurrences of `<8 x float>`, and the native code uses `ymm` registers and `vblendvps` masked SIMD instructions.

And for the `vadd!` case, try `@btime \$x .+= \$y`.