LoopVectorization.jl vmap gives an error ::VectorizationBase.Vec{4, Int64}

Storopoli · July 21, 2021, 6:30pm

I am trying to do a vmap on a function. map and ThreadsX.map both work.

collatz(x::Int64) =
    if iseven(x)
        x ÷ 2
    else
        3x + 1
    end

function collatz_sequencia(x::Int64)
	n = 0
    while true
        x == 1 && return n
        n += 1
        x = collatz(x)
    end
	return n
end

Then:

vmap(collatz_sequencia, 1:10)

MethodError: no method matching zero_offsets(::VectorizationBase.FastRange{Int64, Static.StaticInt{0}, Static.StaticInt{1}, Int64})
Closest candidates are:
zero_offsets(!Matched::Static.StaticInt{N}) where N at /home/storopoli/.julia/packages/VectorizationBase/geEQH/src/static.jl:130
zero_offsets(!Matched::VectorizationBase.StridedPointer{T, N, C, B, R, X, O} where {X, O}) where {T, N, C, B, R} at /home/storopoli/.julia/packages/VectorizationBase/geEQH/src/strided_pointers/stridedpointers.jl:115

I tried to remove the range, but to no avail:

vmap(collatz_sequencia, collect(1:10))

MethodError: no method matching collatz_sequencia(::VectorizationBase.Vec{4, Int64})
Closest candidates are:
collatz_sequencia(!Matched::Int64) at /home/storopoli/Documents/Julia/Computacao-Cientifica/notebooks/3_Parallel.jl#==#a7be2174-a7dd-4259-aab9-64cdcc749fb0:1

Elrod · July 21, 2021, 7:29pm

At least two issues here.

Add vmap support for ranges. This is a bug that should be easy to fix.
collatz_sequencia is restricted to ::Int64.

The simplest fix for “2.” would be to redefine it to work correctly with VectorizationBase.Vec{4,Int64} inputs. This would require loosening the signatures, but also adjusting the while loop.
You can see, for example, how gcd is defined for AbstractSIMD types and compare to the definition in Base.

Some day, it’d be cool to work on an SPMD-style program transformer for Julia that can automate this.

Storopoli · July 21, 2021, 8:35pm

Am I going in the right direction?

using VectorizationBase: AbstractSIMDVector, vany

collatz_SIMD(x) =
    if x % 2 == 0
        x ÷ 2
    else
        3x + 1
    end

function collatz_sequencia_SIMD(x::AbstractSIMDVector{W,I}) where {W,I<:Base.HWReal}
	n = 0
    while vany(x ≠ 1)
        n += 1
        x = collatz_SIMD(x)
    end
	return n
end

vmap(collatz_sequencia_SIMD, [1,2]) # just a test

TypeError: non-boolean (VectorizationBase.Mask{2, UInt8}) used in boolean context

collatz_SIMD@Other: 1[inlined]
collatz_sequencia_SIMD(::VectorizationBase.Vec{2, Int64})@Other: 5
vmap_singlethread!(::typeof(Main.workspace59.collatz_sequencia_SIMD), ::VectorizationBase.StridedPointer{Union{}, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}, ::Static.StaticInt{0}, ::Int64, ::Val{false}, ::Tuple{VectorizationBase.StridedPointer{Int64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}})@map.jl:91
vmap_singlethread!@map.jl:58[inlined]
macro expansion@map.jl:224[inlined]
gc_preserve_vmap!@map.jl:224[inlined]
vmap!@map.jl:273[inlined]
vmap_call@map.jl:375[inlined]
vmap(::typeof(Main.workspace59.collatz_sequencia_SIMD), ::Vector{Int64})@map.jl:384
top-level scope@Local: 1[inlined]

Elrod · July 21, 2021, 8:42pm

That is in the right direction, but a few more changes:

To avoid the “non-boolean” error, use IfElse.ifelse
collatz_sequencia will be called with multiple inputs, and will also have to return multiple inputs/lanes. I’d initialize n with n = zero(x). Then you’ll have to manage a mask m indicating which of these lanes are finished, and n += m. You can control the loop with while vany(m), and update the mask with m &= x ≠ 1. You can’t determine breaking out of the loop with vany(x ≠ 1) because because collatz(1) == 4.

Storopoli · July 21, 2021, 8:53pm

Sorry I am still having a hard time I cannot find documentation on mask in VectorizationBase.jl. How do I define a mask?

function collatz_sequencia_SIMD(x::AbstractSIMDVector{W,I}) where {W,I<:Base.HWReal}
	n = zero(x)
	m = ifelse(collatz_SIMD(x) ≠ 1, true, false)
    while vany(m)
        n += 1
        x = collatz_SIMD(x)
		m &= x ≠ 1
    end
	return n
end

vmap(collatz_sequencia_SIMD, [1,2])

TypeError: non-boolean (VectorizationBase.Mask{2, UInt8}) used in boolean context

collatz_SIMD@Other: 1[inlined]
collatz_sequencia_SIMD(::VectorizationBase.Vec{2, Int64})@Other: 3
vmap_singlethread!(::typeof(Main.workspace77.collatz_sequencia_SIMD), ::VectorizationBase.StridedPointer{Union{}, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}, ::Static.StaticInt{0}, ::Int64, ::Val{false}, ::Tuple{VectorizationBase.StridedPointer{Int64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}})@map.jl:91
vmap_singlethread!@map.jl:58[inlined]
macro expansion@map.jl:224[inlined]
gc_preserve_vmap!@map.jl:224[inlined]
vmap!@map.jl:273[inlined]
vmap_call@map.jl:375[inlined]
vmap(::typeof(Main.workspace77.collatz_sequencia_SIMD), ::Vector{Int64})@map.jl:384
top-level scope@Local: 1[inlined]

Elrod · July 21, 2021, 9:01pm

Any comparison with AbstractSIMDs will result in a mask.

Use IfElse.ifelse to avoid the non-boolean errors.

Storopoli · July 21, 2021, 9:19pm

Ok got that part:

function collatz_sequencia_SIMD(x::AbstractSIMDVector{W,I}) where {W,I<:Base.HWReal}
	n = zero(x)
	m = IfElse.ifelse(collatz_SIMD(x) ≠ one(x), one(x), zero(x))
    while vany(m)
        n += m
        x = collatz_SIMD(x)
		m &= IfElse.ifelse(x ≠ one(x), one(x), zero(x)) 
    end
	return n
end

vmap(collatz_sequencia_SIMD, [1, 2, 3, 4])

Somehow it complains with non-boolean errors:

TypeError: non-boolean (VectorizationBase.Mask{2, UInt8}) used in boolean context

collatz_SIMD@Other: 1[inlined]
collatz_sequencia_SIMD(::VectorizationBase.Vec{2, Int64})@Other: 3
vmap_singlethread!(::typeof(Main.workspace168.collatz_sequencia_SIMD), ::VectorizationBase.StridedPointer{Union{}, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}, ::Static.StaticInt{0}, ::Int64, ::Val{false}, ::Tuple{VectorizationBase.StridedPointer{Int64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}})@map.jl:91
vmap_singlethread!@map.jl:58[inlined]
macro expansion@map.jl:224[inlined]
gc_preserve_vmap!@map.jl:224[inlined]
vmap!@map.jl:273[inlined]
vmap_call@map.jl:375[inlined]
vmap(::typeof(Main.workspace168.collatz_sequencia_SIMD), ::Vector{Int64})@map.jl:384
top-level scope@Local: 1[inlined]

Elrod · July 21, 2021, 9:22pm

The mask should be collatz_SIMD(x) ≠ one(x) or, if you upgrade to VectorizationBase 0.20.23 (just released a few minutes ago), VectorizationBase.max_mask(x).
Use Ifelse.ifelse in collatz_SIMD.

Also, is your CPU an M1 Mac, or does it not have AVX2?
Currently, VectorizationBase decides to use a vector width of 2 for Int64 on CPUs without AVX2.
I have an M1, but I don’t have a CPU with AVX but not AVX2, so I can’t test what works best on the latter.

Elrod · July 21, 2021, 9:35pm

Also, on LoopVectorization master, I added support and tests for vmap with ranges. I’ll tag a new release in a few hours.

Storopoli · July 21, 2021, 9:36pm

This is a Pluto Notebook for a graduate course on scientific computing using Julia (Ciência de Dados e Computação Científica com Julia). I will run it on a Linux, but I am making the content in a mix of Mac M1 and Linux with AVX2.

I saw the new release, I also saw that you defined the iseven function. So I’ve updated the VectorizationBase.jl to 0.20.23.

I am still getting errors:

collatz_SIMD(x) =
    if IfElse.ifelse(VectorizationBase.iseven(x), one(x), zero(x))
        x ÷ 2
    else
        3x + 1
    end

function collatz_sequencia_SIMD(x::VectorizationBase.AbstractSIMDVector{W,I}) where {W,I<:Base.HWReal}
	n = zero(x)
	m = VectorizationBase.max_mask(x)
    while VectorizationBase.vany(m)
        n += m
        x = collatz_SIMD(x)
		m &= x ≠ 1
    end
	return n
end

vmap(collatz_sequencia_SIMD, [1, 2, 3, 4])

TypeError: non-boolean (VectorizationBase.Vec{2, Int64}) used in boolean context

collatz_SIMD(::VectorizationBase.Vec{2, Int64})@Other: 1
collatz_sequencia_SIMD(::VectorizationBase.Vec{2, Int64})@Other: 6
vmap_singlethread!(::typeof(Main.workspace16.collatz_sequencia_SIMD), ::VectorizationBase.StridedPointer{Union{}, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}, ::Static.StaticInt{0}, ::Int64, ::Val{false}, ::Tuple{VectorizationBase.StridedPointer{Int64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}})@map.jl:91
vmap_singlethread!@map.jl:58[inlined]
macro expansion@map.jl:224[inlined]
gc_preserve_vmap!@map.jl:224[inlined]
vmap!@map.jl:273[inlined]
vmap_call@map.jl:375[inlined]
vmap(::typeof(Main.workspace16.collatz_sequencia_SIMD), ::Vector{Int64})@map.jl:384
top-level scope@Local: 1[inlined]

Thank you!

Elrod · July 21, 2021, 9:37pm

collatz_SIMD(x) =
    IfElse.ifelse(iseven(x), x ÷ 2, 3x + 1)

Storopoli · July 21, 2021, 9:43pm

Of course! Makes total sense. Thanks.

Ok but now I think I need to define some sort of convertion:

collatz_SIMD(x) =
    IfElse.ifelse(VectorizationBase.iseven(x), x ÷ 2, 3x + 1)

function collatz_sequencia_SIMD(x::VectorizationBase.AbstractSIMDVector{W,I}) where {W,I<:Base.HWReal}
	n = zero(x)
	m = VectorizationBase.max_mask(x)
    while VectorizationBase.vany(m)
        n += m
        x = collatz_SIMD(x)
		m &= x ≠ 1
    end
	return n
end

vmap(collatz_sequencia_SIMD, [1,2,3,4])

MethodError: vconvert(::Type{VectorizationBase.Vec{2, Union{}}}, ::VectorizationBase.Vec{2, Int64}) is ambiguous. Candidates:

vconvert(::Type{VectorizationBase.Vec{W, F}}, v::VectorizationBase.Vec{W, T}) where {W, F<:Union{Float32, Float64}, T<:Union{Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}} in VectorizationBase at /Users/storopoli/.julia/packages/VectorizationBase/kTRxL/src/llvm_intrin/conversion.jl:29

vconvert(::Type{VectorizationBase.Vec{W, T1}}, v::VectorizationBase.Vec{W, T2}) where {W, T1<:Union{Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}, T2<:Union{Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}} in VectorizationBase at /Users/storopoli/.julia/packages/VectorizationBase/kTRxL/src/llvm_intrin/conversion.jl:36

Possible fix, define

vconvert(::Type{VectorizationBase.Vec{W, Union{}}}, ::VectorizationBase.Vec{W, T2}) where {W, T2<:Union{Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}}

convert@base_defs.jl:152[inlined]
macro expansion@memory_addr.jl:0[inlined]
__vstore!@memory_addr.jl:810[inlined]
_vstore!@stridedpointers.jl:229[inlined]
vmap_singlethread!(::typeof(Main.workspace17.collatz_sequencia_SIMD), ::VectorizationBase.StridedPointer{Union{}, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}, ::Static.StaticInt{0}, ::Int64, ::Val{false}, ::Tuple{VectorizationBase.StridedPointer{Int64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{0}}}})@map.jl:95
vmap_singlethread!@map.jl:58[inlined]
macro expansion@map.jl:224[inlined]
gc_preserve_vmap!@map.jl:224[inlined]
vmap!@map.jl:273[inlined]
vmap_call@map.jl:375[inlined]
vmap(::typeof(Main.workspace17.collatz_sequencia_SIMD), ::Vector{Int64})@map.jl:384
top-level scope@Local: 1[inlined]

Elrod · July 21, 2021, 10:17pm

This requires the master branch of both VectorizationBase and LoopVectorization.
They should be released within the next few hours.

using VectorizationBase, LoopVectorization, IfElse

collatz_SIMD(x) =
    IfElse.ifelse(VectorizationBase.iseven(x), x ÷ 2, 3x + 1)

function collatz_sequencia_SIMD(x)
    n = zero(x)
    m = x ≠ 0
    while  VectorizationBase.vany(VectorizationBase.collapse_or(m))
        n += m
        x = collatz_SIMD(x)
        m &= x ≠ 1
    end
    return n
end

vmap(collatz_sequencia_SIMD, 1:100) == map(collatz_sequencia_SIMD, 1:100)

Performance seems better for large ranges, but worse for small ones:

julia> dest = Vector{Int}(undef, 100);

julia> @benchmark vmap!(collatz_sequencia_SIMD, $dest, axes($dest,1))
BechmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.552 μs …  8.688 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.563 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.570 μs ± 77.390 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂▆██▆▂                                                ▁   ▂
  ▆██████▇▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▅▆▇█████ █
  4.55 μs      Histogram: log(frequency) by time     4.72 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark map!(collatz_sequencia_SIMD, $dest, axes($dest,1))
BechmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.461 μs …  4.313 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.499 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.512 μs ± 51.718 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂▅▇█▅▄▂▁▁▁
  ▁▁▂▄▇███████████▇▇▅▅▄▄▃▃▃▃▃▃▄▄▄▅▅▅▅▄▄▃▃▃▂▂▂▂▁▁▂▁▁▂▁▁▁▁▁▁▁▁ ▃
  2.46 μs        Histogram: frequency by time        2.63 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> dest = Vector{Int}(undef, 1000);

julia> @benchmark vmap!(collatz_sequencia_SIMD, $dest, axes($dest,1))
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  44.796 μs …  76.785 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     45.153 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.152 μs ± 573.617 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃▅▆▆▅▄▂   ▂▆██▆▁ ▁▂                                   ▁▁▁▁ ▂
  ▅████████▇▅▄██████▆███▃▁▁▁▁▄███▅▄▁▁▁▁▁▁▁▁▃▁▄▆▆▇█▇▇█▇▅▆▆▇████ █
  44.8 μs       Histogram: log(frequency) by time      46.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark map!(collatz_sequencia_SIMD, $dest, axes($dest,1))
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  87.214 μs … 105.938 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     88.666 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   88.719 μs ± 614.135 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                    ▁▁▂▃▅▅▆▆▆▇█▇▇▆▇▅▄▄▂▂▁
  ▁▁▁▁▁▁▁▂▂▂▂▃▃▄▅▆▆▇█████████████████████▇▇▆▆▄▄▅▃▄▃▃▃▃▃▂▂▂▂▂▂▂ ▄
  87.2 μs         Histogram: frequency by time         90.2 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

This is with a computer that has AVX512. Performance will probably be a lot worse without it, because (aside from 512-bit vectors), AVX512 is needed for SIMD Int64 multiplication.

Using Int32 gives a roughly 2x performance boost for vmap, while making map slower:

julia> dest = Vector{Int32}(undef, 1000);

julia> @benchmark vmap!(collatz_sequencia_SIMD, $dest, Int32(1):Int32(1_000))
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  22.573 μs …  60.918 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.658 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.721 μs ± 571.101 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁█▇
  ▂▃███▆▃▂▃▅█▆▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂ ▃
  22.6 μs         Histogram: frequency by time         23.8 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark map!(collatz_sequencia_SIMD, $dest, Int32(1):Int32(1_000))
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  133.721 μs … 151.036 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     135.518 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   135.627 μs ± 668.463 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         ▂▄▅▅▆█▇▆▅▄▃▃
  ▁▁▁▁▁▁▁▁▁▂▂▁▂▂▂▂▃▂▃▄▅▆██████████████▇▆▆▅▅▅▄▄▄▄▄▄▄▃▃▂▃▂▂▂▂▁▂▁▁ ▃
  134 μs           Histogram: frequency by time          137 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Elrod · July 21, 2021, 10:24pm

A couple more comments:

vmap also requires the function be defined for scalars, because that is how it determines the element type of the returned array. vmap! doesn’t have that limitation (since it mutates an existing vector, instead of returning a new one).
I’m using m = x ≠ 0 because vmap(!) will often call the function with 0 as an input, even if 0 isn’t in your vector. The result of this isn’t used for anything, it’s just padding for when the vector length isn’t divisible by the chunk sizes vmap! uses. Thus, it has to not error or get caught in an infinite loop.

Storopoli · July 21, 2021, 10:26pm

Thank you! I will update in a few hours…

Elrod · July 21, 2021, 10:39pm

No problem.

Also, because it wasn’t clear and Mask isn’t documented (I/someone else should add some documentation…):
Mask acts like a bunch of booleans.

using VectorizationBase

julia> vxi = Vec(ntuple(Int, Val(4))...)
Vec{4, Int64}<1, 2, 3, 4>

julia> vyi = Vec{4}(2)
Vec{4, Int64}<2, 2, 2, 2>

julia> vxi > vyi
Mask{4,Bit}<0, 0, 1, 1>

julia> vxi ≥ vyi
Mask{4,Bit}<0, 1, 1, 1>

julia> vxi == vyi
Mask{4,Bit}<0, 1, 0, 0>

When ordinary code deals with Bools, you need to replace branches with IfElse.ifelse so that it works with masks.
I.e.,

res = if cmp # cmp is a bool
   iftruebranch
else
   iffalsebranch
end

becomes

# cmp can be a `Bool` or a `Mask`
res = IfElse.ifelse(cmp, iftruebranch, iffalsebranch)

because with AbstractSIMD inputs, cmp will be a Mask instead of a Bool.
If one side of the branch is much more likely than another, so that it’s still fairly probably that even with many inputs every single one of them will only go to one side of the branch (and the other side of the branch is also very expensive), you could do something like

# cmp is almost always true
res = iftruebranch
if !vall(collapse_and(cmp))
    res = IfElse.ifelse(cmp, res, iffalsebranch)
end

You can think of calling a function with an AbstractSIMD input as calling it a bunch of times, but that each call has to follow through the same sequence of instructions, and take the same path through branches. (And because the compiler isn’t handling it, you have to do that manually)
You can use masks to control/combine results from different paths/conditions.

Storopoli · July 21, 2021, 10:48pm

Crash Course im LoopVectorization SIMD stuff… Thanks!

Now it all make sense you call an AbstractSIMD and you expect that it will need a Single Instruction and Multiple Data…

Storopoli · July 22, 2021, 8:12am

Works like a charm. Just a FYI:

@benchmark map(collatz_sequencia, 1:100_000)

BenchmarkTools.Trial: 223 samples with 1 evaluation.
 Range (min … max):  21.896 ms …  24.059 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.450 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.513 ms ± 311.141 μs  ┊ GC (mean ± σ):  0.06% ± 0.47%

             ▅▂█▂▁ ▂▂▃                                          
  ▃▁▁▁▁▁▃▃▁▅▇█████████▇▆▇▆▆▄▄▅▃▄▃▃▄▃▁▃▁▃▁▁▃▁▁▁▃▁▁▁▁▃▁▃▁▃▁▃▁▁▁▃ ▃
  21.9 ms         Histogram: frequency by time         23.8 ms <

 Memory estimate: 781.33 KiB, allocs estimate: 2.

@benchmark ThreadsX.map(collatz_sequencia, 1:100_000)

BenchmarkTools.Trial: 1310 samples with 1 evaluation.
 Range (min … max):  3.221 ms …   7.911 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.596 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.804 ms ± 687.383 μs  ┊ GC (mean ± σ):  2.62% ± 7.84%

    ▃█▆▃▁                                                      
  ▃▆██████▇█▆▆▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▂▁▁▂▂▂▃▃▃▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▁▂ ▃
  3.22 ms         Histogram: frequency by time        6.76 ms <

 Memory estimate: 9.12 MiB, allocs estimate: 2310.

benchmark vmapntt(collatz_sequencia_SIMD, 1:100_000)

BenchmarkTools.Trial: 1505 samples with 1 evaluation.
 Range (min … max):  2.200 ms … 45.955 ms  ┊ GC (min … max): 0.00% … 2.91%
 Time  (median):     2.590 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.316 ms ±  4.827 ms  ┊ GC (mean ± σ):  0.50% ± 0.32%

  █▆                                                          
  ███▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ █
  2.2 ms       Histogram: log(frequency) by time     44.5 ms <

 Memory estimate: 781.33 KiB, allocs estimate: 2.

Topic		Replies	Views
LoopVectorization: no method matching shufflevector General Usage	2	222	January 29, 2024
Help understanding vectorization (or lack thereof) Performance	15	1212	June 8, 2018
V1.3.1 no gain using multithread General Usage multithreading	33	2028	March 4, 2020
ANN: LoopVectorization 0.12: multithreading and better handling of discontiguous memory accesses Performance	16	2169	March 17, 2021
Experiments with VectorizationBase Performance	6	668	March 23, 2021

LoopVectorization.jl vmap gives an error ::VectorizationBase.Vec{4, Int64}

Related topics