Many thanks, we are down to 628.492 μs
now.
When I try unroll=(1,6)
with a tuple, I get this error:
**ERROR:** MethodError: no method matching bitstore!(::VectorizationBase.PackedStridedBitPointer{1,2}, ::Mask{8,UInt8}, ::SVec{8,Int32})
Stacktrace:
[1] vnoaliasstore!(ptr::VectorizationBase.PackedStridedBitPointer{1,2}, v::Mask{8,UInt8}, i::Tuple{VectorizationBase.Static{0},VectorizationBase._MM{8,VectorizationBase.Static{0}}})
@ VectorizationBase ~/.julia/packages/VectorizationBase/kIoqa/src/masks.jl:424
In a loop like this, is there any way to insist that seen
really remains an integer, not an SVec?
@avx unroll=4 for c in axes(x,2)
seen = 0
for r in axes(x,1)
flag = onlyone(x[r,c] == y[c], seen)
mask[r,c] = flag
seen += any(flag)
end
end
I tried for instance moving @avx
to the inner loop, but then I get
ERROR: MethodError: no method matching subsetview(::VectorizationBase.PackedStridedBitPointer{1,2}, ::Val{2}, ::Int64)
This was Threads.nthreads() == 4
, possibly unwisely, on a 2-core laptop.