using Statistics, LoopVectorization
function case1(predictor, threshold)
ind_leq = predictor .<= threshold;
ind_grt = .!ind_leq;
return ind_leq, ind_grt
end
function case2(predictor, threshold)
ind_leq = BitVector(undef, length(predictor));
ind_grt = similar(ind_leq);
@inbounds for i in eachindex(predictor)
ind_leq[i] = predictor[i] <= threshold;
ind_grt[i] = !ind_leq[i];
end
return ind_leq, ind_grt
end
function case3(predictor, threshold)
ind_leq = BitVector(undef, length(predictor));
ind_grt = similar(ind_leq);
@turbo for i in eachindex(predictor)
ind_leq[i] = predictor[i] <= threshold;
ind_grt[i] = !ind_leq[i];
end
return ind_leq, ind_grt
end
# Generate data
n, m = 10^3, 5
features = randn(n, m)
weights = rand(-2:2, m)
labels = features * weights;
# Transpose data
targets = permutedims(labels);
predictor = view(features, :, 1);
threshold = quantile(predictor, 0.1);
case1(predictor, threshold) == case2(predictor, threshold) == case3(predictor, threshold)
@btime case1($predictor, $threshold);
@btime case2($predictor, $threshold);
@btime case3($predictor, $threshold);
I get
julia> case1(predictor, threshold) == case2(predictor, threshold) == case3(predictor, threshold)
true
julia> @btime case1($predictor, $threshold);
492.605 ns (5 allocations: 4.62 KiB)
julia> @btime case2($predictor, $threshold);
1.849 μs (4 allocations: 448 bytes)
julia> @btime case3($predictor, $threshold);
145.379 ns (4 allocations: 448 bytes)
@turbo
batches the operations.
This is actually something tricky to do at a low level, meaning it will take some time before the rewrite supports it.
It is easy to batch at a high level though.
At a high level, we’re just indexing an array.
At a low level, we’re doing a ton of bit manipulations and then indexing into the same memory address across many loop iterations.
You’d have to
- look at the individual bit assignments and realize they’re incrementing, so separate iterations index separate bits.
- that you can batch these and in so doing actually delete all those bit manipulations.
Eventually, I plan on having the rewrite handle this as well while working on a low level, but it still doesn’t handle anything yet, so that’s a ways away.
Meaning for now, this means either use something that works on a relatively high level – that is LoopVectorization.@turbo
or broadcasting – or use Vector{Bool}
instead.
julia> function case2bool(predictor, threshold)
ind_leq = Vector{Bool}(undef, length(predictor));
ind_grt = similar(ind_leq);
@inbounds for i in eachindex(predictor)
ind_leq[i] = predictor[i] <= threshold;
ind_grt[i] = !ind_leq[i];
end
return ind_leq, ind_grt
end
case2bool (generic function with 1 method)
julia> @btime case2bool($predictor, $threshold);
316.359 ns (2 allocations: 2.12 KiB)
@turbo
would probably be the fastest (like it is here), but it doesn’t work on all loops, and it is also a weighty dependency that introduces latency.