What you’re benchmarking there is Vector{Bool}. I had thought there might be some performance left on the table without a BitArray-specific bit fiddling method… but this looks pretty good: the BitArray is beating a Vector{Bool} by 10x.
julia> using BenchmarkTools
julia> x = rand(Bool, 1_000_000);
julia> y = BitArray(x);
julia> @btime findall($x);
4.126 ms (2 allocations: 3.82 MiB)
julia> @btime findall($y);
460.060 μs (2 allocations: 3.82 MiB)
I don’t see quite such a large discrepancy (more like 5x, might be CPU dependent?), but I do see findall outperforming the index-into-a-range solution for BitArray: