The findall(::BitArray) optimization really shines here compared to the iterator-based findall, despite the need to allocate an intermediate BitArray:
julia> @btime findall(A .> 0.1) setup=(A=rand(1000))
1.833 μs (4 allocations: 11.30 KiB)
julia> @btime findall(>(0.1), A) setup=(A=rand(1000))
8.533 μs (14 allocations: 16.55 KiB)
It’s faster for almost every case I tested (the only exception being small arrays where few elements satisfy the condition). The advantage likely boils down to SIMD execution of the comparison, and avoiding the need to repeatedly grow/copy the output array. It’d be great for the iterator-based implementation to specialize on strided numeric arrays to take advantage of SIMD.