Unexpectedly slow performance of iszero for BitVector

brianguenter · August 15, 2024, 8:20pm

The builtin function iszero for BitVector seems unreasonably slow:

julia> a = BitVector(fill(0,10000));

julia> @benchmark iszero($a)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.750 μs …  8.588 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.775 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.806 μs ± 95.423 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    █ █
  ▂▁█▁█▁▁▅▁█▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▆▁▂▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▁▂▁▁▂▁▂▁▂ ▂
  3.75 μs        Histogram: frequency by time        4.05 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

I implemented my own version of this


is_zero(a::BitVector) = !any(a)

which is about 100x faster:

julia> using FastDifferentiation #has my implementation of is_zero for BitVector
Precompiling FastDifferentiation...
  1 dependency successfully precompiled in 2 seconds. 47 already precompiled.

julia> @benchmark FastDifferentiation.is_zero($a)
BenchmarkTools.Trial: 10000 samples with 993 evaluations.
 Range (min … max):  35.549 ns … 74.119 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     35.750 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   36.031 ns ±  0.875 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    █
  ▂▇█▁▄▃▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▅▁▃▁▁▁▁▁▁▁▂▂▁▂▂▁▂▂▂▁▂▁▂▁▂▁▁▁▂▁▁▁▂▁▁▁▂▂ ▂
  35.5 ns         Histogram: frequency by time        39.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Am I making a benchmarking error? Or is iszero(a::BitVector) slower than it should be? I’d rather use the builtin iszero than my own, makes life simpler and less error prone.

Here’s my versioninfo:

Julia Version 1.11.0-rc2
Commit 34c3a63147 (2024-07-29 06:24 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  JULIA_NUM_THREADS = 1
  JULIA_EDITOR = code.cmd

Results are the same for 1.10 so this doesn’t seem to be a performance regression:

julia> a = BitVector(fill(0,10000));

julia> @benchmark iszero($a)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.750 μs …   6.800 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.788 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.810 μs ± 103.563 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▄ █
  ▂██▁██▁▃▃▂▁▂▂▁▂▂▄▁▃▂▁▂▂▂▁▂▂▁▂▂▂▁▂▂▁▂▂▂▁▂▂▁▂▂▂▁▂▂▁▂▂▂▁▂▂▁▂▂▂ ▂
  3.75 μs         Histogram: frequency by time        4.28 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 16 default, 0 interactive, 8 GC (on 32 virtual cores)
Environment:
  JULIA_EDITOR = code.cmd
  JULIA_NUM_THREADS = 16

kristoffer.carlsson · August 15, 2024, 8:31pm

any is specialized for BitArray while iszero (implemented as all(iszero, x)) is not and will just iterate element by element.

brianguenter · August 15, 2024, 9:07pm

Is there a technical reason why iszero can’t be specialized for BitVector? Does the function

iszero(a::Bitvector) = !any(a)

fail under some circumstances? Or is it just that this particular operation never got optimized?

mbauman · August 15, 2024, 9:22pm

Nope, this would be a great first pull request to the project!

There’s nothing fundamentally special about Julia’s “built-ins”. In fact, they’re no different than code you can write. BitArray has some hand-written specializations that can operate on 64 elements (or more!) at a time… but if they don’t exist they fall back to generic implementations. That’s all that’s happening here.

DNF · August 15, 2024, 10:31pm

But what is the right approach? Is it to specialize iszero(::BitArray), or is it to specialize all(f, ::BitArray), or something else?

Oscar_Smith · August 15, 2024, 11:06pm

specializing all seems good if possible, but if complicated, specializing iszero is definitely easy.

Henrique_Becker · August 15, 2024, 11:13pm

I am surprised that any is not defined in terms of all, or vice-versa. It is literally the same check just with a condition negated.

jling · August 16, 2024, 2:50am

For one reason at least, the early return/exit condition is different

Henrique_Becker · August 16, 2024, 11:16am

I mean, for the single-parameter method, yes. But for the one that takes a condition, all(f, xs) = !any(!f, xs) and vice-versa.

Topic		Replies	Views
Dividing array by float: is multiplying with precalculated inverse worth it Performance broadcast	12	855	November 30, 2022
A problem about performance Performance	30	717	October 3, 2022
Matrix-vector multiplication slower than a 'naive' for loop? Performance vector	7	1651	July 30, 2020
Compiler Performance Internals & Design	20	2026	June 2, 2018
BitVector vs Vector{Bool} as default on comparison operations Performance	8	9145	November 2, 2020

Unexpectedly slow performance of iszero for BitVector

Related topics