Bounds check outside loop affects loop performance

wheeheee · January 31, 2024, 4:10pm

While playing around with a function, I noticed that adding a bounds check (si_check) had a drastic impact on its performance. Although for N <= 18, there is no noticeable performance impact, and the emitted LLVM IR shows that most exceptions from @nextract ... have been elided, when N > 18 (in my benchmarks I use N = 56), using @inbounds benefits performance, but the bounds check actually drastically slows it down. The output of @code_llvm apart from the exceptions is really different too.

Is this expected? Should I file an issue?

@generated function _filt_fir!(out, b::NTuple{N,T}, x, siarr, col) where {N,T}
    silen = N - 1
    si_end = Symbol(:si_, silen)
    SMALL_FILT_VECT_CUTOFF = 18
    si_check = N > SMALL_FILT_VECT_CUTOFF ? :(nothing) : :(@assert length(siarr) == $silen)

    q = quote
        $si_check
        Base.@nextract $silen si siarr
        for i in axes(x, 1)
            xi = x[i, col]
            val = muladd(xi, b[1], si_1)
            Base.@nexprs $(silen-1) j -> (si_j = muladd(xi, b[j+1], si_{j+1}))
            $si_end = b[N] * xi
            out[i, col] = val
        end
    end

    if N > SMALL_FILT_VECT_CUTOFF
        loop_args = q.args[6].args[2].args
        for i in (2, 10)
            loop_args[i] = :(@inbounds $(loop_args[i]))
        end
    end
    q
end

Benchmarks, with the function above unmodified. No assert ... in the body.

julia> x = rand(10_000); out = similar(x);

julia> a = 1.; b = Tuple(rand(56)); si = zeros(55);

julia> @benchmark _filt_fir!($out, $b, $x, $si, $1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  26.900 μs … 112.600 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     27.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.320 μs ±   1.496 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██ ▁        ▆ ▄                      ▁▄                      ▂
  ██▁█▁█▁▇▇▁▇▁█▁█▇▁▅▁▄▁▃▅▁▆▁▇▁▇▆▁▆▁▆▁▆▁██▁█▁█▁▇▇▁▇▁▇▁▇▆▁▆▁█▁▇▄ █
  26.9 μs       Histogram: log(frequency) by time      30.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Benchmark after changing si_check to an unconditional @assert ...

julia> @benchmark _filt_fir!($out, $b, $x, $si, $1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  68.400 μs … 109.700 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     68.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   68.968 μs ±   1.968 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃                                         ▂▁                ▁
  ██▃▁▃▁▃▃▁▁▃▆▅▄▆▅▆▆▇▆▆▅▆▆▅▅▇▅▆▇▆▅▅▅▅▄▅▆▅▆▅▅▆██▅▆▄▆▅▆▆▇▆▄▄▅▄▆▅ █
  68.4 μs       Histogram: log(frequency) by time        76 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Platform details, uses AVX-512 instructions.

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 11 on 8 virtual cores
Environment:
  JULIA_CONDAPKG_BACKEND = Null
  JULIA_NUM_THREADS = auto

Topic		Replies	Views
Bounds check performance General Usage question , inbounds , bounds-check , effects	2	94	October 25, 2024
A safe inbounds use with great performance effect Performance	1	441	May 19, 2022
Why is my one loop faster than the other? Performance question , optimization	6	183	November 18, 2024
When does @inbounds increase performance? Performance inbounds	14	1197	February 14, 2025
Iterating over range is slower than while loop Performance	7	590	April 24, 2023

Bounds check outside loop affects loop performance

Related topics