Hey Julia experts,
I have a performance puzzle. This function is designed to be completely allocation-free, but @benchmark
tells a different story. Iβm hoping a fresh pair of eyes can spot the problem.
This is the function:
using StatsBase
_default_reducer(::Val{:median}) = median!
_default_reducer(::Val{:mean}) = mean
_default_reducer(::Val{:std}) = std
function sigma_clip!(x::AbstractVector{T},
mask::BitVector,
buffer::AbstractVector{T};
upper::Real=3,
lower::Real=3,
cent_func::Symbol=:median,
std_func::Symbol=:std,
maxiter::Int=5) where {T<:Real}
maxiter > 0 || throw(ArgumentError("maxiter must be positive"))
upper > 0 && lower > 0 || throw(ArgumentError("upper and lower must be positive"))
cent_reducer = _default_reducer(Val(cent_func))
std_reducer = _default_reducer(Val(std_func))
# initialize buffer following mask (isfinite) to avoid cal reducer over NaN or Inf
N = 0
@inbounds for i in eachindex(x)
xi = x[i]
if isfinite(xi)
N += 1
buffer[N] = xi
mask[i] = true
else
mask[i] = false
end
end
N == 0 && return mask
iter = 0
@inbounds while iter < maxiter
# calculate statistic
w = view(buf, 1:N)
c = cent_reducer(w)
s = std_reducer(w)
iszero(s) && return mask
upper_bound = c + s * upper
lower_bound = c - s * lower
N = 0
n_clipped = 0
@inbounds for i in eachindex(x)
if mask[i]
xi = x[i]
if xi > upper_bound || xi < lower_bound
mask[i] = false
n_clipped += 1
else
N += 1
buffer[N] = xi
end
end
end
n_clipped == 0 && return mask
iter += 1
end
return mask
end
The Evidence (The Benchmark):
@benchmark sigma_clip!(x, y, buf;upper=1.5, lower=1.5, cent_func=:mean) setup=begin
x = rand(100)
y = trues(100)
buf = rand(100)
end
BenchmarkTools.Trial: 10000 samples with 8 evaluations per sample.
Range (min β¦ max): 3.302 ΞΌs β¦ 254.620 ΞΌs β GC (min β¦ max): 0.00% β¦ 97.24%
Time (median): 5.255 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 5.725 ΞΌs Β± 4.792 ΞΌs β GC (mean Β± Ο): 3.23% Β± 4.42%
ββββ β
βββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββ β
3.3 ΞΌs Histogram: frequency by time 8.42 ΞΌs <
Memory estimate: 5.50 KiB, allocs estimate: 346.
Iβve ruled out the usual suspects (global variables, benchmark setup for mutating functions). What am I missing?
Thanks for any clues!