Help optimizing my generic sigma clipping function in Julia

Hi everyone,

I’m working on a small personal project and wrote this sigma clipping function in Julia. It works fine so far, but I’m definitely not an expert and I feel like it could be way more efficient — especially to reduce memory allocations and speed up runtime.

The function is meant to be generic, so it should work both on vectors and matrices, and also allow the user to choose:

  • which functions to use for the center (mean, median, etc.)
  • which function to use for the standard deviation (std, mad, etc.)
  • and which clamp value to apply when masking outliers (clampvalue)

Implementation details

Currently, I’m using a boolean mask (BitVector) to keep track of which elements are considered “good” or clipped out. This mask is updated at each iteration of the clipping loop.

To compute the center and standard deviation on the “good” subset, I use view(x, mask) — i.e., a view of the array selecting only the elements where the mask is true.

Inside the loop, I iterate over every element and update the mask based on the clipping criteria.

Finally, I apply the mask to the output array, replacing clipped values with the chosen clampvalue.

using Statistics

function mclip!(a, mask, cval)
    @inbounds @simd for p in eachindex(mask)
        if !mask[p]
            a[p] = cval
        end
    end
end

function sigma_clip!(x::AbstractArray{T},
    out::AbstractArray{T};
    sigma_low = 3,
    sigma_upp = 3,
    cent_func = mean,
    std_func = std,
    maxiters::Int = 5,
    clampvalue = NaN) where {T}

    mask = trues(size(x))
    m = cent_func(x)
    s = std_func(x)
    r = 1 # number of clipped elements
    n = 0
    while (n <= maxiters) && (r != 0)
        r = 0
        @inbounds for p in eachindex(x)
            if x[p] > m + (s * sigma_upp) || x[p] < m - (s * sigma_low)
               mask[p] = false
               r += 1
            end
        end
        
        m = cent_func(view(x, mask))
        s = std_func(view(x, mask))
        n += 1
    end

    mclip!(out, mask, clampvalue)
    return out
end

I’d appreciate any tips on how to:

Cut down temporary allocations inside the loop

Avoid unnecessary copies if possible

Or anything else that could make this faster, cleaner, or more Julian

Thanks so much for your patience and help with a newbie like me!

Hi,

The only performance issue I see is that view(x, mask) is allocating:

julia> x = rand(10); mask = rand(Bool, length(x)); @btime view($x, $mask);
  58.232 ns (2 allocations: 96 bytes)
4-element view(::Vector{Float64}, [3, 5, 6, 8]) with eltype Float64:
 0.6461785676274421
 0.7066252429548666
 0.22816455870960406
 0.5279362929418385

As you can see, the reason is that it needs finds the indices [3, 5, 6, 8] where the mask is true (to enable fast indexing), for which it needs to allocate an array.


You could keep track of these indices yourself, using only one allocation (well, two below):

function sigma_clip2!(
    x::AbstractVector{T},    # (Could stay as AbstractArray, though I'm not sure
    out::AbstractVector{T};  #  what would be the point of higher dimensions)
    sigma_low = 3,
    sigma_upp = 3,
    cent_func = mean,
    std_func = std,
    maxiters::Int = 5,
    clampvalue::T = NaN) where {T}  # Best to ensure typeof(clampvalue) == eltype(out)

    ids_to_keep_buffer = collect(eachindex(x))  # Must be collect(1:length(x)), doesn't support e.g. OffsetArrays
	nb_ids_to_keep = length(x)
    new_ids_to_keep_buffer = Vector{Int}(undef, length(x))
    for _ = 1:maxiters
		ids_to_keep = view(ids_to_keep_buffer, 1:nb_ids_to_keep)
		m = cent_func(view(x, ids_to_keep))
        s = std_func(view(x, ids_to_keep))
		
		# Could precompute lower and upper bound
		# 	lb = m - (s * sigma_low)
		# 	ub = m + (s * sigma_upp)
		# and use these in the loop, but that seems slightly slower for some reason. 

		nb_new_ids_to_keep = 0
        @inbounds for p in ids_to_keep
			if m - (s * sigma_low) <= x[p] <= m + (s * sigma_upp)
				nb_new_ids_to_keep += 1
				new_ids_to_keep_buffer[nb_new_ids_to_keep] = p
			else
				out[p] = clampvalue
			end
		end
		
        nb_new_ids_to_keep < nb_ids_to_keep || break
		ids_to_keep_buffer, new_ids_to_keep_buffer = new_ids_to_keep_buffer, ids_to_keep_buffer
		nb_ids_to_keep = nb_new_ids_to_keep
    end
	
    return out
end

Here we have two overallocated buffers of valid indices (corresponding to non-clipped entries); new_ids_to_keep_buffer will be built up this iteration and replace ids_to_keep_buffer. We keep track of the number of valid entries (the complement of your r) and use a view to get an AbstractVector{Int} of the real valid indices ids_to_keep. Then view(x, ids_to_keep) is the same as your view(x, mask), only already computed. We also only iterate over the entries p in ids_to_keep, as all others are already clipped anyway.

julia> x = randn(1_000_000); out = similar(x);

julia> @btime sigma_clip!($x, $out);
  47.079 ms (40 allocations: 91.40 MiB)

julia> @btime sigma_clip2!($x, $out);
  29.441 ms (6 allocations: 15.26 MiB)

Potentially we can make do with a single buffer, but we would need to be very careful when modifying a collection we are currently iterating over.


This is not the most elegant and presumably not the most performant implementation possible. I imagine we could just create a non-allocating iterator equivalent to view(x, mask). It would not support arbitrary indexing, but we don’t need that for mean or std. (For median it might be more important? Also median does support iterators.)

I couldn’t seem to find an existing implementation for such an iterator, so here’s my own version:

struct MaskedIterator{I, M}
    it::I
    mask_it::M
end

function Base.iterate(mi::MaskedIterator)
    # Try the first entry. If it is not masked out, return it and the iterator states.
    # Otherwise, fall back to the general case using these states.

    iit = iterate(mi.it)
    mit = iterate(mi.mask_it)

    (isnothing(iit) || isnothing(mit)) && return nothing
    if first(mit)
        return first(iit), (iit[2], mit[2])
    else
        return iterate(mi, (iit[2], mit[2]))
    end
end

function Base.iterate(mi::MaskedIterator, (istate, mstate))
    # Keep iterating until we get a true mask entry (or we're done)

    iit = iterate(mi.it, istate)
    mit = iterate(mi.mask_it, mstate)

    while !isnothing(iit) && !isnothing(mit) && !first(mit)
        iit = iterate(mi.it, iit[2])
        mit = iterate(mi.mask_it, mit[2])
    end

    (isnothing(iit) || isnothing(mit)) && return nothing
    return first(iit), (iit[2], mit[2])
end
julia> x = rand(5)
5-element Vector{Float64}:
 0.7634859118664
 0.19910795036201456
 0.6617345518525386
 0.6275731947824943
 0.6967566385365754

julia> m = rand(Bool, 5)
5-element Vector{Bool}:
 0
 1
 0
 0
 1

julia> mi = MaskedIterator(x, m)
MaskedIterator{Vector{Float64}, Vector{Bool}}([0.7634859118664, 0.19910795036201456, 0.6617345518525386, 0.6275731947824943, 0.6967566385365754], Bool[0, 1, 0, 0, 1])

julia> for a in mi
           println(a)
       end
0.19910795036201456
0.6967566385365754

julia> mean(mi)
0.447932294449295
function sigma_clip3!(x::AbstractArray{T},
    out::AbstractArray{T};
    sigma_low = 3,
    sigma_upp = 3,
    cent_func = mean,
    std_func = std,
    maxiters::Int = 5,
    clampvalue = NaN) where {T}

    mask = trues(size(x))
    m = cent_func(x)
    s = std_func(x)
    r = 1 # number of clipped elements
    n = 0
    while (n <= maxiters) && (r != 0)
        r = 0
        @inbounds for p in eachindex(x)
            if x[p] > m + (s * sigma_upp) || x[p] < m - (s * sigma_low)
               mask[p] = false
               r += 1
            end
        end
        
        m = cent_func(MaskedIterator(x, mask))  # <-- only change
        s = std_func(MaskedIterator(x, mask))   # <--
        n += 1
    end

    mclip!(out, mask, clampvalue)
    return out
end
julia> x = randn(1_000_000); out = similar(x);

julia> out .= 0; sigma_clip!(x, out); count(isnan, out)
3045

julia> out .= 0; sigma_clip2!(x, out); count(isnan, out)  # as a correctness check
3045

julia> out .= 0; sigma_clip3!(x, out); count(isnan, out)
3045


julia> @btime sigma_clip!($x, $out);
  43.162 ms (40 allocations: 91.40 MiB)

julia> @btime sigma_clip2!($x, $out);
  29.016 ms (6 allocations: 15.26 MiB)

julia> @btime sigma_clip3!($x, $out);
  47.516 ms (4 allocations: 122.23 KiB)

So it does reduce the number of allocations (though I’m not sure where the second one comes from)*, but the overhead is too significant, so it’s slower regardless :slight_smile: .

* trues(size(x)) simply already yields 4 allocations.