Range (statistics)

rafael.guerra · January 4, 2023, 6:59pm

I often need to compute a statistical range (difference between maximum and minimum).
Is there a function in the Statistics standard library to compute the range? Or should we use: maximum(x) - minimum(x)?

tbeason · January 4, 2023, 7:08pm

Since extrema is slightly faster than calling both functions separately, you could do this

rangeof(x) = (tmp = extrema(x); return tmp[2] - tmp[1])

or

rangeof(x) = ((low, high) = extrema(x); return high - low)

stillyslalom · January 4, 2023, 7:08pm

julia> using BenchmarkTools

julia> datarange(x) = -(-(extrema(x)...))
datarange (generic function with 1 method)

julia> @btime datarange($x)
  161.395 ns (0 allocations: 0 bytes)
9.91438063817977

julia> @btime maximum($x) - minimum($x)
  257.181 ns (0 allocations: 0 bytes)
9.91438063817977

juliohm · January 4, 2023, 7:15pm

Is this computation done for scaling a given set of samples? If yes, you may be interested in using TableTransforms.jl’s MinMax or Scale:

rafael.guerra · January 4, 2023, 7:28pm

Thanks, so I guess I better define one of the above range functions in my startup.jl, if there is nothing ready-made.

rafael.guerra · January 5, 2023, 3:14pm

Fyi, I found this minmax algorithm here, which seems to perform better than the previous solutions:

function range2(x)
    min = max = first(x)
    for xi in view(x, 2:length(x))
        min > xi ? min = xi : max < xi && (max = xi)
    end
    return max - min
end

x = 10 * rand(100)
@btime range2($x)        # 123 ns (0 allocs: 0 bytes)

Since range is one of the important measures of variability, an optimized version should be available in the standard library, IMHO.

Dan · January 5, 2023, 3:37pm

For performance, branches (as generated by ? : operator) are bad. An alternate version:

function range3(x)
    min = typemax(eltype(x))
    max = typemin(eltype(x))
    for xi in x
        min = ifelse(min > xi, xi, min)
        max = ifelse(max < xi, xi, max)
    end
    return max - min
end

is faster on my machine.

julia> @btime range3($x)        # 123
  98.743 ns (0 allocations: 0 bytes)

rafael.guerra · January 5, 2023, 3:39pm

Thanks @Dan for the insights, I see 40% speedup here!

DNF · January 5, 2023, 4:25pm

This just replicates the functionality of extrema followed by a subtraction. If this is faster than extrema, that means there’s something wrong with the latter.

DNF · January 5, 2023, 5:32pm

Hmm, I amend my statement a bit. extrema handles NaNs, which range3 does not (nor does it handle empty collections):

julia> range3([1.0, NaN, 2.0])
1.0

julia> extrema([1.0, NaN, 2.0])
(NaN, NaN)

julia> range3(Float64[])
-Inf

julia> range3(Int[])
1

If you want this to be in a stdlib, it should be robust against things like this.

Dan · January 5, 2023, 10:52pm

Fixing this does not entail so much overhead:

function range4(x)
    isempty(x) && error("Does not support empty vectors")
    min = typemax(eltype(x))
    max = typemin(eltype(x))
    hasnan = false
    for xi in x
        hasnan |= isnan(xi)
        min = ifelse(min > xi, xi, min)
        max = ifelse(max < xi, xi, max)
    end
    hasnan && error("Does not support NaNs in vectors")
    return max - min
end

julia> x = 10 * rand(100);

julia> y = 10 * rand(1000000);

julia> @btime range4($x);
  107.822 ns (0 allocations: 0 bytes)

julia> @btime extrema($x);
  318.175 ns (0 allocations: 0 bytes)

julia> @btime range4($y);
  1.397 ms (0 allocations: 0 bytes)

julia> @btime extrema($y);
  3.285 ms (0 allocations: 0 bytes)

There is a ~3X gap here (persisting into large vectors), and this trivial (and popular) operation could probably be optimized further (SIMD, threads etc.)

DNF · January 5, 2023, 11:04pm

Can you take a look at Base.extrema and see if you can tell what, if anything, is wrong with it, performance-wise?

Because I really think the right approach is to use extrema for this, and fix any problems there.

rafael.guerra · January 5, 2023, 11:06pm

Dan, sorry to ask, but is hasnan required?
Could we write range4() as follows:

function range4b(x)
    isempty(x) && error("Does not support empty vectors")
    min = typemax(eltype(x))
    max = typemin(eltype(x))
    for xi in x
        isnan(xi) && error("Does not support NaNs in vectors")
        min = ifelse(min > xi, xi, min)
        max = ifelse(max < xi, xi, max)
    end
    return max - min
end

DNF · January 5, 2023, 11:07pm

You probably want to avoid branches.

Dan · January 5, 2023, 11:16pm

You wouldn’t want to read the vector from memory twice. The memory transfers are critical for performance.

Whoops, the code changed… but now DNF is right, the branch is bad.

rafael.guerra · January 5, 2023, 11:17pm

Sorry, you seem to have captured a temporary change in the code, the original snippet is restored.

Dan · January 5, 2023, 11:19pm

But benchmarking is the best disinfectant. You are welcome to post results (my machine’s results are up there already).

Dan · January 5, 2023, 11:24pm

Had a quick glance, but will look some more. Basically, there is an attempt to capture many reduce-like operations in one template, and perhaps the overhead of reduce with support to dims option is taking a toll.

On the NaN / empty-vec front, I think actually a vector type which prohibits NaNs can be useful (i.e. throws when operations generate a NaN). With this type, specialized cases can be dispatched.

rafael.guerra · January 5, 2023, 11:26pm

In your examples which do not contain NaN, for the small array case, range4b() is ~5% faster, and for the large array case, range4() is ~2% faster.

Dan · January 5, 2023, 11:29pm

…and compared to extrema? is it still 3X?

Topic		Replies	Views
Perf improvement suggestion for simple extrema Performance proposal	4	651	February 18, 2020
Find positions of minimum and maximum values at once General Usage	5	130	October 4, 2024
C implementation of function being ~4 times faster even absence of allocs Performance ccall	28	1399	March 5, 2025
Minimum and maximum for OnlineStats.OrderStats Statistics onlinestats , orderstats , extrema	4	156	June 20, 2025
Extrema(x,dims=1) and NaNs General Usage	9	429	May 30, 2023

Range (statistics)

Related topics