# Range (statistics)

I often need to compute a statistical range (difference between maximum and minimum).
Is there a function in the `Statistics` standard library to compute the range? Or should we use: ` maximum(x) - minimum(x)`?

Since `extrema` is slightly faster than calling both functions separately, you could do this

`rangeof(x) = (tmp = extrema(x); return tmp[2] - tmp[1])`

or

`rangeof(x) = ((low, high) = extrema(x); return high - low)`

1 Like
``````julia> using BenchmarkTools

julia> datarange(x) = -(-(extrema(x)...))
datarange (generic function with 1 method)

julia> @btime datarange(\$x)
161.395 ns (0 allocations: 0 bytes)
9.91438063817977

julia> @btime maximum(\$x) - minimum(\$x)
257.181 ns (0 allocations: 0 bytes)
9.91438063817977
``````
1 Like

Is this computation done for scaling a given set of samples? If yes, you may be interested in using TableTransforms.jlâ€™s MinMax or Scale:

https://juliaml.github.io/TableTransforms.jl/stable/transforms/#MinMax

1 Like

Thanks, so I guess I better define one of the above range functions in my startup.jl, if there is nothing ready-made.

Fyi, I found this minmax algorithm here, which seems to perform better than the previous solutions:

``````function range2(x)
min = max = first(x)
for xi in view(x, 2:length(x))
min > xi ? min = xi : max < xi && (max = xi)
end
return max - min
end

x = 10 * rand(100)
@btime range2(\$x)        # 123 ns (0 allocs: 0 bytes)
``````

Since range is one of the important measures of variability, an optimized version should be available in the standard library, IMHO.

For performance, branches (as generated by `? :` operator) are bad. An alternate version:

``````function range3(x)
min = typemax(eltype(x))
max = typemin(eltype(x))
for xi in x
min = ifelse(min > xi, xi, min)
max = ifelse(max < xi, xi, max)
end
return max - min
end
``````

is faster on my machine.

``````julia> @btime range3(\$x)        # 123
98.743 ns (0 allocations: 0 bytes)
``````
4 Likes

Thanks @Dan for the insights, I see 40% speedup here!

This just replicates the functionality of `extrema` followed by a subtraction. If this is faster than `extrema`, that means thereâ€™s something wrong with the latter.

3 Likes

Hmm, I amend my statement a bit. `extrema` handles `NaN`s, which `range3` does not (nor does it handle empty collections):

``````julia> range3([1.0, NaN, 2.0])
1.0

julia> extrema([1.0, NaN, 2.0])
(NaN, NaN)

julia> range3(Float64[])
-Inf

julia> range3(Int[])
1
``````

If you want this to be in a stdlib, it should be robust against things like this.

3 Likes

Fixing this does not entail so much overhead:

``````function range4(x)
isempty(x) && error("Does not support empty vectors")
min = typemax(eltype(x))
max = typemin(eltype(x))
hasnan = false
for xi in x
hasnan |= isnan(xi)
min = ifelse(min > xi, xi, min)
max = ifelse(max < xi, xi, max)
end
hasnan && error("Does not support NaNs in vectors")
return max - min
end
``````
``````julia> x = 10 * rand(100);

julia> y = 10 * rand(1000000);

julia> @btime range4(\$x);
107.822 ns (0 allocations: 0 bytes)

julia> @btime extrema(\$x);
318.175 ns (0 allocations: 0 bytes)

julia> @btime range4(\$y);
1.397 ms (0 allocations: 0 bytes)

julia> @btime extrema(\$y);
3.285 ms (0 allocations: 0 bytes)
``````

There is a ~3X gap here (persisting into large vectors), and this trivial (and popular) operation could probably be optimized further (SIMD, threads etc.)

2 Likes

Can you take a look at `Base.extrema` and see if you can tell what, if anything, is wrong with it, performance-wise?

Because I really think the right approach is to use `extrema` for this, and fix any problems there.

2 Likes

Dan, sorry to ask, but is `hasnan` required?
Could we write `range4()` as follows:

``````function range4b(x)
isempty(x) && error("Does not support empty vectors")
min = typemax(eltype(x))
max = typemin(eltype(x))
for xi in x
isnan(xi) && error("Does not support NaNs in vectors")
min = ifelse(min > xi, xi, min)
max = ifelse(max < xi, xi, max)
end
return max - min
end
``````

You probably want to avoid branches.

You wouldnâ€™t want to read the vector from memory twice. The memory transfers are critical for performance.

Whoops, the code changedâ€¦ but now DNF is right, the branch is bad.

Sorry, you seem to have captured a temporary change in the code, the original snippet is restored.

But benchmarking is the best disinfectant. You are welcome to post results (my machineâ€™s results are up there already).

Had a quick glance, but will look some more. Basically, there is an attempt to capture many reduce-like operations in one template, and perhaps the overhead of `reduce` with support to `dims` option is taking a toll.

On the NaN / empty-vec front, I think actually a vector type which prohibits NaNs can be useful (i.e. throws when operations generate a NaN). With this type, specialized cases can be dispatched.

In your examples which do not contain `NaN`, for the small array case, `range4b()` is ~5% faster, and for the large array case, `range4()` is ~2% faster.

1 Like

â€¦and compared to `extrema`? is it still 3X?