Strange behavior of intersect() when applied to ranges

When trying to intersect two ranges I stumbled upon the following behavior:

julia> intersect(2:8, 4:10)
4:8

julia> intersect(2:7, 5:10)
5:7

julia> intersect(2:6, 6:10)
6:6

julia> intersect(2:5, 7:10)
7:6

julia> intersect(2:4, 8:10)
8:7

julia> intersect(2:3, 9:10)
9:8

Is this expected? It works fine for arrays:

julia> intersect(collect(2:8), collect(4:10))
5-element Array{Int64,1}:
 4
 5
 6
 7
 8

julia> intersect(collect(2:7), collect(5:10))
3-element Array{Int64,1}:
 5
 6
 7

julia> intersect(collect(2:6), collect(6:10))
1-element Array{Int64,1}:
 6

julia> intersect(collect(2:5), collect(7:10))
0-element Array{Int64,1}

julia> intersect(collect(2:4), collect(8:10))
0-element Array{Int64,1}

julia> intersect(collect(2:3), collect(9:10))
0-element Array{Int64,1}

Note that if you do e.g. collect(intersect(1:2,4:5)) it returns a 0-element Array{...} too, so if you compare the collected results you get the same, i.e. collect(intersect(r1,r2)) == intersect(collect(r1),collect(r2)) for unitranges r1 and r2.
Effectively, something like 4:2 is an empty range since the step is +1 (not to be confused with a non-empty range like 4:-1:2). The advantage is that intersect of unitranges seems to always return unitranges which is important for type stability. Quite nice if you ask me! (And you might wanna take a look at the implementation - very simple and understandable)

4 Likes

Ok, that makes sense. I don’t know why, but for a moment I was thinking of 4:2 as 4:-1:2. The argument about type stability makes total sense. Thanks a lot.

The automatic modification of range bounds still seems somewhat strange:

julia> 4:2
4:3

*edit: this is definitely intentional behavior, but I don’t understand the rationale.

unitrange_last(start::T, stop::T) where {T<:Integer} =
    ifelse(stop >= start, stop, convert(T,start-oneunit(stop-start)))

I also noticed that intersect!() throws an error when applied to unit ranges:

julia> s = 1:5
1:5

julia> s = intersect(s, 4:5)
4:5

julia> s = 1:5
1:5

julia> intersect!(s, 4:5)
ERROR: setindex! not defined for UnitRange{Int64}
Stacktrace:
 [1] error(::String, ::Type) at ./error.jl:42
 [2] error_if_canonical_setindex(::IndexLinear, ::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:1028
 [3] setindex! at ./abstractarray.jl:1019 [inlined]
 [4] filter!(::getfield(Base, Symbol("##83#84")){typeof(∉),typeof(push!),Set{Int64}}, ::UnitRange{Int64}) at ./array.jl:2339
 [5] _shrink!(::Function, ::UnitRange{Int64}, ::Tuple{UnitRange{Int64}}) at ./array.jl:2384
 [6] intersect!(::UnitRange{Int64}, ::UnitRange{Int64}) at ./array.jl:2389
 [7] top-level scope at none:0

That’s not surprising: unit ranges aren’t mutable objects, so a mutating method shouldn’t exist.

struct UnitRange{T<:Real} <: AbstractUnitRange{T}
    start::T
    stop::T
    UnitRange{T}(start, stop) where {T<:Real} = new(start, unitrange_last(start,stop))
end
2 Likes

Right. I was wondering why the method below is defined then:

[6] intersect!(::UnitRange{Int64}, ::UnitRange{Int64}) at ./array.jl:2389

But then I noticed it comes from here:

intersect!(v::AbstractVector, itrs...) = _shrink!(intersect!, v, itrs)

Yeah, that method may be too broadly typed; there are a lot of immutable concrete types <: AbstractVector. This is a case where trait-based dispatch would be useful.

I suppose that empty ranges could retain independent start and stop values but what would the advantage be? The current behavior allows the length of a range to always be computed as stop-start+1. It also normalizes empty ranges starting at a given index.

4 Likes

We don’t do the clever trick anymore for computing the length of the range, so we could change this now (https://github.com/JuliaLang/julia/pull/43058#issuecomment-973218652), if there is support for that.

And there is also now this issue https://github.com/JuliaLang/julia/issues/40331 for printing these empty ranges more clearly.

2 Likes

Sorry if this doesn’t make sense, but wouldn’t it be clear to output the existing form: UnitRange[]