Intersect of 2 arrays

I have 2 arrays of Dict containing from and to items, both of type ZonedDateTime (although for our example they could be integers, doesn’t matter). They are basically 2 arrays of ranges.

What would be the simplest/the most efficient way to determine whether they intersect?

Thank you,

Intersect in which sense? Have the same keys? Have the same values? Have the same pairs of key and value? The intersection is about the Dicts having the same keys/values, or the keys/values itself intersecting with the keys/values of another Dict?

1 Like

Can you provide an MWE, please? It would be much easier if you give a small example of what you have and what you are trying to achieve.

1 Like

Let’s assume “from” and “to” are integers (so that’s easier for me to write an example).

A = [Dict(“from”=>1, “to”=>20), Dict(“from”=>51, “to”=>100)]
B = [Dict(“from”=>21, “to”=>50), Dict(“from”=>99, “to”=>105)]

In this case, they do intersect with 2 values, 99 and 100.

In another example:

C = [Dict(“from”=>1, “to”=>20), Dict(“from”=>51, “to”=>100)]
D = [Dict(“from”=>21, “to”=>50), Dict(“from”=>101, “to”=>125)]

They don’t intersect.

Thank you.

Why are you using Dicts for the ranges? Why not use UnitRange{Int64} (or in your real case UnitRange{ZonedDateTime}).

The ranges inside the same array are mutually exclusive (i.e., do not share any numbers?) the arrays are sorted? All of this is relevant for knowing what is the most efficient algorithm. If the answer to both is “yes”, then the code below should suffice.

A = [1:20, 51:100]
B = [21:50, 99:105]
C = [1:20, 51:100]
D = [21:50, 101:125]

function list_intersections(A, B)
    idx_a, idx_b = 1, 1
    len_a, len_b = length.((a, b)) 
    intersects = UnitRange{Int64}[]
    while idx_a <= len_a && idx_b <= b
        if last(A[idx_a]) < first(B[idx_b])
            idx_a += 1
        elseif last(B[idx_b]) < first(A[idx_a])
            idx_b += 1
        else
            r = intersect(A[idx_a], B[idx_b])
            !isempty(r) && push!(intersects, r)
            if last(A[idx_a]) < last(B[idx_b])
                idx_a += 1
            else
                idx_b += 1
            end 
        end
    end   
    return intersects
end

list_intersections(A, B)
list_intersections(C, D)

Gives the expected results.

2 Likes

If you can do not mind use other structures than Dict (which is rather inefficient in this case), you can use nice Intervals.jl package. Something like this

function combineintervals(A, B)
    res = Vector{eltype(A)}()
    for a in A, b in B
        c = intersect(a, b)
        isempty(c) || push!(res, c)
    end
    return res
end

A = [1..20, 51..100]
B = [21..50, 101..125]

julia> combineintervals(A, B)
Interval{Int64, Closed, Closed}[]

julia> combineintervals(A, [21..50, 99..125])
1-element Vector{Interval{Int64, Closed, Closed}}:
 Interval{Int64, Closed, Closed}(99, 100)

Using this package makes sense also if you are working with seconds resolution, since you can use for example close, open intervals, for example

using Dates

julia> Interval{Closed, Open}(DateTime("2021-01-01"), DateTime("2021-02-01"))
Interval{DateTime, Closed, Open}(DateTime("2021-01-01T00:00:00"), DateTime("2021-02-01T00:00:00"
))

and still be able to get correct results without worrying of edge cases.

4 Likes

I had some time at lunch; and always nice to learn never-heard packages like Intervals.

struct FromTo
    from
    to
    function FromTo(f,t)
        @assert t>=f "`To` has to be >= `From`"
        new(f,t)
    end
end

struct FromToCollections
    l::Array{FromTo}
end

A = FromToCollections([FromTo(1,20),FromTo(51,100)])
B = FromToCollections([FromTo(21,50),FromTo(99,105)])
C = FromToCollections([FromTo(1,20), FromTo(51, 100)])
D = FromToCollections([FromTo(21, 50), FromTo(101,125)])

function intersect(A::FromTo, B::FromTo)
    # The complement case seems to be easier to define
    return !((A.to < B.from) ||(B.to < A.from))
end

function intersect(As::FromToCollections, Bs::FromToCollections)
    for a in As.l, b in Bs.l
        intersect(a,b) && return true
    end
    return false
end
julia> intersect(A,B)
true

julia> intersect(C,D)
false

Generally I like to define my own type, then I can defined what I meant intersect for my own type at a very unclutter way, and build my way up.

Edit:
on a second thought, probably should use a different name for intersect since it return “are they intersect” and doesn’t return “the intersect”.

Also, intersect is already defined by Base because Base already has UnitRanges.

Sincerely, unless there is a good reason, this seems to be reinventing the wheel. Often an interval is just two values that have ordering functions defined for them, and there is a lot of code that already works over both UnitRange and the types of Intervals.jl (that also support the concept of closed and open ranges).

1 Like