Some questions about eachoverlap in GenomicFeatures.jl

During using the eachoverlap function in GenomicFeatures,I cannot know how to solve this problem.
My problem is :if each interval in “col” contains any interval in “hhh” ,I will output interval "col ".Here is the picture(below). For example col’s 10628-10683 contains 10631 in hhh,I will output the first line in col.Next 10643-10779 in col contains hhh’s 10648 .If col’s interval does not contain any hhh’s interval,we will discard this col’s line


I typed in this code.and cannot solve it.

eachoverlap(col,hhh,isless)


And I also tried isoverlappiing,unfortunately it was slow.

k=IntervalCollection{String}()
for a in col ,b in hhh
    if isoverlapping(a,b)
        if (a in k)==false
            push!(k,a)
        end
    end
end

So would you please tell me how I can handle this issue?I will be grateful to you.

I may not understand your question. However, below are some approaches for determining whether intervals in col have at least one intersection with intervals in hhh.

result = Vector{eltype(col)}()

for (c, h) in eachoverlap(col, hhh)
	push!(result, c)
end

result |> unique

result = Dict{eltype(col), Vector{eltype(hhh)}}()

for (c, h) in eachoverlap(col, hhh)
	intervals = get!(result, c, [])
	push!(intervals, h)
end

keys(result) |> collect |> sort
function hasintersection(interval, col)

	# Return early if chromosome is not in the interval collection.
	if !haskey(col.trees, seqname(interval))
		return false
	end

	# Setup intersection iterator.
	iter = GenomicFeatures.IntervalTrees.intersect(col.trees[seqname(interval)], (leftposition(interval), rightposition(interval)))

	# Attempt first iteration.
	if iterate(iter) === nothing
		return false
	end

	# Intersection exists.
	return true 

end

result = Vector{eltype(col)}()

for interval in col
	if hasintersection(interval, hhh)
		push!(result, interval)
	end
end

result
1 Like

Many thanks.And I made it in this way. But there is another question:How can I count the intervals in hhh in col’s interval(such as hhh’s first interval is in col’s first interval,hhh’s second interval is in col’s first interval,too)?
For example ,the hhh’s 10631 10633 10636 … are in col’s first line like the picture below.Probably 10644 in hhh is also in col’s first line and second line, we will return like the second picture.(It means the fifth and sixth line in hhh are in the first line’s interval in col and also in the second line’s interval in col.) But I cannot solve the problem by myself.Would you please to help me?
image

That output does not appear to count the intervals; rather it lists the pairs of enumerated intervals that overlap.

using GenomicFeatures
using DataFrames


col = [
	Interval("chr1", 10628, 10683, '?', "abc")
	Interval("chr1", 10643, 10779, '?', "abc")
	Interval("chr1", 10645, 10748, '?', "abc")
	Interval("chr1", 10648, 10786, '?', "abc")
	Interval("chr1", 10676, 10767, '?', "abc")
	Interval("chr1", 10690, 10731, '?', "abc")
	Interval("chr1", 10707, 10793, '?', "abc")
	Interval("chr1", 10742, 10814, '?', "abc")
] |> IntervalCollection

hhh = [
	Interval("chr1", 10631, 10631)
	Interval("chr1", 10633, 10633)
	Interval("chr1", 10636, 10636)
	Interval("chr1", 10638, 10638)
	Interval("chr1", 10641, 10641)
	Interval("chr1", 10644, 10644)
	Interval("chr1", 10645, 10645)
	Interval("chr1", 10650, 10650)
] |> IntervalCollection

function number_interval(tp::Tuple)
	# Unpack Tuple.
	(i, interval) = tp

	# Setup numbered metadata.
	new_metadata = (
		i = i,
		original = metadata(interval)
	)

	# Create new interval with numbered metadata.
	return Interval(
		seqname(interval),
		leftposition(interval),
		rightposition(interval),
		strand(interval),
		new_metadata
	)
end

# Convert intervals.
col_numbered = col |> enumerate .|> number_interval |> IntervalCollection
hhh_numbered = hhh |> enumerate .|> number_interval |> IntervalCollection


# Reproduce example.
df = DataFrame()

for (c, h) in eachoverlap(col_numbered, hhh_numbered)

	result = (
		queryHits = GenomicFeatures.metadata(c).i, 
		subjectHits = GenomicFeatures.metadata(h).i
	)

	push!(df, result)
end

df

Thanks for helping me again.It works.Best wishes.