I have a collection of interval events. Each interval has a (physical) location, a start time, and a stop time. There’s also a non-negative kernel function, e.g.
kernel(dist) = ifelse(dist <= 100, 1, 0).
For each interval
i, I want to find
sum(kernel(dist(i, j)) for j in <intervals ongoing when i ends>). Here’s a very simple but inefficient implementation:
function interval_counts(locs, starts, stops) n = length(locs) return map(1:n) do i inds_of_active_intervals = [j for j in 1:n if starts[j] < stops[i] <= stops[j]] distances = [distance(locs[i], locs[j]) for j in inds_of_active_intervals] return sum(kernel, dists) end end
(Typically there are on the order of a million intervals, and at any time about 500 of them are ongoing.)
I keep feeling like there’s a simple way to make this much faster, probably relating to sorting intervals by their start or stop times. But I’ve not found any meaningful algorithmic speedup. (I have found that combining the body of the anonymous function in the example code into one line avoids allocating the index and distance vectors, but it’s not a big effect.)