Ok I think this one is better, I got nerdsniped:
Now I calculate one vector with a cumsum of the full business hours per day in seconds. This means each business hours interval has to be looked up only once. Then later, I can just look up differences of full-day intervals in that vector and only have to compute the fractional parts separately. I didn’t spend much time checking this except with the one example from above, but it runs in about 2.5ms for 40,000 items. One could also add more complicated logic for the business hours, I went with the simple example from above without holidays etc.
using Dates
using Intervals
using DataFrames
function business_intervals(date::Date)
weekday = dayofweek(date)
if 1 <= weekday <= 5
DateTime(date, Time(8, 00)) .. DateTime(date, Time(20, 00))
else
DateTime(date, Time(10, 00)) .. DateTime(date, Time(18, 00))
end
end
function business_seconds(d1s, d2s)
# find first and last dates
mi, ma = extrema([extrema(d1s)..., extrema(d2s)...])
dmi = Date(mi)
dma = Date(ma)
all_days = dmi:Day(1):dma
# query each day's business hours once
all_time_intervals = business_intervals.(all_days)
# accumulate durations in seconds over all days
# durations between two full days can then be computed with two lookups and a difference
cumulative_business_seconds = cumsum(Second(span(int)) for int in all_time_intervals)
map(d1s, d2s) do d1, d2
interval = d1 .. d2
# compute day indices for lookup
i1 = Dates.days(Date(d1) - dmi) + 1
i2 = Dates.days(Date(d2) - dmi) + 1
# compute durations on full days by direct lookup
full_days_seconds = cumulative_business_seconds[i2 - 1] - cumulative_business_seconds[i1]
first_day_seconds = Second(span(intersect(all_time_intervals[i1], interval)))
total = first_day_seconds + full_days_seconds
# avoid double dipping if both times are on the same day
if i2 > i1
last_day_seconds = Second(span(intersect(all_time_intervals[i2], interval)))
total += last_day_seconds
end
total
end
end
julia> df = DataFrame(
date1 = rand(DateTime(2018):Day(1):DateTime(2019), 40_000) .+ Second.(rand.(Ref(1:86400))),
date2 = rand(DateTime(2020):Day(1):DateTime(2021), 40_000) .+ Second.(rand.(Ref(1:86400)))
);
julia> business_seconds(
[DateTime(2021, 11, 25, 19, 30, 00)],
[DateTime(2021, 11, 26, 08, 30, 00)],
)
1-element Vector{Second}:
3600 seconds
julia> @time business_seconds(df.date1, df.date2);
0.002505 seconds (6 allocations: 338.703 KiB)