Faster date parsing?

I wonder if there is a faster date parsing package in Julia for example where the default `Date` parsing is quite slow versus a custom algorithm.

``````using Dates
date = Dates.format(Date(2020, 08, 09), "mm/dd/yyyy")

using BenchmarkTools
@benchmark Date(\$date, "mm/dd/yyyy")
# BenchmarkTools.Trial:
#   memory estimate:  2.64 KiB
#   allocs estimate:  44
#   --------------
#   minimum time:     13.301 μs (0.00% GC)
#   median time:      14.300 μs (0.00% GC)
#   mean time:        15.055 μs (0.00% GC)
#   maximum time:     81.101 μs (0.00% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1

function fastdate(str)
@inbounds Date(parse(Int, str[7:10]), parse(Int, str[1:2]), parse(Int, str[4:5]))
end

@benchmark fastdate(\$date)
# BenchmarkTools.Trial:
#   memory estimate:  96 bytes
#   allocs estimate:  3
#   --------------
#   minimum time:     150.308 ns (0.00% GC)
#   median time:      155.118 ns (0.00% GC)
#   mean time:        163.088 ns (2.75% GC)
#   maximum time:     3.275 μs (94.30% GC)
#   --------------
#   samples:          10000
#   evals/sample:     811
``````

It feels like there could be a faster way to do date parsing from strings.

Yes. It is called `Dates`.

``````using Dates, BenchmarkTools
date = Dates.format(Date(2020, 08, 09), "mm/dd/yyyy")
df = DateFormat("mm/dd/yyyy")
function fastdate(str)
@inbounds Date(parse(Int, str[7:10]), parse(Int, str[1:2]), parse(Int, str[4:5]))
end
julia> @btime Date(\$date, "mm/dd/yyyy")
13.288 μs (44 allocations: 2.63 KiB)
2020-08-09

julia> @btime Date(\$date, \$df)
66.069 ns (0 allocations: 0 bytes)
2020-08-09

julia> @btime fastdate(\$date)
145.365 ns (3 allocations: 96 bytes)
2020-08-09
``````
4 Likes

Isn’t this misleading, since you just extracted the slow part out of a slow code?

``````julia> @btime df = DateFormat("mm/dd/yyyy")
20.000 μs (43 allocations: 2.61 KiB)
dateformat"mm/dd/yyyy"
``````
``````julia> @btime begin df = DateFormat("mm/dd/yyyy"); Date(\$date, df) end
20.299 μs (44 allocations: 2.63 KiB)
2020-08-09
``````

That’s kind of the point. Note that the `fastdate` above is also handcrafted to a particular date format, so I am not sure in what sense this could be considered “misleading”.

2 Likes

This is a nice trick I didn’t know about.

My use case was 1 billion rows so I needed somrmething fast.

You could also use the dateformat string macro:

``````Date(date, dateformat"mm/dd/yyyy")
``````

This should be as fast at runtime as defining a date format separately.

3 Likes

This is probably the normal use case, which is why it’s probably best benchmarked in the way Tamas did - you incur the cost of constructing a `DateFormat` once, but that’s easily swamped by the actual date parsing for larger vectors.

Here’s an example with a million `Date`s which also demonstrates @lungben’s point:

``````julia> using Dates, BenchmarkTools

julia> function fastdate(str)
@inbounds Date(parse(Int, str[7:10]), parse(Int, str[1:2]), parse(Int, str[4:5]))
end
fastdate (generic function with 1 method)

julia> x = Dates.format.(rand(Date(2010,1,1):Day(1):Date(2030,1,1), 1_000_000), "mm/dd/yyyy");

julia> @btime fastdate.(\$x);
287.560 ms (3000002 allocations: 99.18 MiB)

julia> df = DateFormat("mm/dd/yyyy")
dateformat"mm/dd/yyyy"

julia> @btime Date.(\$x, \$df);
138.532 ms (3 allocations: 7.63 MiB)

julia> @btime Date.(\$x, dateformat"mm/dd/yyy");
137.874 ms (3 allocations: 7.63 MiB)
``````
2 Likes