Faster date parsing?

I wonder if there is a faster date parsing package in Julia for example where the default Date parsing is quite slow versus a custom algorithm.

using Dates
date = Dates.format(Date(2020, 08, 09), "mm/dd/yyyy")

using BenchmarkTools
@benchmark Date($date, "mm/dd/yyyy")
# BenchmarkTools.Trial:
#   memory estimate:  2.64 KiB
#   allocs estimate:  44
#   --------------
#   minimum time:     13.301 μs (0.00% GC)
#   median time:      14.300 μs (0.00% GC)
#   mean time:        15.055 μs (0.00% GC)
#   maximum time:     81.101 μs (0.00% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1

function fastdate(str)
    @inbounds Date(parse(Int, str[7:10]), parse(Int, str[1:2]), parse(Int, str[4:5]))
end

@benchmark fastdate($date)
# BenchmarkTools.Trial:
#   memory estimate:  96 bytes
#   allocs estimate:  3
#   --------------
#   minimum time:     150.308 ns (0.00% GC)
#   median time:      155.118 ns (0.00% GC)
#   mean time:        163.088 ns (2.75% GC)
#   maximum time:     3.275 μs (94.30% GC)
#   --------------
#   samples:          10000
#   evals/sample:     811

It feels like there could be a faster way to do date parsing from strings.

Yes. It is called Dates.

using Dates, BenchmarkTools
date = Dates.format(Date(2020, 08, 09), "mm/dd/yyyy")
df = DateFormat("mm/dd/yyyy")
function fastdate(str)
    @inbounds Date(parse(Int, str[7:10]), parse(Int, str[1:2]), parse(Int, str[4:5]))
end
julia> @btime Date($date, "mm/dd/yyyy")
  13.288 μs (44 allocations: 2.63 KiB)
2020-08-09

julia> @btime Date($date, $df)
  66.069 ns (0 allocations: 0 bytes)
2020-08-09

julia> @btime fastdate($date)
  145.365 ns (3 allocations: 96 bytes)
2020-08-09
4 Likes

Isn’t this misleading, since you just extracted the slow part out of a slow code?

julia> @btime df = DateFormat("mm/dd/yyyy")
  20.000 μs (43 allocations: 2.61 KiB)
dateformat"mm/dd/yyyy"
julia> @btime begin df = DateFormat("mm/dd/yyyy"); Date($date, df) end
  20.299 μs (44 allocations: 2.63 KiB)
2020-08-09

That’s kind of the point. Note that the fastdate above is also handcrafted to a particular date format, so I am not sure in what sense this could be considered “misleading”.

2 Likes

This is a nice trick I didn’t know about.

My use case was 1 billion rows so I needed somrmething fast.

You could also use the dateformat string macro:

Date(date, dateformat"mm/dd/yyyy")

This should be as fast at runtime as defining a date format separately.

3 Likes

This is probably the normal use case, which is why it’s probably best benchmarked in the way Tamas did - you incur the cost of constructing a DateFormat once, but that’s easily swamped by the actual date parsing for larger vectors.

Here’s an example with a million Dates which also demonstrates @lungben’s point:

julia> using Dates, BenchmarkTools

julia> function fastdate(str)
           @inbounds Date(parse(Int, str[7:10]), parse(Int, str[1:2]), parse(Int, str[4:5]))
       end
fastdate (generic function with 1 method)

julia> x = Dates.format.(rand(Date(2010,1,1):Day(1):Date(2030,1,1), 1_000_000), "mm/dd/yyyy");

julia> @btime fastdate.($x);
  287.560 ms (3000002 allocations: 99.18 MiB)

julia> df = DateFormat("mm/dd/yyyy")
dateformat"mm/dd/yyyy"

julia> @btime Date.($x, $df);
  138.532 ms (3 allocations: 7.63 MiB)

julia> @btime Date.($x, dateformat"mm/dd/yyy");
  137.874 ms (3 allocations: 7.63 MiB)
2 Likes