Given a string of numbers that are grouped in blocks, I want to parse the numbers and find the total of each block, and then return the largest value from the totals. Each block is separated by two newline chars, and each number is separate a single newline char e.g.
data = "1\n2\n\n3\n4"
My solution is extremely simple:
function solve(data)
maximum(eachsplit(data, "\n\n")) do block
sum(parse(Int64, num) for num in eachsplit(block, "\n"))
end
end
The Problem
But, it’s slow. Here’s the test result from my dataset (255 blocks, 2000 numbers overall):
julia> @btime solve($data)
309.959 μs (3757 allocations: 234.77 KiB)
The exact same algorithm in Rust is 10x faster:
pub fn solve() -> i64 {
return include_str!("../input.txt")
.split("\n\n")
.map(|e| e.lines().map(|c| c.parse::<i64>().unwrap()).sum::<i64>())
.max()
.unwrap();
}
Rust Criterion benchmark result as follows:
solve time: [30.432 µs 30.455 µs 30.481 µs]
An iterative solution
Of course, we Julians can always make our code run fast! So I rewrote the code to solve the problem manually without any iterators/eachsplit functions:
function solve2(data)
buf = IOBuffer(data)
largest = 0
total = 0
for line in eachline(buf)
if line == "" # in between two newline characters
# Update largest number and reset for next block
largest = max(largest, total)
total = 0
continue
end
val = parse(Int, line)
total += val
end
# handle last one
largest = max(largest, total)
return largest
end
And the result is better but not too great yet since it’s still off by 4x from the Rust benchmark:
julia> @btime solve2($data)
120.917 μs (4520 allocations: 335.45 KiB)
If I just use the regular split
function rather than creating an IOBuffer and use eachline
, then the number of allocations is greatly smaller but the runtime is comparable.
function solve3(str)
largest = 0
total = 0
for line in split(str, "\n")
if line == "" # in between two newline characters
# Update largest number and reset for next block
largest = max(largest, total)
total = 0
continue
end
val = parse(Int, line)
total += val
end
# handle last one
largest = max(largest, total)
return largest
end
julia> @btime solve3($data)
125.000 μs (7 allocations: 160.94 KiB)
Too good to be true
Interestingly, if I create and pass an IOBuffer
separately, then the runtime is greatly improved 1,000x.
function solve1(buf)
largest = 0
total = 0
for line in eachline(buf)
if line == "" # in between two newline characters
# Update largest number and reset for next block
largest = max(largest, total)
total = 0
continue
end
val = parse(Int, line)
total += val
end
# handle last one
largest = max(largest, total)
return largest
end
julia> @btime solve1($buf) setup=(seekstart($buf))
138.951 ns (5 allocations: 383 bytes)
That seems too good to be true… And, it does not add up. Just creating an IOBuffer
wouldn’t take 120 μs. Maybe I’m not measuring it correctly? I don’t really trust this result.
Optimization ideas?
The final question is how to make it run as fast as Rust. I would be fine if it’s just off by 50% but there’s a 4x difference at this point.
You can find the test data here: https://gist.githubusercontent.com/tk3369/566642a9fb13a8f705c7822ce14a8a16/raw/ccdfed4f63b58a75431677a4cd2b73a351e084c2/gistfile1.txt