After seeing this several times, I’m no longer surprised. It’s important to note that this is perfectly idiomatic rust, and it uses only the nice “functional” abstractions. And it is only two times slower than the best (maybe ?) hand-tuned Julia. Of course, you could write a hand-tuned version in rust (wonder what that would look like). I think this is a big challenge for Julia. It might be difficult, given the the dynamic semantics of Julia. But my guess is that the latter isn’t really the problem here.
String manipulation is also not an area where Julia exactly shines at the moment performance-wise.
Okay, now this version is really high-level:
import Base.Iterators as Itr
using IterTools
newsolve(data) = begin
itr = Itr.Stateful(Itr.map(s->something(tryparse(Int, s), -1), eachsplit(data, '\n')))
maximum(Itr.takewhile(>(0), repeatedly(()->sum(Itr.takewhile(!=(-1), itr); init=0))))
end
and it actually works:
julia> data = "11\n21\n\n3\n34\n5\n\n16\n17\n10"
"11\n21\n\n3\n34\n5\n\n16\n17\n10"
julia> bigdata=(data*"\n\n")^255;
julia> newsolve(bigdata)
43
Performance-wise it is totally competitive. The problem with performance with the initial solve
function was eachsplit(..., "\n\n")
. The eachsplit
isn’t optimized for long patterns and needs some optimization work.
My test shows this is not very competitive on the data set from the OP.
@btime newsolve($s)
131.668 μs (3 allocations: 144 bytes)
Can you add some other baselines? What is the fastest Julia version now?
(because of machine differences, a single benchmark is usually uninformative)
Yes. These are taken from previous posts. (I added @inline
annotations to the one i posted)
julia> @btime solve($s)
197.365 μs (2263 allocations: 141.36 KiB)
75501
julia> @btime solve2($s)
148.081 μs (4520 allocations: 335.45 KiB)
75501
# The following reported 12.6 μs on one btime (!)
julia> @btime hilox($s)
20.572 μs (0 allocations: 0 bytes)
75501
julia> @btime max_num($s)
22.548 μs (0 allocations: 0 bytes)
75501
julia> @btime solvelox($s)
16.187 μs (0 allocations: 0 bytes)
75501
Why didn’t you use codeunits()
instead of transcode()
? I am not sure but when I benchmarked the two functions codeunits()
were faster (few tens/hundreds ns faster, wouldn’t make tangible difference here).
because looking for some function that transforms characters into numbers, I found transcode first.
I saw that it took less than 1us to do the operation and therefore I concentrated on finding an algorithm that did everything needed in one step.
Indeed codeunits is 10 times faster than transcode, but this weighs little on the final result:
julia> @btime transcode(UInt8,bigdata);
222.391 ns (1 allocation: 16 bytes)
julia>
julia> @btime codeunits(bigdata);
21.910 ns (1 allocation: 16 bytes)
julia> function solvelox(data)
v=codeunits(data)
largest = total = totpar= 0
pre_nl=true
for e in v
if (e == 0x0a) && pre_nl
largest = max(largest, total)
total = totpar= 0
elseif (e == 0x0a) && !pre_nl
total=total+totpar
pre_nl=true
totpar=0
else
totpar = totpar*10+e-0x30
pre_nl=false
end
end
return max(largest, total+totpar)
end
solvelox (generic function with 1 method)
julia> @btime solvelox($bigdata)
15.100 μs (0 allocations: 0 bytes)
75501
julia> @btime max_num($bigdata)
28.700 μs (0 allocations: 0 bytes)
75501
You’re right. I had forgotten that BenchmarkTools runs multiple evaluations per sample.
I also tried to use the functions of IterTools, but tests on pieces of code did not encourage me to continue
v = transcode(UInt8,data)
using IterTools
p=partition(v,2,1)
ps=Iterators.Stateful(p)
Iterators.reset!(ps)
@time reduce((s,c)->s*10+first(c)-0x30 ,takewhile(t->first(t)!=0x0a,takewhile(!=((0x0a, 0x0a)), ps)),init=0)
@time takewhile(t->first(t)!=0x0a,takewhile(!=((0x0a, 0x0a)), ps))
although I enjoyed doing something like this …
bigdata=codeunits(data)
blk=groupby(!=[0x0a,0x0a]),groupby(!=(0x0a),bigdata))
dig2num=sum(vofv)sum(v->reduce((s,c)->c==0x0a ? s : s*10+c-0x30, v, init=0), vofv)
maximum(dig2num,blk)