Performance of splitting string and parsing numbers

After seeing this several times, I’m no longer surprised. It’s important to note that this is perfectly idiomatic rust, and it uses only the nice “functional” abstractions. And it is only two times slower than the best (maybe ?) hand-tuned Julia. Of course, you could write a hand-tuned version in rust (wonder what that would look like). I think this is a big challenge for Julia. It might be difficult, given the the dynamic semantics of Julia. But my guess is that the latter isn’t really the problem here.

String manipulation is also not an area where Julia exactly shines at the moment performance-wise.

3 Likes

Okay, now this version is really high-level:

import Base.Iterators as Itr
using IterTools

newsolve(data) = begin
  itr = Itr.Stateful(Itr.map(s->something(tryparse(Int, s), -1), eachsplit(data, '\n')))
  maximum(Itr.takewhile(>(0), repeatedly(()->sum(Itr.takewhile(!=(-1), itr); init=0))))
end

and it actually works:

julia> data = "11\n21\n\n3\n34\n5\n\n16\n17\n10"
"11\n21\n\n3\n34\n5\n\n16\n17\n10"

julia> bigdata=(data*"\n\n")^255;

julia> newsolve(bigdata)
43

Performance-wise it is totally competitive. The problem with performance with the initial solve function was eachsplit(..., "\n\n"). The eachsplit isn’t optimized for long patterns and needs some optimization work.

2 Likes

My test shows this is not very competitive on the data set from the OP.

 @btime newsolve($s)
  131.668 μs (3 allocations: 144 bytes)

Can you add some other baselines? What is the fastest Julia version now?
(because of machine differences, a single benchmark is usually uninformative)

Yes. These are taken from previous posts. (I added @inline annotations to the one i posted)

julia> @btime solve($s)
  197.365 μs (2263 allocations: 141.36 KiB)
75501

julia> @btime solve2($s)
  148.081 μs (4520 allocations: 335.45 KiB)
75501

# The following reported 12.6 μs on one btime (!)
julia> @btime hilox($s)
  20.572 μs (0 allocations: 0 bytes)
75501


julia> @btime max_num($s)
  22.548 μs (0 allocations: 0 bytes)
75501

julia> @btime solvelox($s)
  16.187 μs (0 allocations: 0 bytes)
75501

Why didn’t you use codeunits() instead of transcode() ? I am not sure but when I benchmarked the two functions codeunits() were faster (few tens/hundreds ns faster, wouldn’t make tangible difference here).

because looking for some function that transforms characters into numbers, I found transcode first.
I saw that it took less than 1us to do the operation and therefore I concentrated on finding an algorithm that did everything needed in one step.
Indeed codeunits is 10 times faster than transcode, but this weighs little on the final result:

julia> @btime transcode(UInt8,bigdata);
  222.391 ns (1 allocation: 16 bytes)

julia> 

julia> @btime codeunits(bigdata);
  21.910 ns (1 allocation: 16 bytes)

julia> function solvelox(data)
           v=codeunits(data)
           largest = total = totpar= 0
           pre_nl=true
           for e in v
               if (e == 0x0a)  && pre_nl
                   largest = max(largest, total)
                   total = totpar= 0
               elseif (e == 0x0a)  && !pre_nl
                   total=total+totpar
                   pre_nl=true
                   totpar=0
               else
                   totpar = totpar*10+e-0x30
                   pre_nl=false
               end

           end
           return max(largest, total+totpar)
       end
solvelox (generic function with 1 method)

julia> @btime solvelox($bigdata)
  15.100 μs (0 allocations: 0 bytes)
75501

julia> @btime max_num($bigdata)
  28.700 μs (0 allocations: 0 bytes)
75501
1 Like

You’re right. I had forgotten that BenchmarkTools runs multiple evaluations per sample.

I also tried to use the functions of IterTools, but tests on pieces of code did not encourage me to continue

v = transcode(UInt8,data)

using IterTools

p=partition(v,2,1)

ps=Iterators.Stateful(p)

Iterators.reset!(ps)

@time reduce((s,c)->s*10+first(c)-0x30 ,takewhile(t->first(t)!=0x0a,takewhile(!=((0x0a, 0x0a)), ps)),init=0)

@time takewhile(t->first(t)!=0x0a,takewhile(!=((0x0a, 0x0a)), ps))

although I enjoyed doing something like this …

bigdata=codeunits(data)
blk=groupby(!=[0x0a,0x0a]),groupby(!=(0x0a),bigdata))
dig2num=sum(vofv)sum(v->reduce((s,c)->c==0x0a ? s : s*10+c-0x30, v, init=0), vofv)
maximum(dig2num,blk)