Python's replace faster than Julia's replace?

Hi, I am currently working on regex stuffs, and I noticed that Python’s replace seems to be faster than Julia’s replace.

function test(iter=100_000)
    for _ in 1:iter
        replace("The quick foxes run quickly.", r"fox" => s"bus")
    end
end

@time test(10_000_000)
# 6.720898 seconds (90.00 M allocations: 4.321 GiB, 2.32% gc time)

and for Python,

import timeit

def test(iter=100_000):
    for i in range(iter):
        "The quick foxes run quickly.".replace(u"fox", u"bus")

timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
# 1.6722020990000601 seconds

Am I missing something?

Your code uses regex in julia, but not in python.

In [3]: import re

In [4]: def test(iter=100_000):
   ...:     for i in range(iter):
   ...:         re.sub(u"fox", u"bus", "The quick foxes run quickly.")
   ...: 
   ...: 

In [5]: %timeit test(10_000_000)
6.38 s ± 883 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3 Likes

but even if you don’t use regex in Julia, it’s still slower.

TBH I’m not that surprised, Julia’s String is not the fastest among languages and Python’s replace is in C and in this case you’re just calling a C function in a loop.

2 Likes

Thanks, but as @jling pointed out, it is still slower.

function test(iter=100_000)
    for _ in 1:iter
        replace("The quick foxes run quickly.", "fox" => "bus")
    end
end

@time test(10_000_000)
# 2.315239 seconds (80.00 M allocations: 4.768 GiB, 2.48% gc time)

You should use BenchmarkTools.@btime rather than @time. Not sure that will change anything, but @time is almost never the tool you want.

1 Like

Indeed, you are right. I have no idea why this is the case though.
Interestingly, with regex the time is the same for Ju & Py.

1 Like

doesn’t really matter in this case:

julia> @btime test(10_000_000)
  2.155 s (80000000 allocations: 4.77 GiB)

julia> @time test(10_000_000)
  2.061217 seconds (80.00 M allocations: 4.768 GiB, 2.02% gc time)

just String allocation dominating the scene

1 Like

I actually tried, but nothing changed:

@btime test(10_000_000)
# 2.321 s (80000000 allocations: 4.77 GiB)

because in Regex’s case, passing to and process the string with some Perl regex library is costing most of the time for both languages I’m guessing

2 Likes

I think the “always use @btime” advice is a bit overused for long-running functions (:
Especially now that @time reports both GC and compilation time.

3 Likes

It’s within about 50% on my machine, which is pretty close.

I suspect that most of the remaining difference is not due to string processing, per se, but rather to memory management. Python’s reference-counted memory management is highly optimized for code that continually allocates and discards small objects (strings, in this case), whereas Julia’s mark-and-sweep garbage collection is optimized for code that completely eliminates allocation in critical inner loops (in which case reference counting imposes an unacceptable overhead; Python doesn’t worry about this because it’s nearly impossible to eliminate allocations in pure Python code).

People writing highly optimized string-processing code in Julia (e.g. GitHub - JuliaData/CSV.jl: Utility library for working with CSV and other delimited files in the Julia programming language) typically work hard to avoid allocating zillions of temporary strings, and in doing so have been able to match or exceed the performance of highly optimized C libraries (CSV Reader Benchmarks: Julia Reads CSVs 10-20x Faster than Python and R - JuliaHub).

9 Likes

For me it’s 0.87 sec in Python vs 3.2 sec in Julia - so, 3-4 times slower.

1 Like

Weird! With Julia 1.6.1 and Python 3.9.5 on a 2.7 GHz Intel Core i7:

julia> function test(iter=100_000)
           for _ in 1:iter
               replace("The quick foxes run quickly.", "fox" => "bus")
           end
       end
test (generic function with 2 methods)

julia> using BenchmarkTools

julia> @btime test(10_000_000);
  2.514 s (80000000 allocations: 4.77 GiB)

and

>>> import timeit
>>> 
>>> def test(iter=100_000):
...     for i in range(iter):
...         "The quick foxes run quickly.".replace(u"fox", u"bus")
... 
>>> timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
1.586207215

Yes, I just ran literally the same code as yours. Julia is also 1.6.1, python 3.9.1, laptop with Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz.

julia> @btime test(10_000_000);
  2.931 s (80000000 allocations: 4.77 GiB)

>>> timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
0.8384239412844181