Hi, I am currently working on regex stuffs, and I noticed that Python’s replace seems to be faster than Julia’s replace.
function test(iter=100_000)
for _ in 1:iter
replace("The quick foxes run quickly.", r"fox" => s"bus")
end
end
@time test(10_000_000)
# 6.720898 seconds (90.00 M allocations: 4.321 GiB, 2.32% gc time)
and for Python,
import timeit
def test(iter=100_000):
for i in range(iter):
"The quick foxes run quickly.".replace(u"fox", u"bus")
timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
# 1.6722020990000601 seconds
In [3]: import re
In [4]: def test(iter=100_000):
...: for i in range(iter):
...: re.sub(u"fox", u"bus", "The quick foxes run quickly.")
...:
...:
In [5]: %timeit test(10_000_000)
6.38 s ± 883 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
but even if you don’t use regex in Julia, it’s still slower.
TBH I’m not that surprised, Julia’s String is not the fastest among languages and Python’s replace is in C and in this case you’re just calling a C function in a loop.
Thanks, but as @jling pointed out, it is still slower.
function test(iter=100_000)
for _ in 1:iter
replace("The quick foxes run quickly.", "fox" => "bus")
end
end
@time test(10_000_000)
# 2.315239 seconds (80.00 M allocations: 4.768 GiB, 2.48% gc time)
It’s within about 50% on my machine, which is pretty close.
I suspect that most of the remaining difference is not due to string processing, per se, but rather to memory management. Python’s reference-counted memory management is highly optimized for code that continually allocates and discards small objects (strings, in this case), whereas Julia’s mark-and-sweep garbage collection is optimized for code that completely eliminates allocation in critical inner loops (in which case reference counting imposes an unacceptable overhead; Python doesn’t worry about this because it’s nearly impossible to eliminate allocations in pure Python code).
Weird! With Julia 1.6.1 and Python 3.9.5 on a 2.7 GHz Intel Core i7:
julia> function test(iter=100_000)
for _ in 1:iter
replace("The quick foxes run quickly.", "fox" => "bus")
end
end
test (generic function with 2 methods)
julia> using BenchmarkTools
julia> @btime test(10_000_000);
2.514 s (80000000 allocations: 4.77 GiB)
and
>>> import timeit
>>>
>>> def test(iter=100_000):
... for i in range(iter):
... "The quick foxes run quickly.".replace(u"fox", u"bus")
...
>>> timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
1.586207215