Python's replace faster than Julia's replace?

Hi, I am currently working on regex stuffs, and I noticed that Python’s replace seems to be faster than Julia’s replace.

function test(iter=100_000)
    for _ in 1:iter
        replace("The quick foxes run quickly.", r"fox" => s"bus")
    end
end

@time test(10_000_000)
# 6.720898 seconds (90.00 M allocations: 4.321 GiB, 2.32% gc time)

and for Python,

import timeit

def test(iter=100_000):
    for i in range(iter):
        "The quick foxes run quickly.".replace(u"fox", u"bus")

timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
# 1.6722020990000601 seconds

Am I missing something?

Your code uses regex in julia, but not in python.

In [3]: import re

In [4]: def test(iter=100_000):
   ...:     for i in range(iter):
   ...:         re.sub(u"fox", u"bus", "The quick foxes run quickly.")
   ...: 
   ...: 

In [5]: %timeit test(10_000_000)
6.38 s ± 883 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4 Likes

but even if you don’t use regex in Julia, it’s still slower.

TBH I’m not that surprised, Julia’s String is not the fastest among languages and Python’s replace is in C and in this case you’re just calling a C function in a loop.

2 Likes

Thanks, but as @jling pointed out, it is still slower.

function test(iter=100_000)
    for _ in 1:iter
        replace("The quick foxes run quickly.", "fox" => "bus")
    end
end

@time test(10_000_000)
# 2.315239 seconds (80.00 M allocations: 4.768 GiB, 2.48% gc time)

You should use BenchmarkTools.@btime rather than @time. Not sure that will change anything, but @time is almost never the tool you want.

1 Like

Indeed, you are right. I have no idea why this is the case though.
Interestingly, with regex the time is the same for Ju & Py.

1 Like

doesn’t really matter in this case:

julia> @btime test(10_000_000)
  2.155 s (80000000 allocations: 4.77 GiB)

julia> @time test(10_000_000)
  2.061217 seconds (80.00 M allocations: 4.768 GiB, 2.02% gc time)

just String allocation dominating the scene

1 Like

I actually tried, but nothing changed:

@btime test(10_000_000)
# 2.321 s (80000000 allocations: 4.77 GiB)

because in Regex’s case, passing to and process the string with some Perl regex library is costing most of the time for both languages I’m guessing

2 Likes

I think the “always use @btime” advice is a bit overused for long-running functions (:
Especially now that @time reports both GC and compilation time.

5 Likes

It’s within about 50% on my machine, which is pretty close.

I suspect that most of the remaining difference is not due to string processing, per se, but rather to memory management. Python’s reference-counted memory management is highly optimized for code that continually allocates and discards small objects (strings, in this case), whereas Julia’s mark-and-sweep garbage collection is optimized for code that completely eliminates allocation in critical inner loops (in which case reference counting imposes an unacceptable overhead; Python doesn’t worry about this because it’s nearly impossible to eliminate allocations in pure Python code).

People writing highly optimized string-processing code in Julia (e.g. GitHub - JuliaData/CSV.jl: Utility library for working with CSV and other delimited files in the Julia programming language) typically work hard to avoid allocating zillions of temporary strings, and in doing so have been able to match or exceed the performance of highly optimized C libraries (https://juliacomputing.com/blog/2020/06/fast-csv/).

13 Likes

For me it’s 0.87 sec in Python vs 3.2 sec in Julia - so, 3-4 times slower.

1 Like

Weird! With Julia 1.6.1 and Python 3.9.5 on a 2.7 GHz Intel Core i7:

julia> function test(iter=100_000)
           for _ in 1:iter
               replace("The quick foxes run quickly.", "fox" => "bus")
           end
       end
test (generic function with 2 methods)

julia> using BenchmarkTools

julia> @btime test(10_000_000);
  2.514 s (80000000 allocations: 4.77 GiB)

and

>>> import timeit
>>> 
>>> def test(iter=100_000):
...     for i in range(iter):
...         "The quick foxes run quickly.".replace(u"fox", u"bus")
... 
>>> timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
1.586207215

Yes, I just ran literally the same code as yours. Julia is also 1.6.1, python 3.9.1, laptop with Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz.

julia> @btime test(10_000_000);
  2.931 s (80000000 allocations: 4.77 GiB)

>>> timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
0.8384239412844181

replace in Julia is still slower than that in Python, although time has gone to 2025.

I don’t know if there’s been much change in the implementations, though on my Apple M3 laptop Julia (1.11) is now only about 50% slower than Python (3.10):

julia> @btime test(10_000_000);;
  695.733 ms (40000000 allocations: 1.71 GiB)

vs.

>>> timeit.timeit('test(10_000_000)', setup="from __main__ import test", number=1)
0.476581749971956

As I wrote above, I don’t think there is any big mystery here. Python is faster than Julia at allocating zillions of tiny objects, thanks in part to reference counting, whereas Julia lets you write fast non-allocating inner loops. Performance-critical code should generally be of the latter variety. (The core replace function in Python is probably written in C too.)

3 Likes

Can you try on 1.12 ? I get very close results to python (julia is 6% slower) maybe because the compiler understands it doesn’t have to allocate the two strings in the Pair ?
just for fun, I tried to write a faster yet more limited version in julia

@inline function matchat(src::AbstractVector{UInt8}, k::AbstractVector{UInt8}, i::Int)
           @inbounds for j in Base.OneTo(length(k))
               if src[i+j-1] != k[j]
                   return false
               end
           end
           return true
       end
function fastreplace(s::String, kv::Pair{String,String})
                  k = codeunits(first(kv))
                  v = codeunits(last(kv))
                  src = codeunits(s)

                      nk, nv, ns = length(k), length(v), length(src)

                  # Preallocate maximum possible size
                  maxout = ns + (ns ÷ max(nk,1)) * max(nv-nk, 0)
                  out = Memory{UInt8}(undef, maxout)

                  i, j = 1, 1
                  N = ns-nk+1
                  @views @inbounds while i <= ns
                      if i <= N && matchat(src, k,i)
                          @inbounds out[j:j+nv-1] = v
                          i += nk
                          j += nv
                      else
                          @inbounds out[j] = src[i]
                          i += 1
                          j += 1
                      end
                  end
                  return @inbounds String(@view out[1:j-1])
                  end

I’m sure it has tons of issues but:

julia> @btime fastreplace($s,$p)
  52.888 ns (3 allocations: 144 bytes)
"The quick buses run quickly."

julia> @btime replace($s,$p)
  69.570 ns (4 allocations: 192 bytes)
"The quick buses run quickly."

I would start looking into why so many allocations: [Likely misunderstanding edited out]

just String allocation dominating the scene

No not really, yes too many, but allocations aren’t really a problem in Julia, should be as costly as in Python (for each one; assuming you can get as few), it’s just that deallocations (RC) is better in Python, especially for strings, and Julia could do the same or something even better, skipping all allocations:

Mutable strings are better when you want to replace (i.e. in-place); as many bytes, e.g. r"fox" => s"bus", but can actually also be better when not or if e.g. as many letters, but not same length because UTF-8 variable-length.

What Julia needs is owned objects like in Pony language, then mutable strings are real good. Then you could change them to be immutable frozen, immutable standard strings, since they are also in some ways better (quoting from AI):

This immutability offers several advantages, including:

  • Performance optimizations:

Immutable strings can be efficiently shared and cached, as their content is guaranteed not to change.

  • Thread safety:

Multiple threads can safely access and use the same string without worrying about concurrent modifications.

  • Predictability:

The value of a string variable will not unexpectedly change due to actions elsewhere in the program.

Working with mutable string-like data:

While String itself is immutable, there are ways to achieve mutable string-like behavior in Julia for specific use cases:

  • Using Vector{UInt8}: For low-level manipulation, you can store string data as a Vector{UInt8} (byte array). This allows direct modification of individual bytes. However, you lose the convenience of built-in string methods and need to handle encoding manually.

Code

    data = UInt8['H', 'e', 'l', 'l', 'o']    # Modify data    data[1] = UInt8('J')    # Convert back to string (if needed)    mutable_string = String(data)
  • Using the MutableStrings.jl package: If you require mutable string types with familiar string methods for large-scale text processing, the MutableStrings.jl package provides MutableASCIIString and MutableUTF8String types. This package allows in-place modification of string data.

I believe MutableStrings package was just a hallucination… :slight_smile: though there is Str.jl. [I was using WeakRefStrings.jl [that moast shouldn’t be using, unless really knowing what do to] to test/look into it more, though really only InlineStrings.jl it reexports, which is safe.]

1 alloc for the long string, 1 for the regex, 1 for the s string, 1 for the result, 1 because you’re in global and 4 within replace ?

Thanks, I was confused why this slower (100_000_000 / 10_000_000 = 10 allocations per iteration?!), still confused:

function test(iter=100_000)
  for _ in 1:iter
    replace(InlineString31("The quick foxes run quickly."), r"fox" => s"bus")
  end
end

@time test(10_000_000)
 11.287954 seconds (100.00 M allocations: 4.023 GiB, 6.63% gc time)

than with standard String, from OP post:

julia> @time test(10_000_000)
  8.191371 seconds (90.00 M allocations: 3.576 GiB, 7.78% gc time)