Julia slower than Python for finding pangrams?

gdalle · April 5, 2024, 6:17am

Gotta love how this never ever fails

fabiangans · April 5, 2024, 7:09am

Yes, I wanted to try out StaticCompiler anyway, so here is how to make a statically compiled version of Elrod’s fastest suggestion:

using StaticCompiler
using StaticTools

function ispangram(argc::Int, argv::Ptr{Ptr{UInt8}})
    if argc != 2
        println(c"You should provide exactly one argument")
        return Cint(-1)
    end
    a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
    appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
    arg1 = unsafe_load(argv,2)
    ichar = 1
    while (b = unsafe_load(arg1,ichar)) != 0x00
        if A ≤ b ≤ Z
            appears |= one(UInt32) << (b-A)
        elseif a ≤ b ≤ z
            appears |= one(UInt32) << (b-a)
        end
        if appears == (-1%UInt32) 
            println(c"Is a Pangram")
            return Cint(0)
        end
        ichar = ichar+1
    end
    println(c"Not a Pangram")
    return Cint(0)
end
compile_executable(ispangram, (Int, Ptr{Ptr{UInt8}}), "./")

cjdoris · April 5, 2024, 8:15am

Here’s a correct version which avoids branching on the byte value, it gets a 2x speedup on my machine:

julia> function ispangram12a(input::String)
           a = UInt8('a')
           appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
           for b in codeunits(input)
               appears |= one(UInt32) << ((b | 0x20) - a)
               appears == (-1%UInt32) && return true
           end
           false
       end
ispangram12a (generic function with 1 method)

julia> @btime ispangram5($input)
  121.657 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram10($input)
  116.376 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12a($input)
  62.245 ns (0 allocations: 0 bytes)
true

It uses the | 0x20 trick to ensure that if the byte is an uppercase ASCII letter then it becomes a lowercase one. Then we unconditionally subtract 'a' and shift. This works because anything which is not a letter turns into a shift of something more than 25 (since we’re using unsigned ints), and those have no effect because we already set all those bits in the original value of appears.

I also removed the unnecessary @inbounds which had no effect.

Below is a version which removes the branch to exit early and instead just checks once after the loop. It happens to be about the same speed for the given input (which is 410 characters) but for shorter inputs it can be significantly faster. For example on a 200-character string it is 2x faster.

julia> function ispangram12b(input::String)
           a = UInt8('a')
           appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
           for b in codeunits(input)
               appears |= one(UInt32) << ((b | 0x20) - a)
           end
           appears == (-1%UInt32)
       end
ispangram12b (generic function with 1 method)

julia> @btime ispangram12a($input)
  63.673 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12b($input)
  66.496 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12a($(input[1:200]))
  61.429 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12b($(input[1:200]))
  31.791 ns (0 allocations: 0 bytes)
true

Edit: Interesting it is managing to process 6.2 bytes per nanosecond. I assume the compiler has automatically SIMD’d the loop?

Eben60 · April 5, 2024, 3:31pm

To be fair with Python, which also included startup time in OP:

import sys
import timeit

def verifica_letras(string):
   letras_usadas = set(string.lower())
   letras_alfabeto = set('abcdefghijklmnopqrstuvwxyz')
   return letras_usadas == letras_alfabeto

input = "wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"

v = verifica_letras(input)

print(timeit.timeit(lambda: verifica_letras(input), setup="pass", number=1_000_000))
# 3.89342975

which amounts to 3.9µs per execution.

using BenchmarkTools

verifica_letras(str) = Set(str |> lowercase) == Set("abcdefghijklmnopqrstuvwxyz")

ispangram(input) = ('a':'z' ⊆ lowercase(input))

function ispangram12b(input::String)
    a = UInt8('a')
    appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
    for b in codeunits(input)
        appears |= one(UInt32) << ((b | 0x20) - a)
    end
    appears == (-1%UInt32)
end

input = "wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"
    
@btime verifica_letras($input)
#  3.982 μs (17 allocations: 7.77 KiB)
@btime ispangram($input);
#  3.620 μs (10 allocations: 7.01 KiB)
@btime ispangram12b($input);
#  51.840 ns (0 allocations: 0 bytes)

# Mac Mini M1

Here Julia’s verifica_letras code corresponds one-to-one to the python’s one, just converted to one-liner (ispangram made one-liner, too). We see, in this case Julia and Python are equally fast.

Eben60 · April 5, 2024, 3:35pm

What I like about Julia: Quite often I start writing a multiline function, then step-by-step remove unnecessary intermediate steps until I have just one (well-readable) LOC.

mschauer · April 5, 2024, 4:58pm

Isn’t the conclusion that the beautiful and simple 'a':'z' ⊆ lowercase(input) is just fine, just that @Flavio_Barros did the timing of run- and compilation time jointly?

Elrod · April 5, 2024, 4:59pm

Yeah, when doing SIMD you normally want to do batches (e.g., 64 at a time) in between checks.

mike.ingold · April 5, 2024, 6:15pm

Yeah, it looks to me like we’ve nailed down the source of the time difference. Then, as is tradition for this community, a gang of wild Julians showed up and beat the problem into algorithmic submission to prove that it could be computed even faster if code complexity, readability, and robustness aren’t a concern.

LaurentPlagne · April 5, 2024, 11:55pm

And as usual I enjoy and learn a lot from the wild

Topic		Replies	Views
Julia is slower than Python when appending elements to untyped arrays General Usage question	42	814	February 10, 2025
For-loop vs list-comprehension Performance	15	2004	April 7, 2021
Code written in Julia vs code written using PyCall New to Julia	5	2167	October 12, 2018
Why is python faster than Julia Performance	14	1874	March 12, 2020
Why is this simple function twice as slow as its Python version Performance question	97	4237	April 12, 2021

Julia slower than Python for finding pangrams?

Related topics