Julia slower than Python for finding pangrams?

Gotta love how this never ever fails

25 Likes

Yes, I wanted to try out StaticCompiler anyway, so here is how to make a statically compiled version of Elrod’s fastest suggestion:

using StaticCompiler
using StaticTools

function ispangram(argc::Int, argv::Ptr{Ptr{UInt8}})
    if argc != 2
        println(c"You should provide exactly one argument")
        return Cint(-1)
    end
    a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
    appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
    arg1 = unsafe_load(argv,2)
    ichar = 1
    while (b = unsafe_load(arg1,ichar)) != 0x00
        if A ≤ b ≤ Z
            appears |= one(UInt32) << (b-A)
        elseif a ≤ b ≤ z
            appears |= one(UInt32) << (b-a)
        end
        if appears == (-1%UInt32) 
            println(c"Is a Pangram")
            return Cint(0)
        end
        ichar = ichar+1
    end
    println(c"Not a Pangram")
    return Cint(0)
end
compile_executable(ispangram, (Int, Ptr{Ptr{UInt8}}), "./")
3 Likes

Here’s a correct version which avoids branching on the byte value, it gets a 2x speedup on my machine:

julia> function ispangram12a(input::String)
           a = UInt8('a')
           appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
           for b in codeunits(input)
               appears |= one(UInt32) << ((b | 0x20) - a)
               appears == (-1%UInt32) && return true
           end
           false
       end
ispangram12a (generic function with 1 method)

julia> @btime ispangram5($input)
  121.657 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram10($input)
  116.376 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12a($input)
  62.245 ns (0 allocations: 0 bytes)
true

It uses the | 0x20 trick to ensure that if the byte is an uppercase ASCII letter then it becomes a lowercase one. Then we unconditionally subtract 'a' and shift. This works because anything which is not a letter turns into a shift of something more than 25 (since we’re using unsigned ints), and those have no effect because we already set all those bits in the original value of appears.

I also removed the unnecessary @inbounds which had no effect.

Below is a version which removes the branch to exit early and instead just checks once after the loop. It happens to be about the same speed for the given input (which is 410 characters) but for shorter inputs it can be significantly faster. For example on a 200-character string it is 2x faster.

julia> function ispangram12b(input::String)
           a = UInt8('a')
           appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
           for b in codeunits(input)
               appears |= one(UInt32) << ((b | 0x20) - a)
           end
           appears == (-1%UInt32)
       end
ispangram12b (generic function with 1 method)

julia> @btime ispangram12a($input)
  63.673 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12b($input)
  66.496 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12a($(input[1:200]))
  61.429 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram12b($(input[1:200]))
  31.791 ns (0 allocations: 0 bytes)
true

Edit: Interesting it is managing to process 6.2 bytes per nanosecond. I assume the compiler has automatically SIMD’d the loop?

7 Likes

To be fair with Python, which also included startup time in OP:

import sys
import timeit

def verifica_letras(string):
   letras_usadas = set(string.lower())
   letras_alfabeto = set('abcdefghijklmnopqrstuvwxyz')
   return letras_usadas == letras_alfabeto

input = "wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"

v = verifica_letras(input)

print(timeit.timeit(lambda: verifica_letras(input), setup="pass", number=1_000_000))
# 3.89342975

which amounts to 3.9µs per execution.

using BenchmarkTools

verifica_letras(str) = Set(str |> lowercase) == Set("abcdefghijklmnopqrstuvwxyz")

ispangram(input) = ('a':'z' ⊆ lowercase(input))

function ispangram12b(input::String)
    a = UInt8('a')
    appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
    for b in codeunits(input)
        appears |= one(UInt32) << ((b | 0x20) - a)
    end
    appears == (-1%UInt32)
end

input = "wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"
    
@btime verifica_letras($input)
#  3.982 μs (17 allocations: 7.77 KiB)
@btime ispangram($input);
#  3.620 μs (10 allocations: 7.01 KiB)
@btime ispangram12b($input);
#  51.840 ns (0 allocations: 0 bytes)

# Mac Mini M1

Here Julia’s verifica_letras code corresponds one-to-one to the python’s one, just converted to one-liner (ispangram made one-liner, too). We see, in this case Julia and Python are equally fast.

5 Likes

What I like about Julia: Quite often I start writing a multiline function, then step-by-step remove unnecessary intermediate steps until I have just one (well-readable) LOC.

2 Likes

Isn’t the conclusion that the beautiful and simple 'a':'z' ⊆ lowercase(input) is just fine, just that @Flavio_Barros did the timing of run- and compilation time jointly?

5 Likes

Yeah, when doing SIMD you normally want to do batches (e.g., 64 at a time) in between checks.

1 Like

Yeah, it looks to me like we’ve nailed down the source of the time difference. Then, as is tradition for this community, a gang of wild Julians showed up and beat the problem into algorithmic submission to prove that it could be computed even faster if code complexity, readability, and robustness aren’t a concern.

11 Likes

And as usual I enjoy and learn a lot from the wild :yum:

5 Likes