# Julia slower than Python for finding pangrams?

I came up with a neat Julia solution using sets to solve that simple pangram. However, I’m a bit perplexed… C benchmarks show it running over 200 times faster, and even Python outperforms my Julia code substantially. Curious if anyone has insights into why this might be happening. Here’s my Julia implementation, along with the others:

In C

``````#include <stdio.h>
#include <stdlib.h>

#define ALPHABET_SIZE 26

int isPangram(char* str) {
int alphabet[ALPHABET_SIZE] = {0};

for (int i = 0; str[i] != '\0'; i++) {
if (str[i] >= 'A' && str[i] <= 'Z') {
alphabet[str[i] - 'A'] = 1;
} else if (str[i] >= 'a' && str[i] <= 'z') {
alphabet[str[i] - 'a'] = 1;
}
}

for (int i = 0; i < ALPHABET_SIZE; i++) {
if (alphabet[i] == 0) {
return 0;
}
}

return 1;
}

void main(int argc, char** argv) {

if (argc == 0) {

printf("Provide at least one letter!");
exit(0);
}

int i;

for(i = 1; i < argc; i++) {

if (argc < 1) break;

if (isPangram(argv[i])) {
printf("yes, it's a pangram \n");
} else {
printf("no, it's not a pangram \n");
}
}

exit(0);
}
``````

In Python

``````import sys

def verifica_letras(string):
letras_alfabeto = set('abcdefghijklmnopqrstuvwxyz')

if len(sys.argv) <= 1:
print("Provide at least one letter!")
sys.exit()

if verifica_letras(sys.argv[1]):
print("yes, it's a pangram \n")
else:
print("no, it's not a pangram \n")
``````

In Julia

``````function ispangram(input)
'a':'z' ⊆ lowercase(input) ? println("Yes, is a pangram.") : println("Not, it's not a pangram.")
end

length(ARGS) > 0 ? ispangram(ARGS[1]) : println("Provide at least one letter!")
``````

Benchmarks with hyperfine - C

``````> hyperfine --shell=none --warmup 30 "./a.out wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"
``````

Result

Time (mean ± σ): 701.0 µs ± 152.1 µs [User: 525.3 µs, System: 67.9 µs]
Range (min … max): 523.8 µs … 3765.6 µs 3569 runs

Benchmarks with hyperfine - Python

``````> hyperfine --shell=none --warmup 30 "python3 pangram.py wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"

``````

Result

Time (mean ± σ): 15.9 ms ± 3.8 ms [User: 11.6 ms, System: 3.8 ms]
Range (min … max): 12.4 ms … 37.7 ms 210 runs

Benchmarks with hyperfine - Julia

``````hyperfine --shell=none --warmup 30 "julia --compile=all pangram.jl wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"
``````

Result

Time (mean ± σ): 266.0 ms ± 23.4 ms [User: 228.5 ms, System: 112.0 ms]
Range (min … max): 240.1 ms … 314.1 ms 12 runs

Puzzled

C is expectedly much faster, but I’m surprised at the performance gap between my Python and Julia implementations. Python finished in 15 ms, while Julia took more than 17 times that. Could anyone familiar with both languages help me understand why Julia might be lagging here?

Hi! I took the liberty to rename your post to make it more specific. Very generic titles like “Julia slower than Python!!!” have the tendency to attract pile-up, and we only need a few expert opinions here

5 Likes

Fair…I am new around, sorry for that.

No worries! And now we get to sit back and watch while everyone you nerdsniped comes together to find a blazingly fast Julia implementation

Although the first thing I would say is that you’re including startup and compilation time in your benchmark, which is typically the wrong way to go about this. If you plan to run your function many times (as is typical for most functions), this one-time cost becomes negligible, and so you shouldn’t measure it. In Python it’s not too visible, but Julia has much higher latency

4 Likes

You are measuring startup time, and Julia (still, despite major improvements) is known to have a slow startup time: it compiles your code every time you run a new Julia process. Basically,

is not how you should run Julia programs, at least not if the runtime of `pangram.jl` is less than a few minutes. Instead, you start a REPL / IJulia kernel or some other long running process and then run / include your Julia code in that.

In this case, in a new Julia REPL:

``````julia> function ispangram(input)
return ('a':'z' ⊆ lowercase(input))
end
ispangram (generic function with 1 method)

julia> input = "wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"
"wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwer
tyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"

julia> ispangram(input)
true

julia> using BenchmarkTools

julia> @benchmark ispangram(\$input)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max):  3.641 μs … 245.906 μs  ┊ GC (min … max): 0.00% … 96.62%
Time  (median):     4.031 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   4.366 μs ±   7.129 μs  ┊ GC (mean ± σ):  5.55% ±  3.35%

▃▆█▄▂
▂▅█▆▄▄▆▆▅▆█████▇▇▆▄▄▄▄▄▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
3.64 μs         Histogram: frequency by time        5.41 μs <

Memory estimate: 7.01 KiB, allocs estimate: 10.
``````
11 Likes

Using more explicit `for` loop:

``````function ispangram3(input)
appears = falses(26)
for c in input
isletter(c) || continue
appears[lowercase(c)-'a'+1] |= true
end
return all(appears)
end
``````

is a bit faster.

1 Like

Yeah, but this is ~100% startup/compilation time. It’s kinda like timing:

``````hyperfine --shell=none --warmup 30 "cc pangram.c && ./a.out wjjqevffkkgbcehhiqpvqutmwxawzvjnbvukmlzxyhkgfddzfjhcujnlkjbdfgghjhujkiuytghjioplkjhgfdsaqwertyujioplkjhgfdsaqwertzuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsazxcvbnmlkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkjhgfdsaqwertyuioplkj"`
``````

You might be saying “well don’t do that” — and that’s exactly what Julians will say about timing `julia pangram.jl` like that, too.

The good news is that we’re getting closer to having a real `juliac` that can output compiled stand-alone executables with significantly lower startup times like `cc` does.

29 Likes

Since you are only looking for ASCII characters, it’s even faster still to search the raw bytes of the UTF-8 encoding, via `codeunits(input)`, analogous to the C implementation.

``````function ispangram4(input::String)
a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears = fill(false, 26) # Vector{Bool} is faster than BitVector falses(26)
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears[b-(A-0x01)] = true
elseif a ≤ b ≤ z
appears[b-(a-0x01)] = true
end
end
return all(appears)
end
``````

Note that a `Vector{Bool}` is faster to work with than a `BitVector`, which is why I didn’t use `falses(26)` here. You could save a bit more time by pre-allocating the `appears` array, though hopefully someday Julia will stack-allocate it automatically (Move small arrays to the stack by pchintalapudi · Pull Request #43573 · JuliaLang/julia · GitHub).

But the first-order issue here is how you do the benchmarking, as commented above.

8 Likes

I appreciate all the optimization suggestions! My surprise was that simple Python and C solutions outperformed my initial Julia code. I realize now that my Julia speed test wasn’t accurate.

5 Likes

Bitsets allow for efficient early stopping. Here, I manually use a `UInt32` as a bitset.

``````julia> function ispangram5(input::String)
a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears |= one(UInt32) << (b-A)
elseif a ≤ b ≤ z
appears |= one(UInt32) << (b-a)
end
appears == (-1%UInt32) && return true
end
false
end
ispangram5 (generic function with 1 method)

julia> @btime ispangram4(\$input)
470.774 ns (1 allocation: 80 bytes)
true

julia> @btime ispangram5(\$input)
123.972 ns (0 allocations: 0 bytes)
true
``````

Lack of allocations can also make a difference for shorter inputs

``````julia> @btime ispangram4("abcdefghijklmnopqrstuvwxyz")
57.158 ns (1 allocation: 80 bytes)
true

julia> @btime ispangram5("abcdefghijklmnopqrstuvwxyz")
34.608 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram4("abcdefghijklmnopqrstuvwxy")
55.704 ns (1 allocation: 80 bytes)
false

julia> @btime ispangram5("abcdefghijklmnopqrstuvwxy")
33.771 ns (0 allocations: 0 bytes)
false
``````

With some more effort, we could SIMD this…

10 Likes

We can shave this one a bit with a mask:

``````julia> mask = UInt32(2^26 - 1)
0x03ffffff

julia> function ispangram6(input::String)
a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears::UInt32 = 0
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears |= one(UInt32) << (b-A)
elseif a ≤ b ≤ z
appears |= one(UInt32) << (b-a)
end
iszero(appears ⊻ mask) && return true
end
false
end
ispangram6 (generic function with 1 method)

julia> @btime ispangram5(\$input)
82.210 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram6(\$input)
3.135 μs (91 allocations: 1.42 KiB)
true

julia> ispangram6("abcdeghijklmnopqrstuvwxyz")
false
``````
1 Like

Belatedly realized that `iszero(a ⊻ b)` is just a fancy way of saying `a == b`, that bums a few more cycles.

``````julia> function ispangram7(input::String)
a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears::UInt32 = 0
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears |= one(UInt32) << (b-A)
elseif a ≤ b ≤ z
appears |= one(UInt32) << (b-a)
end
appears == mask && return true
end
false
end
ispangram7 (generic function with 1 method)

julia> @btime ispangram7(\$input)
1.300 μs (92 allocations: 1.44 KiB)
true

julia> ispangram7("abcdeghijklmnopqrstuvwxyz")
false
``````

The allocations are presumably because the mask is global?

1 Like

Yes:

``````julia> function ispangram8(input::String, mask::UInt32)
a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears::UInt32 = 0
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears |= one(UInt32) << (b-A)
elseif a ≤ b ≤ z
appears |= one(UInt32) << (b-a)
end
appears == mask && return true
end
false
end
ispangram8 (generic function with 2 methods)

97.457 ns (0 allocations: 0 bytes)
``````

Edit: so I did, just now, notice that I was comparing nanoseconds with microseconds earlier. Oops.

2 Likes

That’s still slower than `ispangram5`?
But we should just be testing vs `0`, except that’s slower for me, too

``````julia> @btime ispangram5(\$input)
85.659 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram9(\$input)
123.144 ns (0 allocations: 0 bytes)
true
``````

Which isn’t surprising

``````	cmp	ecx, -1
jne	.LBB0_10
``````

macro-op fuses into one uop, so we shouldn’t really expect some other comparison to be noticeably faster.

5 Likes

Yeah, wasn’t paying attention there.

This, on the other hand, just barely squeezes past it:

``````julia> @inline shift_safe(::Type{T}, s::Integer) where {T} = s & (8 * sizeof(T) - 1)
shift_safe (generic function with 1 method)

a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears::UInt32 = 0
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears |= one(UInt32) << shift_safe(UInt32, b-A)
elseif a ≤ b ≤ z
appears |= one(UInt32) << shift_safe(UInt32, b-a)
end
appears == mask && return true
end
false
end
ispangram9 (generic function with 1 method)

97.501 ns (0 allocations: 0 bytes)
true

80.317 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram5(\$input)
82.212 ns (0 allocations: 0 bytes)
``````

Borrowed that from BitPermutations.jl, credit where credit is due.

But of course, combining this with pangram5 is the real win:

``````julia> function ispangram10(input::String)
a,A,z,Z = UInt8('a'),UInt8('A'),UInt8('z'),UInt8('Z')
appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
@inbounds for b in codeunits(input)
if A ≤ b ≤ Z
appears |= one(UInt32) << shift_safe(UInt32, b-A)
elseif a ≤ b ≤ z
appears |= one(UInt32) << shift_safe(UInt32, b-a)
end
appears == (-1%UInt32) && return true
end
false
end
ispangram10 (generic function with 1 method)

julia> @btime ispangram10(\$input)
57.249 ns (0 allocations: 0 bytes)
``````
3 Likes

I wonder if StaticCompiler.jl could be applied here.

4 Likes

This `if` is very cumbersome in source and perhaps generates branching code. Since we are dealing with UInt8s we might as well do bitwise lowercasing:

``````b2 = b | 0x20
``````

and use `b2` for `appears`.

4 Likes

Good call! There’s no need for a branch at all, because ASCII is very cleverly designed to help us out.

``````julia> function ispangram11(input::String)
appears::UInt32 = ~((one(UInt32)<<26)-one(UInt32))
@inbounds for b in codeunits(input)
appears |= one(UInt32) << shift_safe(UInt32, (b & 0x9f) - 1)
appears == (-1%UInt32) && return true
end
println(bitstring(appears))
false
end
ispangram11 (generic function with 1 method)

julia> @btime ispangram11(\$input)
45.202 ns (0 allocations: 0 bytes)
true

julia> @btime ispangram10(\$input)
57.266 ns (0 allocations: 0 bytes)
true
``````
3 Likes

That’s not correct if there are characters other than ASCII letters. e.g.

``````julia> b = UInt8('\r'); (b & 0x9f) - 1
12
``````
3 Likes