Matt Parker's Crappy Python Challenge

I found this amusing and perhaps you will too… and maybe someone with Julia will make their mark.

(The scattergram does show a Julia competitor looking pretty good.)

5 Likes

I have searched a bit in the net, but was not lucky to find something related to Julia and this challenge.
Is there somewhere a performance scattergram published that includes a Julia code?

I thought this video was interesting. The main theme of the video was just describing the “two-language problem”, but not in those words. It seems like someone did have a go in Julia, and did much better than the original implementations, but still fell short of the C/C++ speeds.

1 Like

0.0068 sec. (the C++ code, I’m up to 5 min timestamp) is hard to beat in Julia, because Julia starts in 0.177 sec. for me (I’ve though cut startup in half with a sysimage; and you could time without startup or compilation). [EDIT: startup can be eliminated, see my comment further down.]

Would the benchmarking on our end include Julia startup time?

No, it’s just a predictable complaint people could have (e.g. insist on timing that way at Debian’s Benchmark game), while unfair.

1 Like

@HexSpin, @ellocco, Isn’t this related (or same type of code? I’ve yet to watch the full video): Rust / Julia Comparison Post and Optimal Quordle

Here’s Julia code that Parker linked to:

It uses Polyester threads, while good still far behind the best, which will be hard to beat. It’s a good challenge (excluding Julia startup/compilation overhead), and if compiling to binary with StaticCompiler.jl, then currently I think impossible to beat since it doesn’t support threads.

So under the videos there are a bunch of links; most of the benchmarking is done by Benjamin Paassen, here there is a spreadsheet of implementations he curates

1 Like

Hmm, we’re beaten by Python (and JavaScript, Go and Haskell) on time, even Python is 16x faster, so I think we can do better. And intriguingly the smallest code is Python at 18 lines (Julia’s is 447), and even that code is also 10x faster than Julia. It’s also intriguing to compete on size (lines of code or bytes): 5words538 - Python Repl - Replit

The Julia code could easily be a ton faster. It is using Set{Char} so just switching to the bitmasked datastructure will be a massive improvement.

5 Likes

Perhaps - but I’m hardly qualified to comment. I’m what you might call a “Matt Parker kinda coder, but in Julia”.

Seeing the video, I was surprised there was only one Julia version, and thought that there will be something done from the community to defend what is ours :wink:

1 Like

I had the same reaction - but not having that skill level, I thought I’d offer it up for the more able.

As I commented on Slack in #performance-helpdesk:

algorithmically it’s nothing special — the whole point was to do the algorithm done by Ben Paaßen in Python and show that you get a big speed boost just by moving to Julia and doing some fairly naive things
[it includes] None of the bit twiddling done by the fastest C implementations nor any of the scheduling of loading (if their run time is < 500ms, then I’m curious how they’re loading the file and doing the actual analysis).

This is also discussed in the README in that repo.

It’s also a bit more general in its formulation than Paaßen’s – it’s written as a recursive call so search for cliques of any size and with any (constant) number of letters in the constituent words.

Related to “what counts as time?”, I had also commented in the same Slack thread:

And given comments on Patreon and elsewhere, I also don’t think people knew how to do timing in Julia, so I have no idea how they counted e.g. compilation, etc. If somebody has a C program with -O3 that takes several tens of seconds to compile but runs in < 1 second, then I think Julia comparable for total runtime of ~40 seconds

2 Likes

I only briefly looked at standup5x5/525.c at master · stew675/standup5x5 · GitHub (what seems to be one of the fastest solutions, of that challenge - not sure, I missed something, though). This kind of C(++) code is so close to assembler, nailing the relevant portions of the algo to specific instructions (avx, popcnt, “regular” boolean logic, …), for a specific pieces of silicon, that I don’t think compiler-options even contribute all that much, to performance. Probably, the main-reason, for the author, to not code assembler, directly, was, to have a (relatively) nice way to code the utility (non-performance-critical) stuff, not loose track of multithreading-issues and maybe the overall control-flow, but which doesn’t even look too complicated, in this piece of code. :wink:

My guess is: Best bet in Julia would be to use something along those lines of bit-fiddling, & using intrinsics with as little overhead as possible, from julia, for max. performance. But then, the question really also is: What’s even the point, as it is just competing for which language is more suited to doing assembler…

I think, often, people forget, there is only so much, any given cpu can do, within a given time, with a fixed instruction-set, memory-bandwidth, etc. And any high-level-language can shine, when they are able to save the cpu some work, through clever algorithms. BUT, also the problem has to be complex enough, for that, to even provide these opportunities of “work-pruning”. Once, optimisation of algorithms and heuristics is exhausted, it’s “just” going to be a matter of how to tell the cpu, what to do, with as little overhead, as possible.

7 Likes

Your observations seem correct to me. You ask what the point is … and I’d say pride : being beaten by C++ okay, but crushed by Java and Python… ouch. I guess this is why pride is a sin.

4 Likes

Your observations seem correct to me. You ask what the point is … and I’d say pride : being beaten by C++ okay, but crushed by Java and Python… ouch. I guess this is why pride is a sin.

Haha, I totally get that. My 2 cents would be: Efforts in that direction are probably invested more wisely, by tackling one of the high-profile benchmarks, even when they’re evaluated with flawed metrics, often, currently, to: Gain more traction and then question (some of the) established metrics. This (nonetheless fun) challenge will likely be forgotten in 2…3 weeks.

3 Likes

Startup-time is avoidable, see e.g. this benchmark (also its intriguing accuracy column):

@brenhinkeller I believe it uses threading, so matching (and beating?) Parker’s challenge should then be possible. Can someone also look into making that Julia code a tiny bit faster, to close the 0.46% speed-gap (less accurate if needed?), to claim top spot there?

EDIT: It doesn’t use threading (can’t?), I misremembered, it used @simd. The tricks used in the best Parker’s challenge code use threading (and interesting tricks, and improved algorithms) so very hard to beat. How difficult and far off might it be to support threading (Julia’s or polyster-threads) in StaticCompiler.jl?

This must be a new(er) implementation, that hadn’t been reported, before. Can’t imagine, it doesn’t use threading, already, with that result and being almost on par, with the best C(++) contenders. :sweat_smile:
Can’t seem to find a link, anywhere - do we have a repo-link, for the julia-code?

lol, just realized, this is not regarding the matt-parker-challenge.

This change alone would also help: