Hey guys, I happened to look at the Julia benchmarks on the Benchmarks Game website lately. Overall, the Julia results are extremely intimidating. I dare say, a lot of the Julia examples, aside from being blazing fast, are written in comparatively few lines of code. If this isn’t a good advertisement for Julia, I don’t know what is.
Anyway, I noticed that the results for the binary tree game lagged very far behind the others. I figure that it’s probably possible to resolve this. For one, this uses Distributed
and I was wondering if using Base.Threads
would be faster.
First of all, for me the existing code with Distributed
runs way faster than the 20 s reported on the site
julia> @btime binary_trees(io, 21);
3.977 s (12586662 allocations: 384.16 MiB)
I’m not sure what the discrepancy is (I’m using 4 processes just like on the site). By the way, 3.97 s is only slightly slower than the C implementation.
I started messing around with doing a single-process threaded version of this and quickly realized I have no clue what I’m doing. The biggest problem I had translating this over to threads is that there’s no obvious threaded alternative to the @distributed
map-reduce. The parallel map-reduce from Transducers is incredibly fast on smaller examples, but it wouldn’t be allowed in the benchmarks game, so I’d have to reverse engineer what the map reduce is doing in those cases. In my testing it was however a lot slower than using @distributed
julia> @btime binary_trees($io, $21);
10.184 s (613767861 allocations: 18.29 GiB)
but again, I’m already seeing the original example be way faster than the 20 s reported on the site.
So, I have the following questions, if anyone can help:
- Why am I seeing such a big discrepancy with the results on the site?
- Is it likely to be possible to make the program any faster using
Threads
or is this a misconception? My thinking is that any tiny amount of IPC is an unnecessary overhead. - Can anyone suggest a simple, succinct threaded map-reduce that gets good results?