Julia position in the Debian Benchmark Game can be improved, and categorization of some Julia there is unfair

EDIT: I was looking at graph here thinking Julia might be missing: Measured : Which programming language is fastest? (Benchmarks Game) it’s actually still in other graph, and I still argue the categorization (for “naked ffi” is unfair, seemingly at least vs Rust’s code). Also I just think we can still improve Julia’s numbers, and after realizing my mistake, it seems Julia ranking isn’t changed, though it could be if we improve the numbers… I edited the title.

The cause could be bad, but misleading and unfair, showing of Julia:

It seems like Julia takes “6.1” times longer than fastest (there Rust), not competing with Go and F# with 1.2x or Java 1.6x (to name some GC languages), but that ignores the category:

  • possible hand-written vector instructions | “unsafe”

Where * Julia #3 is only “1.3” times slower (still slower than Go…). But C goes down to 0.3 x, so Julia is still 4.36 x slower.

Is it “unsafe”, I mean yes, it sure is, but no more or less than e.g. C/C++/Rust, because of:

Base.@propagate_inbounds @inline

But I see @inline is considered ok for the former category…, so it seem to be only about the former (is it at all [more] problematic?). Is there a good way to write fast code without either?

But Julia isn’t using “hand-written vector instructions” like C, so can we do two things:

a) Get Julia programs there, like that one hand-optimized with all we got?
b) Also improve the default category. I think, but not sure, that it controls showing in the graph we dropped off.

Here Julia is 1.5 x: but is only listed in the non-default category with “naked ffi”

It’s likely because of:

using Base.GMP.MPZ: add_ui!, mul_ui!, add!, tdiv_q!

I’m pretty sure Julia had program in the default category before (for all the programs), and was just re-categorized, but it seems very unfair with the fastest programs, in C++ and Rust, using the same code:

https://crates.io/crates/rug

For the C++ approved one #include <gmpxx.h>

  • possible hand-written vector instructions | “unsafe” | naked ffi

If anyone can submit a fully “safe” and without “naked ffi” use just in case it’s stopping listing, that would be great!

I don’t recall the numbers for these or if they regressed in 1.11.1 (I don’t think so, at least much, but can at least be made better, Julia’s error bar was always large/r than wanted, because I think of those outliers):

7.3 x (vs 2.7 x for Java…, 2.2 x for Haskell):

3.5 x:

1.7 x:

1.5 x (why is F# doing better while also rounding to 1.5? Seemingly because of threading that Julia doesn’t use in that category):
fasta - Which programs are fastest? (Benchmarks Game) (note there the “unsafe” Julia version at 2.4 x is worse, despite using threading, just seems to make it worse…for that category)

3 Likes

The most immediate difference I see in the top 4 are parallel programs for the 4 cores. This is what the C version documents about the blocks number:

// This value controls how many blocks the workload is broken up into (as long
// as the value is less than or equal to the factorial of the argument to this
// program) in order to allow the blocks to be processed in parallel if
// possible. PREFERRED_NUMBER_OF_BLOCKS_TO_USE should be some number which
// divides evenly into all factorials larger than it. It should also be around
// 2-8 times the amount of threads you want to use in order to create enough
// blocks to more evenly distribute the workload amongst the threads.

Go at #5 is structured differently so I’m not sure if it’s breaking the program up into more blocks or not; it could be very different because of goroutines.

The only Julia version there is serial, corroborated by the sum of the cpu loads. The fastest serial program is C#, so that’s the fairest thing the serial Julia program could try to live up to.

I’m guessing the boundscheck removal is not about safety but about allowing autovectorization to compare with other languages’ manual vectorization; I suppose no competitor knew their way around SIMD.jl. However, there’s no boundschecked benchmark to eyeball a possible effect from autovectorization, nor is it documented to have occurred or intended. The unsafety is probably the unsafe_copyto! being used.

Another thing worth mentioning is that all the Julia scripts were based on those in other languages; this isn’t considered unrepresentative of the language because many top entries are documented to be based on chains of candidates across several languages. Instead, the issue is inheriting less optimal algorithms. The serial Julia program is documented to be based on a Javascript program that isn’t even on the list, and evidently that hasn’t been rectified yet. Similarly, old code is all over that list, the top Java entry was written back in 2010. It does not surprise me that older code and fewer updates fall behind, and I wouldn’t call it unfair unless there’s a pattern of newer idiomatic code being rejected.

2 Likes

Incidentally, this got me to notice that unsafe_copyto! errors at Memory in v1.11(.1)

julia> unsafe_copyto!(collect(1:5), 2, collect(11:15), 1, 5)
ERROR: BoundsError: attempt to access MemoryRef{Int64} at index [5]

whereas you need the safe version for the error in v1.10(.6)

julia> unsafe_copyto!(collect(1:5), 2, collect(11:15), 1, 5)
5-element Vector{Int64}:
  1
 11
 12
 13
 14

julia> copyto!(collect(1:5), 2, collect(11:15), 1, 5)
ERROR: BoundsError: attempt to access 5-element Vector{Int64} at index [2:6]

I’m on Windows, but assuming the same thing happens on Linux, it appears that:

  1. unsafe_copyto! now contradicts its documented purpose for no boundschecking in Arrays, hopefully there’s an issue out there about this
  2. those benchmarks didn’t need it anyway
2 Likes

It really feels like they dont want Julia to excell at the benchmarks, or simply dont care.
I’ve invested quite some time years ago to match C performance, which wasn’t that hard by just translating the C versions, but those versions never made it into the game for different reasons, which all felt unfair, considering they where 1:1 translations of the C versions that take first spot, seemingly without applying the same penalties.

At this point, it could make more sense to fork the project, than actually fighting with them and investing time into this - at least a few years ago, Julia was matching all the fastest solutions, but even though people worked a lot on getting better versions into the benchmark, Julia is still trailing behind by a lot.

7 Likes

By the way, incidentally related to this, I recently played around with making some of these benchmarks run with julia’s new AOT compiler mode, and the results are quite promising: GitHub - MasonProtter/juliac-bench

My plan was to wait for 1.12 to release and then open a PR to submit these. If accepted, Julia would be topping the charts for a lot of these benchmarks (which makes me assume it won’t be accepted, but oh well)

19 Likes

Julia dropped of Debian Benchmark Game (graph), unfairly

Please show your claim is true.

Here are the charts.

Please show that your claim is not “misleading and unfair”.

1 Like

Is their argument not in the body of the post?

Look at the charts page. Do you see Julia in those charts or not?

1 Like

Hi @igouy, glad to have you here! I believe the concern here isn’t that Julia was dropped, but rather that the way in which solutions are being accepted/categorized has led to an effective decrease in the ranking — meaning that Julia “drops” off some “top” lists.

4 Likes

If that is the case, perhaps someone needs to edit the post, so that it isn’t “misleading and unfair”.

decrease in the ranking

The post does not seem to provide before/after changes in ranking?

1 Like

I don’t know, I’ve never contributed to the BenchmarkGames or looked too in-depth at the rules and solutions. I’m just a friendly moderator, trying to help mediate the discussion a bit here. I don’t think it’s necessarily a temporal thing, just that it’s tricky to know how the rules are applied.

It’s also just tricky to know how Julia fits in the first place; this has been discussed before. Hopefully a real juliac workflow will help clear things up in the future.

9 Likes

I’m actually uncertain if OP is even talking about those overall summary charts in particular because only individual benchmark premises are cited, and the title as written doesn’t suggest a claim that Julia isn’t on those charts. It would if “of” were a typo for “off”, but it could be a typo for “on” or not a typo at all (though the meaning is less clear there). Either way, good evidence for a drop would also require a record of the site’s past results, and I don’t immediately see a page where that’s available. OP is forthcoming about relying on their memory instead, which the rest of us can only and do take with a grain of salt. The clearer and more interesting point here is that Julia’s contributions have room for improvement.

1 Like

If that is the case, perhaps someone needs to edit the post, so that it isn’t “misleading and unfair”.

All we can do as commenters is provide adequate grounds for our disagreements and suggest specific edits to improve the conversation, it’s ultimately up to people to edit their own words. I also notice that your username is consistent with that of this benchmarks game’s maintainer (not sure if this is right title), who understandably has a vested interest in representing this benchmarks game fairly and has a very welcome expertise that lends to providing data, e.g. past graphs, that most people don’t have on hand.

So, it’s worth pointing out that this is a casual forum for discussing opinions, not a rigorous peer-reviewed publication. Nobody is taking initial posts as gospel, and disagreements are routine. I’ll highlight that my first comment is also pushing back on the “unfair” description because it seemed fair for a serial reimplementation with no published updates in one benchmark to fall behind parallel programs lightly tuned to the input, and OP was evidently receptive of that feedback. There’s also no rule on this forum to edit out every acknowledged error, let alone be the responsibility of anybody besides OP, because the priority is for the conversation to make sense when read. For what it’s worth, I think it’s possible for an edit along the lines of what you’re saying to keep the conversation legible because the main point was about improving participation; it’ll just be up to a more detailed discussion with and agreement from OP.

8 Likes

In what capacity are you involved with the “Benchmarks Game”?

As pointed out, the title is not grammatically correct. Seems to have a typo, or something else. I think the most parsimonious correction someone can make is to add an “f” to “off”. Yeah, I know it’s not the only one, but it’s the best. I certainly instantly read this to mean that Julia had been removed entirely from the benchmarks. I don’t recall if I even registered any error in the title.

So there is a title with some kind of language error that will be widely interpreted as saying something that is completely and manifestly untrue. And the untrue statement is potentially damaging to someone’s reputation. I am not surprised that someone objects strongly.

All of this is completely independent of whether Julia is treated fairly in the game.

1 Like

I changed the title to

Julia’s position changed in Debian Benchmark Game (graph), unfairly?

which is a bit less ambiguous and more open to discussion

5 Likes

I really meant dropped off, but I was simply mistaken with my untrue claim, of it dropping off, or changing rank, there (I’ve edited my original post and title).

I didn’t mean Julia was intentionally dropped of the list, I thought because of unfair categorization of code, if a recent change, might be the (automated) reason not Julia not show in a graph, which was simply the wrong graph to be looking at.

The rules for Julia is the same as for other CG-languages, and the rules within that category of languages isn’t unfair, but across all languages it is (IMHO; though the unfair rule seems not applied to Haskell):

The rules for one of the programs (Julia’s outlier) is:

When possible, use default GC; … As a practical matter, the myriad ways to custom allocate memory will not be accepted.

This is unfair to GC languages like Julia (I could also nitpick some other things). I would like to see from someone Julia’s fastest implementation, with pools, as is possible and allowed, and used, for C etc. [I would also like to see faster implementation for e.g. it conforming to all rules, including unfair.] It actually seems to be that Haskell, a GC language, uses some such technique breaking that unfair rule:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-ghc-8.html

https://downloads.haskell.org/ghc/latest/docs/libraries/ghc-compact-0.1.0.0-1216/GHC-Compact.html

This module provides a data structure, called a Compact, for holding immutable, fully evaluated data in a consecutive block of memory. Compact regions are good for two things:

  1. Data in a compact region is not traversed during GC;
5 Likes

I’ve ran across old threads discussing this, and I don’t think it’s fruitful to question the rules, and I’m saying that as someone who thinks that a GC benchmark should entirely disallow non-GC implementations and a separate non-GC benchmark should exist for comparisons within a language. This is after all a game, not a perfectly objective and practical comparison of performance, so if you don’t like the rules or disagree with the enforcers, then either put up with the caveats or play a different game. There are in fact different benchmarks games enforced by different people, including reimplementations of this one.

On the other hand, I think it’s fair to question a specific implementation skirting the rules, in this case using a library memory pool with a documented intention to evade the default GC.

6 Likes

This specific rule is selectively applied. For example, the rule also says:

As a practical matter, the myriad ways to custom allocate memory will not be accepted.
Please don’t implement your own custom “arena” or “memory pool” or “free list” - they will not be accepted.

And yet the top Rust (#2 solution overall) result uses a non-default bump allocator.
So it’s not just about weird rules.

10 Likes