Funny Benchmark with Julia (no longer) at the bottom

Syx_Pek · October 11, 2023, 5:10am

My last (faster) algorithm completely changes how the method works. As such I don’t know if it will satisy the constraints as is.

I’m also unsure if my method is faster in their environment

jling · October 11, 2023, 5:25am

I think other languages implementation should pick up your sorting algorithms soon

jling · October 11, 2023, 11:28am

If someone gives me a color map between language names and colors, I will happily use that

You need tokei installed

gist.github.com

https://gist.github.com/Moelf/21fcb48fa1543b4c68f3235fe1411809

Project.toml

[deps]
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"

plot.jl

using JSON3, Unitful, CairoMakie

CairoMakie.activate!(px_per_unit=2)

count_lang(x, d) = begin x=lowercase(x); JSON3.read(`tokei  -o json  -t=$x ./$d`)[:Total][:code] end

function main()
    TOKEI = Dict("Java (GraalVM)" => "Java")
    DIR = Dict(
        "Java (GraalVM)" => "java",

This file has been truncated. show original

giordano · October 11, 2023, 11:51am

Or you can use Tokei_jll

xgdgsc · October 11, 2023, 12:43pm

github.com

markembling/github-languages-palette/blob/master/palettes/githublangs.json

{
    "1C Enterprise": "#814ccc",
    "ABAP": "#e8274b",
    "ActionScript": "#882b0f",
    "Ada": "#02f88c",
    "Agda": "#315665",
    "AGS Script": "#b9d9ff",
    "Alloy": "#64c800",
    "AMPL": "#e6efbb",
    "AngelScript": "#c7d7dc",
    "ANTLR": "#9dc3ff",
    "API Blueprint": "#2acca8",
    "APL": "#5a8164",
    "AppleScript": "#101f1f",
    "Arc": "#aa2afe",
    "ASP": "#6a40fd",
    "AspectJ": "#a957b0",
    "Assembly": "#6e4c13",
    "Asymptote": "#4a0c0c",
    "ATS": "#1ac620",

This file has been truncated. show original

jling · October 11, 2023, 1:11pm

ah man that thing is missing Odin… anyway, I have updated the Gist

jling · October 11, 2023, 1:29pm

now I really want to also shrink code size …

diff --git a/julia/related.jl b/julia/related.jl
index e21c8a2..b79b866 100644
--- a/julia/related.jl
+++ b/julia/related.jl
@@ -1,9 +1,4 @@
-using JSON3
-using StructTypes
-using Dates
-using StaticArrays
-
-# warmup is done by hyperfine
+using JSON3, Dates, StaticArrays
 
 function relatedIO()
     json_string = read("../posts.json", String)
@@ -33,12 +28,9 @@ struct RelatedPost
     related::SVector{5,PostData}
 end
 
-StructTypes.StructType(::Type{PostData}) = StructTypes.Struct()
-
-function fastmaxindex!(xs::Vector, topn, maxn, maxv)
-    maxn .= 1
-    maxv .= 0
-    top = maxv[1]
+function fastmaxindex!(xs, topn, maxn::AbstractVector{T}, maxv) where T
+    maxn .= one(T)
+    maxv .= top = zero(T)
     for (i, x) in enumerate(xs)
         if x > top
             maxv[1] = x
@@ -59,13 +51,7 @@ function fastmaxindex!(xs::Vector, topn, maxn, maxv)
 end
 
 function related(posts)
-    for T in (UInt8, UInt16, UInt32, UInt64)
-        if length(posts) < typemax(T)
-            return related(T, posts)
-        end
-    end
-end
-function related(::Type{T}, posts) where {T}
+    T = UInt32
     topn = 5
     # key is every possible "tag" used in all posts
     # value is indicies of all "post"s that used this tag

Mason · October 11, 2023, 1:38pm

Code size is typically better off measured in terms of the compressed file size instead of the number of lines of code.

jling · October 11, 2023, 1:46pm

if you have a way to extract that information that would be great.

there are folders with configuration files for Swift and Java.

You need a way to extract only source code files, and exclude comments, and compress it.

Mason · October 11, 2023, 1:49pm

just gzip the source file. Comment filtering would be a bit annoying though yeah

jling · October 11, 2023, 1:50pm

I can’t. because I don’t know what’s source file (i.e. I’m not going to code that up manually). I rely on Tokei to only count source files + only line with code

Dan · October 11, 2023, 2:12pm

The size of the source is kind of a murky metric for a piece of code, compressed or uncompressed. It is tempting to have a 2-axis chart, but either a regular 1-axis bar-chart is enough or a different 2nd axis can be chosen. For example:

x-axis = time/post
y-axis = # of posts
and see how the programs scale with # posts.

jar1 · October 11, 2023, 5:42pm

Code size in that benchmark is

How source code size is measured

We start with the source-code markup you can see, remove comments, remove duplicate whitespace characters, and then apply minimum GZip compression. The measurement is the size in bytes of that GZip compressed source-code file.

Thanks to Brian Hurt for the idea of using size of compressed source code instead of lines of code.

dlakelan · October 11, 2023, 6:38pm

size of compressed code isn’t that great of a metric if you’re concerned about human readability and writeability. If you’ve got a lot of boilerplate that you have to repeat often, it’s easy to compress this away, but you can’t not read it or not write it… it’s annoying crap in the code file.

So from a human readability perspective you’d want also something like compression ratio. A language with a very high compression ratio is basically a language that sucks to write or read.

Mason · October 11, 2023, 6:41pm

I mean, nobody likes reading repeated boilerplate, but also compressing it seems pretty fair. It doesn’t completely ignore the boilerplate, but it reduces it’s cost to the code length metric, which is also how I tend to interact with boilerplate when I’m reading it.

I glance at it, skim it, and if I see something repeated a bunch, I look at the repeated code less and less each time it’s repeated.

As a concrete example, consider this part of the diff @jling posted:

-using JSON3
-using StructTypes
-using Dates
-using StaticArrays

+using JSON3, Dates, StaticArrays

When the word using is repeated a bunch of times in a very predictable way like this, it’s very easy for my eye to just completely skip it and only look at the words to the right of the using on each line. I don’t need to figure out what is going on each time, I immediately see that all of those are using statements.

In fact, if there were a few more usings in there, I might actually find the verbose version more pleasant to read than the one-liner version.

dlakelan · October 11, 2023, 6:57pm

Yes, but consider this from the Wikipedia on boilerplate, and then multiply by 25 for some code with 25 different subclasses of Pet.

public class Pet {
    private String name;
    private Person owner;

    public Pet(String name, Person owner) {
        this.name = name;
        this.owner = owner;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Person getOwner() {
        return owner;
    }

    public void setOwner(Person owner) {
        this.owner = owner;
    }
}

Mason · October 11, 2023, 7:03pm

Okay, but is the experience of reading that really so significantly different from the experience of reading this hypothetical syntax?

public class Pet {
    private String name
    private Person owner
    Pet(String name, Person owner) = {this.name = name; this.owner = owner}
    String getName() = return name
    void setName(String name) = this.name = name
    Person getOwner() = return owner
    void setOwner(Person owner) = this.owner = owner}

I’m not really convinced it’s such a big difference.

dlakelan · October 11, 2023, 7:06pm

I am not a Java programmer, but my impression was Java allowed inheriting interface but not implementation. So if you have 25 pets, say Dog, Cat, Bird, Fish etc… then you have this boilerplate 25 times even though it does absolutely nothing different each time. Then when you gzip it it will compress about 50 to 1. Which means when you’re reading through this code only 2% of it is even worth looking at in some sense.

So it seems to me that if you want a “human complexity” metric it should be something that takes into account when the compression ratio is very high that reading or writing that code really sucks.

EDIT: I looked this up, and apparently maybe I’m wrong, Java does allow some implementation inheritance, but I do think there are languages that don’t and you wind up with lots of heavily repeated boilerplate.

ericphanson · October 11, 2023, 7:15pm

I think non-compressed metrics reward golfing too much (eg get everything onto one line if it is per line, or use all 1-character variable names, if it is by character). Compressing devalues these approaches, which I think is appropriate.

(Also the data compression limit is the Shannon entropy, so there’s a nice information theoretic flavor to compression-based metrics).

dlakelan · October 11, 2023, 7:26pm

To be clear, what I’m actually saying is not anything directly against compression as a metric, I like using compression, its more that it’s insufficient by itself, the compression ratio is also an additional important dimensionless ratio in the question.

Topic		Replies	Views
Matt Parker's Crappy Python Challenge General Usage	19	1987	October 26, 2022
Yet another language benchmark Performance benchmark	9	902	June 15, 2025
Help to get my slow Julia code to run as fast as Rust/Java/Lisp Performance	100	4587	August 6, 2021
Benchmarks game Performance	20	3781	May 13, 2020
Small benchmark Performance benchmark	14	2729	November 21, 2018

Funny Benchmark with Julia (no longer) at the bottom

How source code size is measured

Related topics