My last (faster) algorithm completely changes how the method works. As such I don’t know if it will satisy the constraints as is.
I’m also unsure if my method is faster in their environment
My last (faster) algorithm completely changes how the method works. As such I don’t know if it will satisy the constraints as is.
I’m also unsure if my method is faster in their environment
I think other languages implementation should pick up your sorting algorithms soon
If someone gives me a color map between language names and colors, I will happily use that
You need tokei
installed
Or you can use Tokei_jll
now I really want to also shrink code size …
diff --git a/julia/related.jl b/julia/related.jl
index e21c8a2..b79b866 100644
--- a/julia/related.jl
+++ b/julia/related.jl
@@ -1,9 +1,4 @@
-using JSON3
-using StructTypes
-using Dates
-using StaticArrays
-
-# warmup is done by hyperfine
+using JSON3, Dates, StaticArrays
function relatedIO()
json_string = read("../posts.json", String)
@@ -33,12 +28,9 @@ struct RelatedPost
related::SVector{5,PostData}
end
-StructTypes.StructType(::Type{PostData}) = StructTypes.Struct()
-
-function fastmaxindex!(xs::Vector, topn, maxn, maxv)
- maxn .= 1
- maxv .= 0
- top = maxv[1]
+function fastmaxindex!(xs, topn, maxn::AbstractVector{T}, maxv) where T
+ maxn .= one(T)
+ maxv .= top = zero(T)
for (i, x) in enumerate(xs)
if x > top
maxv[1] = x
@@ -59,13 +51,7 @@ function fastmaxindex!(xs::Vector, topn, maxn, maxv)
end
function related(posts)
- for T in (UInt8, UInt16, UInt32, UInt64)
- if length(posts) < typemax(T)
- return related(T, posts)
- end
- end
-end
-function related(::Type{T}, posts) where {T}
+ T = UInt32
topn = 5
# key is every possible "tag" used in all posts
# value is indicies of all "post"s that used this tag
Code size is typically better off measured in terms of the compressed file size instead of the number of lines of code.
if you have a way to extract that information that would be great.
there are folders with configuration files for Swift and Java.
You need a way to extract only source code files, and exclude comments, and compress it.
just gzip
the source file. Comment filtering would be a bit annoying though yeah
I can’t. because I don’t know what’s source file (i.e. I’m not going to code that up manually). I rely on Tokei
to only count source files + only line with code
The size of the source is kind of a murky metric for a piece of code, compressed or uncompressed. It is tempting to have a 2-axis chart, but either a regular 1-axis bar-chart is enough or a different 2nd axis can be chosen. For example:
Code size in that benchmark is
How source code size is measured
We start with the source-code markup you can see, remove comments, remove duplicate whitespace characters, and then apply minimum GZip compression. The measurement is the size in bytes of that GZip compressed source-code file.
Thanks to Brian Hurt for the idea of using size of compressed source code instead of lines of code.
size of compressed code isn’t that great of a metric if you’re concerned about human readability and writeability. If you’ve got a lot of boilerplate that you have to repeat often, it’s easy to compress this away, but you can’t not read it or not write it… it’s annoying crap in the code file.
So from a human readability perspective you’d want also something like compression ratio. A language with a very high compression ratio is basically a language that sucks to write or read.
I mean, nobody likes reading repeated boilerplate, but also compressing it seems pretty fair. It doesn’t completely ignore the boilerplate, but it reduces it’s cost to the code length metric, which is also how I tend to interact with boilerplate when I’m reading it.
I glance at it, skim it, and if I see something repeated a bunch, I look at the repeated code less and less each time it’s repeated.
As a concrete example, consider this part of the diff @jling posted:
-using JSON3
-using StructTypes
-using Dates
-using StaticArrays
+using JSON3, Dates, StaticArrays
When the word using
is repeated a bunch of times in a very predictable way like this, it’s very easy for my eye to just completely skip it and only look at the words to the right of the using
on each line. I don’t need to figure out what is going on each time, I immediately see that all of those are using
statements.
In fact, if there were a few more using
s in there, I might actually find the verbose version more pleasant to read than the one-liner version.
Yes, but consider this from the Wikipedia on boilerplate, and then multiply by 25 for some code with 25 different subclasses of Pet.
public class Pet {
private String name;
private Person owner;
public Pet(String name, Person owner) {
this.name = name;
this.owner = owner;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Person getOwner() {
return owner;
}
public void setOwner(Person owner) {
this.owner = owner;
}
}
Okay, but is the experience of reading that really so significantly different from the experience of reading this hypothetical syntax?
public class Pet {
private String name
private Person owner
Pet(String name, Person owner) = {this.name = name; this.owner = owner}
String getName() = return name
void setName(String name) = this.name = name
Person getOwner() = return owner
void setOwner(Person owner) = this.owner = owner}
I’m not really convinced it’s such a big difference.
I am not a Java programmer, but my impression was Java allowed inheriting interface but not implementation. So if you have 25 pets, say Dog, Cat, Bird, Fish etc… then you have this boilerplate 25 times even though it does absolutely nothing different each time. Then when you gzip it it will compress about 50 to 1. Which means when you’re reading through this code only 2% of it is even worth looking at in some sense.
So it seems to me that if you want a “human complexity” metric it should be something that takes into account when the compression ratio is very high that reading or writing that code really sucks.
EDIT: I looked this up, and apparently maybe I’m wrong, Java does allow some implementation inheritance, but I do think there are languages that don’t and you wind up with lots of heavily repeated boilerplate.
I think non-compressed metrics reward golfing too much (eg get everything onto one line if it is per line, or use all 1-character variable names, if it is by character). Compressing devalues these approaches, which I think is appropriate.
(Also the data compression limit is the Shannon entropy, so there’s a nice information theoretic flavor to compression-based metrics).
To be clear, what I’m actually saying is not anything directly against compression as a metric, I like using compression, its more that it’s insufficient by itself, the compression ratio is also an additional important dimensionless ratio in the question.