Run the half-life of code analysis on Julia?

history

#1

Not quite a language development post (so maybe it will get recategorized), but it would be interesting to see the results of this “half-life of code” analysis on Julia’s git repo:

I thought the results would be interesting, but realistically I won’t get around to it so I figured I’d throw it out here and see if anyone wants to take a crack at it.


Examining an older and simpler version of Julia
#2

Here it goes,



Naming: Remove all underscores to matter what?
#3

It would be great to get some insider explanations on the upper graph.

A few outsider observations:

  • code of 2011 has almost all perished whereas that of 2010 has not.
  • what’s that huge spike in 2012?
  • LOCs was around 50,000 in 2013 and 2014, then doubled in 2015.
  • Feb 2012: Julia announced, afterwards LOCs increase sharply.
  • Feb 2013: 0.1 released: this is around the end of those big drops in LOCs after the huge spike
  • Aug 2014: 0.3 released, before quite a big drop.
  • Now: the imminent merge of #18588 would produce a visible dent of -20,000 LOCs.

#4

These are all guesses, but here are some plausible answers:

  • code of 2011 has almost all perished whereas that of 2010 has not.

In 2010 the code was mostly really basic things that you need to bootstrap a language – parsing, core C runtime functionality, how to do arithmetic on integers and floats, etc. – there’s not much reason to change these once they work. By 2011, we were building out an initial cut of a standard library which is more malleable and definitely doesn’t have one “right” answer. So I’m guessing most of the initial stdlib code has been replaced over time. By the same sort of thinking, 2009 was pretty much just “sketching” and prototyping, so almost all of that code was replaced pretty quickly.

The mini spike in 2011 was @viral checking in the source of msun (and a bunch of HTML pages where wget had gotten a 404 or something :grimacing:). That was on August 8th and then after some work getting it into shape and working, he moved the whole thing into a separate repo as OpenLibm on August 13th.

  • what’s that huge spike in 2012?
  • Feb 2012: Julia announced, afterwards LOCs increase sharply.
  • Feb 2013: 0.1 released: this is around the end of those big drops in LOCs after the huge spike

Back then we didn’t have packages, so there was a bunch of additional package-like code checked in the JuliaLang/julia repo under the extras directory:

  • editor support
  • plotting (Winston)
  • bindings to graphics libraries (Cairo)
  • data structures (trie)
  • color support
  • GLPK
  • image formats (DICOM, FITS)
  • ode solvers
  • Rmath (that one took a while to get rid of!)
  • polynomials
  • a web server!
  • linear programming
  • statistical distributions
  • HDFS support
  • memoization
  • zlib support
  • Tk support
  • argument parsing
  • WAV file support
  • ICU library bindings
  • JSON parsing
  • units
  • text wrapping

This was getting pretty out of control, so I wrote a package manager and on Nov 20th deleted a ton of stuff that had been moved into packages. We kept on moving things into packages so that by Julia 0.1 most of this lovely menagerie of stuff was no longer in Base Julia.

Somewhere in that same time period, Mike Nolta moved the manual from the JuliaLang.org website to the julia repo as well. Interestingly enough, this move was from Markdown hosted on GitHub to RestructuredText hosted on Read The Docs, which is the exact inverse of what we just did today.

  • LOCs was around 50,000 in 2013 and 2014, then doubled in 2015.

I suspect a lot of this is increased test coverage and lots more documentation. Overall the growth in LOC looks roughly linear. I’d be interested in seeing this analysis broken down in src, base, doc, etc.

  • Aug 2014: 0.3 released, before quite a big drop.

Not sure what this is. Maybe some other chunk of functionality that we moved out? Anyone remember? I don’t see anything in the release notes that jogs my memory.

  • Now: the imminent merge of #18588 would produce a visible dent of -20,000 LOCs.

That’s the way @jeff.bezanson rolls – his net LOC contribution to the project is negative :grin:. #18588 is the best kind of change: it adds functionality, generalizes things, removes tons of code.

Anyone know what the spike in early 2016 is? Looking through the git log, I don’t see anything obvious.


#5

Now: the imminent merge of #18588 would produce a visible dent of -20,000 LOCs.

That’s the way @jeff.bezanson rolls – his net LOC contribution to the project is negative :grin:. #18588 is the best kind of change: it adds functionality, generalizes things, removes tons of code.

Anyone know what the spike in early 2016 is? Looking through the git log, I don’t see anything obvious.

Actually, #18588 is Michael Hatherly’s Manual conversion.

Presumably you were thinking of #18457, which has a net gain of ~600 lines so far… although you’re right that he has a net negative LOC. :wink:


#6

That’s actually true for the top 3 contributors


#7

Most impressive … in order of appearance there, who is a takes away more lines of code than they leave?

JeffBezanson, StephanKarpinski, ViralBShah, MichaelHatherly, pao, carnaval, avkis, jcorbin, KDr2, zhmz90


#8
$> python git-of-theseus/analyze.py julia --outdir "julia_test" --only "test/*"
$> cd julia_test
$> python ../git-of-theseus/stack_plot.py cohorts.json
$> python ../git-of-theseus/survival_plot.py survival.json


#9
$> python git-of-theseus/analyze.py julia --outdir "julia_doc" --only "doc/*"


#10
$> python git-of-theseus/analyze.py julia --outdir "julia_src" --only "src/*"


#11
$> python git-of-theseus/analyze.py julia --outdir "julia_base" --only "base/*"