As Julia develops and moves to the 1.0 milestone I wanted to get some opinions about whether it might be worthwhile to re-write various packages. Many widely used packages have been developed since the early days, (e.g., DataFrames 2013-02-14, GLM 2013-07-14). Legacy code is pervasive in the Base and many packages as it is natural in open source projects and just code in general. I myself will be working this summer to re-write and update a Numfocus Julia project after 0.7 is up and running.
One main reason for having sub-optimal code is legacy due to backward compatibility, but since the only officially supported version is Julia 0.6, it makes sense to drop those issues in the transition to 0.7 - 1.0. One potential benefits includes taking full advantage of the advances in the language which were not available back five years ago or even a few months back. Moreover, it would also allow a re-structuring of ecosystems (supported in part by the new package manager). There are many packages that have been left to gather virtual dust for some time and retiring those for new ones would be a gain. One example is JSON → LazyJSON or JSON2.
Not necessarily. While bit-rot is a thing, code doesn’t just go bad. Old code has the advantage that it has been battle tested for along times and bugs have been shaken out. Rewriting for the sake of rewriting is unlikely to be time well spent imo. If you feel you can make, e.g., a new DataFrames that is better than the current one, then you should write that package and it is then likely that people will use it. But trying to “rally the masses” to abandon (and rewrite) a bunch of well-used and well-tested packages might not be time well spent.
I am not following the development of these packages closely, but I see activity from 8 days ago in JSON.jl. In light of a recent discussion, I think we should be more cautious about pronouncing that packages are (semi-)abandonned or “gathering dust”.
As for the main question, I think it is important to distinguish
making sure that packages work smoothly with v0.7 (no deprecation warnings etc), which can be done now if one has the time, but can wait until it is released (especially if one wants to use magic like FemtoCleaner.jl),
making use of new language features, which should probably just happen automatically when someone touches a particular piece of code,
API redesign, some of which will be possible because of new features in v0.7 (I have big plans for NamedTuples, for example), but some of which is orthogonal to language features and will just happen naturally in due time,
complete rewrites of the same thing from scratch, which is rarely a productive activity.
Those were really good reads and informative. Using that framework it seems that it might only be worthwhile very specific cases (e.g., native RMath, not too likely?) However, I do think there are some instances it is worthwhile. For example, breaking a battery package into smaller compact packages (e.g., DataFrames → StatsModels, similar to how stdlib got structured). For instance, I think the IRLS routines in GLM could be moved to a separate package that can best develop and maintain these (e.g., dense, sparse, mixed, and distributed linear predictors). When it was developed, DataFrames was the only tabular data package and didn’t support distributed parallel processing for those routines (this is coming from a guy who only uses DataFrames for tabular data in Julia). I don’t want to advocate for a re-writing frenzy, but to analyze if there are possible worthwhile cases to do so in a productive manner.
For example, JSON2 / LazyJSON is a “re-write” which is happening in a different repository, but is spiritually a branch development for eventually replacing JSON. When I refer to retire packages gathering dust I don’t refer to veterans (old by no means means bad and in most cases the opposite), but literally packages in active organizations that have been broken since Julia 0.4 (e.g. JuliaStats/RegERMs).
Cleanly abandonning open source projects is always a difficult problem: you want to keep the source available, yet clearly signal that it is not maintained.
Github has an archive feature which may be useful for this. Perhaps you could open an issue about
appending an explanation to the README,
de-registering,
then archiving the package.
It would be great if there were some guidelines on when and how to do these things.
Most definitely! I’m in whole-hearted agreement with this.
A lot of old packages (and even code in Base) would benefit from a “how could we have done this better in v0.7/v1.0” review.
Doing this sooner rather than later would allow also breaking changes to APIs (and for that, you really should have some time for experimentation, testing (including performance testing!) and user feedback.
While I would agree with that assessment in cases where something was written in a stable language, that’s not at all the case with Julia. Code that was developed for v0.3, v0.4 or even v0.5, was essentially developed for a quite different language. Many things that might still seem to work in v0.7, could still cause serious problems (for example, not using the new GC.@preserve macro in places where pointers are used outside of a ccall).
I dealt with something similar in a previous life, after we made a major revamp of the M/Mumps language, adding programming structures and object oriented programming, and in that case, it was pretty much always well worth it to rewrite old library routines to take advantage of the new syntax and capabilities (in some cases, required to close serious security holes).
The re-write and update is QuantEcon.jl which was first developed in Julia 0.3-DEV.
@StefanKarpinski, is there a guideline about retiring packages using Pkg3? Could it read that the Github repo is archived, parse repostatus.org shield or a special value in the project file?