Adding file length as constraint to style guide


#1

I think most Julia packages are culprit to long file lengths. And as such, different cohesive blocks of logic are usually put into the same file

Bugs hide in dirty code. And no open-source developer wants to touch an ugly codebase.

From a very quick search in ~20 mouse clicks, I found these 3 examples.

I think avoiding this would help in the long run


#2

That’s an interesting file to pick for OrdinaryDiffEq.jl. While the line lengths are a little long and it should probably be squashed a bit, that file is really concise and easy to understand. If you wanted to pick on me, pick on this file:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/unrolled_tableaus.jl

Or choose its big brother:

Then again, I don’t know of a good way to store huge tables of numbers other than like this.


#3

I apologize, but I’m not trying to pick on anyone.

If more than one person does something, it’s nobody’s fault.

I just thought somebody should make the point instead of letting it go unchecked forever.


#4

What don’t you like about the OrdinaryDiffEq.jl code you linked? Or for that matter, the ApproxFun.jl code? I would consider those easy to read and pretty standard Julia files. I generally don’t care about the style too much as long as the algorithm is easy to read, so I’d take suggestions/PRs.


#5

I’m not sure what you mean by this.

To be more correct, people are focused on writing packages for the things they care about and will use. And frequently the functionality is incomplete and non-scalable. I agree with the sentiment, but until package developers are on a payroll, they frequently will not build much more than they need.

I assume you see some obviously horrible code here, but I think they were strange choices to make your point…

http://docs.julialang.org/en/release-0.5/manual/style-guide/


#6

As long as you think that is a satisfactory answer, consider this closed.

I just don’t think the style guide has any really good examples of why most pieces of ugly code are ugly.


Also, I don’t think payroll is a valid excuse for registering code that can’t easily be developed on by others.

For the rest of time, people are always just going to be like “XXX.jl already does YYY, so just put your functionality there.”

That sucks. I don’t want to touch a ugly project that somebody only developed to have more public-facing code…


Lastly, when somebody breaks up code with 2+ new line characters, it usually means the file should be split up there. When you do this in every file, that’s bad practice.


#7

EDIT: as much of the OP and the title of the thread has been edited (to the better :slight_smile: ), much of what I say here may not make sense any more.

I have learned a few lessons from my time in the open source community, and I think some of them are relevant here:

  1. Vague critiques are seldom welcomed. The reason is they convey a negative sentiment without giving a clear course of action to alleviate it.
  2. Identifying specific problems, and identifying specific solutions to those problems, are generally welcomed in an open spirit. In this case, it seems that the problem you are identifying (in your bullet point 1 and 2) is the lack of succinct advice in the julia style guide on file length.
  3. Actions count – words don’t. In this case, an approach you might take is to write a recommendation on file length and submit it as a PR on the style guide.

With regards to your bullet point 3, it is a little unclear to me what exactly you would propose as a solution. And I disagree with your critique. I think the focus on writing good packages is very clear in a fairly large group of important packages, especially when compared to other highly decentralized open source projects such as R.


#8

Payroll is a valid excuse for everything in this non-utopian world we live in. :frowning:

Actually this is the opposite complaint. You don’t want to improve existing code, so you make your own out of stubbornness or laziness.


In general, I think there should be non-profit money that is funneled to develop some of the most important packages and maintain robust/clean code, and there should be other sources for the community to register packages that are limited/prototype/untested/etc.

Hopefully the organization and registration issues are fixed by Pkg3: https://github.com/JuliaLang/Juleps/issues?q=is%3Aissue+is%3Aopen+label%3APkg3


#9

I think this is a really interesting problem for mathematical codes. I don’t think that any amount of comments will make this code:

understandable to someone who doesn’t know how PI-stepsize adaptivity works: it’s a transformation of a very in-depth algorithm, and has not been implemented very many places because not many people know it / know how to do it. But just because this code will not be understood by the vast majority of potential developers, does that mean it should be replaced with something similar but easier? No, this is production-quality differential equation solver code, and the benchmarks show that PI-stabilization helps a lot in terms of runtime and stability properties, so it’s staying.

That’s the nature of academic software. Sometimes you have to write a whole paper just to explain and test a small portion of the code. So by that very fact, the number of people who will understand the full code is minuscule (I would bet the answer for “who knows all of the algorithms in JuliaDiffEq” is… just me). And even for those who understand it, it will probably require developer documentation, some academic papers, and some pencil/paper to workout what’s going on, because it’s inherently a math problem.

But then the fact that it’s a mature software ecosystem means that, any algorithm that any undergraduate has ever learned has some optimized implementation somewhere. So the barrier to entry is already going to be much higher than “okay I took an undergraduate class and I’d like to implement ____” since anything at that level is a quick weekend project (and I probably got all of them done already, or Sundials + ODEInterface picks up the slack).

It’s still an issue. I am trying to address it by continuing to keep up with devdocs, being very accessible in my Gitter channel, and writing blog posts along the way to explain the overarching ideas. There’s other methods which address this problem in parallel. One way is the modular architecture which means that someone can contribute to the ecosystem by creating a brand new package, so there’s no reason to force someone to adding to a specific package (though in this case, for it to be useful, it probably needs all kinds of optimizations, callback interfaces, advanced stepsize algorithms, all of which you get for free with the *DiffEq packages). For this reason I am also leaving around ODE.jl as a non-production quality (but decent) ODE package which people can hack away on with a much lower barrier to entry (I would definitely use that for teaching).

But line spacing / breaking up files is the least of my worries. My main worry is writing more of the papers to explain what some of the new algorithms are, and continuing to test/benchmark every detail. That stuff will let me advance to candidacy. Moving code around to look a little (subjectively) prettier won’t.


#10

@ChrisRackauckas May I recommend looking at my @with macro? It would allow you to drop the integrator. that is repeated 1000 times. See for example:

which operates on a beast of a type:


The macro is defined here and should probably have a different name and home… suggestions welcome.


#11

I think it’s very similar to using Parameters.jl’s @pack and @unpack. I think it would be an interesting addition to that set of macros. But is

But yeah, that code definitely has a case of "it’s not broken, it has a ton of tests to make it stays that way, and I have other stuff to work on, so I’m not going to touch it until I both have the time to implement the beautification, but also properly test it out to make sure that this very core piece of code doesn’t break. However, I instead have to spend my time wowing academics and implementing novel algorithms, otherwise I’ll be out of a job. It’s a sad reality, and yes it does always go back to the fact that open source software is woefully underfunded (if at all…).


#12

Yikes, is a file of 57 lines (OrdinaryDiff) considered too long? I wonder why…


#13

A file length constraint has nothing to say about the a hard maximum number of lines.

It says that those 56 lines:

  • could be split into multiple files very easily
  • their separation could improve the code’s clarity (if done well)

I literally found these files by:

  • clicking the 3 most recently updated (registered) packages
  • finding the first file with more than a one-page scroll of text (only clicked like 3/pkg)

#14

I did the whole MATLAB thing already. I’m not going back.


#15

:laughing:

Edit: for those who haven’t programmed in Matlab, it imposes a one function per file constraint (I think this got relaxed somewhat recently) leading to very many files.


#16

if a programmer feels the need to separate logical units into smaller files to increase readability, you don’t need suggested limit. if a programmer does not feel such need, numerical limits won’t help. also, there are good arguments for and against chopping up the code base. if you have good tools to navigate large files, what is your argument against it? at the end, we are dealing with types and functions. let us have some freedom where to put them. (i always hated languages that impose a file structure on you.)


#17

:laughing:


#18

It appears to me that a big reason for disliking long files is the difficulty of navigating through them. This is an issue of imperfect IDEs and not of files lengths. From a user experience perspective, an ideal IDE or editor lets you navigate between functions easily, no matter whether they are in the same file or in another file. Functions are the natural logical and navigational units. Files are irrelevant.

UX-wise, and maybe for the style guide, one should therefore think about function length rather than file length. But, as several have pointed out, there are constraints on how short one can make a function that implements a complicated algorithm.


#19

Even if you managed to get this highly contentious restriction into the style guide, note that the whole guide is a suggestion, not anything binding. So people could still write code as they please. Which, when you think about it, is great.

There is no Style Guide Police. In the open source world, the only person whose coding style you can constrain is yourself. You can try submitting PRs to the abovementioned repos, but aesthetic reorganizations or mere reshuffling of code into separate files might not be a priority. Which is also great, when you realize that developer time is a scarce resource.


#20

How fine a granularity would you suggest? One file per function, or type declaration, or something else?

I looked at the OrdinaryDiffEq example and did not immediately see how it could be split, unless you go with the Matlab approach. Also, it’s already a really short file.