I wouldn’t give too much thought to such analysis - they’re very specific to the used languages:
At this point it’s worth noting that both datasets are in Java, which is the promised language of one-liner methods.
On top of that, 2 datasets? That’s a very small amount, probably not very representative.
As such, software developers should be wary of breaking their code into too small pieces, and actively avoid introducing very short (1-3 lines) functions when given the choice. At the very least unnecessary single-line functions (ie. excluding getters, setters etc.) should be all but banned.
I don’t get this. If what you want to is logically a one-liner, or something very short, like defining a polynomial or some cost function, what should you do? Pad the function with garbage to achieve sufficient length? Combine unrelated functionality in a single function?
The article, or at least the conclusions, seemed completely absurd to me. Functions should be logical units.
John Ousterhout suggests in the great book “Philosophy of Software Design” to use “deep modules”, i.e. functions (or classes, modules, etc.) where the functionality is large compared to its interface.
The functionality is here not directly related to the lines of code - even a very short function could provide sufficient “functionality”.
Furthermore, I think that long functions should be split up into (usually private) sub-functions (again being “deep modules” with simple API), especially to make variable dependencies more transparent. This reduces the bug probability in my experience, and does not increase it.
The counterpoint to this is I have seen actual code deployed where there were a series of 20 function, each of which had a name starting in a number indicating roughly (but not exactly) what order they executed in. Also, they were all out of order in the file, so you had to jump back and forth to follow the control flow. Furthermore, each function did the same
null checks repeatedly, roughly doubling the length. It took me several hours to understand what could have been a single 100 line function.
I try to use assignments and functions only in cases where I know I’ll be using/doing the same thing twice. I don’t have a particularly good reason for it, except to avoid cluttering up the namespace.
I have some hypothesis.
The bug density of very small functions (e.g., one-liners) is greater than the bug density of longer functions because longer functions have more comments and spacing.
Note that some of the studies cited count comments and spacing, others do not account for them, and in the the first graphic in section “Data set analysis” it is not clear which is the case.
Two-line methods have half the bug density of one-line methods because both commonly have the same number of errors (related to just creating methods) but one is half the lines than the other.
Two-line methods have half the bug density of one-line methods because one-line methods are, in fact, two-line methods people decided to be smart about and put in a single line.
Two-line methods have half the bug density of one-line methods because we are counting when setters and getters change.
The section Criticism of defect density/function size connection also sums-up my criticism. Basically, what is shown is a lot of correlations based on limited data. A causal explanation is not given, and such hypothesis (i.e., the cause) is not tested. So while the authors claim about science (which I find entirely valid, software engineering is really more engineering than science), what is shown is not obligatorily good science. It is a good text to instigate further scientific investigation, but I find the arguments for preferring long or short method bodies to be kinda weak.
The reason a function is short or long seems important.
When most of us start writing a function, we don’t call
round(Int,200randexp()) to figure out how many lines to make it; line length is not independent of function or purpose.
@bramtayl waiting for two uses will often be reasonable, because you’ll then have a better idea about how code should actually be organized, what the api should be, or appropriate abstractions. I recall reading an article where someone said they copy and paste the first time, waiting for the third data point before refactoring.
But why avoid assignments?
Perhaps in Julia, what would have been a bunch of short methods in other languages is done with loops and
@eval instead – easier, and much less error prone.
I would say that waiting for the 3rd time would be really bad cause it could lead to many people copy pasting their second time, but I’ve also seen multiple implements the same non-trivial function multiple times in different parts of a code base, so not sure what the real answer is.
Short functions are idiomatic for Julia, especially because of multiple dispatch design patterns. I would say that in Julia code, long functions are the code smell. They are sometimes inevitable, but should be treated with suspicion.
It should be kept in mind that these questions depend on the language, and results from other languages with different paradigms do not necessarily carry over to Julia.
Nobody said it explicitly, but the boundaries of a function have “special” meaning for the Julia JIT compiler, so there may be reasons to write a short method that is not used anywhere else just to deal with type instability and improve performance.
While I do believe this is a fact and, consequently, completely agree, I also think it is also a little naive and miss the point. The line length of a method when expanding recursively all their calls is clearly not independent of function or purpose, but you can have the exact same method written with many different line lengths depending of how much of that method code you wrap in their own sub-functions that are called only there.
I, for example, avoid having methods with more than 30 lines (i.e., do not fit one vim screen), if a method has more than 30 line I try to break it in smaller parts which I name with an underscore in the front (because they are internal) and I only call a single time. It is an arbitrary threshold, but I find my code much easier to reason with by doing so.
Then the comparison should be between code bases following different styles, not within code bases.
FWIW, the java code bases found errors to be minimized at around 5-10 lines.
Again, I completely agree, but not all coding styles or guidelines specify a “preferred method line length” and even if they do, a large enough code base probably fail to follow this suggestion consistently.
Just throwing my hat in with Tamas here. I’ve answered several StackOverflow questions where I’ve pointed out that multiple dispatch with low/zero overhead encourages shorter functions, while MatLabs syntactical and structural norms encourage longer functions. So heuristics for one are not necessarily heuristics for the other.