Code style consultation: Should we strive to avoid code duplication?

Someone can probably speak more extensively than me, but I’ll give a statement of my understanding as it applies to general use:

The same code split over more functions will usually not increase compile time and can reduce it by avoiding duplicate work (if functions are actually used multiple times). It can also make the compiler’s job easier by allowing it to compile multiple simpler functions rather than one large and complicated one. However, this reduction is usually pretty minor for typical code.

More functions for the same code will usually not improve the run time. However, because the compiler can choose to inline functions it will often not degrade it noticeably either. In short, if a function is so small that the call overhead would be substantial relative to its work, the compiler will choose to essentially copy-paste the contents into the calling function. People work hard at making the compiler good at making this decision so you shouldn’t use this next part often, but it is possible to nudge the compiler one way or the other via @inline/@noinline.

While inlining is often faster (no function call overhead), it also requires the duplication of compiled code. This means more compilation work and that the code takes up more memory. Eventually, this can bloated code can tax your system resources (the instruction cache, in particular) and lead to reduced performance.

Function calls can be expensive in languages like MATLAB (and Python, I think, although I don’t have experience with it). In compiled languages like Julia, C, FORTRAN, or Rust, the function call overhead is usually smaller. Further, these compiled languages will often inline small functions to remove the overhead entirely.

If you write a function and @inline all of its calls, it will have essentially the same compilation and runtime performance (benefits and drawbacks) as if you had copy-pasted every called function into the body. But at least you’ll get the code organization and de-duplication benefits of using many compact functions with simple jobs.

But the compiler is usually pretty good at choosing what to inline and not, so I strongly discourage you from actually using @inline outside of exceptional cases (search for old discussions or open a new one if you want to learn more). Do you recall that + is a function? The compiler decides whether to inline it or not based on which method is called (i.e., its arguments) and in what context. For example, it will always choose to inline the + when the arguments are Int but probably won’t (I’m guessing) when they are Matrix{Int}.


As a real, practical example, I have several Python projects going that are similar, but slightly different. Part of this is “utility code” that I have been copy/pasting all over the place because it’s useful in a few situations. I really want to just make it into a standalone package, but that’s such a PITA in python, especially considering that the code isn’t quite identical, and it would require multiple dispatch, which Python doesn’t have.

In an ideal world, I’d be coding it in Julia, have a utility module that I could just import, and refactor everything to avoid duplication. As it is now, I am continually improving my code, but since I have simultaneous projects, I always have to double check 4 or 5 different files to ensure I’m using the latest-and-greatest version. It sucks.

1 Like