Code style consultation: Should we strive to avoid code duplication?

Should we strive to avoid code duplication when defining a function with multiple methods if they have similar code?

Specifically, for the following function with two methods, should we combine them into a single method?

function myfun(arg::Type1)
    # Common code 1
    # Special code 1 for `arg::Type1`
    # Common code 2
    # Special code 2 for `arg::Type1`
    # ...
end

function myfun(arg::Type2)
    # Common code 1
    # Special code 1 for `arg::Type2`
    # Common code 2
    # Special code 2 for `arg::Type2`
    # ...
end

Should we keep them separate but with many duplicated snippets or combine them into the following single method:

function myfun(arg::Union{Type1, Type2})
    # Common code 1
    if arg isa Type1
        # Special code 1 for `arg::Type1`
    elseif arg isa Type2
        # Special code 1 for `arg::Type2`
    end
    # Common code 2
    if arg isa Type1
        # Special code 2 for `arg::Type1`
    elseif arg isa Type2
        # Special code 2 for `arg::Type2`
    end
    # ...
end

Thanks!

The usual way I tackle this is to use solution 1 both with functions having the common codes into them so that you don’t repeat the common parts (apart from a one-line function call)

1 Like

Yes, you should always strive to avoid code duplication. The solution I use is to define a new function that contains the common code that is then called by both of your functions. The principle I use is: if a piece of code can be used in at least two points in your code base, this piece of code should already be a function.

12 Likes

Although it sounds very comforting to do so, what do we ultimately gain from it, except for making the code more fragmented and expending more mental effort? Can it achieve better runtime efficiency?

As pointed out by Datseris, here is the possible way to go

function common1()
# common code 1
end

function common2()
# common code 2
end

function myfun(arg::Type1)
    common1()
    # Special code 1 for `arg::Type1`
    common2()
    # Special code 2 for `arg::Type1`
    # ...
end

function myfun(arg::Type2)
    common1()
    # Special code 1 for `arg::Type2`
    common2()
    # Special code 2 for `arg::Type2`
    # ...
end

Overall, I like multiple dispatch more than if-else structure.

7 Likes

It effectively creates a function barrier which is a common technique to avoid type instability.

2 Likes

No, runtime will be identical (EDIT: provided your code was optimal to begin with). You will gain mental effort though, because you have less code over all, a better organized code, and a code that is much easier to debug, because if your “common code” was bugged, in your approach you’d have to debug it two times. It also makes the code more readable and easier to teach to new contributors, provided you learn to name the functions effectively (advertisement: GitHub - JuliaDynamics/GoodScientificCodeWorkshop: A workshop on writing good scientific code. )

6 Likes

I actually prefer the inverse approach here:

function special1(arg::Type1) end
function special2(arg::Type1) end

function special1(arg::Type2) end
function special2(arg::Type2) end

function myfun(arg)
  # common code 1
  special1(arg)
  # common code 2
  special2(arg)
end

Not a major point but this way myfun is kept entirely generic.

ETA: this also means that you only need define special1/special2 for any new types.

ETA: to respond to the original question: the software engineering orthodoxy is that code duplication is generally only a good idea when it’s foreseeable that you’ll want to change the code in one place but not the other. And even then you really should only duplicate the code at the point you actually need the two copies to diverge.

9 Likes

Thank you for providing such good learning material! But before reading it, I still doubt whether introducing more functions will bring more bugs, and whether the effort spent to reduce code duplication is less than the effort spent “debugging it two times” (and, strictly speaking, it’s not “debugging it two times”, but rather fixing one bug in two places). :grinning:

Yes, in practice I do this. It is more extensible and I love multiple dispatch!

2 Likes

You claim that “it takes more effort to reduce this code duplication”. I am confused. Won’t you have to anyways write this code two times? If you use it in two functions you write it two times. Why is it suddenly “more effort” to write this code one time in a new function, but “less effort” to write it twice inside two different functions? Sure, if it is truly a single line of code, then you likely shouldn’t make a function out of it, but I am assumming here that we’re talking about something substantial.

This particular sentence also doesn’t make sense to me:

Isn’t the exact same code we are talking about? If it had a bug before, it will have the same bug whether you refactor it into a new function or not. How can it possibly be that moving the exact same code out into a function will bring in more bugs?

And I need to stress, that so far you have ignored the by far more important aspects of long-term maintenance or readability for new users or your future self. In my experience these count much more than then +/- 10 seconds you will need to write the function new_name(x, y , z) header.

3 Likes

Copy and Paste is much cheaper, I think.

I think that breaking down a large code block involving many variables into a new function is not always a trivial task, is it? At the very least, you need to carefully clarify what its inputs and outputs are, and come up with a suitable function name, etc. Is it really possible for humans to completely avoid introducing bugs during this process? Of course, it may not be very difficult, but for me, it’s not always as straightforward as +/- 10 seconds.

This point is indeed extremely important.

It is true that Copy/paste is cheap at the time it is done… but it may cost a lot after that. Imagine a new reader (which maybe yourself in a year) want to modify the code: it may take a while to check that the duplicate source code is actually exactly the same… After a tedious check, you may make your modification twice again…

2 Likes

In addition the common part can be factorize in a function that can be named with a meaningful name that can ease the reading of the calling code.

1 Like

Yes, I completely agree on this point. However, I find it difficult to agree that the following principle is necessary in any situation unless it is just an idealistic pursuit. I wonder if everyone can strictly adhere to this principle? Alternatively, what is the maximum level of violation that everyone can accept?

definitely not, in fact I wouldn’t even know how to objectively measure adherence. if you write dst .= 2 .* src in multiple places is that “duplication” that should be replaced with broadcast_equal_twice(dst, src) ?

imo “DRY” is more of a vibe-check than a rule — the recognition of and adherence to which can only come with experience

2 Likes

Very reasonable. :grinning:

Copy and paste appears cheaper, but only if it’s actually done. Consider the usual copy and paste workflow:

  1. Realize that this code might be useful elsewhere
  2. Copy and paste it
  3. Identify the places where it differs and change them

Now, you have almost the same code twice – maybe including some bugs – and refactoring into a function feels like work as you need to do it at several places, i.e., as many times as you copy and pasted already.

Instead, consider the following alternative workflow:

  1. Realize that this code might be useful elsewhere
  2. Identify the places that would differ and refactor it into a function immediately
  3. Use the new function in the two places, i.e., for the original as well as the new code.

Made it a habit to step back from the keyboard and think whenever reaching for copy and paste. In this way it’s just as cheap as copy and paste, but my code stays nicer.

1 Like

May I ask a serious question: Strictly speaking, will introducing more functions increase the compilation or run time?

will introducing more functions increase the compilation or run time?

depends how many times you call them lol

1 Like