A bot to detect code duplication

This might seem like a naive suggestion (it really is), but wouldn’t it be cool if there was a bot that could scan a list of packages and identify “code duplicity”? The main aim of such a bot would be to detect if multiple packages have defined practically the same function. It could even report different grades of similarity. The goal of this new information would allow developers to identify sets of popular functions that seem to pop up “everywhere”. We could then consolidate such functions into toolbox-packages. Apart from avoiding code duplication, the main benefit would be that the implementation of these common functions would be optimal (cause we all improved on the same singular instance of that implementation). I suspect that another cool feature might pop out if we relax the unit of duplication from function to “just code”.

Soon we will have a curated list of “standard” and “approved” packages. That would be a good chuck of code to run this on.

Anyways, just :gun:ing the :poop:!

1 Like

I think it would be rather difficult to create a useful version of a tool like this. Fanatically trying to avoid code duplication is a quick route to dependency hell. In its simplest form such a tool would likely mostly detect simple code which is perhaps a few lines long. In those cases, it’s usually rather dubious whether it would be worth it to add a dependency for the sake of eliminating that code.

Now, certainly there are many cases in which it serves you well to be aware of what packages exist in the ecosystem and to utilize them rather than to try to implement it on your own. However, it seems to me that those cases would be much harder to detect for a couple of reasons. The first is just entropy: as your code gets longer the probability that it will closely mimic some existing implementation decreases. The second is that this resemblance may not even become apparent until you are a large part of the way through.

2 Likes

Honestly, you’re making a lot of sense here…

Yea, I guess it might be interesting if it turns out it’s not just one-liners that pop out. Or if the one-liners are ultra prevalent everywhere and that we should add them all to some standard toolbox.

Few people are likely willing to put a dependency on a toolbox of random stuff if they are only using a single thing. Also, these type of things are typically used as implementation details which means that you actually don’t want it in any public interface (like putting it in a toolbox would mean). In addition, using a my_toolbox_function() call might be less descriptive than just seeing the composition the Base functions used in it.

You could try collecting this type of toolbox functions but I think it might be hard to get any larger traction.

Well, you’re making a lot of sense too… Thanks for the feedback!

The tattletale project:
https://tattletale.jboss.org/ does this similar task for JavaEE applications.

1 Like