What tools do you use for plagiarism detection of code?

For those who are not familiar, the MOSS system is one of the most used tools when trying to detect plagiarism in source code submissions. The output format is given as a list of submission pairs, in descending order of “similarity” between the two with some other information such as the number of lines of similarity etc. There is also a link for each pair that shows the source code side by side, and highlights in different colours the blocks of code that are similar across the two submissions.

A recurring issue I have is doing “grouping” or “clustering” or potential groups of submissions, where I may have 3, or an even larger group, of submissions which may be similar, and grouping them can help with more investigative work (eg. if there was collaboration, or taking code from some online source).

Are there any tools currently available that can help with this? Also, what other tools do people use? Are there any other tools that work like MOSS or supplement MOSS?

1 Like

I use GitHub - hjalti/mossum: Moss summarizer to reveal plagiarism clusters. The image below is from the readme, but I saw a similar (albeit smaller) mess the first time I used it when grading. Subsequent classes seem to have gotten the heads-up on the unsuitability of copy-paste as a way to complete assignments - only a few pairs or triads trying to sneak under the radar.


That’s a great tool! It will definitely be useful for finding the potential groups. It’s also a good thing that it has options to handle the minimum number of lines and percentages for consideration.

Are there any other tools people use other than MOSS? It seems that in academia, most people are using MOSS, and I haven’t heard of any other tools that people use…