For those who are not familiar, the MOSS system is one of the most used tools when trying to detect plagiarism in source code submissions. The output format is given as a list of submission pairs, in descending order of “similarity” between the two with some other information such as the number of lines of similarity etc. There is also a link for each pair that shows the source code side by side, and highlights in different colours the blocks of code that are similar across the two submissions.
A recurring issue I have is doing “grouping” or “clustering” or potential groups of submissions, where I may have 3, or an even larger group, of submissions which may be similar, and grouping them can help with more investigative work (eg. if there was collaboration, or taking code from some online source).
Are there any tools currently available that can help with this? Also, what other tools do people use? Are there any other tools that work like MOSS or supplement MOSS?