Copy Pasta and its effects on Code Quality (a.k.a. Term Paper on Code Clones)

I’ve been keeping very busy since I moved to Norway for my internship, but today finally was one of those rainy October Sundays so I had the time to sit down and finally finish this post which I have been meaning to write for a little over a year but never got around to. Well, here it goes: All the stuff below is based on the German paper I wrote for this class, which is in turn based on the publications referenced in the paper and at the bottom of this page. Any gross oversimplifications and inaccuracies are entirely my fault.

Way back in the 6th semester of my Bachelor’s at TUM I had the opportunity to take a seminar class focusing on software quality. The general idea was to get a better understanding of what good code quality is and go beyond the notion of “a good developer know good code when they see it” and also to investigate if there are metrics that might be interesting. My topic was code clones and I was working under the supervision of a researcher at TUM who has published on exactly this topic.

What is a Code Clone?

First up, let’s define what we mean by Code Clone. One definition seems immediately obvious:

“Code Clones are segments of code that are similar according to some definition of similarity.” – Ira Baxter

However, that doesn’t really help us if we want a computer to analyze code to find clones. So let’s take a look at a definition used in literature that we can throw at a computer:

  • Type I clones: two segments are clones if they are identical save for whitespace and comments.
  • Type II clones: allow for consistent renaming of identifiers and changing the values of literals.
  • Type III clones: several statements may be different between clones.

I guess it is example time!

Different fragments of code (with hard-to-decipher German comments 😉 )

Continue reading Copy Pasta and its effects on Code Quality (a.k.a. Term Paper on Code Clones)