Every spring and fall students from leading technical universities across Europe head to another institution in the ATHENS network to take part in a one week intensive course on a topic of their choice. This March was the second time I had the opportunity to take part in this program. Two years ago I went to Télécom ParisTech to explore how information can be extracted from unstructured data sources such as the Wikipedia. I published a short writeup of that here.
This time around, I went to Technical University of Delft to look into how Data Science can be used to improve Software Engineering. The course was taught by Alberto Bacchelli who is SNFS Professor of Empirical Software Engineering at the University of Zurich and came back to TU Delft to teach two courses during ATHENS week. He is one of the leading figures in the field of Data Science in Software Engineering (also called Empirical Software Engineering) and has a ton of relevant publications.
A couple of weeks ago, I published a first article on code clones based on a paper I wrote a couple of terms back in my undergrad. If you haven’t read that one yet, give it a read first. In this part I focus on the different algorithmic approaches that can be used to detect code clones. And once more: All the stuff below is based on the German paper I wrote for a class on Software Quality, which is in turn based on the publications referenced in the paper and at the bottom of this page. Any gross oversimplifications and inaccuracies are entirely my fault.
So, what we have from the last post is an understanding of what code clones are, what different types of clones there are, and an understanding that we should try to avoid them if possible. Now, let’s take a look at how to detect such code clones in a code base (short of hiring someone to read code and just flag clones manually).
To do this we will consider four different approaches in increasing order of abstraction.
I’ve been keeping very busy since I moved to Norway for my internship, but today finally was one of those rainy October Sundays so I had the time to sit down and finally finish this post which I have been meaning to write for a little over a year but never got around to. Well, here it goes: All the stuff below is based on the German paper I wrote for this class, which is in turn based on the publications referenced in the paper and at the bottom of this page. Any gross oversimplifications and inaccuracies are entirely my fault.
Way back in the 6th semester of my Bachelor’s at TUM I had the opportunity to take a seminar class focusing on software quality. The general idea was to get a better understanding of what good code quality is and go beyond the notion of “a good developer know good code when they see it” and also to investigate if there are metrics that might be interesting. My topic was code clones and I was working under the supervision of a researcher at TUM who has published on exactly this topic.
What is a Code Clone?
First up, let’s define what we mean by Code Clone. One definition seems immediately obvious:
“Code Clones are segments of code that are similar according to some definition of similarity.” – Ira Baxter
However, that doesn’t really help us if we want a computer to analyze code to find clones. So let’s take a look at a definition used in literature that we can throw at a computer:
Type I clones: two segments are clones if they are identical save for whitespace and comments.
Type II clones: allow for consistent renaming of identifiers and changing the values of literals.
Type III clones: several statements may be different between clones.
Just a few weeks after coming back from my time abroad, I found myself travelling again for the weekend: In Regensburg (about 1h from Munich), Hackaburg was taking place for the second time. It is a student-run hackathon at OTH Regensburg. I had a blast there the last time, so I naturally decided to go this year as well.
Unlike most of the previous hackathons I attended, we already formed a team prior to the event – all of us Master’s students at TUM. Combined, our team has a little under 20 years of study time at TUM under the belt (which makes me feel kind of wise but also really old 😉 ).
The event kicked off with keynotes by sponsors and pitches by a couple of other participants looking for team members. Soon after, our team was sitting in a bar in downtown Regensburg, trying to think of what we wanted to build. While we had a bunch of decent ideas, none of them quite felt right. So, we decided to start over with a fresh mind the following morning.
Since I am at Waterloo this term I have the awesome opportunity to dive into North American Hackathon culture. The first event I took part in was Hack the North in the middle of September. It is Canada’s biggest hackathon and well known even in Europe.
After meeting a lot of awesome people at the team building and brainstorming session, I decided to form a team together with three full-time Waterloo students and we set out to pursue an education-related idea: Did your teachers in grade school ever reward you for doing well in class with stars and/or stickers? Some of our teachers went beyond this and rewarded their students with points that could be spent on rewards, such as pizza, movie days, and toys. This was the original idea for what later came to be called Points and Me; to create a elegant, approachable, and friendly system for rewarding students for excelling in academics.
So, a couple of other exchange students and me were asked to give a short presentation on our home universities and opportunities to go to Germany at The University of Waterloo. This post is an attempt to collect some of the relevant information so others can refer to it later.
This past Saturday, two friends and I took part in the German Collegiate Programming Contest. Since none of us actually trained much beforehand and it was the first ICPC-like programming contest for two thirds of the team (me included), we were not one of those teams that managed to solve almost all problems, but ended right in the middle of the ranking with a total of three solved problems (even though we were really, really close to solving four 😉 ).
Nevertheless, I’d like to share some insights into one of the problems the organizers of the GCPC came up with. I really like it because there is a beautiful algorithm that you can apply to it and (mostly because) it is somewhat StarTrek related 😉 . The problem I’m talking about is problem L of the GCPC 2016 which you can find in this PDF. The solution of course contains spoilers, so if the problem statement makes you curious I’d recommend giving it a try first and coming back later to see if you found a more elegant solution. If you are a TUM student, you can even run it against all the tests that were used in the contest on TUMJudge.
Earlier this month I attended Dragonhack in beautiful Ljubljana, Slovenia. It was a truly amazing hackathon and I got to know and work with a whole bunch of awesome people. Turns out the organizers made an aftermovie for the event which they have recently released and which I’m gonna share with you today. Try to see if you can spot me in the video, I haven’t found myself in it so far…
However, they also had a professional photographer at the venue who took over 200 photos, so there inevitably are also some photos of us. I’m gonna share the two best photos of us: One of our entire team deep in discussion and one of me on stage demoing our hack with my teammates in the background launching a last-minute effort to get our livedemo to work.
After finishing my thesis I thought I should do something with the rest of my semester break and so I decided to go to Brussels for a few days. I went with two old friends from high school. All three of us had already been to Brussels for a school trip over half a decade ago.
I’m just gonna share a few nice pictures we took (Mostly not mine though 😉 ). As always I’m the only one in the pictures, we Germans tend to value privacy very much 😉
Unrelated PS: As promised, I’m gonna share my term paper from last year on Code Cloning in OpenSource Software in a few days and also show some code cloning offenses that I committed myself over the years (mostly old code from Easy Feed Editor).
Despite really wanting to, I didn’t find the time to go to Hamburg for the 32C3 (bachelor’s thesis + TOEFL exam in January + work) but I caught up on some of the talks thanks to the amazing recordings the CCC always provides. I’m gonna share some of my favorite talks here with you. (Small disclaimer: I obviously didn’t watch every single talk ;)) Continue reading My favorite Talks at 32c3→