Abstract

Several types of clones exist in software systems due to the copy-paste activity, developer limitations, language restrictions, and software development lifecycle. This work studies the issues of cloning in server side technologies for web applications. We studied 11 different reasonable size (average over 22K LOC) web development projects coded in C#, Java, Ruby-on-Rails (ROR), and PHP based on the same set of requirements. We identified and analyzed simple and structural clones present in these systems in order to compare the different technologies in terms of number of clones, clone size, clone coverage, reasons behind creation of clones, and the ratio of refactorable and non-refactorable clones. Our study focused only on the base languages of these server side technologies. Our analyses show that C# has the highest number of clones and ROR has the lowest. C# also has the highest and ROR has the lowest percentages of refactorable clones. PHP has the highest clone coverage and ROR has the lowest. Average clone size for all projects ranges from 49.8 to 77.2 tokens. In terms of clone size, there are no significant differences across projects in the same technology. The project size, project architecture, and developer approach dictate the percentage of clones present in a software project. The use of frameworks and design patterns helps control generation of clones.