Vladimir Itsykson

Marat Akhin

   Excessive code duplication is a bane of modern software development. Several experimental studies show that on average 15 percent of a software system can contain source code clones – repeatedly reused fragments of similar code. While code duplication may increase the speed of initial software development, it undoubtedly leads to problems during software maintenance and support. That is why many developers agree that software clones should be detected and dealt with at every stage of software development life cycle.
   This paper is a brief survey of current state-of-the-art in clone detection. First, we highlight main sources of code cloning such as copy-and-paste programming, mental code patterns and performance optimizations. We discuss reasons behind the use of these techniques from the developer’s point of view and possible alternatives to them.
   Second, we outline major negative effects that clones have on software development. The most serious drawback duplicated code have on software maintenance is increasing the cost of modifications – any modification that changes cloned code must be propagated to every clone instance in the program. Software clones may also create new software bugs when a programmer makes some mistakes during code copying and modification. Increase of source code size due to duplication leads to additional difficulty of code comprehension.
   Third, we review existing clone detection techniques. Classification based on used source code representation model is given in this work. We also describe and analyze some concrete examples of clone detection techniques highlighting main distinctive features and problems that are present in practical clone detection.
   Finally, we point out some open problems in the area of clone detection. Currently questions like “What is a code clone?”, “Can we predict the impact clones have on software quality” and “How can we increase both clone detection precision and recall at the same time?” stay open to further research. We list the most important questions in modern clone detection and explain why they continue to remain unanswered despite all the progress in clone detection research.

Bio

Vladimir Itsykson graduated cum laude from Saint-Petersburg State Polytechnical University in 1996. He received his Ph.D in Computer Science in 2000. Currently (as of late 2010) he is an associate professor of Computer Systems and Software Engineering department in SPbSPU and leads hardware/software R&D lab there.
Since 2005 he’s been a director of several R&D projects on software static analysis that were done by government orders or in collaboration with the world’s leading IT companies including (but not limited to) Panasonic, General Motors and Intel.

His fields of interest include:

  • Software engineering
  • Formal methods of software analysis and synthesis
  • Source code defect detection
  • Automation of software testing
  • Software reengineering and reverse-engineering

He’s an author and co-author of over 100 publications in various areas of computer science and software engineering


Marat Akhin is a researcher in the field of computer science and software engineering. He began his scientific work during senior undergraduate years and participated in several R&D projects over the years, including:

  • a collaboration project with Panasonic company on source code pattern analysis
  • a R&D project on automated C/C++ static analysis for defect detection

In 2009 he graduated summa cum laude from Saint-Petersburg State Polytechnical University with Master’s Degree in Computer Science and after that entered Ph.D program there. As of late 2010 he is taking part in a R&D project on JavaScript static analysis and is also actively pursuing his Ph.D on incremental clone detection.
Marat is a four time winner of the prestige Vladimir Potanin’s scholarship contest, an author of over 20 publications in computer science. His fields of interest include:

  • Clone detection
  • Modern software development technologies
  • Software static analysis