Software Code Clone Detection Using AST

  IJPTT-book-cover
 
International Journal of P2P Network Trends and Technology (IJPTT)          
 
© 2014 by IJPTT Journal
Volume - 4 Issue - 3 
Year of Publication : 2014
Authors : G. Anil kumar , Dr. C.R.K.Reddy , Dr. A. Govardhan

Citation

G. Anil kumar , Dr. C.R.K.Reddy , Dr. A. Govardhan."Software Code Clone Detection Using AST". International Journal of P2P Network Trends and Technology (IJPTT), V4(3):38-43 May - Jun 2014, ISSN:2249-2615, www.ijpttjournal.org, Published by Seventh Sense Research Group.

Abstract

The research which exists suggests that a considerable portion (10-15%) of the source code of large-scale computer programs is duplicate code. Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near misses differing only in single lexemes, or near misses only between complete functions. This paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros.A tool using these techniques is applied to a C production software system of some 500K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well-known compiler method for detecting common sub-expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences, commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.

References

[1] Alfred Aho, Ravi Sethi and Jeffrey Ullman, Compilers, Principles, Techniques and Tools, Addison-Wesley 1986.
[2] Brenda Baker, On Finding Duplication and Near-Duplication in Large Software Systems, Working Conference on Reverse Engineering 1995, IEEE.
[3] P. Barson, N. Davey, S. Field, R. Frank, D.S.W. Tansley, Dynamic Competitive Learning Applied to the Clone Detection Problem, Proceedings of International Workshop on Applications of Neural Networks to Telecommunications 2, 0-8058-2084-1, Lawrence Erlbaum, Mahwah, NJ 1995.
[4] Ira Baxter and Christopher Pidgeon, Software Change through Design Maintenance, International Conference on Software Maintenance, 1997, IEEE.
[5] Jean-Marc DeBaud, DARE: Domain- Augmented Reengineering, Working Conference on Reverse Engineering, 1997, IEEE.
[6] K. Kontogiannis, R. DeMori, E. Merlo, M. Galler, and M. Bernstein, Pattern Matching for Clone and Concept Detection, Journal of Automated Software Engineering 3, 77-108, 1996, Kluwer Academic Publishers, Norwell, Massachusetts.
[7] J.H. Johnson, Substring Matching for Clone Detection and Change tracking, Proceedings of the International Conference on Software Maintenance 1994, IEEE.
[8] H. Johnson, Navigating the Textual Redundancy Web in Legacy Source, Proceedings of CASCON ’96, Toronto, Ontario, November 1996.
[9] B. Lague, D. Proulx, E. Merlo, J. Mayrand, J. Hudepohl, Assessing the Benefits of Incorporating Function Clone Detection in a Development Process, International Conference on Software Maintenance 1997, IEEE.
[10] Generalized LR Parsing, Masaru Tomita ed., 1991, Kluwer Academic Publishers, Norwell, Massachusetts
[11] Tim Wagner and Susan Graham, Incremental Analysis of Real Programming Languages, Proceedings 1997 SIGPLAN Conference on Programming Language Design and Implementation, June 1997, ACM

Keywords
Software maintenance, clone detection, software evaluation, Design Maintenance System.