Efficient token based clone detection with flexible tokenization

Please use this identifier to cite or link to this item: https://doi.org/10.1145/1287624.1287698

DC Field	Value
dc.title	Efficient token based clone detection with flexible tokenization
dc.contributor.author	Basit, H.A.
dc.contributor.author	Puglisi, S.J.
dc.contributor.author	Smyth, W.F.
dc.contributor.author	Turpin, A.
dc.contributor.author	Jarzabek, S.
dc.date.accessioned	2013-07-04T08:21:52Z
dc.date.available	2013-07-04T08:21:52Z
dc.date.issued	2007
dc.identifier.citation	Basit, H.A., Puglisi, S.J., Smyth, W.F., Turpin, A., Jarzabek, S. (2007). Efficient token based clone detection with flexible tokenization. Proceedings of the the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2007, ESEC-FSE'07 : 513-516. ScholarBank@NUS Repository. https://doi.org/10.1145/1287624.1287698
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/41197
dc.description.abstract	Code clones are similar code fragments that occur at multiple locations in a software system. Detection of code clones provides useful information for maintenance, reengineering, program understanding and reuse. Several techniques have been proposed to detect code clones. These techniques differ in the code representation used for analysis of clones, ranging from plain text to parse trees and program dependence graphs. Clone detection based on lexical tokens involves minimal code transformation and gives good results, but is computationally expensive because of the large number of tokens that need to be compared. We explored string algorithms to find suitable data structures and algorithms for efficient token based clone detection and implemented them in our tool Repeated Tokens Finder (RTF). Instead of using suffix tree for string matching, we use more memory efficient suffix array. RTF incorporates a suffix array based linear time algorithm to detect string matches. It also provides a simple and customizable tokenization mechanism. Initial analysis and experiments show that our clone detection is simple, scalable, and performs better than the previous well-known tools.
dc.source	Scopus
dc.subject	Clone detection
dc.subject	Reverse engineering
dc.subject	Software maintenance
dc.subject	Token-based clone detection
dc.type	Conference Paper
dc.contributor.department	COMPUTATIONAL SCIENCE
dc.description.doi	10.1145/1287624.1287698
dc.description.sourcetitle	Proceedings of the the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2007, ESEC-FSE'07
dc.description.page	513-516
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM