Please use this identifier to cite or link to this item: https://doi.org/10.1145/1287624.1287698
DC FieldValue
dc.titleEfficient token based clone detection with flexible tokenization
dc.contributor.authorBasit, H.A.
dc.contributor.authorPuglisi, S.J.
dc.contributor.authorSmyth, W.F.
dc.contributor.authorTurpin, A.
dc.contributor.authorJarzabek, S.
dc.date.accessioned2013-07-04T08:21:52Z
dc.date.available2013-07-04T08:21:52Z
dc.date.issued2007
dc.identifier.citationBasit, H.A., Puglisi, S.J., Smyth, W.F., Turpin, A., Jarzabek, S. (2007). Efficient token based clone detection with flexible tokenization. Proceedings of the the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2007, ESEC-FSE'07 : 513-516. ScholarBank@NUS Repository. https://doi.org/10.1145/1287624.1287698
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/41197
dc.description.abstractCode clones are similar code fragments that occur at multiple locations in a software system. Detection of code clones provides useful information for maintenance, reengineering, program understanding and reuse. Several techniques have been proposed to detect code clones. These techniques differ in the code representation used for analysis of clones, ranging from plain text to parse trees and program dependence graphs. Clone detection based on lexical tokens involves minimal code transformation and gives good results, but is computationally expensive because of the large number of tokens that need to be compared. We explored string algorithms to find suitable data structures and algorithms for efficient token based clone detection and implemented them in our tool Repeated Tokens Finder (RTF). Instead of using suffix tree for string matching, we use more memory efficient suffix array. RTF incorporates a suffix array based linear time algorithm to detect string matches. It also provides a simple and customizable tokenization mechanism. Initial analysis and experiments show that our clone detection is simple, scalable, and performs better than the previous well-known tools.
dc.sourceScopus
dc.subjectClone detection
dc.subjectReverse engineering
dc.subjectSoftware maintenance
dc.subjectToken-based clone detection
dc.typeConference Paper
dc.contributor.departmentCOMPUTATIONAL SCIENCE
dc.description.doi10.1145/1287624.1287698
dc.description.sourcetitleProceedings of the the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2007, ESEC-FSE'07
dc.description.page513-516
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.