Review Sharing via Deep Semi-Supervised Code Clone Detection

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ACCESS.2020.2966532

DC Field	Value
dc.title	Review Sharing via Deep Semi-Supervised Code Clone Detection
dc.contributor.author	Guo, C.
dc.contributor.author	Yang, H.
dc.contributor.author	Huang, D.
dc.contributor.author	Zhang, J.
dc.contributor.author	Dong, N.
dc.contributor.author	Xu, J.
dc.contributor.author	Zhu, J.
dc.date.accessioned	2021-08-19T02:16:58Z
dc.date.available	2021-08-19T02:16:58Z
dc.date.issued	2020
dc.identifier.citation	Guo, C., Yang, H., Huang, D., Zhang, J., Dong, N., Xu, J., Zhu, J. (2020). Review Sharing via Deep Semi-Supervised Code Clone Detection. IEEE Access 8 : 24948-24965. ScholarBank@NUS Repository. https://doi.org/10.1109/ACCESS.2020.2966532
dc.identifier.issn	21693536
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/197871
dc.description.abstract	Code review as a typical type of user feedback has recently drawn increasing attentions for improving code quality. To carry out research on code review, sufficient review data is normally required. As a result, recent efforts commonly focus on analysis for projects with sufficient reviews (called 's-projects'), rather than projects with extremely few ones (called 'f-projects'). Actually, through statistics on public platforms, the latter ones dominate open source software, in which novel approaches should be explored to improve their review-based code improvement. In this paper, we try to address the problem via building a review sharing channel where the informative review can be reasonably delivered from s-projects to the f-projects. To ensure the accuracy of shared reviews, we introduce a novel code clone detection model based on Convolutional Neural Network (CNN), and build suitable 's-projects, f-projects' pairs through the clone detection. Especially, to alleviate the dataset heterogeneity between the training and testing, an autoencoder-based semi-supervised learning strategy is employed. Furthermore, to improve the sharing experience, heuristic filtering tactics are applied to reduce the time cost. Meanwhile, the LDA (Latent Dirichlet Allocation)-based ranking algorithm is used for presenting diverse review themes. We have implemented the sharing channel as a prototype system RSharer+, which contains three representative modules: data preprocessing, code clone detection and review presentation. The collected datasets are first transformed into context-sensitive numerical vectors in the data proprecessing. Then in the clone detection, data vectors are trained and tested on the BigCloneBench and real code-review pairs. At last, the presentation module provides review classification and theme extraction for better sharing experience. Extensive comparative experiments on hundreds of real labelled code fragments demonstrate the precision of clone detection and the effectiveness of review sharing. © 2013 IEEE.
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.source	Scopus OA2020
dc.subject	Code clone
dc.subject	deep learning
dc.subject	review sharing
dc.subject	semi-supervised CNN
dc.subject	software review
dc.type	Article
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1109/ACCESS.2020.2966532
dc.description.sourcetitle	IEEE Access
dc.description.volume	8
dc.description.page	24948-24965
Appears in Collections:	Elements Staff Publications

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
10_1109_ACCESS_2020_2966532.pdf		2.29 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM