Please use this identifier to cite or link to this item:
https://doi.org/10.1109/ACCESS.2020.2967449
Title: | CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph | Authors: | Sarwar, R. Urailertprasert, N. Vannaboot, N. Yu, C. Rakthanmanon, T. Chuangsuwanich, E. Nutanong, S. |
Keywords: | authorship identification co-authorship graph multi-author documents scientometrics Set similarity search stylometry |
Issue Date: | 2020 | Publisher: | Institute of Electrical and Electronics Engineers Inc. | Citation: | Sarwar, R., Urailertprasert, N., Vannaboot, N., Yu, C., Rakthanmanon, T., Chuangsuwanich, E., Nutanong, S. (2020). CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph. IEEE Access 8 : 18374-18393. ScholarBank@NUS Repository. https://doi.org/10.1109/ACCESS.2020.2967449 | Abstract: | Stylometry has been successfully applied to perform authorship identification of single-author documents (AISD). The AISD task is concerned with identifying the original author of an anonymous document from a group of candidate authors. However, AISD techniques are not applicable to the authorship identification of multi-author documents (AIMD). Unlike AISD, where each document is written by one single author, AIMD focuses on handling multi-author documents. Due to the combinatoric nature of documents, AIMD lacks the ground truth information - that is, information on writing and non-writing authors in a multi-author document - which makes this problem more challenging to solve. Previous AIMD solutions have a number of limitations: (i) the best stylometry-based AIMD solution has a low accuracy, less than 30%; (ii) increasing the number of co-authors of papers adversely affects the performance of AIMD solutions; and (iii) AIMD solutions were not designed to handle the non-writing authors (NWAs). However, NWAs exist in real-world cases - that is, there are papers for which not every co-author listed has contributed as a writer. This paper proposes an AIMD framework called the Co-Authorship Graph that can be used to (i) capture the stylistic information of each author in a corpus of multi-author documents and (ii) make a multi-label prediction for a multi-author query document. We conducted extensive experimental studies on one synthetic and three real-world corpora. Experimental results show that our proposed framework (i) significantly outperformed competitive techniques; (ii) can effectively handle a larger number of co-authors in comparison with competitive techniques; and (iii) can effectively handle NWAs in multi-author documents. @ 2013 IEEE. | Source Title: | IEEE Access | URI: | https://scholarbank.nus.edu.sg/handle/10635/198973 | ISSN: | 2169-3536 | DOI: | 10.1109/ACCESS.2020.2967449 |
Appears in Collections: | Staff Publications Elements |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
10_1109_ACCESS_2020_2967449.pdf | 1.89 MB | Adobe PDF | OPEN | None | View/Download |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.