CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ACCESS.2020.2967449

DC Field	Value
dc.title	CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph
dc.contributor.author	Sarwar, R.
dc.contributor.author	Urailertprasert, N.
dc.contributor.author	Vannaboot, N.
dc.contributor.author	Yu, C.
dc.contributor.author	Rakthanmanon, T.
dc.contributor.author	Chuangsuwanich, E.
dc.contributor.author	Nutanong, S.
dc.date.accessioned	2021-08-24T02:39:31Z
dc.date.available	2021-08-24T02:39:31Z
dc.date.issued	2020
dc.identifier.citation	Sarwar, R., Urailertprasert, N., Vannaboot, N., Yu, C., Rakthanmanon, T., Chuangsuwanich, E., Nutanong, S. (2020). CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph. IEEE Access 8 : 18374-18393. ScholarBank@NUS Repository. https://doi.org/10.1109/ACCESS.2020.2967449
dc.identifier.issn	2169-3536
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/198973
dc.description.abstract	Stylometry has been successfully applied to perform authorship identification of single-author documents (AISD). The AISD task is concerned with identifying the original author of an anonymous document from a group of candidate authors. However, AISD techniques are not applicable to the authorship identification of multi-author documents (AIMD). Unlike AISD, where each document is written by one single author, AIMD focuses on handling multi-author documents. Due to the combinatoric nature of documents, AIMD lacks the ground truth information - that is, information on writing and non-writing authors in a multi-author document - which makes this problem more challenging to solve. Previous AIMD solutions have a number of limitations: (i) the best stylometry-based AIMD solution has a low accuracy, less than 30%; (ii) increasing the number of co-authors of papers adversely affects the performance of AIMD solutions; and (iii) AIMD solutions were not designed to handle the non-writing authors (NWAs). However, NWAs exist in real-world cases - that is, there are papers for which not every co-author listed has contributed as a writer. This paper proposes an AIMD framework called the Co-Authorship Graph that can be used to (i) capture the stylistic information of each author in a corpus of multi-author documents and (ii) make a multi-label prediction for a multi-author query document. We conducted extensive experimental studies on one synthetic and three real-world corpora. Experimental results show that our proposed framework (i) significantly outperformed competitive techniques; (ii) can effectively handle a larger number of co-authors in comparison with competitive techniques; and (iii) can effectively handle NWAs in multi-author documents. @ 2013 IEEE.
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.source	Scopus OA2020
dc.subject	authorship identification
dc.subject	co-authorship graph
dc.subject	multi-author documents
dc.subject	scientometrics
dc.subject	Set similarity search
dc.subject	stylometry
dc.type	Article
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1109/ACCESS.2020.2967449
dc.description.sourcetitle	IEEE Access
dc.description.volume	8
dc.description.page	18374-18393
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
10_1109_ACCESS_2020_2967449.pdf		1.89 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM