Please use this identifier to cite or link to this item:
Title: A multilabel text classification algorithm for labeling risk factors in sec form 10-K
Authors: Huang, K.-W. 
Li, Z.
Keywords: Annual reports
Multilabel classification
Risk factors
Text classification
Text mining
Issue Date: 2011
Citation: Huang, K.-W.,Li, Z. (2011). A multilabel text classification algorithm for labeling risk factors in sec form 10-K. ACM Transactions on Management Information Systems 2 (3). ScholarBank@NUS Repository.
Abstract: This study develops, implements, and evaluates a multilabel text classification algorithm called the multilabel categorical K-nearest neighbor (ML-CKNN). The proposed algorithm is designed to automatically identify 25 types of risk factors with specific meanings reported in Section 1A of SEC form 10-K. The idea of ML-CKNN is to compute a categorical similarity score for each label by the K-nearest neighbors in that category. ML-CKNN is tailored to achieve the goal of extracting risk factors from 10Ks. The proposed algorithm can perfectly classify 74.94% of risk factors and 98.75% of labels. Moreover, ML-CKNN is empirically shown to outperform ML-KNN and other multilabel algorithms. The extracted risk factors could be valuable to empirical studies in accounting or finance. © 2011 ACM.
Source Title: ACM Transactions on Management Information Systems
ISSN: 2158656X
DOI: 10.1145/2019618.2019624
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on May 27, 2023

Page view(s)

checked on May 25, 2023

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.