Please use this identifier to cite or link to this item:
|Title:||Using feature generation and feature selection for accurate prediction of translation initiation sites.||Authors:||Zeng, F.
|Issue Date:||2002||Citation:||Zeng, F.,Yap, R.H.,Wong, L. (2002). Using feature generation and feature selection for accurate prediction of translation initiation sites.. Genome informatics series : proceedings of the . Workshop on Genome Informatics. Workshop on Genome Informatics 13 : 192-200. ScholarBank@NUS Repository.||Abstract:||Correct prediction of the translation initiation site (TIS) is an important issue in genomic research. We show that feature generation together with correlation based feature selection can be used with a variety of machine learning algorithms to give highly accurate translation initiation site prediction. Only very few features are needed and the results achieve comparable accuracy to the best existing approaches. Our approach has the advantage that it does not require one to devise a special prediction method; rather standard machine learning classifiers are shown to give very good performance on the selected features. The raw and generated features which we have found to be important are the following: positions -3 and -1 in the sequence; upstream k-grams for k=3, 4, and 5; stop-codon frequency; downstream in-frame 3-gram; and the distance of ATG to the beginning of the sequence. The best result, with an overall accuracy of 90%, is obtained by selecting only seven features from this set. The same features retrained with the use of a scanning model achieves an overall accuracy of 94% on this dataset.||Source Title:||Genome informatics series : proceedings of the . Workshop on Genome Informatics. Workshop on Genome Informatics||URI:||http://scholarbank.nus.edu.sg/handle/10635/39186||ISSN:||09199454|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Feb 19, 2020
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.