화학공학소재연구정보센터
Biochemical and Biophysical Research Communications, Vol.399, No.3, 402-405, 2010
Prediction of protein subcellular localization by weighted gene ontology terms
We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Hoglund IDS, fungal Hoglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations. (C) 2010 Elsevier Inc. All rights reserved.