Process Biochemistry, Vol.44, No.6, 654-660, 2009
Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition
Understating the adaptation mechanism of enzymes to pH extremes and discriminating them is a challenging task and would help to design stable enzymes. In this work, we have systematically analyzed the secondary structure amino acid compositions of 105 acidic and 111 alkaline enzymes, respectively. We found that the propensity of the individual residues to participate in different secondary structures might be a general stability mechanism for their adaptation to pH extremes. Based on it, we present a secondary structure amino acid composition method for extracting useful features from sequence, and a novel ensemble classifier named random forest was used. The overall prediction accuracy evaluated by the 10-fold cross-validation reached 90.7%. Comparing our method with other feature extraction methods, the improvement of the overall prediction accuracy ranged from 5.5% to 21.2%. The random forests algorithm also outperformed other machine learning techniques with an improvement ranging from 3.2% to 19.9%. Crown Copyright (C) 2009 Published by Elsevier Ltd. All rights reserved.
Keywords:Acidic enzyme;Alkaline enzyme;Adaptation mechanism;Secondary structure amino acid composition;Feature extraction;Random forests