화학공학소재연구정보센터
Biochemical and Biophysical Research Communications, Vol.339, No.3, 1015-1020, 2006
Prediction of protease types in a hybridization space
Regulating most physiological processes by controlling the activation, synthesis, and turnover of proteins, proteases play pivotal regulatory roles in conception, birth, digestion, growth, maturation, ageing, and death of all organisms. Different types of proteases have different functions and biological processes. Therefore, it is important for both basic research and drug discovery to consider the following two problems. (1) Given the sequence of a protein, can we identify whether it is a protease or non-protease? (2) If it is, what protease type does it belong to? Although the two problems can be solved by various experimental means, it is both time-consuming and costly to do so. The avalanche of protein sequences generated in the post-genetic era has challenged us to develop an automated method for making a fast and reliable identification. By hybridizing the functional domain composition and pseudo-amino acid composition, we have introduced a new method called "FunD-PseAA(1) predictor" that is operated in a hybridization space. To avoid redundancy and bias, demonstrations were performed on a dataset where none of the proteins has >= 25% sequence identity to any other. The overall success rate thus obtained by the jackknife cross-validation test in identifying protease and non-protease was 92.95%, and that in identifying the protease type was 94.75% among the following six types: (1) aspartic, (2) cysteine, (3) glutamic, (4) metallo, (5) serine, and (6) threonine. Demonstration was also made on an independent dataset, and the corresponding overall success rates were 98.36% and 97.11%, respectively, suggesting the FunD-PseAA predictor is very powerful and may become a useful tool in bioinformatics and proteomics. (c) 2005 Elsevier Inc. All rights reserved.