화학공학소재연구정보센터
Polymer, Vol.45, No.2, 525-546, 2004
On the use of secondary structure in protein structure prediction: a bioinformatic analysis
The amount of structural information encoded in secondary structure can be measured by its ability to specify the correct peptide backbone conformation of protein chains. Using methodology derived from information theory, we generate optimized distributions of backbone phipsi dihedral angle pairs given either correct or predicted three-state secondary structure. Entropy measurements on these distributions provide a means to determine the effect of secondary structure knowledge on identifying the actual 3D conformation of protein chains. We find that only a modest fraction of the total uncertainty in phi-psi conformation (from 14 to 38%, at 20-90degrees resolutions, respectively) is resolved even with perfect knowledge of secondary structure. We further show that prediction of secondary structures, because of an accuracy ceiling below 80%, degrades structural information substantially. If prediction accuracy is below 50%, virtually no advantage is gained from using the prediction. Moreover, even state-of-the-art prediction accuracy of 75% retains less than one-third of the structural information encoded in secondary structure. We demonstrate that the level of structural description affects the amount of information extracted. The effort to provide as much structural detail as possible, while faced with a limited structural data set, results in an optimum resolution in the vicinity of a 20degrees-partition of the (phi, psi) plane. We show that structural information increases exponentially with prediction accuracy, revealing that even marginal gains in the performance of secondary structure prediction algorithms are important for the retention of structural information. We observe that different kinds of secondary structure prediction outputs (single-state prediction, single-state prediction with a confidence index, and three-state probability prediction) do not differ greatly in the amount of structural information they yield, so long as the methods formulated in this work to generate propensity distributions are applied appropriately. The optimal phi-psi probability distributions developed here may be useful in biasing searches in structure space. We discuss the sources of the degradation of information caused by errors in secondary structure prediction, and their consequences for the prediction of the 3D conformation of protein chains. (C) 2003 Elsevier Ltd. All rights reserved.