Biochemical and Biophysical Research Communications, Vol.357, No.2, 453-460, 2007
Prediction of protein structural class for the twilight zone sequences
Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in-silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation. (c) 2007 Elsevier Inc. All rights reserved.
Keywords:SCOP structural class;structural class prediction;secondary structure;twilight zone proteins;low sequence homology;sequence representation