Journal of the American Chemical Society, Vol.130, No.1, 176-185, 2008
Selecting folded proteins from a library of secondary structural elements
A protein evolution strategy is described by which double-stranded DNA fragments encoding defined Escherichia coli protein secondary structural elements (a-helices, P-strands, and loops) are assembled semirandomly into sequences comprised of as many as 800 amino acid residues. A library of novel polypeptides generated from this system was inserted into an enhanced green fluorescent protein (EGFP) fusion vector. Library members were screened by fluorescence activated cell sorting (FACS) to identify those polypeptides that fold into soluble, stable structures in vivo that comprised a subset of shorter sequences (similar to 60 to 100 residues) from the semirandom sequence library. Approximately 108 clones were screened by FACS, a set of 1149 high fluorescence colonies were characterized by dPCR, and four soluble clones with varying amounts of secondary structure were identified. One of these is highly homologous to a domain of aspartate racemase from a marine bacterium (Polaromonas sp.) but is not homologous to any E coli protein sequence. Several other selected polypeptides have no global sequence homology to any known protein but show significant a-helical content, limited dispersion in 1 D nuclear magnetic resonance spectra, pH sensitive ANS binding and reversible folding into soluble structures. These results demonstrate that this strategy can generate novel polypeptide sequences containing secondary structure.