Biochemical and Biophysical Research Communications, Vol.307, No.2, 382-388, 2003
ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes
A new system to recognize protein coding genes in the coronavirus genomes, specially suitable for the SARS-CoV genotiles, has been proposed in this paper. Compared with some existing systems, the new program package has the merits of simplicity, high accuracy, reliability, and quickness. The system ZCURVE_CoV has been run for each of the 11 newly sequenced SARS-CoV genomes. Consequently, six genomes not annotated previously have been annotated, and some problems of previous annotations in the remaining five genomes have been pointed out and discussed. In addition to the polyprotein chain ORFs 1a and 1b and the four genes coding for the major structural proteins, spike (S), small envelop (E), membrane (M), and nuleocaspid (N), respectively, ZCURVE_CoV also predicts 5-6 Putative proteins in length between 39 and 274 amino acids with unknown functions. Some single nucleotide mutations within these putative coding sequences have been detected and their biological implications are discussed. A web service is provided, by which a user can obtain the annotated result immediately by pasting the SARS-CoV genome sequences into the input window on the web site (http://tubic.tju.cdu.cn/sars/). The software ZCURVE-CoV can also be downloaded freely from the web address mentioned above and run in Computers under the platforms of Windows or Linux. (C) 2003 Elsevier Inc. All rights reserved.