
The files in this directory contain the sequence sets 
described in the paper:

Burge, C. & Karlin, S. (1997) Prediction of complete gene
structures in human genomic DNA.  J. Mol. Biol. 268, 78-94.

The file train-LIB.genbank contains the set of 380 genes 
designated with a script L in the paper in GenBank flatfile 
format. This set contains 238 multi-exon genes and 142 
single-exon genes.  The multi-exon genes in this set contain 
a total of 1492 exons and 1254 introns.

The file train-coding-cDNA.genbank contains the set of 1619 
cDNA sequences which, together with the 380 genes described 
above, form the set designated with a script C in the paper.
Again, this file is in GenBank flatfile format.  Note that 
these sequences have been edited slightly from their original 
GenBank formats by trimming away the 5' and 3' UTR portions, 
leaving only the coding portion of the cDNA (ATG -> stop codon).

(the above comments were copied and slightly edited from 
original README file by C. Burge)

