Using the current state of the art bioinformatics tools/software, what is the preferred method of *identifying EST sequences* for the subtraction procedure of a cDNA library ? In order to decrease the abundant messages which dominate cDNA libraries, I hope to identify the longest, most abundant, and annotatable (based on e.g. swissprot) ESTs. I would like to get expert opinions on how to most effectively go about it. I have several thousand ESTs and would, for at least this first round, like to identify 96 clones which are the most abundant/longest/annotatable. Approaches I have considered are : 1. Running the entire dataset through CAP3 to produce contigs. Then take the consensus sequence for each contig and run a blastp against swissprot to see if is annotatable. 2. Running an all against all blast search using the ESTs as both the query and the database. Additionally, one could make the database a combination of both the ESTs and swissprot, thus indicating not only which sequences have similar/identical matches within the EST database, but also whether they have a homolog in swissprot Does anything exist in bioperl which performs the necessary sequence analysis for subtraction of a cDNA library? BTW, if these are not the correct listserv/bulletin boards for such a query, please let me know the preferred location. Thank you and Happy Holidays! Tristan Fiedler -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow - Walsh Laboratory NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler at rsmas.miami.edu t.fiedler at umiami.edu (alias) 305-361-4626