[ssml] Re: ssml-general digest, Vol 1 #18 - 1 msg

Nancy Hansen nhansen at nhgri.nih.gov
Wed Dec 17 09:39:06 EST 2003

Hi Tristan,

	I don't have too much experience with this stuff, but since noone
else has piped in:

	Assuming these are human sequences, I'd take a non-redundant set
of human protein sequences and compare each est sequence against the
protein sequences using blastx.  I think it would be preferable to do a
blastn against annotated CDS (nucleotide) sequences, but I'm not sure
where you can get a nice curated set of those, but the protein sequences
would probably do the trick.

	Bioperl has modules to parse the blast output (and run it in fact,
but since it's just one format and one big query file, I'd just run it
manually).  So with bioperl parsing, it's easy to write a perl script that
will tally up all the best hits by protein id's and report totals.  You
can then examine what you get to see if the results that are most
redundant are good choices.

	Am I missing something, though?  What's the advantage to
assembling them first?


Nancy F. Hansen, PhD	nhansen at nhgri.nih.gov
Bioinformatics Group
NIH Intramural Sequencing Center (NISC)
8717 Grovemont Circle, Rm. 152L
Gaithersburg, MD 20877
Phone: (301) 435-1560	Fax: (301) 435-6170

On Tue, 16 Dec 2003 ssml-general-request at bioinformatics.org wrote:

> When replying, PLEASE edit your Subject line so it is more specific
> than "Re: ssml-general digest, Vol..."
> Today's Topics:
>    1. [Fwd: Re: Request to mailing list ssml-general rejected] (Dan Bolser)
> --__--__--
> Message: 1
> Date: Tue, 16 Dec 2003 09:20:15 -0000 (GMT)
> From: "Dan Bolser" <dmb at mrc-dunn.cam.ac.uk>
> To: <ssml-general at bioinformatics.org>
> Subject: [ssml] [Fwd: Re: Request to mailing list ssml-general rejected]
> Using the current state of the art bioinformatics tools/software, what is the
> preferred method of *identifying EST sequences* for the subtraction procedure of a
> cDNA library ?
> In order to decrease the abundant messages which dominate cDNA libraries, I hope
> to identify the longest, most abundant, and annotatable (based on e.g. swissprot)
> ESTs.  I would like to get expert opinions on how to most effectively go about it.
>  I have several thousand ESTs and would, for at least this first round, like to
> identify 96 clones which are the most abundant/longest/annotatable.
> Approaches I have considered are :
> 1. Running the entire dataset through CAP3 to produce contigs.  Then take the
> consensus sequence for each contig and run a blastp against swissprot to see if is
> annotatable.
> 2.  Running an all against all blast search using the ESTs as both the query and
> the database.  Additionally, one could make the database a combination of both the
> ESTs and swissprot, thus indicating not only which sequences have
> similar/identical matches within the EST database, but also whether they have a
> homolog in swissprot
> Does anything exist in bioperl which performs the necessary sequence analysis for
> subtraction of a cDNA library?
> BTW, if these are not the correct listserv/bulletin boards for such a query,
> please let me know the preferred location.
> Thank you and Happy Holidays!
> Tristan Fiedler
> --
> Tristan J. Fiedler, Ph.D.
> Postdoctoral Research Fellow - Walsh Laboratory
> NIEHS Marine & Freshwater Biomedical Sciences Center
> Rosenstiel School of Marine & Atmospheric Sciences
> University of Miami
> tfiedler at rsmas.miami.edu
> t.fiedler at umiami.edu (alias)
> 305-361-4626
> --__--__--
> _______________________________________________
> ssml-general mailing list
> ssml-general at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/ssml-general
> End of ssml-general Digest

More information about the ssml-general mailing list