[BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank?

dale richardson dalesan at gmail.com
Wed Apr 1 15:17:15 EDT 2009


Hello All,

Please forgive me if this post comes off as inexperienced, but if any  
of you have the time I would like to hear your suggestions on the  
following problem.

I've got a set of genomic DNA sequences for a number of species. What  
I want to do is to obtain only full-length cDNA matches to these  
genomic sequences from GenBank, excluding Refseq sequences. What I've  
been doing so far is blasting these genomic sequences against the nr  
nucleotide database and manually evaluating which hits to keep or  
discard, depending on the coverage of the subject sequence to the  
query. While this method may be suitable for organisms with poorly  
characterized expression data, when trying to do this for mouse or  
human the task becomes entirely daunting.

So my question is this:

What is the most efficient way to obtain a set of cDNA sequences that  
match to a set of genomic DNA sequences while excluding spurious  
hits , RefSeq sequences and "pseudo" full length cDNAs?

As you can imagine, I am interesting in looking for alternative splice  
variants for a number of genes.

Any information or help that you could graciously muster would be very  
much appreciated.

with sincere regards,

dale richardson





  




More information about the BBB mailing list