[Biodevelopers] Batch download of RefSeq or dbSNP?

Titus Brown titus at caltech.edu
Wed Jul 5 20:24:13 EDT 2006


On Wed, Jul 05, 2006 at 02:25:25PM -0400, Christopher Dwan wrote:
-> 
-> I'm writing some scripts to download data.  Specifically, I need  
-> FASTA versions of:
-> 
-> * All the "finished" mouse proteins in refseq
-> * All the "finished" human proteins in refseq
-> * All the sequences in dbSNP
-> 
-> Ideally, my script would produce updated versions of these datasets  
-> nightly or so.  I would prefer to do this without spamming the NCBI  
-> servers (or my bandwidth providers) too much.
-> 
-> I've messed around with the bioperl Bio::DB routines enough to get  
-> really confused by ENTREZ queries.  I've also looked at the FASTA  
-> source available through FTP from NCBI, and that confused me more.
-> 
-> How do smart people do this sort of thing these days?

I don't know if I'm smart, but I use the NCBI Web services interface
directly,

	http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

You can also use SOAP:

	http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html

The three tasks you mention above should be pretty easy with the basic
EUtils interface.

cheers,
--titus



More information about the Biodevelopers mailing list