[Biodevelopers] Batch download of RefSeq or dbSNP?
Titus Brown
titus at caltech.edu
Wed Jul 5 20:24:13 EDT 2006
On Wed, Jul 05, 2006 at 02:25:25PM -0400, Christopher Dwan wrote:
->
-> I'm writing some scripts to download data. Specifically, I need
-> FASTA versions of:
->
-> * All the "finished" mouse proteins in refseq
-> * All the "finished" human proteins in refseq
-> * All the sequences in dbSNP
->
-> Ideally, my script would produce updated versions of these datasets
-> nightly or so. I would prefer to do this without spamming the NCBI
-> servers (or my bandwidth providers) too much.
->
-> I've messed around with the bioperl Bio::DB routines enough to get
-> really confused by ENTREZ queries. I've also looked at the FASTA
-> source available through FTP from NCBI, and that confused me more.
->
-> How do smart people do this sort of thing these days?
I don't know if I'm smart, but I use the NCBI Web services interface
directly,
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
You can also use SOAP:
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html
The three tasks you mention above should be pretty easy with the basic
EUtils interface.
cheers,
--titus
More information about the Biodevelopers
mailing list