[Biodevelopers] Batch download of RefSeq or dbSNP?

Christopher Dwan cdwan at bioteam.net
Wed Jul 5 14:25:25 EDT 2006


I'm writing some scripts to download data.  Specifically, I need  
FASTA versions of:

* All the "finished" mouse proteins in refseq
* All the "finished" human proteins in refseq
* All the sequences in dbSNP

Ideally, my script would produce updated versions of these datasets  
nightly or so.  I would prefer to do this without spamming the NCBI  
servers (or my bandwidth providers) too much.

I've messed around with the bioperl Bio::DB routines enough to get  
really confused by ENTREZ queries.  I've also looked at the FASTA  
source available through FTP from NCBI, and that confused me more.

How do smart people do this sort of thing these days?

-Chris Dwan




More information about the Biodevelopers mailing list