On Wed, Jul 05, 2006 at 02:25:25PM -0400, Christopher Dwan wrote: -> -> I'm writing some scripts to download data. Specifically, I need -> FASTA versions of: -> -> * All the "finished" mouse proteins in refseq -> * All the "finished" human proteins in refseq -> * All the sequences in dbSNP -> -> Ideally, my script would produce updated versions of these datasets -> nightly or so. I would prefer to do this without spamming the NCBI -> servers (or my bandwidth providers) too much. -> -> I've messed around with the bioperl Bio::DB routines enough to get -> really confused by ENTREZ queries. I've also looked at the FASTA -> source available through FTP from NCBI, and that confused me more. -> -> How do smart people do this sort of thing these days? I don't know if I'm smart, but I use the NCBI Web services interface directly, http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html You can also use SOAP: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html The three tasks you mention above should be pretty easy with the basic EUtils interface. cheers, --titus