[Biodevelopers] Batch download of RefSeq or dbSNP?

Ethan Strauss ethan.strauss at promega.com
Thu Jul 6 10:39:36 EDT 2006


Hi Chris,
	Can you pull the sequences from ftp://ftp.ncbi.nih.gov/refseq/
originally and then get updates daily from
ftp://ftp.ncbi.nih.gov/refseq/daily/? I have downloaded releases without
too much trouble. I have never dealt with the daily updates, but I would
think it would be fairly easy to get them and then sort the sequences
into the appropriate (mouse, human, whatever) bins.
	I have never looked at dbSNP, but
ftp://ftp.ncbi.nih.gov/snp/database/README.create_local_dbSNP.txt looks
helpful. 
Ethan 

-----Original Message-----
From: biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.org
[mailto:biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.o
rg] On Behalf Of Christopher Dwan
Sent: Wednesday, July 05, 2006 1:25 PM
To: biodevelopers at bioinformatics.org
Subject: [Biodevelopers] Batch download of RefSeq or dbSNP?


I'm writing some scripts to download data.  Specifically, I need FASTA
versions of:

* All the "finished" mouse proteins in refseq
* All the "finished" human proteins in refseq
* All the sequences in dbSNP

Ideally, my script would produce updated versions of these datasets
nightly or so.  I would prefer to do this without spamming the NCBI
servers (or my bandwidth providers) too much.

I've messed around with the bioperl Bio::DB routines enough to get
really confused by ENTREZ queries.  I've also looked at the FASTA source
available through FTP from NCBI, and that confused me more.

How do smart people do this sort of thing these days?

-Chris Dwan

_______________________________________________
Biodevelopers mailing list
Biodevelopers at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/biodevelopers



More information about the Biodevelopers mailing list