[Bioclusters] Local copy of NCBI

Tim Harsch bioclusters@bioinformatics.org
Thu, 18 Sep 2003 10:01:14 -0700


I'd like to clear up some of this and at the same time add to the confusion
:-)

Please correct me if I'm wrong.  NCBI has a new preferred method of
downloading the databases.  You start with the huge entire set of databases,
and then run fmerge (see ftp://ftp.ncbi.nih.gov/blast/db/blastdb.txt)

What I don't get is how to develop the basic algorithm, for instance today's
blast databases are dated 9/16, but so are the "rolling month" files
referenced in blast announcement #024.  Although #033 seems to be a more
recent explanation.

NCBI did a reorg of the blast databases.  This is what I'm guessing an
algorithm would look like, and this is where I'd like to get affirmation:

Pick a day of the month to download the entire preformatted databases.
Run fmerge on them in create mode
Nightly download the month.* files
    Run fmerge on them against their intended preformatted database.  I
don't understand the exact mapping...

Oh, and apparently the rolling month files contain 30 days of new sequences,
what about 31 day months?

And what are the *.MSK files??  For instance if you take a look at
est_others.tar.gz, it is very small and there are only *.nal and *.msk files
in there.  Is the msk file used by the blast standalone as an instruction as
to how to add new data into the database when blasting, or is it an
instruction to fmerge for putting them together or what?

----- Original Message ----- 
From: "Osborne, John" <jko1@cdc.gov>
To: <bioclusters@bioinformatics.org>
Cc: "Tang, Kevin" <kht7@cdc.gov>
Sent: Thursday, September 18, 2003 6:46 AM
Subject: [Bioclusters] Local copy of NCBI


> Hi everyone,
> What are people out there doing to get a local copy of NCBI's databases?
I
> mean RefSeq, dbSNP, taxonomy, etc...  We've been updating our copy ad-hoc
by
> ftp, are most people just putting this into a cron job?
>
> I've heard that the NCBI tookkit offers something like this (to get daily
> updates via web services or something) but I don't know where to look.
> getseq looks suspicious but I need to configure it using entrez2, which
> needs X Windows, which needs vibrant, which means RH dependency hell...
Is
> there a simple commandline way to get get a seequence from NCBI and keep a
> local copy of NCBI?
>
>  -John
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters