[BiO BB] Automatic blast database maintenance/updating

Joe Landman landman at scalableinformatics.com
Fri Aug 3 17:21:20 EDT 2007


Must have missed the original email, I saw Marty's response.

> On 7/30/07, Rohan Sachdeva <rsachdev at usc.edu> wrote:
>> Hello I've been charged with installing and maintaining a wwwblast server in
>> my lab. I've got everything setup but I am looking for an easy way to keep
>> all the databases updated. I was hoping someone could point me toward a
>> script that used update_blastdb.pl to update whatever databases and then
>> extract them too.

I haven't used update_blastdb.pl (looks like it came out in 2005, some
years after our db_dlaf.pl came out at
http://downloads.scalableinformatics.com/downloads/db_dlaf.pl ).

With db_dlaf.pl (Database DownLoad And Format), it is pretty easy to set
up scripted/cron'ed updates, without mouse clicks...

[landman at crunch-r.scalableinformatics.com:~/q]
        6 >./db_dlaf.pl --list
alu.a.gz
alu.n.gz
drosoph.aa.gz
drosoph.nt.gz
ecoli.aa.gz
ecoli.nt.gz
env_nr.gz
env_nt.gz
est.nal
.
.
.

wgs.gz
yeast.aa.gz
yeast.nt.gz


and pulling and formatting say yeast.aa as a protein database

[landman at crunch-r.scalableinformatics.com:~/q]
        10 >./db_dlaf.pl --db=yeast.aa.gz --fdb /usr/bin/formatdb \
	--formatdb "-p T -o T" --verbose
destination path = ./20070803
url = ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
Using the following database(s)
yeast.aa.gz
starting transfer of ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/yeast.aa.gz
transfer for file=yeast.aa.gz is 10.1 seconds, rate=0.18 MB/s, size=1.86 MB
file=yeast.aa.gz size=1.86 MB, uncompressed file=yeast.aa, size=3.24 MB
formatdb against db= yeast.aa
preparing to run formatdb
command line = /usr/bin/formatdb -i yeast.aa -p T -o T

It deposits the formatted databases into directories named as indicated

[landman at crunch-r.scalableinformatics.com:~/q]
        11 >ls -alF 20070803/
total 7560
drwxrwxr-x  2 landman landman    4096 Aug  3 17:12 ./
drwxrwxr-x  3 landman landman    4096 Aug  3 16:56 ../
-rw-rw-r--  1 landman landman     499 Aug  3 17:12 formatdb.log
-rw-rw-r--  1 landman landman 3399727 Aug  3 17:12 yeast.aa
-rw-rw-r--  1 landman landman  624891 Aug  3 17:12 yeast.aa.phr
-rw-rw-r--  1 landman landman   50456 Aug  3 17:12 yeast.aa.pin
-rw-rw-r--  1 landman landman   50384 Aug  3 17:12 yeast.aa.pnd
-rw-rw-r--  1 landman landman     244 Aug  3 17:12 yeast.aa.pni
-rw-rw-r--  1 landman landman  561270 Aug  3 17:12 yeast.aa.psd
-rw-rw-r--  1 landman landman   12773 Aug  3 17:12 yeast.aa.psi
-rw-rw-r--  1 landman landman 2980337 Aug  3 17:12 yeast.aa.psq

No mouse clicks required, though a simple foreach loop in a script that
you insert into a crontab could help...

#!/bin/tcsh
#
# start with some aa DBs ...
#
foreach d ("pataa.gz" "pdbaa.gz" "swissprot.gz")
/usr/local/bin/db_dlaf.pl --db=$d --fdb /usr/bin/formatdb  \
	--formatdb "-p T -o T" --verbose
end
#
# and move on to some nt DBs ...
#
foreach d ("est_human.gz" "est_mouse.gz" "nt.gz")
/usr/local/bin/db_dlaf.pl --db=$d --fdb /usr/bin/formatdb  \
	--formatdb "-p F -v 1024 -o T" --verbose
end
#
# END OF SCRIPT

Put that in a file somewhere, make it executable, and then run

	crontab -e

to have it run every so often.  Once a week/month/quarter/year.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the BBB mailing list