On Mon, 2002-12-16 at 19:36, Jeremy Mann wrote: > I am implementing a nightly updated and formatdb script for the BLAST > unformatted databases from NCBI. A researcher today asked a question that > I could not answer. His question was, if he runs a long BLAST search > during the time my script is running, what will happen to his returned > search? Will he get false positive from both databases (the old one and > the newly created one)? Will the database be locked out during his search > and my script will fail? I used to get this question quite a bit when I talked about the previous scalable BLAST products I had developed. The short answer is that you can design your process to fit the way you want to work. > I was amazed that I didn't think of this sooner. What does everybody here > use as a script and how do you prevent the database from being newly > formatted if a current BLAST search is running? Generally this is not so hard. You can even incorporate the update into a queuing system, as long as you use an O(1) data distribution system, such as the old ccp I had architected, or some newer stuff. Use a priority based mechanism to schedule the update to occur between computing runs. This requires some tuning/tweaking of the queuing system, but it is generally not that hard to do. If you are going to do this by hand, use the "lsof" command to see if you have any processes using the particular file. Right before I did a quick run, I looked at my database indices: [root@head run]# lsof db/nr* [root@head run] Then I started a quick run [root@head small]# /big/run/ncbi/build/blastall -i cherry_tomato.fsa -o x -e 0.0001 -d nr -p blastx and back I went to look at my indices: [root@head run]# lsof db/nr* COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME blastall 11443 root mem REG 9,0 9947488 98310 db/nr.pin blastall 11443 root mem REG 9,0 396957149 98309 db/nr.psq blastall 11443 root mem REG 9,0 272675746 98308 db/nr.phr You can basically implement 2 lines of Perl to do a "reference count" on the file: $reference_count = `lsof $filename | tail +2 | wc -l`; chomp($reference_count); Do the update if $reference_count == 0. There are other "tricks" you can play. The one I used to use was to download the database, append the date/time to it, wait for the run to finish (e.g. reference count goes to 0, system quiescent), and then swap links as Chris indicated. It is usually advisable to have a few levels of previous libraries available (to check older calculations if need be, especially useful for examining whether you are looking at a signal or at noise). These aren't things you want to commit to CVS or other versioning systems, all you really need is to maintain a few with versioning meta-data attached. There are many ways to do this. This is somewhat beyond the scope of what I can cover in a short message. > > Thanks for any answers. -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman@scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615