[Bioclusters] Nightly updated BLAST databases

Joseph Landman bioclusters@bioinformatics.org
16 Dec 2002 21:23:42 -0500

On Mon, 2002-12-16 at 20:51, Jeremy Mann wrote:

> > (4) if all looks good and there are no blast jobs running then change
> > the symbolic link(s) so that your newly built databases in volume B are
> > the ones that people use when a search is fired off. As before the
> > methods for figuring out 'are there any searches running' can be as
> > complex or a simple as the production environment demands
> If you don't mind me asking, how do you do this? How do I control when and
> if the BLAST jobs are running? 

Ahh...  Are you using queuing system?  If not, then you can write a real
simple shell around blast to only start a job when a semaphore is
cleared.  If the semaphore is set, it sleeps for 1 second, and then
reads the semaphore again.  It gives up after N seconds.  It would look
something like this (call it blastall, and place this in the users path
before the real blastall, which you can take out of their path):

        #!/usr/bin/perl -w
        use strict;
        use constant max_count => 86400;  # 86400 seconds in a day ...
        use constant true => (1==1);
        use constant false => (1==0);
        my $count=0;
        my $ready_to_run=false;
        my $time_elapsed=false;
        do {
             sleep(1); # take a nap
             $ready_to_run=(-e "/path/to/semaphore");
             $time_elapsed=$count < count_max;
           } until ($ready_to_run || $time_elapsed);
        if ( $time_elapsed && !$ready_to_run )
             die "Run timed out.  Please look into why the semaphore
        file /path/to/semaphore has not been removed\n";
        if ( !$time_elapsed && $ready_to_run )
             my $args=join("",@ARGV);
             my ($output_handle,$line);
             open($output_handle,"/path/to/real/blastall $args|") or die
        "cannot run the command /path/to/real/blastall $args.  Please
             while ($line=<$output_handle>) {print $line;}
             close ($output_handle);
Then when you want to stop the next batch of runs, simply 

	touch /path/to/semaphore

and wait for the machine to quiesce.  Once quiet, do the database monte,
and then remove the semaphore:

	rm -f /path/to/semaphore

If you are already running a job queuing system, you can pause the queue
after the current set of runs, and then do the db monte.

> I would think there would have to be some
> sort of manual control. Here is what I think I need to do:
> 1. Run rsync from crontab (already done)
> 2. Custom script to see if rsync is still running. If so, stop, if not run
>    2nd script, after an hour checks if rsync is still running. I am
> confused as to how to pull this off. If I run it from crontab, I would
> need to add some sort of check to see if 1st script is running, if so,
> don't run again until next day.
> 3. 3rd script runs uncompress | formatdb into another directory. I got
> this one in place.
> 4. 4th script resymlinks db/ from blast/ directory. Need to add a few if
> statements to see if 3rd script is still running and check for existing
> blast jobs.

View the process as a pipeline.  You need to inspect for errors every
step of the way.  You can use rsync, or the Perl/python modules that do
rsync.  Check the error return codes.  Don't go to the next pipeline
step if the previous is not done.

Joseph Landman <landman@scalableinformatics.com>