[BioBrew Users] Re: [Rocks-Discuss]benchmarking Biobrew

Glen Otero gotero at linuxprophet.com
Mon Mar 28 18:05:02 EST 2005


Hi Rajiv-

We should take this discussion off the Rocks list and continue it on  
the BioBrew list. Please join that list if you haven't already. You can  
subscribe here: http://bioinformatics.org/lists/BioBrew-Users

As for your benchmark questions, I would do some research on the mpi  
implementations of fasta, wise, and phylip. I'm certain that HMMER  
doesn't have an mpi implementation. HMMER uses PVM.

Glen


On Mar 28, 2005, at 6:38 AM, Rajiv wrote:

>  
> ----- Original Message -----
>  From: Rajiv
> To: Glen Otero
> Sent: Friday, March 25, 2005 4:54 PM
> Subject: Re: [Rocks-Discuss]benchmarking Biobrew
>
> Dera Glen,
>     Thanks . Now that we have done with clustalw,gromacs and mpiblast  
> - I would like to test fasta, phylip,wise,HMMER. I dont want a do  
> complete benchmark/testing now. I would like to do a basic benchmark -  
> something working in MPI for all these applications. I would be glad  
> if you could help me in this.
>  
> Regards,
> Rajiv
> ----- Original Message -----
>  From: Glen Otero
> To: Rajiv
> Cc: npaci-rocks-discussion at sdsc.edu
> Sent: Thursday, March 24, 2005 1:30 PM
> Subject: Re: [Rocks-Discuss]benchmarking Biobrew
>
> Rajiv-
>
> I've included a tutorial for getting started with mpiBLAST below.  
> There is excellent documentation for mpiBLAST at  
> http://mpiblast.lanl.gov. Also, there is an error with the HMMER rpm.  
> Standalone HMMER will work but the pvm version won't. I've posted a  
> new SRPM on the biobrew website  
> (http://ftp.bioinformatics.org/pub/biobrew/srpms/3.3/hmmer-2.3.2 
> -4.src.rpm). After building and installing the hmmer RPM from the new  
> SRPM, you'll find that the HMMER User's Guide includes a very good  
> tutorial on getting started with HMMER.
>
> HTH!
>
> Glen
>
> ****
> mpiBLAST target sequence files must be indexed prior to use using the  
> 'mpiformatdb' program, which is included in the mpiBLAST package.  
> BLAST, and therefore mpiBLAST, require a configuration file (.ncbirc)  
> and will complain during certain types of searches if it's not  
> present. The following example will demonstrate how to configure and  
> run mpiBLAST as user 'glen'. We begin in glen's home directory:
>
> [glen at glen]$ pwd
> /home/glen
>
> Before running mpiBLAST, it's necessary to configure the shared and  
> local storage paths that each compute node will use to access the  
> database fragments, queries, and BLAST results. This is often the  
> user's home directory, which is typically mounted from the head node  
> onto each compute node via NFS. The local storage path designates the  
> location of a directory on a compute node's local hard drive (if  
> available) where that node's database fragments will be stored. The  
> local storage path is typically /tmp or a subdirectory of /tmp. When  
> compute nodes search a database, they will copy fragments to the local  
> storage directory. During subsequent searches of the same database,  
> the fragments will already reside in local storage, and so will not  
> need to be copied from the head node again. To configure mpiBLAST,  
> create a .ncbirc file in your home directory that looks like:
>
> [NCBI]
> Data=/path/to/shared/storage/Data
>
> [BLAST]
> BLASTDB=/path/to/shared/storage
> BLASTMAT=/path/to/shared/storage/Data
>
> [mpiBLAST]
> Shared=/path/to/shared/storage
> Local=/path/to/local/storage
>
> The 'Data' variable holds the location of the directory containing the  
> required BLOSUM and PAM scoring matrices. The BLASTMAT variable also  
> designates the path to the scoring matrices, and will usually be  
> identical to the 'Data' variable. The BLASTDB variable tells regular  
> BLAST (not mpiBLAST) where to find BLAST databases. As mentioned  
> above, the 'Shared' and 'Local' variables designate the shared and  
> local database paths, respectively. By setting BLASTDB to the same  
> path as 'Shared', it is possible for BLAST to share the same databases  
> that mpiBLAST uses. In this case, be sure to format all databases with  
> mpiformatdb rather than formatdb. An example .ncbirc file for 'glen'  
> would like like this:
>
> [NCBI]
> Data=/opt/BioBrew/NCBI/6.1.0/Data
>
> [BLAST]
> BLASTDB=/home/glen
> BLASTMAT=/opt/BioBrew/NCBI/6.1.0/Data
>
> [mpiBLAST]
> Shared=/home/glen
> Local=/tmp
>
> Use a text editor to add this line to your .bashrc file so the path to  
> the BLAST and mpiBLAST executables are encapsulated in an environment  
> variable:
>
> export BLASTPATH=/opt/BioBrew/NCBI/6.1.0/bin
>
> Log out and log back in as the regular user so this new environment  
> variable is sourced into your shell environment. Alternatively, just  
> source .bashrc like so:
>
> [glen@ glen]$ source .bashrc
>
> The environment variable should now be present in your shell. Check to  
> make sure:
>
> [glen@ glen]$ echo $BLASTPATH
> /opt/BioBrew/NCBI/6.1.0/bin
>
> Format the Database
> Download the IL2RA and Hs.seq.uniq.gz files from the BioBrew website  
> (http://ftp.bioinformatics.org/pub/biobrew/) using wget:
>
> [glen at glen]$ wget http://ftp.bioinformatics.org/pub/biobrew/IL2RA
> [glen at glen]$ wget  
> http://ftp.bioinformatics.org/pub/biobrew/Hs.seq.uniq.gz
>
> Uncompress the Hs.seq.uniq.gz file:
> [glen@ glen]$ gunzip Hs.seq.uniq.gz
>
> The IL2RA file contains the complete coding sequence of the human  
> interleukin-2 receptor alpha chain. The Hs.seq.uniq file is a  
> collection of unique human gene sequences. The goal of this example is  
> to compare the IL2RA sequence to the unique human sequences in  
> Hs.seq.uniq. Before you can compare the IL2RA sequence to the  
> Hs.seq.uniq database, you need to format the target sequences into an  
> indexed database with mpiformatdb. The mpiformatdb command accepts the  
> same command line options as NCBI's formatdb. To format a sequence  
> database with mpiformatdb, the command line syntax looks like this:
>
> [glen@ glen]$ $BLASTPATH/mpiformatdb -N 11 -i Hs.seq.uniq
>
>  The above command creates and formats the Hs.seq.uniq database into  
> 11 fragments, one for each compute node. mpiformatdb reads the .ncbirc  
> file and moves the created fragments and index files to
>  /home/glen.
>
> Running a job
> mpiBLAST command line syntax is nearly identical to NCBI's BLAST. To  
> run an mpiBLAST job on 11 nodes, the command syntax would look like:
>
> [glen@ glen]$ mpirun -np 12 $BLASTPATH/mpiblast -p blastn -d  
> Hs.seq.uniq -i IL2RA -o blast_results
>
> The above command will query the sequences in IL2RA against the  
> Hs.seq.uniq database and write out results to the blast_results file.  
> For optimal performance, it's important to start at least one more  
> process than the number of compute nodes because one of the mpiBLAST  
> processes is dedicated to scheduling, which is not CPU-intensive.  
> Also, mpiBLAST needs at least 3 processes to perform any search. One  
> process performs file output, one schedules search tasks, and the  
> remaining processes perform search tasks.
> **********
>
> On Mar 23, 2005, at 8:42 PM, Rajiv wrote:
>
>
> Dear All,
> I have installed ROCKS 3.3 with Biobrew roll. How to benchmark HMMER,  
> mpiblast. Any tutorials on this.
>
> Regards,
> Rajiv
>
>
>  Glen Otero Ph.D.
> Linux Prophet
>
>
Glen Otero Ph.D.
Linux Prophet

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 9378 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/biobrew-users/attachments/20050328/8e1c787c/attachment.bin


More information about the BioBrew-Users mailing list