Hi, Why the -F F? you are increasing the number of hits by including low complexity regions in the search. I bet that requires creating some fat index along the way. Eitan -------------------- Eitan Rubin, PhD Head of Bioinformatics The Bauer Center for Genomics Research Harvard University Tel: 617-496-5649 Fax: 617-495-2196 -----Original Message----- From: bioclusters-request at bioinformatics.org [mailto:bioclusters-request at bioinformatics.org] Sent: Wednesday, March 09, 2005 6:46 PM To: bioclusters at bioinformatics.org Subject: Bioclusters Digest, Vol 5, Issue 9 Send Bioclusters mailing list submissions to bioclusters at bioinformatics.org To subscribe or unsubscribe via the World Wide Web, visit https://bioinformatics.org/mailman/listinfo/bioclusters or, via email, send a message with subject or body 'help' to bioclusters-request at bioinformatics.org You can reach the person managing the list at bioclusters-owner at bioinformatics.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioclusters digest..." Today's Topics: 1. multiple inputs to MPIBLAST (Lik Mui) 2. Memory Usage for Blast - question (Dinanath Sulakhe) 3. Re: multiple inputs to MPIBLAST (Aaron Darling) 4. Re: Memory Usage for Blast - question (Hrishikesh Deshmukh) 5. Re: Memory Usage for Blast - question (Dinanath Sulakhe) 6. Re: Memory Usage for Blast - question (Lucas Carey) 7. Re: Memory Usage for Blast - question (Dinanath Sulakhe) 8. Re: Memory Usage for Blast - question (Lucas Carey) ---------------------------------------------------------------------- Message: 1 Date: Wed, 9 Mar 2005 13:27:10 -0800 From: Lik Mui <lmui at stanford.edu> Subject: [Bioclusters] multiple inputs to MPIBLAST To: bioclusters at bioinformatics.org Message-ID: <1110403630.422f6a2e8ddce at webmail.stanford.edu> Content-Type: text/plain; charset=ISO-8859-1 Hello, I tried to feed multiple inputs to mpiblast (all in a single FASTA file). I found that when the number of inputs is > 15, mpiblast's performance GREATLY deteriotes. For example, using 1 single head node, I get a blastall output in about 20 seconds. When I feed an input of 20 input sequences to MPIBLAST on a 24 node cluster, the result takes 3 minutes to get back. This is hardly super-linear. I am running on a 24 nodes Platform ROCKS cluster with MPICH 1.2.6, and the latest MPIBLAST 1.3.0. Can anyone explain why this is or how to get around MPIBLAST slowing down with multiple inputs. Thanks in advanced. Lik Mui p.s. because my genome db is about 1 GB, it seems to make sense to process a batch of inputs together with a single read of the db. Hence, I am running multiple input files. If this is not correct reasoning, please comment. ------------------------------ Message: 2 Date: Wed, 09 Mar 2005 16:09:41 -0600 From: Dinanath Sulakhe <sulakhe at mcs.anl.gov> Subject: [Bioclusters] Memory Usage for Blast - question To: bioclusters at bioinformatics.org Message-ID: <6.0.0.22.2.20050309154748.04a271b0 at pop.mcs.anl.gov> Content-Type: text/plain; charset="us-ascii"; format=flowed Hi, I am not sure if this is the right place to ask this question !! I am running Blast (NCBI) parallely on a cluster with 80 nodes. (I am running NCBI NR against Itself). Each node is a dual processor. I am using Condor to submit the jobs to this cluster. The problem I am coming across is, whenever two blast jobs (each blast job has 100 sequences) are assigned on One node (one on each processor), the node cannot handle the amount of memory used by the two blast jobs. PBS mom daemon on the nodes cannot allocate the memory they need to monitor the jobs on the node and they fail, thus killing the jobs. Condor doesn't recognize this failure and assumes that the job was successfully completed, but actually only few sequences get processed before the job is killed. Now the Admin of the Site is asking me if its possible to reduce the amount of memory these blast jobs use? He says these jobs are requesting about 600-700MB of RAM, and he is asking me to reduce it to atmost 500MB. Is it possible to reduce the amount of RAM it is requesting by tweaking any of the parameters in blast?? My blast options are : blastall -i $input -o $output -d $db -p blastp -m 8 -F F Please let me know, Thank you, Dina ------------------------------ Message: 3 Date: Wed, 09 Mar 2005 16:11:25 -0600 From: Aaron Darling <darling at cs.wisc.edu> Subject: Re: [Bioclusters] multiple inputs to MPIBLAST To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org> Message-ID: <422F748D.2090403 at cs.wisc.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Lik The bad behavior could be due any one of a number of factors (extra fragment copies, startup overhead, etc). In order to pin down what's going wrong on your setup it would be helpful to have a debug log as generated by adding the --debug command line option. Debug goes to stderr, redirect as appropriate for whatever shell you use. As the mpiblast debug log can get lengthy you may want to send it directly to me or post it on a web server somewhere... -Aaron Lik Mui wrote: >Hello, I tried to feed multiple inputs to mpiblast (all in a single FASTA >file). I found that when the number of inputs is > 15, mpiblast's >performance GREATLY deteriotes. For example, using 1 single head node, I >get a blastall output in about 20 seconds. When I feed an input of 20 >input sequences to MPIBLAST on a 24 node cluster, the result takes 3 >minutes to get back. This is hardly super-linear. > >I am running on a 24 nodes Platform ROCKS cluster with MPICH 1.2.6, and the >latest MPIBLAST 1.3.0. > >Can anyone explain why this is or how to get around MPIBLAST slowing down >with multiple inputs. > >Thanks in advanced. > > Lik Mui > > >p.s. because my genome db is about 1 GB, it seems to make sense to process a >batch of inputs together with a single read of the db. Hence, I am running >multiple input files. If this is not correct reasoning, please comment. > > > >_______________________________________________ >Bioclusters maillist - Bioclusters at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters > > ------------------------------ Message: 4 Date: Wed, 9 Mar 2005 17:50:37 -0500 From: Hrishikesh Deshmukh <hdeshmuk at gmail.com> Subject: Re: [Bioclusters] Memory Usage for Blast - question To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org> Message-ID: <829d7fb6050309145051155606 at mail.gmail.com> Content-Type: text/plain; charset=US-ASCII Hi, You haven't told how long each sequence is!, you can tweak wordsize -W to make it faster but then BLAST becomes less sensitive.I suggest you take a look at the book BLAST by Ian Korf et al. Thanks, Hrishi On Wed, 09 Mar 2005 16:09:41 -0600, Dinanath Sulakhe <sulakhe at mcs.anl.gov> wrote: > Hi, > I am not sure if this is the right place to ask this question !! > I am running Blast (NCBI) parallely on a cluster with 80 nodes. (I am > running NCBI NR against Itself). Each node is a dual processor. > > I am using Condor to submit the jobs to this cluster. The problem I am > coming across is, whenever two blast jobs (each blast job has 100 > sequences) are assigned on One node (one on each processor), the node > cannot handle the amount of memory used by the two blast jobs. PBS mom > daemon on the nodes cannot allocate the memory they need to monitor the > jobs on the node and they fail, thus killing the jobs. > > Condor doesn't recognize this failure and assumes that the job was > successfully completed, but actually only few sequences get processed > before the job is killed. > > Now the Admin of the Site is asking me if its possible to reduce the amount > of memory these blast jobs use? He says these jobs are requesting about > 600-700MB of RAM, and he is asking me to reduce it to atmost 500MB. > > Is it possible to reduce the amount of RAM it is requesting by tweaking any > of the parameters in blast?? > > My blast options are : > > blastall -i $input -o $output -d $db -p blastp -m 8 -F F > > Please let me know, > Thank you, > Dina > > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > ------------------------------ Message: 5 Date: Wed, 09 Mar 2005 16:59:28 -0600 From: Dinanath Sulakhe <sulakhe at mcs.anl.gov> Subject: Re: [Bioclusters] Memory Usage for Blast - question To: Hrishikesh Deshmukh <hdeshmuk at gmail.com>, "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org>, "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org> Message-ID: <6.0.0.22.2.20050309165745.03d35090 at pop.mcs.anl.gov> Content-Type: text/plain; charset="us-ascii"; format=flowed This is a blast run for NCBI NR against itself. So the sequence size varies. Thanks for the reply, i will look into the wordsize. Dina At 04:50 PM 3/9/2005, Hrishikesh Deshmukh wrote: >Hi, > >You haven't told how long each sequence is!, you can tweak wordsize -W >to make it faster but then BLAST becomes less sensitive.I suggest you >take a look at the book BLAST by Ian Korf et al. > >Thanks, >Hrishi > > >On Wed, 09 Mar 2005 16:09:41 -0600, Dinanath Sulakhe ><sulakhe at mcs.anl.gov> wrote: > > Hi, > > I am not sure if this is the right place to ask this question !! > > I am running Blast (NCBI) parallely on a cluster with 80 nodes. (I am > > running NCBI NR against Itself). Each node is a dual processor. > > > > I am using Condor to submit the jobs to this cluster. The problem I am > > coming across is, whenever two blast jobs (each blast job has 100 > > sequences) are assigned on One node (one on each processor), the node > > cannot handle the amount of memory used by the two blast jobs. PBS mom > > daemon on the nodes cannot allocate the memory they need to monitor the > > jobs on the node and they fail, thus killing the jobs. > > > > Condor doesn't recognize this failure and assumes that the job was > > successfully completed, but actually only few sequences get processed > > before the job is killed. > > > > Now the Admin of the Site is asking me if its possible to reduce the amount > > of memory these blast jobs use? He says these jobs are requesting about > > 600-700MB of RAM, and he is asking me to reduce it to atmost 500MB. > > > > Is it possible to reduce the amount of RAM it is requesting by tweaking any > > of the parameters in blast?? > > > > My blast options are : > > > > blastall -i $input -o $output -d $db -p blastp -m 8 -F F > > > > Please let me know, > > Thank you, > > Dina > > > > _______________________________________________ > > Bioclusters maillist - Bioclusters at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bioclusters > > >_______________________________________________ >Bioclusters maillist - Bioclusters at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters =============================== Dinanath Sulakhe Mathematics & Computer Science Division Argonne National Laboratory Ph: (630)-252-7856 Fax: (630)-252-5986 ------------------------------ Message: 6 Date: Wed, 9 Mar 2005 18:06:44 -0500 From: Lucas Carey <lcarey at odd.bio.sunysb.edu> Subject: Re: [Bioclusters] Memory Usage for Blast - question To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org> Message-ID: <20050309230644.GA27139 at odd.bio.sunysb.edu> Content-Type: text/plain; charset=us-ascii Hi Dina, I don't know how many of the results you actually need. You may free up some memory by limiting e-value, returned results, and aligned results. blastall -e 0.0001 -b 25 -v 25 Another option, if you can limit Condor to a single job per machine, would be to run 'blastall -a 2' to use both CPUs with only one process. -Lucas On Wednesday, March 09, 2005 at 16:09 -0600, Dinanath Sulakhe wrote: > Hi, > I am not sure if this is the right place to ask this question !! > I am running Blast (NCBI) parallely on a cluster with 80 nodes. (I am > running NCBI NR against Itself). Each node is a dual processor. > > I am using Condor to submit the jobs to this cluster. The problem I am > coming across is, whenever two blast jobs (each blast job has 100 > sequences) are assigned on One node (one on each processor), the node > cannot handle the amount of memory used by the two blast jobs. PBS mom > daemon on the nodes cannot allocate the memory they need to monitor the > jobs on the node and they fail, thus killing the jobs. > > Condor doesn't recognize this failure and assumes that the job was > successfully completed, but actually only few sequences get processed > before the job is killed. > > Now the Admin of the Site is asking me if its possible to reduce the amount > of memory these blast jobs use? He says these jobs are requesting about > 600-700MB of RAM, and he is asking me to reduce it to atmost 500MB. > > Is it possible to reduce the amount of RAM it is requesting by tweaking any > of the parameters in blast?? > > My blast options are : > > blastall -i $input -o $output -d $db -p blastp -m 8 -F F > > Please let me know, > Thank you, > Dina > > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters ------------------------------ Message: 7 Date: Wed, 09 Mar 2005 17:23:51 -0600 From: Dinanath Sulakhe <sulakhe at mcs.anl.gov> Subject: Re: [Bioclusters] Memory Usage for Blast - question To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org>, "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org> Message-ID: <6.0.0.22.2.20050309171325.04bcdbd0 at pop.mcs.anl.gov> Content-Type: text/plain; charset="us-ascii"; format=flowed At 05:06 PM 3/9/2005, Lucas Carey wrote: >Hi Dina, >I don't know how many of the results you actually need. You may free up >some memory by limiting e-value, returned results, and aligned results. >blastall -e 0.0001 -b 25 -v 25 would limiting the e-value and other parameters reduce the RAM usage?? >Another option, if you can limit Condor to a single job per machine, would >be to run 'blastall -a 2' to use both CPUs with only one process. These jobs are assigned by the scheduler. Initially i had used '-a 2' option, but when this job is running on a node, the scheduler would assign another job by some other user on the same node, assuming the other processor to be free, but then blast would starve the other job. So we can't use 'a -n' option here. Thanks, Dina >-Lucas > >On Wednesday, March 09, 2005 at 16:09 -0600, Dinanath Sulakhe wrote: > > Hi, > > I am not sure if this is the right place to ask this question !! > > I am running Blast (NCBI) parallely on a cluster with 80 nodes. (I am > > running NCBI NR against Itself). Each node is a dual processor. > > > > I am using Condor to submit the jobs to this cluster. The problem I am > > coming across is, whenever two blast jobs (each blast job has 100 > > sequences) are assigned on One node (one on each processor), the node > > cannot handle the amount of memory used by the two blast jobs. PBS mom > > daemon on the nodes cannot allocate the memory they need to monitor the > > jobs on the node and they fail, thus killing the jobs. > > > > Condor doesn't recognize this failure and assumes that the job was > > successfully completed, but actually only few sequences get processed > > before the job is killed. > > > > Now the Admin of the Site is asking me if its possible to reduce the > amount > > of memory these blast jobs use? He says these jobs are requesting about > > 600-700MB of RAM, and he is asking me to reduce it to atmost 500MB. > > > > Is it possible to reduce the amount of RAM it is requesting by tweaking > any > > of the parameters in blast?? > > > > My blast options are : > > > > blastall -i $input -o $output -d $db -p blastp -m 8 -F F > > > > Please let me know, > > Thank you, > > Dina > > > > _______________________________________________ > > Bioclusters maillist - Bioclusters at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bioclusters >_______________________________________________ >Bioclusters maillist - Bioclusters at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters =============================== Dinanath Sulakhe Mathematics & Computer Science Division Argonne National Laboratory Ph: (630)-252-7856 Fax: (630)-252-5986 ------------------------------ Message: 8 Date: Wed, 9 Mar 2005 18:33:38 -0500 From: Lucas Carey <lcarey at odd.bio.sunysb.edu> Subject: Re: [Bioclusters] Memory Usage for Blast - question To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org> Message-ID: <20050309233338.GC27139 at odd.bio.sunysb.edu> Content-Type: text/plain; charset=us-ascii On Wednesday, March 09, 2005 at 17:23 -0600, Dinanath Sulakhe wrote: > At 05:06 PM 3/9/2005, Lucas Carey wrote: > >Hi Dina, > >I don't know how many of the results you actually need. You may free up > >some memory by limiting e-value, returned results, and aligned results. > >blastall -e 0.0001 -b 25 -v 25 > >after would limiting the e-value and other parameters reduce the RAM usage?? with mpiBLAST -e & -b can limit the memory usage for the master node. I don't have a free cpu right now to check blastall. > > >Another option, if you can limit Condor to a single job per machine, would > >be to run 'blastall -a 2' to use both CPUs with only one process. > > These jobs are assigned by the scheduler. Initially i had used '-a 2' > option, but when this job is running on a node, the scheduler would assign > another job by some other user on the same node, assuming the other > processor to be free, but then blast would starve the other job. So we > can't use 'a -n' option here. I used to use an OpenPBS cluster that would do that, but allowed me to specify which nodes I wanted to run on. I would start up my compute job on one processor, and while (1){ sleep (1000); } one the second. -Lucas > > > Thanks, > Dina > > >-Lucas > > > >On Wednesday, March 09, 2005 at 16:09 -0600, Dinanath Sulakhe wrote: > >> Hi, > >> I am not sure if this is the right place to ask this question !! > >> I am running Blast (NCBI) parallely on a cluster with 80 nodes. (I am > >> running NCBI NR against Itself). Each node is a dual processor. > >> > >> I am using Condor to submit the jobs to this cluster. The problem I am > >> coming across is, whenever two blast jobs (each blast job has 100 > >> sequences) are assigned on One node (one on each processor), the node > >> cannot handle the amount of memory used by the two blast jobs. PBS mom > >> daemon on the nodes cannot allocate the memory they need to monitor the > >> jobs on the node and they fail, thus killing the jobs. > >> > >> Condor doesn't recognize this failure and assumes that the job was > >> successfully completed, but actually only few sequences get processed > >> before the job is killed. > >> > >> Now the Admin of the Site is asking me if its possible to reduce the > >amount > >> of memory these blast jobs use? He says these jobs are requesting about > >> 600-700MB of RAM, and he is asking me to reduce it to atmost 500MB. > >> > >> Is it possible to reduce the amount of RAM it is requesting by tweaking > >any > >> of the parameters in blast?? > >> > >> My blast options are : > >> > >> blastall -i $input -o $output -d $db -p blastp -m 8 -F F > >> > >> Please let me know, > >> Thank you, > >> Dina > >> > >> _______________________________________________ > >> Bioclusters maillist - Bioclusters at bioinformatics.org > >> https://bioinformatics.org/mailman/listinfo/bioclusters > >_______________________________________________ > >Bioclusters maillist - Bioclusters at bioinformatics.org > >https://bioinformatics.org/mailman/listinfo/bioclusters > > =============================== > Dinanath Sulakhe > Mathematics & Computer Science Division > Argonne National Laboratory > Ph: (630)-252-7856 Fax: (630)-252-5986 > > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters ------------------------------ _______________________________________________ Bioclusters maillist - Bioclusters at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters End of Bioclusters Digest, Vol 5, Issue 9 *****************************************