Res: [Bioclusters] split database with blast

Daniel Xavier de Sousa danielucg at yahoo.com.br
Mon Nov 27 16:41:59 EST 2006


Hi Lucas

Thank you so much about your opnion.

I will test dBlast, but I think ( I read the paper) the solution is equal of mpiBLAST, it is, one patch to NCBI BLAST.

well I'm studing about this issue, then any other thing please tell me.
So I will wait the opnion from other users of list.

By

Daniel Xavier

----- Mensagem original ----
De: Lucas Carey <lcarey at odd.bio.sunysb.edu>
Para: HPC in Bioinformatics <bioclusters at bioinformatics.org>
Enviadas: Segunda-feira, 27 de Novembro de 2006 14:05:47
Assunto: Re: [Bioclusters] split database with blast


Hi Daniel,
It is non-trivial to get correct effective database sizes with NCBI BLAST, as it involves processing both query sequences and database sequences. You're best bet is to use a package that can split the databases and return correct e-values. mpiBLAST  is one, but dBlast is another if for some unfathomable reason you don't like mpiBLAST. 
However, depending on what you're doing, e-value differences may not matter. In my personal opinion, there is no difference between e-36 and e-40, so the differences you are talking about are negligible.
-Lucas

On Tuesday, November 21, 2006 at 15:41 -0800, Daniel Xavier de Sousa wrote:
> 
> 
> Hi for all,
> 
> I need some help about Parallel BLAST. I will bee happy if anyone help me.  
> I have worked with parallel BLAST using split database. 
> 
> I don?t have problem to execute on part of database  and statistics values  when
> use WUBLAST, because  use DBRECMAX and DBRECMIN parameters and I
> execute Blast like virtual split database, get just the piece of all
> database, and the e-value get right.
> 
> But I really want do everything work in NCBI_BLAST. I know the solution of  mpiBLAST
> and the list of GI number file. But, these solutions aren?t so good.
> The first because the source of BLAST have to change. And the second,
> because require that you use GI numbers in the FASTA identifier.
> 
> So, my  question is:
> 
> 1)      Somebody
> knows some else solution to run process blast on split database, and
> not changes the e-value with relation to run whole database?
> 
> If not, the difference between e-value with whole database and part of database (using the parameter ?z and ?Y of  ncbi_blast) is very important?
> 
> Example, I processed one sequence with whole database and just part of database, using parameter ?z, the result was:
> 
>                                                 (evalue)NR                                  (evalue) NR/2 using ?z
> SeqQuery       Seq1DB                4e-66                                                  6e-66
> SeqQuery       Seq2DB                4e-38                                                  5e-38
> 
>             This difference is relevant?
> Thanks,
> 
> Daniel Xavier ? PUC ? Rio de Janeiro - Brazil
_______________________________________________
Bioclusters maillist  -  Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters


		
_______________________________________________________ 
Novidade no Yahoo! Mail: receba alertas de novas mensagens no seu celular. Registre seu aparelho agora! 
http://br.mobile.yahoo.com/mailalertas/ 
 


More information about the Bioclusters mailing list