[Bioclusters] split database with blast

Lucas Carey lcarey at odd.bio.sunysb.edu
Mon Nov 27 12:05:47 EST 2006

Hi Daniel,
It is non-trivial to get correct effective database sizes with NCBI BLAST, as it involves processing both query sequences and database sequences. You're best bet is to use a package that can split the databases and return correct e-values. mpiBLAST  is one, but dBlast is another if for some unfathomable reason you don't like mpiBLAST. 
However, depending on what you're doing, e-value differences may not matter. In my personal opinion, there is no difference between e-36 and e-40, so the differences you are talking about are negligible.

On Tuesday, November 21, 2006 at 15:41 -0800, Daniel Xavier de Sousa wrote:
> Hi for all,
> I need some help about Parallel BLAST. I will bee happy if anyone help me.  
> I have worked with parallel BLAST using split database. 
> I don?t have problem to execute on part of database  and statistics values  when
> use WUBLAST, because  use DBRECMAX and DBRECMIN parameters and I
> execute Blast like virtual split database, get just the piece of all
> database, and the e-value get right.
> But I really want do everything work in NCBI_BLAST. I know the solution of  mpiBLAST
> and the list of GI number file. But, these solutions aren?t so good.
> The first because the source of BLAST have to change. And the second,
> because require that you use GI numbers in the FASTA identifier.
> So, my  question is:
> 1)      Somebody
> knows some else solution to run process blast on split database, and
> not changes the e-value with relation to run whole database?
> If not, the difference between e-value with whole database and part of database (using the parameter ?z and ?Y of  ncbi_blast) is very important?
> Example, I processed one sequence with whole database and just part of database, using parameter ?z, the result was:
>                                                 (evalue)NR                                  (evalue) NR/2 using ?z
> SeqQuery       Seq1DB                4e-66                                                  6e-66
> SeqQuery       Seq2DB                4e-38                                                  5e-38
>             This difference is relevant?
> Thanks,
> Daniel Xavier ? PUC ? Rio de Janeiro - Brazil

More information about the Bioclusters mailing list