[Bioclusters] split database with blast
Lucas Carey
lcarey at odd.bio.sunysb.edu
Mon Nov 27 12:05:47 EST 2006
Hi Daniel,
It is non-trivial to get correct effective database sizes with NCBI BLAST, as it involves processing both query sequences and database sequences. You're best bet is to use a package that can split the databases and return correct e-values. mpiBLAST is one, but dBlast is another if for some unfathomable reason you don't like mpiBLAST.
However, depending on what you're doing, e-value differences may not matter. In my personal opinion, there is no difference between e-36 and e-40, so the differences you are talking about are negligible.
-Lucas
On Tuesday, November 21, 2006 at 15:41 -0800, Daniel Xavier de Sousa wrote:
>
>
> Hi for all,
>
> I need some help about Parallel BLAST. I will bee happy if anyone help me.
> I have worked with parallel BLAST using split database.
>
> I don?t have problem to execute on part of database and statistics values when
> use WUBLAST, because use DBRECMAX and DBRECMIN parameters and I
> execute Blast like virtual split database, get just the piece of all
> database, and the e-value get right.
>
> But I really want do everything work in NCBI_BLAST. I know the solution of mpiBLAST
> and the list of GI number file. But, these solutions aren?t so good.
> The first because the source of BLAST have to change. And the second,
> because require that you use GI numbers in the FASTA identifier.
>
> So, my question is:
>
> 1) Somebody
> knows some else solution to run process blast on split database, and
> not changes the e-value with relation to run whole database?
>
> If not, the difference between e-value with whole database and part of database (using the parameter ?z and ?Y of ncbi_blast) is very important?
>
> Example, I processed one sequence with whole database and just part of database, using parameter ?z, the result was:
>
> (evalue)NR (evalue) NR/2 using ?z
> SeqQuery Seq1DB 4e-66 6e-66
> SeqQuery Seq2DB 4e-38 5e-38
>
> This difference is relevant?
> Thanks,
>
> Daniel Xavier ? PUC ? Rio de Janeiro - Brazil
More information about the Bioclusters
mailing list