Res: [BiO BB] Can you explain theses results?

Daniel Xavier de Sousa danielucg at
Thu Apr 26 13:14:22 EDT 2007

Hi Mike,


Thanks for your reply. I know this steps of blast, and is
because theses I think to be strange theses results. I believe which for 2
sequence of same length, but one more similar than other, the most similar get
more time to process. But when I run BLASTP this is not happen.


I used default parameter of NCBI_BLAST, in serial machine. The
database was all NR. 


What do you think about?



*        Daniel Xavier de Sousa                    *
*        Mestrando em Informática - PUC-Rio        *
*        E-MAIL :       *
*        Fone   : +55 21 35271500 - 4543           *

----- Mensagem original ----
De: Michael Muratet US-Huntsville <Michael.Muratet at>
Para: General Forum at Bioinformatics.Org <bio_bulletin_board at>
Enviadas: Quinta-feira, 26 de Abril de 2007 13:05:09
Assunto: RE: [BiO BB] Can you explain theses results?

> Hi for all
> Please, you would see theses tests in serial machine, but important to
> cluster.
> What do you think about it?
> Please look theses tests using BLASTP:
> Length Seq Query  -- HITs -- TIME
> 10000                   -- 3    -- 8min51sec
> 10000                   -- 500  -- 7min41sec
> ----------------------------------
> Length Seq Query  -- HITs -- TIME
> 9000                      --  2   --
>  7min11sec
> 9000                      --  500 -- 6min49sec
> ----------------------------------
> Length Seq Query  -- HITs -- TIME
> 3000                      --  3   -- 2min54sec
> 3000                     --  500 -- 2min52sec
> Theses
> times are very strange. You can see which the sequence of 10000 bases
> and 3 hits get more time than another sequence of 10000 bases but with
> 500 hits. So, I can conclude: BLASTP is not sensitive to similarity,
> difference of BLASTN.
> But the most important, why this happened? Can anybody explain theses
> results?


Without knowledge of the parameters and data you used I can offer you a
couple of comments. The BLAST heuristic operates in a several steps.
First, the query and the targets are broken up into tokens based on the
word size you select. The default size is ll for nucleotides and 4 for
proteins (I think--check the man page). The targets are then searched
for tokens that match tokens in the query. The matches are used to seed
alignments performed with the Smith-Waterman algorithm. The bottom line
is that how long a given run takes is a complicated function of the
parameters and the data. I recall there was a paper in Bioinformatics
that calculated the computational complexity, but I can't lay my hands
on it at the moment. You'll notice that the time is dominated by query
length, i.e., the time it takes to search for matches. The difference in
the time for hits depends on how long the hits were. 

There's a good description of blast in the book Bioinformatics by David
Mount. I would also recommend the book Blast by Korf, Yandell, and
Bedell. It has a whole chapter on setting parameters for various types
of searches. (You should never just use the defaults unless, of course,
they are correct for the search you are doing ;-) ). Also, there are
several different implementations of blast (NCBI and WUBLAST being the
two most popular, I think) that perform differently under different
circumstances. There is a parallelized version called mpiBlast and at
least one hardware-accelerated version that I know of.

Good Luck

General Forum at Bioinformatics.Org - BiO_Bulletin_Board at

Fale com seus amigos  de graça com o novo Yahoo! Messenger 

More information about the BBB mailing list