[Bioclusters] SSE2 HMMer

Ian Korf bioclusters@bioinformatics.org
Thu, 26 Jun 2003 10:43:56 +0100


On Wednesday, June 25, 2003, at 11:07 PM, David Huen wrote:
>
> As for the Apple-Genentech Blast speedup - it is bunkum.  The 
> improvements
> are solely from a superior algorithm and I did port their version over 
> to
> x86 and it shows the same speedups too.  I am very disappointed that
> they have repeated these claims for the G5 as I had spoken to one of 
> their
> bioinformatics support team about it early this year.  He adamantly 
> denied
> that there was anything in their claim that remotely implied it was 
> down
> to their hardware and that they had provided the algorithmic 
> improvements
> back to the community.  I left it as that.  I regret it will force me 
> to
> go back and repeat all those measurements and this time document it
> publically with sources and all.
>
> Actually their changes are not all that useful in that there is no 
> effect
> at the default wordlength and I think it is not sensible to use Blast 
> at
> large wordlengths - what would you achieve that other algorthms won't 
> do
> better (thinking SSAHA)?

I completely agree that the using BLAST with a word size of 40 is not a 
good idea and that SSAHA or maybe BLAT is better for nearly identical 
sequences. Still, sometimes it's easier to use a program you know well 
rather than switch to something less familiar. So AG-BLAST does have 
some utility with large word sizes.

My big problem with the benchmark is that the greatest gain in speed is 
actually at short word lengths. For cross-species work, I use a word 
size of 9 or 10. Here, AG-BLAST is 5-8 times faster. Compared to a dual 
Xeon (without similar code optimizations), it may be even faster than 
that. This is really useful, and the idiots doing the benchmarks don't 
display this. You can see the speed difference in the graph they 
present, but they don't talk about relative speed at each word size. 
Here's an experiment of mapping a C. elegans transcript against the C. 
briggsae genome. Data is from the BLAST book by myself, Mark Yandell, 
and Joey Bedell from O'Reilly & Associates (shameless plug for the book 
coming out in about a month).

  W   speed
---  -----
  8    1.5
  9    5.3
10    8.5
11    1.0
15    1.0
20    1.4
30    2.3
40    2.8

-Ian