[Biodevelopers] Four Questions Relating to HMMer

Tue Feb 15 21:20:33 EST 2005

Dear list,
I'm Daniel, a CS student at Stanford, and I've recently become quite 
interested in exploring HMMer and its related algorithms.  I've got 4 
questions here:

I've taken a look at the AltiVec implementation, and heard that Erik Lindahl 
has worked on a second revision of his AltiVec implementation.  (here 
http://lindahl.sbc.su.se/software/altivec/altivec-hmmer,-version-2.html) 
Does anyone know actual performance numbers or where I could test code 
snippits?

One of the big questions I have is why there is no fast SSE-2 version (we 
have a cluster of Pentium IV's here we'd like to try running it on).  The 
article here 
http://bioinformatics.org/pipermail/biodevelopers/2003-January/000151.html 
claims that it's a lack of a  vector max instruction, but that doesn't seem 
like the whole story, is it? Can someone point me to the bleeding edge 
performance of HMMer on x86?

I've been going through the documentation for HMMer and have found a few 
hints at the details of hmmer usage, but I'd be quite interested in knowing 
the steps in a real-world HMMer workflow.  Are hundreds of queries produced 
at once? or is generally a single carefully aligned hmm model fed through a 
rather largish database to produce a result an hour later.  Would speeding 
this computation up on an x86 help the average biologist in the lab: what 
speedup would actually be notable?

I've found some Pfam databases online-- is there a more comprehensive one 
that I can run benchmarks on, for instance, to study the SSE version 
produced in the Jan 2003 post?

Anyhow thanks in advance, I'd appreciate answers to any of these questions 
at all :-)
Daniel