[Bioclusters] Nomenclature (was Re: Call for information.)

Joe Landman bioclusters@bioinformatics.org
17 Apr 2002 16:26:16 -0400


On Wed, 2002-04-17 at 10:58, Ivo Grosse wrote:

> Yes, thanks to make that clear.  Hence, my question about recent Blast 
> benchmarks on P3s and P4s and Athlon-MPs meant: does anyone have those 
> 3 numbers (or sets of numbers) for SINGLE nodes?

This is of course, not a simple question to answer properly.

I am trying to create a set of benchmarks which would be helpful in this
characterization.  It is unfortunate, but you would not be able to get a
single number which accurately characterizes performance with any
significant predictive power.  That is, the execution times are as much
a function of the inputs used, the parameters supplied, and whatnot else
as they are of the machine issues (L1, L2, L3 size, memory bandwidth,
etc).

With a single set of tests (BLASTs of specific type against specific
db's) I have been using with older P4's, I have generally found the
performance relation to work something like this:

Performance of Athlon at frequency X > Performance of Pentium III at
frequency X > Performance of pentium IV at frequency X+delta (delta in
the 400 -> 800 MHz area).  

This test as designed stresses the power of the CPU more than it does of
the CPU memory interface.  In the case of BLAST, you spend a large
fraction of your time chasing pointers through memory, and reading short
sections of indexed "database".  BLAST tends to be more sensitive to the
latency of memory than to the bandwidth of memory.  Larger caches may
help particular cases, where various data structures can be made to
reside in those larger caches.  Higher bandwidth memory doesnt seem to
help.  Lower latency memory does.  Compiler quality is critical to its
performance on any architecture: good compilers can make generally low
performance hardware seem good, poor compilers can strangle high
performance hardware.

All this means is that you need to understand the issues surrounding the
tests you are looking at.  No one sequence size, database, algorithm
will be useful for a meaningful comparison.  You can start to get a
feeling by comparing groups of measurements though.

As soon as I have time, I will try to package up my tests for people to
try on their own.  This will eventually be part of a informatics
performance metric project to be hosted on the bioinformatics.org site.

Joe

-- 

Joseph Landman, Ph.D.
Senior Scientist,
MSC Software High Performance Computing
email		: joe.landman@mscsoftware.com
Main office	: +1 248 208 3312
Fax		: +1 714 784 3774