[Bioclusters] topbiocluster.org

Fri Jun 24 07:30:46 EDT 2005

On 24 Jun 2005, at 2:20 am, Joe Landman wrote:

> One fundamental error made in the Spec numbers is reducing the  
> multidimensional performance space to single numbers using a  
> dubious practice of creating an average (over things with very  
> different characteristics/dimensions).

That's true, but it's still a useful number, and people need it when  
they take their analysis to the boss to ask for the money.  :-)  We  
use a geometric mean of relative performance for our evaluations - we  
run some fairly standard runs of the real applications, time each and  
scale them relative to some standard box we already have  
(traditionally this has been relative to a 466 MHz DS10L, the  
standard machine in the cluster that our esteemed former leader Dr  
Cuff set up, all those years ago)

So, we have the vector of results available to us, but the final  
number is the geometric mean of this vector of relative performance  
figures.  geometric means handle the situation where you have one  
application out of many that runs spectacularly well or badly rather  
more nicely than the arithmetic mean.  This has been particularly  
noticeable in our benchmark;  genewise is a code which varies its  
performance very widely between architectures - much more so than  
BLAST does, in my experience.  So machines which run genewise  
spectacularly well, look unfeasibly good if you use the arithmetic  
mean, but once you use the geometric mean of the relative  
performances, it looks more reasonable.  It's worked well enough for  
us, anyway.

Ultimately, of course, that final reduction of the vector to a single  
number has to be a site-specific formula with weightings for each  
element of the vector determined by the requirements of the  
organisation.

>   I would argue for a vector, and the vector would be per  
> application.  That is have a blast vector, with blastx, psiblast,  
> rpsblast,... .  Have a HMMer vector with an hmmalign, pfam  
> search,...  .  Have a data transfer vector:  time to copy nt to all  
> nodes in cluster/number of nodes.  Have a web services vector.    
> This way you don't lose information (some systems may be better  
> designed for one subset of tasks than another).
>

Right.

> bonnie is a start, but I would question as to how well correlated  
> against use cases it is.  I would think that a more typical use  
> case would involve remote queries of a large database, moving large  
> databases, local queries of large databases, etc.

Bonnie is still important; lots of places create thousands of tiny  
files as part of their process.  We all know what a bad thing this is  
to do, but bonnie's file creation benchmark is a pretty good measure  
of how good the system is at this aspect, as well as how good it is  
at streaming I/O.

I suspect it will also be a good one for testing cluster filesystem  
performance in general, though; the real performance killers in are  
usually in metadata operations, so running Bonnie's file creation  
tests on lots of client machines talking to the same filesystem at  
the same time will be a pretty good stress test of that.

However, it doesn't tell us anything about mmap() performance, which  
I suspect is an important thing to test, since many of our  
applications make heavy use of it (and mmap() is, in our experience,  
the first thing to break if a cluster filesystem isn't up to scratch)

> Please folks, suggest some tests which stress IO and the rest of  
> the system.  Specifically real workloads.   They are the only  
> benchmarks that matter.

Well, our benchmark runs a mixture of various BLAST flavours against  
various databases, genewise, exonerate and HMMER, all at the same  
time (single threaded jobs, one per processor), and re-runs them  
repeatedly.  This is pretty close to our cluster's normal workload.   
The real difficulty with getting meaningful benchmarks is disk  
caching; we do try to run the queries against large databases, and in  
a deliberately non-optimal order, in order to minimise the help disk  
caching makes.  As the memory in machines gets much larger, we're  
probably going to have to expand the benchmark significantly to  
compensate.  Either that or you have to deliberately unmount and  
remount the filesystem to clear the disk cache between benchmark  
runs,  which isn't always feasible,

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233