[Bioclusters] topbiocluster.org
Tim Cutts
tjrc at sanger.ac.uk
Fri Jun 24 07:30:46 EDT 2005
On 24 Jun 2005, at 2:20 am, Joe Landman wrote:
> One fundamental error made in the Spec numbers is reducing the
> multidimensional performance space to single numbers using a
> dubious practice of creating an average (over things with very
> different characteristics/dimensions).
That's true, but it's still a useful number, and people need it when
they take their analysis to the boss to ask for the money. :-) We
use a geometric mean of relative performance for our evaluations - we
run some fairly standard runs of the real applications, time each and
scale them relative to some standard box we already have
(traditionally this has been relative to a 466 MHz DS10L, the
standard machine in the cluster that our esteemed former leader Dr
Cuff set up, all those years ago)
So, we have the vector of results available to us, but the final
number is the geometric mean of this vector of relative performance
figures. geometric means handle the situation where you have one
application out of many that runs spectacularly well or badly rather
more nicely than the arithmetic mean. This has been particularly
noticeable in our benchmark; genewise is a code which varies its
performance very widely between architectures - much more so than
BLAST does, in my experience. So machines which run genewise
spectacularly well, look unfeasibly good if you use the arithmetic
mean, but once you use the geometric mean of the relative
performances, it looks more reasonable. It's worked well enough for
us, anyway.
Ultimately, of course, that final reduction of the vector to a single
number has to be a site-specific formula with weightings for each
element of the vector determined by the requirements of the
organisation.
> I would argue for a vector, and the vector would be per
> application. That is have a blast vector, with blastx, psiblast,
> rpsblast,... . Have a HMMer vector with an hmmalign, pfam
> search,... . Have a data transfer vector: time to copy nt to all
> nodes in cluster/number of nodes. Have a web services vector.
> This way you don't lose information (some systems may be better
> designed for one subset of tasks than another).
>
Right.
> bonnie is a start, but I would question as to how well correlated
> against use cases it is. I would think that a more typical use
> case would involve remote queries of a large database, moving large
> databases, local queries of large databases, etc.
Bonnie is still important; lots of places create thousands of tiny
files as part of their process. We all know what a bad thing this is
to do, but bonnie's file creation benchmark is a pretty good measure
of how good the system is at this aspect, as well as how good it is
at streaming I/O.
I suspect it will also be a good one for testing cluster filesystem
performance in general, though; the real performance killers in are
usually in metadata operations, so running Bonnie's file creation
tests on lots of client machines talking to the same filesystem at
the same time will be a pretty good stress test of that.
However, it doesn't tell us anything about mmap() performance, which
I suspect is an important thing to test, since many of our
applications make heavy use of it (and mmap() is, in our experience,
the first thing to break if a cluster filesystem isn't up to scratch)
> Please folks, suggest some tests which stress IO and the rest of
> the system. Specifically real workloads. They are the only
> benchmarks that matter.
Well, our benchmark runs a mixture of various BLAST flavours against
various databases, genewise, exonerate and HMMER, all at the same
time (single threaded jobs, one per processor), and re-runs them
repeatedly. This is pretty close to our cluster's normal workload.
The real difficulty with getting meaningful benchmarks is disk
caching; we do try to run the queries against large databases, and in
a deliberately non-optimal order, in order to minimise the help disk
caching makes. As the memory in machines gets much larger, we're
probably going to have to expand the benchmark significantly to
compensate. Either that or you have to deliberately unmount and
remount the filesystem to clear the disk cache between benchmark
runs, which isn't always feasible,
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the Bioclusters
mailing list