On 24 Jun 2005, at 2:20 am, Joe Landman wrote: > One fundamental error made in the Spec numbers is reducing the > multidimensional performance space to single numbers using a > dubious practice of creating an average (over things with very > different characteristics/dimensions). That's true, but it's still a useful number, and people need it when they take their analysis to the boss to ask for the money. :-) We use a geometric mean of relative performance for our evaluations - we run some fairly standard runs of the real applications, time each and scale them relative to some standard box we already have (traditionally this has been relative to a 466 MHz DS10L, the standard machine in the cluster that our esteemed former leader Dr Cuff set up, all those years ago) So, we have the vector of results available to us, but the final number is the geometric mean of this vector of relative performance figures. geometric means handle the situation where you have one application out of many that runs spectacularly well or badly rather more nicely than the arithmetic mean. This has been particularly noticeable in our benchmark; genewise is a code which varies its performance very widely between architectures - much more so than BLAST does, in my experience. So machines which run genewise spectacularly well, look unfeasibly good if you use the arithmetic mean, but once you use the geometric mean of the relative performances, it looks more reasonable. It's worked well enough for us, anyway. Ultimately, of course, that final reduction of the vector to a single number has to be a site-specific formula with weightings for each element of the vector determined by the requirements of the organisation. > I would argue for a vector, and the vector would be per > application. That is have a blast vector, with blastx, psiblast, > rpsblast,... . Have a HMMer vector with an hmmalign, pfam > search,... . Have a data transfer vector: time to copy nt to all > nodes in cluster/number of nodes. Have a web services vector. > This way you don't lose information (some systems may be better > designed for one subset of tasks than another). > Right. > bonnie is a start, but I would question as to how well correlated > against use cases it is. I would think that a more typical use > case would involve remote queries of a large database, moving large > databases, local queries of large databases, etc. Bonnie is still important; lots of places create thousands of tiny files as part of their process. We all know what a bad thing this is to do, but bonnie's file creation benchmark is a pretty good measure of how good the system is at this aspect, as well as how good it is at streaming I/O. I suspect it will also be a good one for testing cluster filesystem performance in general, though; the real performance killers in are usually in metadata operations, so running Bonnie's file creation tests on lots of client machines talking to the same filesystem at the same time will be a pretty good stress test of that. However, it doesn't tell us anything about mmap() performance, which I suspect is an important thing to test, since many of our applications make heavy use of it (and mmap() is, in our experience, the first thing to break if a cluster filesystem isn't up to scratch) > Please folks, suggest some tests which stress IO and the rest of > the system. Specifically real workloads. They are the only > benchmarks that matter. Well, our benchmark runs a mixture of various BLAST flavours against various databases, genewise, exonerate and HMMER, all at the same time (single threaded jobs, one per processor), and re-runs them repeatedly. This is pretty close to our cluster's normal workload. The real difficulty with getting meaningful benchmarks is disk caching; we do try to run the queries against large databases, and in a deliberately non-optimal order, in order to minimise the help disk caching makes. As the memory in machines gets much larger, we're probably going to have to expand the benchmark significantly to compensate. Either that or you have to deliberately unmount and remount the filesystem to clear the disk cache between benchmark runs, which isn't always feasible, Tim -- Dr Tim Cutts Informatics Systems Group, Wellcome Trust Sanger Institute GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233