[Bioclusters] non linear scale-up issues?

David Gayler bioclusters@bioinformatics.org
Sun, 9 May 2004 12:22:59 -0500

I am a newbie and not a bio guy, but rather a computer science guy.
I am interested in why you are seeing these kind of horrible non-linear
Scale-out numbers. Please excuse my ignorance, but is the problem that there
are dependencies between jobs and this doesn't scale? What exactly is the
relationship of this problem to MPI? I just have to wonder if figuring out a
better way to divide and conquer has any merit. I am interested in y'alls
feedback as my company is working on a Windows .NET based Grid solution and
we want to focus on the bioinformatics community. It seems to me that a lot
of researchers spend time worrying about getting faster results
(understandably), however it doesn't seem like there is much in the way of
cycle-stealing grid software solutions that are flexible, secure, and easy
to use. I want to know what is missing currently to get these faster results
reliably, despite hardware faults, etc.

I am aware of Condor (free), DataSynapse, Platform Computing, and others. I
am interested in knowing what is, if anything, lacking in these solutions.
Thanks in advance.

> example web servers and services where you need rapid response for
> single, or small numbers of jobs.

We (well, the ensembl-ites) do run a small amount of mpi-clustalw. The
algorithm scales OK for small alignment (but they run quickly, so why
bother?) but is horrible for large alignments.

These are figures for an alignment of a set of  9658 sequences, running on
Dual 2.8GHz PIV  machines with gigabit.

Ncpus 	Runtime 	Efficiency
----  	------- 	-----------
2 	28:21:33	1
4   	19:49:05	0.72
8 	14:49:02	0.48
10  	14:09:41	0.4
16  	13:37:36	0.26
24  	13:00:30	0.18
32  	12:48:39	0.14
48  	12:48:39	0.09
64  	11:19:40	0.08
96  	11:30:09	0.05
128 	11:13:28	0.04

However, although the scaling is horrible, it does at least bring the
runtime down to something more manageable. MPI clustalw only gets run for
the alignments that the single CPU version chokes on. It may not be
pretty, but at least you do get an answer, eventually. Horses for courses
and all that.

> Guy/Tim - did you ever deploy that HMMer PVM cluster we talked about
> for the Pfam web site?

It's on the ever-expanding list of things to do. So, does anyone here have
any opinions/experience  on the PVM verison of HMMer?

