[Bioclusters] RE: non linear scale-up issues?

Chris Dagdigian bioclusters@bioinformatics.org
Mon, 10 May 2004 21:02:47 -0400


{ my $.02 }

If you want to talk about specific bioinformatics pain points...

One of the constant problems is that the informatics application mix can 
often be more performance limited by file and disk IO bandwidth than CPU 
power.

This is one of our pain points -- yeah we have tons of problems that are 
embarassingly parallel in nature which makes them easy to solve with 
clusters/grids/etc. but in order to solve some of these problems we need 
to sling gigabyte or even terabyte volumes of data around.

Seti@home and other desktop cycle stealing approaches work great if you 
have a serial computing problem that needs tons of CPU time but 
relatively small amounts of data traffic. There are absolutly 
bioinformatics problem areas that can work great in this setting.

But...

Slinging a terabyte or two of traffic over the same worm-rotten, 
ocasionally-managed corporate network that handles things like payroll, 
HR, business apps etc. just to get some CPU cycles from a bunch of cheap 
$900 desktop CPUs can be, um... problematic.  It gets even worse if you 
factor in having to traverse 2 firewalls that sit between a 
bandwidth-constrained point to point WAN link. Just imagine the finger 
pointing that will go on between the network folks, the IS helpdesk and 
the people who think "you broke my PC!" and blame the grid client for 
every future crash or problem.

Lets face it, CPU power is dirt cheap and the ratio of "grid hype" to 
"what I can actually do with a grid" is totally out of wack.

At this point in time the cost of going out and actually *purchasing* 
the CPU power you need (and building a predictable, reliable computing 
resource from the bits you bought) may actually cost less in terms of 
effort, time-to-production, soul killing stress and real actual money.

Obviously this is a generic opinion. My overall message I guess is "if 
you want to keep your job, make sure you do very careful research, cost 
analysis and small scale testing before jumping on the grid bandwagon"

-Chris



David Gayler wrote:

> Thanks for the feedback,
> 
> I am aware of United Devices. They seem to have a very good solution and it
> is multi-platform too. I am interested in any pain points you have with
> their or similar technology? Are there features or functionality you would
> like to see that are missing, specifically geared towards bioinformatics
> research?
> 
> If I understand you, you are saying that this is a good example of where
> grid technology is delivering on the promise. It is pretty clear that when
> the problem is 'embarrassingly parallel', enterprise grid (cycle stealing)
> solutions (like UD's) can be an intelligent way to use one's current IT
> infrastructure investment. I guess what I am looking for is what could be
> made better.
> 
> Also, what do you mean by standards folks? Are you talking about Globus
> Toolkit and its ever evolving architecture (now mainly web-service based)?
> What do you see as the threat with this?
>  
> IMHO, MPI is just fine for clusters with good links where security is of no
> or little concern, however it really isn't made for loosely coupled networks
> and cycle stealing scenarios. It certainly never has had security in mind.
> These are the kind of things that, if not baked into the technology, can
> turn your Grid nodes into a security risk and ultimately a bunch of zombies
> waiting to be used for a DDOS attack or worse. Once that happens, trying to
> trust or even keep your Grid could be a tough political battle.
> 
> 
> 
> 
> On Sun, 9 May 2004, Rayson Ho wrote:
> 
> 
>>UD: http://www.grid.org/stats/
>>
>>325,033 years of CPU time collected
> 
> 
> he he :-)  Rayson knows his stuff, as do United Devices.  You will not see
> mpi any where near this 300k+ CPU years.  Good point.
> 
> 
>>BOINC: http://boinc.berkeley.edu
> 
> 
> This looks great.  Classic 'grid hype' this certainly is not.  Good stuff.  
> Thanks for sending on the link, I really hope that the standards folk keep
> the hell away from this.  If they do it may have a real chance...
> 
> Best regards,
> 
> J.
> 
> --
> James Cuff, D. Phil.
> Group Leader, Applied Production Systems
> The Broad Institute. 320 Charles Street, Cambridge,
> MA. 02141-2023.  Tel: 617-252-1925  Fax: 617-258-0903
> 
> -----Original Message-----
> From: bioclusters-admin@bioinformatics.org
> [mailto:bioclusters-admin@bioinformatics.org] On Behalf Of
> bioclusters-request@bioinformatics.org
> Sent: Monday, May 10, 2004 11:01 AM
> To: bioclusters@bioinformatics.org
> Subject: Bioclusters digest, Vol 1 #483 - 4 msgs
> 
> When replying, PLEASE edit your Subject line so it is more specific
> than "Re: Bioclusters digest, Vol..."  And, PLEASE delete any
> unrelated text from the body.
> 
> 
> Today's Topics:
> 
>    1. non linear scale-up issues? (David Gayler)
>    2. Re: MPI clustalw (Tim Cutts)
>    3. Re: non linear scale-up issues? (Rayson Ho)
>    4. Re: non linear scale-up issues? (James Cuff)
> 
> --__--__--
> 
> Message: 1
> From: "David Gayler" <dag_project@sbcglobal.net>
> To: <bioclusters@bioinformatics.org>
> Date: Sun, 9 May 2004 12:22:59 -0500
> Subject: [Bioclusters] non linear scale-up issues?
> Reply-To: bioclusters@bioinformatics.org
> 
> Hi,
> I am a newbie and not a bio guy, but rather a computer science guy.
> I am interested in why you are seeing these kind of horrible non-linear
> Scale-out numbers. Please excuse my ignorance, but is the problem that there
> are dependencies between jobs and this doesn't scale? What exactly is the
> relationship of this problem to MPI? I just have to wonder if figuring out a
> better way to divide and conquer has any merit. I am interested in y'alls
> feedback as my company is working on a Windows .NET based Grid solution and
> we want to focus on the bioinformatics community. It seems to me that a lot
> of researchers spend time worrying about getting faster results
> (understandably), however it doesn't seem like there is much in the way of
> cycle-stealing grid software solutions that are flexible, secure, and easy
> to use. I want to know what is missing currently to get these faster results
> reliably, despite hardware faults, etc.
> 
> I am aware of Condor (free), DataSynapse, Platform Computing, and others. I
> am interested in knowing what is, if anything, lacking in these solutions.
>  
> Thanks in advance.
> 
> -----Original Message-----
> From: bioclusters-admin@bioinformatics.org
> [mailto:bioclusters-admin@bioinformatics.org] On Behalf Of
> bioclusters-request@bioinformatics.org
> Sent: Sunday, May 09, 2004 11:01 AM
> To: bioclusters@bioinformatics.org
> Subject: Bioclusters digest, Vol 1 #482 - 1 msg
> 
> When replying, PLEASE edit your Subject line so it is more specific
> than "Re: Bioclusters digest, Vol..."  And, PLEASE delete any
> unrelated text from the body.
> 
> 
> Today's Topics:
> 
>    1. Re: MPI clustalw (Guy Coates)
> 
> -- __--__-- 
> 
> Message: 1
> Date: Sun, 9 May 2004 11:17:16 +0100 (BST)
> From: Guy Coates <gmpc@sanger.ac.uk>
> To: bioclusters@bioinformatics.org
> Subject: [Bioclusters] Re: MPI clustalw
> Reply-To: bioclusters@bioinformatics.org
> 
> 
>>example web servers and services where you need rapid response for
>>single, or small numbers of jobs.
> 
> 
> We (well, the ensembl-ites) do run a small amount of mpi-clustalw. The
> algorithm scales OK for small alignment (but they run quickly, so why
> bother?) but is horrible for large alignments.
> 
> These are figures for an alignment of a set of  9658 sequences, running on
> Dual 2.8GHz PIV  machines with gigabit.
> 
> Ncpus 	Runtime 	Efficiency
> ----  	------- 	-----------
> 2 	28:21:33	1
> 4   	19:49:05	0.72
> 8 	14:49:02	0.48
> 10  	14:09:41	0.4
> 16  	13:37:36	0.26
> 24  	13:00:30	0.18
> 32  	12:48:39	0.14
> 48  	12:48:39	0.09
> 64  	11:19:40	0.08
> 96  	11:30:09	0.05
> 128 	11:13:28	0.04
> 
> However, although the scaling is horrible, it does at least bring the
> runtime down to something more manageable. MPI clustalw only gets run for
> the alignments that the single CPU version chokes on. It may not be
> pretty, but at least you do get an answer, eventually. Horses for courses
> and all that.
> 
> 
> 
>>Guy/Tim - did you ever deploy that HMMer PVM cluster we talked about
>>for the Pfam web site?
>>
> 
> 
> It's on the ever-expanding list of things to do. So, does anyone here have
> any opinions/experience  on the PVM verison of HMMer?
> 
> 
> Guy

-- 
Chris Dagdigian, <dag@sonsorol.org>
Independent life science IT & informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net