grid computing rant (was Re: [Bioclusters] Pure Java-based GreenTea P2P Distributed Network Computing platform for bioinfo)

chris dagdigian bioclusters@bioinformatics.org
Wed, 16 Oct 2002 16:00:40 -0400


John Sheah wrote:

>  
> Rather than everyone buys extra hardware to build his own linux 
> clusters, why not build an Internet-wide collaborative computing 
> network using the extremely light-weight, simple to use, very 
> intuitive GreenTea P2P Computing network?


Grid computing in the life sciences is a joke.

Nothing but empty press releases and vendors who have tortured the 
definition of "grid" into meaninglessness.
(Department Grid?...Back in the day we called those things compute farms.)

The people selling grids into the life sciences are promising the world: 
("transparent, utility style computing on demand, wheeee!!!") and 
actually delivering technology that was old news a decade ago (batch 
queing systems that can move jobs w/ associated metadata between 
geographically separate systems and do a bit of resource 
brokering/allocation).

The core foundation for real grid computing has already been laid with 
things like gridengine. platform computing's software, globus and 
resource brokers etc.  The part that has not been done yet is all the 
related middleware and infrastructure componants that will make the 
process of extending a local computing environment past the firewall a 
simple, sane and easy experience. All the marketspeak about "seamless 
resource pooling between viritual organizations assembled on the fly!" 
is total BS. There is a heck of a lot of work that needs to be done to 
make the PKI infrastructure and whole resource allocation/control system 
a simple point-and-click process before we really are able to pool and 
break distributed resources on the fly.

I've grown cynical of this space over the last year (can you tell?) and 
my thoughts boil down to basically this:

(1) What CPU cycle shortage? Raw compute power is so cheap, so easy to 
aquire and so well understood that 95% of the time it will be (a) 
easier, (b) faster, (c) more secure and (d) cheaper to purchase enough 
dedicated dedicated research systems to tackle your problem(s) and park 
them in a datacenter somewhere (Sun would call this a grid though :). 
Even providing fast VPN access into the datacenter for the remote users 
will be easier and far less of a management and administrative nightmare 
than setting up some enterprise WAN-based distributed computing grid.

(2) The current SETI-at-home and P2P distributed computing players have 
grabbed all of the low hanging fruit which is basically massively 
CPU-bound applications where there is lots of incentive to rewrite or 
port the algorithim to the distributed computing platform API du jour. 
What about the huge numbers of data and I/O bound applications we have? 
What companies are going to want to sling terabyte volumes of data 
around their corporate LAN or expansive WAN links just to grab CPU 
cycles from a $900 desktop box?

I think real transparent utility-style grid computing will eventually 
exist in a few years. I just have a problem with people trying to push 
this vision now when it is either impossible or a total 
nightmare-with-rigid-use-cases deployment scenario.

My $.02 of course.

-Chris