[Bioclusters] metascheduler product survey report by GridwiseTech

Thu Apr 6 00:11:23 EDT 2006

This info was posted to a Grid Engine mailing list a few days ago,  
figured I'd pass it along the relevant details.

GridwiseTech has prepared and published vendor-independent report on  
grid brokers (metaschedulers).  You can find it, free of charge,  
here: http://www.gridwisetech.com/metaschedulers-pr

GridwiseTech defines a metascheduler as:

  "Metascheduling also popularly known as brokering is an important  
piece in a typical computational Grid. A metascheduler is the vital  
part responsible for the load balancing work between sites and data  
centers. Metascheduling is a technology in the Grid that is  
responsible for managing jobs and application workflow, including  
submitting, scheduling, executing, monitoring, stopping, and  
retrieving results of computational jobs."

The report contains a survey and overview of available metascheduling  
implementations including:

     * CSF & Platform CSF Plus
     * Grid Service Broker
     * GridWay
     * Moab Grid Scheduler (aka Silver)
     * EGEE Workload Manager Service (WMS)
     * Nimrod/G and Axceleon EnFuzion
     * MP Synergy
     * Condor-G

I like the report and I'm glad Gridwise took the time to research and  
write it. Metaschedulers are pretty much a mystery to me.

It makes for interesting reading, particularly the details on all the  
functions and features you lose or lack (or have to write yourself)  
when trying to move from a traditional cluster or compute farm to a  
WAN-based system where one has to manage workloads across many  
administrative domains and different grid implementations.  It is a  
very tough problem. Give me the fine-grained resource allocation and  
scheduling policies of Grid Engine or Platform LSF any day.

<rant mode on>
Looks like I'll still continue to be a grid computing cynic in 2006.   
Nothing has really changed -- I look around and see lots of hype and  
little one-off "grid" projects that will never live or grow beyond  
their initial rollout phases (or the lifespan of the academic  
grant).  Also I'm seeing more academic and industry folk who demand  
(or think they *really need* ) WAN-scale grids for no clear  
practical, scientific, IT or business reason. Odd.

Weighing the costs of dirt-cheap CPU power against the complication,  
expense, social/political hassles and operational burden involved in  
setting up and maintaining a WAN-scale grid it still seems cheaper,  
safer and faster (admittedly from my industry biased perspective)  to  
centralize the compute resources and then bring in the *users* over  
WAN or VPN links.

Compute power these days is pretty cheap and easy to source and  
deploy, the real expense is storage, networking and keeping  
everything running smoothly.  Especially in life science where much  
of the application mix is I/O bound rather than CPU bound. The only  
justifications I can think of for spreading this stuff out all over  
the place would be non-science related -- things like disaster  
tolerance, 24/7 computing and business continuity requirements etc.
</rant mode off>

Just my $.02

-Chris