[Bioclusters] Best ways to tackle migration to dedicated cluster/Farm

Chris Dwan (CCGB) bioclusters@bioinformatics.org
Thu, 25 Mar 2004 11:37:03 -0600 (CST)

> Excuse my ignorance, I've never used Condor but what's a pull based
> system? What would be the advantages over PBS/SGE (which I assume are push
> type systems)?

In a "push" system, a central scheduler assigns work to the compute nodes.
Nodes sit idle until work is assigned to them.

In a "pull" system, each node is responsible for deciding when to make
itself available for work, for requesting an appropriate job, and for
keeping track of the administrative details associated with the job.

In my experience, push systems (including all centrally scheduled
clusters using PBS, SGE, NQE, LSF, and the like) are easier to debug,
since you only have to deal with one scheduler.  They are sometimes less
efficient, since if the scheduler is too busy to assign work to a node (or
doesn't know that the node exists, or is dumb about planning use of
resources, or any of a number of other cases) then the node in question
may go unused.

Pull systems (including SETI@home and friends, as well as Platform's grid
offering) are best suited for cycle stealing and ad-hoc clustering.  They
can be really troublesome to debug, since what gets lost in the case of
errors is usually job state, instead of cycles on nodes.  Instead
of the cluster operating at less than peak efficiency, it loses jobs.
This can be frustrating for both users and admins.

Condor (at least when I used it a year or so ago) is really a hybrid
system.  There is a central "matchmaker" which is responsible for pairing
"classads" from compute resources with jobs needing to be done.  Condor's
biggest strength is the thinking that Myron Livney and his group put into
the social side of distributed computing.  Condor has great facilities for
managing the 'owner comfort' problem:  Grid computing is great, but only
if I get first crack at machines that I own, my competitors never get
to use them, and I get to have my machine back the instant I move my

-Chris Dwan