[Bioclusters] Clueless about Mosix

14 May 2002 12:30:51 -0400

(cc'ing the developers list, as these topics may be of interest to
others)

On Tue, 2002-05-14 at 11:00, Ivo Grosse wrote:
> Chris Dagdigian <dag@sonsorol.org> wrote on Mon, 13 May 2002:
> 
> > It would be counter productive if they were not aware of each other -- 
> > if they played nicely together then there is certainly a place for batch 
> > scheduling in a MOSIX-type environment.
> 
> I have a few naive (and losely coupled) questions:
> 
> 1. even if Mosix and the scheduler were aware of each other, but would 
> operate on the same time scale, then that could (theoretically) lead to 
> terrible confusion (oscillations, resonance, even chaos ... in the 
> sense of nonlinear dynamics and chaos theory), correct?  Do those 
> phenomena occur in reality?  What are the time scales of Mosix and 
> typical schedulers, say, SGE?  I mean:

All of this is tunable, so you should be able to escape realms of
thrashing.  Hopefully Ron Chen can speak to how to tune SGE, and lets
see if we can get Moshe Bar to comment on tuning Mosix.

> - how often does Mosix check the state of the nodes?  (Once every 
> second?)

This would need to be answered by a Mosix guru.

> - how often does SGE check the state of the nodes?

Not sure, but as I remember, SGE doesnt poll the nodes, the nodes do
effectively "pulls" from the server.

> - how long does it take a process to be migrated, or to be started?

Ugh... process start is fast (maybe a few percent off the normal start,
due to more state information).  Process migration is basically
snapshotting CPU, memory, socket, file handle state and transporting
them to remote system.  It should not be that long on a lightly loaded
server, though it can be on a heavily loaded server.

Under Scyld (Don, correct me if I am wrong), this process is done on a
"head" node and transported to the computing nodes.  Under Mosix, every
node is a head node... or something like that.  It is a somewhat
different model of computing.  Scyld is more true SSI, though I dont
think you can malloc more memory than a computing node has and use a
distributed shared memory bit.  This would be cool though.  Mosix is
more of a process migrator with pipes back to the original running
machine.

I defer of course to the Mosix and Scyld experts on this.

> 2. can Mosix be used without a scheduler?  I mean: assume we start 
> 100,000 jobs on a 100-nodes cluster.  Is Mosix smart enough to start 
> only 100 jobs and keep the others in a queue?

You should be able to use the native "batch" command.  See the man
page.  This might not work very well though, as batch is a really
minimal scheduler... wakes up and looks at CPU load.  If load is less
than 0.7, then launch next job in queue.

> 3. if Mosix is not smart enough to do job queuing, then I think there 
> *must* be schedulers that work together with Mosix.  Hi Mosix users, 
> which schedulers do you use?  Did you encounter any confusion phenomena 
> (see question 1.) where the scheduler and Mosix trick each other?

I think Mosix can be told to migrate or not migrate a process.  This can
be set at runtime as I remember.  Scyld has a different concept, where
the migration needs to occur to free up the head for its other tasks. 
Here a scheduler is needed, but it would need to be instructed about how
to deal with Scyld (which Don and team have done with PBS).

> A Mosix-unrelated question:
> 
> 4. SGE is capable of rearranging jobs in the queue according to 
> priority.  But is SGE capable of stopping already running jobs if 
> higher-priority jobs are in the queue?  If not, which schedulers can do 
> that?

I know LSF can do this via checkpointing.  I dont know what SGE
can/cannot do in this area.  I presume something similar.

> 
> Best regards, Ivo
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters
-- 
Joe Landman,
email: landman@scientificappliance.com
web  : http://scientificappliance.com