[Pipet Devel] batch processing

Tue May 23 17:57:55 EDT 2000

> For those not familiar with bioinformatics and our plans for Loci, we're
> talking about the difference between sequential (non-streaming) and
> non-sequential (streaming) data transfer.  A bioinformatics (or scientific)
> computation is typically performed "all at once", returning results when
> finished.  Many are often long-running (depending on the computer), taking
> minutes, hours, or days.
> 
> While designing Loci, my first thought was that these "jobs" could be handled
> as they are on a supercomputer: with batch processing.  This allows some
> information to be returned about the progression of the job, cpu time used,
> etc.  It also allows jobs to be scheduled to run at a certain time, with a set
> priority, etc.
> 
> You may think that batch processing has disappeared along with expensive
> computer time.  But I think it may be particularly important for Piper,
> because (1) the system needs to report back to the user when the job has been
> "gone" for a long period of time (or else the user may think the data has been
> lost) and (2) because we Piper will use "other people's computers" (we want
> those people to decide when jobs will be run and to decide the resources to be
> used).
> 
> Thoughts?

I think at least part of the work can be done by "regular" Overflow (pl) nodes.
Just as Overflow defines Probes (which can be viewed as "breakpoints"), we could
have log nodes that log whatever data "passes through them" and send the log to
a central logging system that would correspond to the GMS part. Does that make
sense?

	Jean-Marc