[Bioclusters] OpenPBS problems

Tim Cutts bioclusters@bioinformatics.org
Wed, 3 Dec 2003 09:19:21 +0000


On 02-Dec-03, Donald Becker wrote:
> Back to the core point: to checkpoint a pipeline the in-pipe data has to
>    be throttled and drained, or
>    extracted and stored
> This goes beyond checkpointing a single process.  And a pipeline
> spanning machines is even more interesting.

Careful, there are two meanings of pipeline being used here.  One is the
traditional Unix pipeline, 'foo | bar | baz', which I suspect is what
Don is saying is hard to checkpoint, because of pipeline buffers and so
on, and the other is what most genomics people would think of as a
pipeline, which is a set of analysis jobs which may depend on each
other, and which normally use some mechanism other than Unix pipes to
pass the data from one part of the pipeline to another (in the case of
Ensembl such status is held in the pipeline MySQL database).  It was
this second sort of pipeline that I was talking about, and they are not
too difficult to checkpoint, in theory, especially if you are willing to
re-run an individual blast job or whatever, so you only need to
checkpoint between individual analysis phases.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK