[Bioclusters] SGE and local output
Joe Landman
bioclusters@bioinformatics.org
15 May 2002 14:02:37 -0400
Ivo,
I will need to refer you to an SGE expert for this. These are SGE
specific questions, and I dont know it well enough to comment.
Joe
On Wed, 2002-05-15 at 13:34, Ivo Grosse wrote:
> Hi Joe and others,
>
> in our case of running 30,000 Blast jobs on a 100-CPU cluster you
> recommended to not write the output directly to the central file
> server, but to write the output to the local node, and to collect the
> output in the end in a non-random manner, in order to avoid NFS server
> hickups and the like.
>
> I love that idea, but people from Germany have the strange habit of
> always trying to think of the worst possible scenario before accepting
> a new idea, so here comes a set of German questions:
>
> Assume one slave node (A) dies. I suppose that SGE will restart the
> non-finished jobs X from node A on a new node B.
>
> Question 1: Is that correect?
>
> Assume the dead node (A) comes back to life at some point.
>
> Question 2: Is SGE smart enough to notice that jobs X that were started
> before node A went down have been restarted on node B, and is SGE smart
> enough to remove the old (and useless) output of jobs X on node A?
>
> Question 3: Alternatively, can SGE be told to try to restart jobs X on
> node A after that node is back to life? How?
>
> Question 4: If the answer to Q4 is yes, can SGE restart jobs X at the
> point where they stopped, or does SGE always restart jobs from the
> beginning? I mean: does SGE support checkpointing? How?
>
> Best regards, Ivo
>
> _______________________________________________
> Bioclusters maillist - Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters
--
Joe Landman,
email: landman@scientificappliance.com
web : http://scientificappliance.com