Tim Cutts wrote: > > On 6 Jan 2005, at 5:49 pm, Malay wrote: > >> Rayson Ho wrote: >> >>> Gridengine currently has the "AND" operator job dependency: >>> A,B -> C >>> ie. we need to wait for job A and B finish before we start job C. >>> There are discussions on the SGE dev mailing list about adding the OR >>> job dependency: >>> A|B -> C >>> So job C will start as soon as job A or job B finishes. >>> I am wondering if this is useful in bioinformatics job flows?? >> >> >> As far as bioinformatics goes I am afraid most of the bioinformatics >> applications are embarassingly independant :) Although such dependancy >> resolution issues will have it's niche application but I guess it's >> very limited as far as bioinformatics goes. > > > I don't think that's true - when you consider something like a gene > annotation process, there are lots of dependencies. Consider what goes > on with Ensembl; before any analyses are performed, the sequences have > to be dusted and RepeatMasked. After that raw features such as blast > hits, ab initio gene predictors and EST alignments can be calculated. > Once the BLAST hits have been done, genewise alignments can be performed > (using the BLAST results to narrow down the areas genewise needs to > analyse). Only once the EST alignments, ab initio predictors and > genewise are complete can the code be run to combine these into a > coherent set of gene structures. A pipeline of any kind by nature depends on previous process. A -> B -> C I don't understand what do you mean by jobs here. These rules can't be hardcoded in scheduler, or can you? In bioinformatics each of these steps is acutally not a job at all they are what they called "steps". Each of these steps like A is composed of 1000,000 BLAST jobs which has no dependency on each other. > > Although each of these processes consists of thousands of independent > jobs, each type of analysis is dependent on the completion of the > previous ones. As I said. But do you actually suggest completing a "job" pipeline before a "step" pipleline. Do you actually carry out the analyis of a small reginon of genome sequence and finish it to end, or finish the blast searches for the whole genome at a time? > As it happens, all of these dependencies are handled in the Ensembl > RuleManager rather than by the scheduling system. That what I meant! The whole dependency issue is in user space, and can be very well maintained my user software. In a software world, unnecessary means, "thing can be managed by easier way". > They're all AND dependencies as far as I can tell, and I've never needed > anything other than AND dependencies in by own pipelines, but I wouldn't > like to claim that OR dependencies aren't useful to someone. > You are an expert Tim. But majority of the cluster users are not like you doing genome pipelines at all. When I can't say for all of them, what I can say is, I never used any dependency resolution system on any scheduler so far. I never felt needing it. All the rules I made are in the software. But may be I am streching my own experience for others. -Malay mbasu(at)ncbi.nlm.nih.gov