[BiO BB] Re: Linear Bioinformatics workflow?

Amir Karger akarger at CGR.Harvard.edu
Thu Oct 6 15:46:19 EDT 2005

> Chris Dwan <cdwan at bioteam.net> said:
[liberally snipped and edited]

Hi, Chris.

> That said, if you narrow the problem enough that it doesn't have to  
> do everything in the world, things get a lot simpler.

Exactly! We *have* a narrow problem: scripting command-line calls to
bioinformatics or formatting tools. I'm narrowing it further by allowing
only 1-D problems. Best of all, biologists already know how to write

To edit/shorten your requirements list for a linear workflow app (which we
can call 1DScript until our marketing team plays with it for a while):
- choose from a limited set of tools
- build a protocol
- save a protocol

That doesn't sound too hard! In fact, the solution already exists. Use
iNquiry to run a set of jobs in a row. For each job, find the command-line
command that iNquiry (Pise) runs, and paste it into Notepad. Save as a shell
script. Voila!  If only we knew some iNquiry developers, we could ask them
to integrate a command-line history, and we'd be set. Is it that hard to pop
that into a GUI?

> Most of the commercial and free workflow engines will do this, but it  
> sounds like the overhead of learning to use them is a bit much for  
> your users?

Yes. Some of our clients will be folks who use computers once a month. Do
they want to devote the time to learn how to use these workflow
applications, which - since they can do so much more than 1DScript - are
necessarily going to be more complex?

In addition, I'm getting the impression - correct me if I'm wrong - that the
usual model for Inforsense et al. is that in-house programmers create
workflows which users then use.  And we have very few in-house programmers.
(Approximately, hm, let's see, carry the 4... um, one.) 

> "I have a [protein sequence | genomic region | compound | mass spec  
> output] and I want to find out everything in the entire world about it.  

Finding websites is work, but a different kind of work than learning and
*remembering* (during that month in the lab) a new language/interface. In my
possibly totally wrong opinion, a biologist who uses computers only
occasionally is more willing to do the searching websites kind of work. So
this biologist can seek out the websites, and paste a bunch of
(parameterized) wgets into a protocol. 

> Ideally, all the result values would be converted to a common  
> vocabulary, format, and normalization.  

Just add a few Scriptome tools (<plug> http://cgr.harvard.edu/cbg/scriptome
</plug>) to your wgets.

> I would rather not go to  
> every website in the world, or even know about all those websites."   

Sorry, out of scope. Buy a programmer. Or hope that someday 1DScript
protocols are online and someone wrote one that's close to what you want so
you can download and tweak it.


> -Chris Dwan
> Amir Karger wrote:
> > Several people mentioned 2-D graphical workflow tool in a  
> > "Bioinformatics
> > workflow?" thread on bioclusters. (I'm redirecting my 
> non-cluster-y  
> > question
> > here.) While still a newbie, I'm getting the impression that many
> > bioinformatics workflows are mostly linear, with obvious important
> > exceptions like conditions and loops. For example, I had a client  
> > last week
> > who wanted to script:
> >
> > 1 blast [sequence=..., program=...] > blast.out
> > 2 get hits from blast.out > blast.hits
> > 3 find hits with 50-70% sequence identity from blast.hits >  
> > blast.good_hits
> > 3 download/fastacmd sequences for IDs in blast.good_hits > 
> hits.fasta
> > 4 clustalw hits.fasta > publishable_result (OK, not really)

More information about the BBB mailing list