[BiO BB] Re: Linear Bioinformatics workflow?
Amir Karger
akarger at CGR.Harvard.edu
Thu Oct 6 15:46:19 EDT 2005
> Chris Dwan <cdwan at bioteam.net> said:
[liberally snipped and edited]
Hi, Chris.
> That said, if you narrow the problem enough that it doesn't have to
> do everything in the world, things get a lot simpler.
Exactly! We *have* a narrow problem: scripting command-line calls to
bioinformatics or formatting tools. I'm narrowing it further by allowing
only 1-D problems. Best of all, biologists already know how to write
protocols
To edit/shorten your requirements list for a linear workflow app (which we
can call 1DScript until our marketing team plays with it for a while):
- choose from a limited set of tools
- build a protocol
- save a protocol
That doesn't sound too hard! In fact, the solution already exists. Use
iNquiry to run a set of jobs in a row. For each job, find the command-line
command that iNquiry (Pise) runs, and paste it into Notepad. Save as a shell
script. Voila! If only we knew some iNquiry developers, we could ask them
to integrate a command-line history, and we'd be set. Is it that hard to pop
that into a GUI?
> Most of the commercial and free workflow engines will do this, but it
> sounds like the overhead of learning to use them is a bit much for
> your users?
Yes. Some of our clients will be folks who use computers once a month. Do
they want to devote the time to learn how to use these workflow
applications, which - since they can do so much more than 1DScript - are
necessarily going to be more complex?
In addition, I'm getting the impression - correct me if I'm wrong - that the
usual model for Inforsense et al. is that in-house programmers create
workflows which users then use. And we have very few in-house programmers.
(Approximately, hm, let's see, carry the 4... um, one.)
> "I have a [protein sequence | genomic region | compound | mass spec
> output] and I want to find out everything in the entire world about it.
Finding websites is work, but a different kind of work than learning and
*remembering* (during that month in the lab) a new language/interface. In my
possibly totally wrong opinion, a biologist who uses computers only
occasionally is more willing to do the searching websites kind of work. So
this biologist can seek out the websites, and paste a bunch of
(parameterized) wgets into a protocol.
> Ideally, all the result values would be converted to a common
> vocabulary, format, and normalization.
Just add a few Scriptome tools (<plug> http://cgr.harvard.edu/cbg/scriptome
</plug>) to your wgets.
> I would rather not go to
> every website in the world, or even know about all those websites."
Sorry, out of scope. Buy a programmer. Or hope that someday 1DScript
protocols are online and someone wrote one that's close to what you want so
you can download and tweak it.
-Amir
>
> -Chris Dwan
>
> Amir Karger wrote:
>
> > Several people mentioned 2-D graphical workflow tool in a
> > "Bioinformatics
> > workflow?" thread on bioclusters. (I'm redirecting my
> non-cluster-y
> > question
> > here.) While still a newbie, I'm getting the impression that many
> > bioinformatics workflows are mostly linear, with obvious important
> > exceptions like conditions and loops. For example, I had a client
> > last week
> > who wanted to script:
> >
> > 1 blast [sequence=..., program=...] > blast.out
> > 2 get hits from blast.out > blast.hits
> > 3 find hits with 50-70% sequence identity from blast.hits >
> > blast.good_hits
> > 3 download/fastacmd sequences for IDs in blast.good_hits >
> hits.fasta
> > 4 clustalw hits.fasta > publishable_result (OK, not really)
>
More information about the BBB
mailing list