[BiO BB] Linear Bioinformatics workflow?
cdwan at bioteam.net
Tue Oct 4 10:40:48 EDT 2005
There is another major category of workflow that I've seen:
"I have a [protein sequence | genomic region | compound | mass spec
output] and I want to find out everything in the entire world about
it. Ideally, all the result values would be converted to a common
vocabulary, format, and normalization. I would rather not go to
every website in the world, or even know about all those websites.
Can you help me?"
This is a "wide" rather than a "deep" process.
As to your original question - my personal opinion is that interface
design is really, really hard, and that if someone were going to come
up with a good, generic way to put that sort of power in the hands of
non-programmer types, it would have happened by now.
That said, if you narrow the problem enough that it doesn't have to
do everything in the world, things get a lot simpler. Each of the
tools that people have mentioned have their strengths and
weaknesses. None will solve every problem.
I'm not aware of a really killer solution for your specific use case:
- let users explore a limited set of tools, and dynamically build
up a protocol
- save that protocol in a personal workspace for future (personal)
re-use and possible sharing
- but keep it totally limited and simple so as not to intimidate
- Plus flexible enough to handle large-ish batches of data
Most of the commercial and free workflow engines will do this, but it
sounds like the overhead of learning to use them is a bit much for
Amir Karger wrote:
> Several people mentioned 2-D graphical workflow tool in a
> workflow?" thread on bioclusters. (I'm redirecting my non-cluster-y
> here.) While still a newbie, I'm getting the impression that many
> bioinformatics workflows are mostly linear, with obvious important
> exceptions like conditions and loops. For example, I had a client
> last week
> who wanted to script:
> 1 blast [sequence=..., program=...] > blast.out
> 2 get hits from blast.out > blast.hits
> 3 find hits with 50-70% sequence identity from blast.hits >
> 3 download/fastacmd sequences for IDs in blast.good_hits > hits.fasta
> 4 clustalw hits.fasta > publishable_result (OK, not really)
More information about the BBB