[Pipet Devel] WHAX and Loci storage ideas

Tue Dec 7 10:44:55 EST 1999

Brad Chapman wrote:
> 
> >This is my understanding as well, although the WFD will be constructed
> >via a graphical shell, which has a 'thin interface' to the middleware.
> >When you say 'constructing the command-lines', do you mean 'generating
> >the interface to the middleware'?
> 
> What I think this refers to is generating a command-line for a program by
> using a GUI to input all of the switches. For instance, if I were using
> program foo that used a -l switch to specify a log file, I would use the
> Loci interface to generate the equivalent of 'foo -l /var/mylogfile.'

That's exactly right, and applies pretty much to generating commands for
command-line applications.  I would like as much as possible for other
interface constructions to work in a similar fashion.  The idea is this:

    LOCI IS IT'S OWN SOFTWARE DEVELOPMENT KIT (SDK).

If you think about it, most programming and GUI building paradigms use tree
and workflow models.  If we can carry these over to Loci, you have a capable
and flexible development environment TOO!

> My
> thinking was that 'the interface to the middleware' would be worked out
> during the programming of the plug-in to work with Loci. For instance, to
> get Loci to use my sequence viewer program, I would have to tell it by
> writing the plug-in:
> 
> 1. What kind of file the program needs (ie. PDB, FASTA, etc)
> 2. How to work the program (ie. the command line stuff: the switches it
> takes, etc)
> 
> Loci would then take this info and have a GUI for 'constructing the command
> line' (getting the switches set up) and do error checking do make sure the
> user supplies the right file for the program.
> At least, this is my current understanding of how stuff would work

It sounds about right to me.  Later, we'll need some people thinking about how
to add these features to the Loci 'SDK'.

> I really like the idea of piping! You (and Jeff) are right, there is no
> reason to stick stuff in a database if you could just pipe it around.
> However, I have a couple of practical questions for using a piping approach
> like this:
> 
> 1. If you have data from a number of sources in a bunch of different
> formats, how would you get them together to pipe them into a program that
> would require them all in one text document in, say, FASTA format? Would
> you have to run each of them through a converter to get them in a common
> format, then pipe them all into a processor that would stick them into a
> single file?

I think you hit the nail on the head.

> 2. Conversely, what if you had a huge document and wanted to break it up
> into smaller documents? For example, what if you had a swiss-prot file and
> wanted to get just the protein sequences for all Zea mays (corn)
> accessions--how would this be done?

You'd need a processor (or database query) to do this.  It'd be better to have
a more general-purpose processor (can handle extracting all sorts of data)
than a special purpose one.  And (if we make our own) the processor should
work from one 'good' data format, leaving translation from swiss-prot to a
converter locus.  Let's say the 'good' data format is 'HumbertoXML' ;-)

    swiss-prot         swiss-prot          breaker-   ---->  Zea mays
      document  ---->  to HXML      ---->    upper    ---->  sequences
                       converter                      ---->  in HXML

> 3. How could individual parts of the data be queried or reordered? For
> instance, if I wanted to separate all sequences with a particular motif out
> of a file and then reorder them by organism.

If this stuff was databased first, you could use a more sophisticated query
system than above.  So, you may want to pipe your data into a database to
start.

> 4. What about doing things like generating GUIs on the fly, as Jeff talked
> about  in the 'constructing the command line' mail? He mentioned getting a
> pyGTK GUI directly from a Glade output XML document in this case, but
> similary, what if we wanted to put the output into a web browser? Would we
> convert the file to XML, then process it into HTML/GladeXML and then output
> it?

Web output of Loci interfaces is a tricky problem, and the whole Web interface
project is the biggest sub-project to Loci.  I can think of some ways to make
simple and limited Web interfaces, but just like you cannot get MS Word to run
via HTML browser, many Loci interfaces cannot not be run this way.  This is
why people made Java applets, etc.

What I am hoping to be able to do is convert diagrams or illustrations (for
example, protein motifs) made by Loci into JPG's for Web display.  I'm trying
to be realistic about this part of Loci.

> These are just a few concerns I thought up for discussion regarding the
> piping system you described. I really like the idea, and think it would be
> a more straightforward to do, but my only concern is how well it would
> scale as operations got more complicated. I guess I have been thinking of
> Loci more as a graphical scripting language, which I imagine having a lot
> more options then just a redirection shell.

Alright, as far as scripting languages are concerned, Loci is very limited. 
But I'd like to think of it as being 'high-level' or a '4GL' (fourth
generation language).  And I think that keeping Loci agnostic of data type
does not deminish its capabilities.  How can one turn a redirection shell into
a scripting language? As long as we're looking at bash as an analogy, we can
consider SHELL SCRIPTING, which is really just a more structured command-line.

> 2. Middleware--2 storage options:
> a. Provide option for XML storage of an "internal XML format." If a user
> has a need for more complicated data-handling (as I described in my
> questions above), they can utilize this option to place things in an
> internal XML database and then use the XML warehouse kind of stuff I
> described in point 3 in my last e-mail.
> b. Provide an option for permanent storage with relational databases (ie.
> MySQL, PostgreSQL, Sybase ...), so that the data can be available after
> Loci has quit.
> 
> The middleware would handle the connections between the Loci front-end,
> which asks for a database or internal format, and the back-end, which
> provides it.

I think you're suggesting a generic XML database as an 'internal database',
which can handle any processable data that are marked up.  I like the idea,
providing it is very generic.

But included in this list of middleware should be the mechanism (database?)
for knowing what locus is connected to what...basically handling all of the
workflow data...and a parser/interpreter  You mentioned this before.

> 3. Back-end: All of the databases themselves.

All the programs, data, converters...everything called a 'locus'.

> If this sounds like a plan, then I would like to humbly propose an
> immediate development focus: Get the piping stuff working with the Loci
> front-end so that we can do something like the following: 1. Input a
> sequence in FASTA format 2. Convert it to a new format 3. View it in a
> sequence viewer. This type of activity would not require any storage
> options, so this would simplify things. In addition, Jeff has the GUI
> set-up to make the connections, so we are currently able to construct this
> kind of workflow diagram. I think reaching this kind of short term goal
> would be extremely exciting as Loci would actually "do" something and would
> provide us with a base for further development. How does this sound? Anyone
> for this? Hip-hip-hooray? Booooo? Whatta you think?

As a focus or goal, this sounds good.  It doesn't say how we'll get there. 
But I never mentioned what the simplest senario for running Loci should be.

> Also, I hope I don't step on any toes by making
> a development direction suggestion. I just want to get an idea of the short
> and long term goals of Loci and kind of find my place somewhere in there so
> I can have Loci working for my thesis project needs.

No problem.  I still owe you a TODO list.  I worked on one, and I will pass it
to Gary for some comments before making it official.  If anyone else wants to
see the unofficial version so that they can comment on it, mail me directly.

Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+