[Pipet Devel] Databases/Languages (was New guy speaks up)

J.W. Bizzaro bizzaro at geoserve.net
Wed Dec 1 02:26:02 EST 1999

Brad Chapman wrote:
> Gary Van Domselaar wrote:
> > Loci as a Graphical Shell/ Graphical
> >Scripting language with a database 'locus', but the actual database, and
> >data model used to store the sequence data (and annotations) would be an
> >'option' depending on what the developers have provided for loci.  so
> >there may be a relational database, an object database etc.,
>         Based on this thinking and the 'plug-in' idea mentioned so often,
> why not implement a database as a loci? (maybe this could be some kind of
> derivative of the container?). This way, the user can use or not use a
> database depending on their work with Loci.

I think you said just what Gary did: a container is a locus that is a type of
a database.

>         For instance, if I am using Loci to pull single files from the PDB
> and pipe them into RasMol for viewing, it would be really stupid to have an
> intermediate step where I stick a single object into a database. By
> contrast, if I am parsing the current UniGene text file (100MB), it  is
> crazy to not have some structured way to store this. I mean, I would be
> none too happy a user if I watched my computer spend an hour parsing a huge
> document and another several BLASTing the results and then had the computer
> crash losing all of my data. A 'plug-in' database loci could serve as the
> storage for huge data files--allowing data backup and easy access to the
> important parts of the data (sequences in this case). Is this the kind of
> plan everyone was thinking of?

Yep, you got it!  The plan is, you can have any sort of intermediate between
data and processor, depending on the needs of the processor (and your needs,
if you're developing new extensions for Loci).

> I agree completely. A database is still a data intermediate, but I think it
> at least has the advantages of: 1) being readily storable 2) allowing
> specific parts of the data to be individually queried. 3) being flexible
> enough to allow a wide variety of data types to be stored without data loss.

Those are some good points we can use for our documentation.

> Gary Van Domselaar wrote:
> >Loci's own database requirements may not be so
> >much for sequence storage as much as it is for things like the container
> >locus, which is a queriable locus that contains other loci.
> One thing I'm not clear about--how does all this relate to the idea of the
> container and storing loci? I guess I'm not clear about exactly what it
> means to store a converter loci, for instance? Even more confusing to me,
> how do you query a container locus? How does this relate with storing
> actual data? Confusion, confusion over here!

You simply have to define the word 'database' rather broadly.  Literally
ANYTHING that can store information in a queriable fashion is a 'database',
for our purposes.  But we'll just use the word 'container' to keep the
language lawyers at bay.

Does a filesystem 'store information in a queriable fashion'?  Yes.  Now this
is where things get interesting: In Loci, you can open a container that
represents a filesystem directory.  Since loci (data, programs, etc.) are
individual files in a directory, they can be thought of as being stored in a
container.  Subdirectories are then container loci within container loci.  BUT
KEEP IN MIND: This is ONE type of container.  Not all container loci represent
filesystem directories.

> I will agree that I would rather not mess around with two languages/two
> interpreters and all of that jazz. No fun! However there are a number of
> things that are implemented in perl currently that are not available in
> python. For instance, the bioperl modules. Although the biopython project
> is dealing with building the same functionalities in python they currently
> have no code (and the list has been relatively silent!). In addition, a
> number of excellent programmers are coding in perl and so there are a lot
> of good scripts/code available. Should all perl scripts either: a) be run
> through CORBA to be used with Loci? or b) have to be reimplemented in
> python to be used with Loci?

This is something we can't expect for ANY program ported to Loci: to be
modified, whether it is by making it use CORBA or by translating it to
Python.  The only things that should require compliance with a Loci
specification for interoperability are (1) GUI widgets, (2) programs that need
direct access or control of Loci internals, (3) wrappers.  THESE will use
CORBA (especially if not written in Python) and/or Python.

> For example, how are we planning on connecting
> to AceDB servers--rewriting AcePerl or running it through CORBA? I think
> our disadvantage if we try and rewrite everything is that we can't take
> full advantage of work done in other languages and have to work really hard
> to keep the python implementation "up to date" with the perl. In addition,
> there is an unhealthy competition between perl and python for programmer
> time, regardless of which is a "better" language.

As I mention above, all this will be done via a large variety of wrapppers. 
But for the most part, if it runs on the command line, the Workspace will
allow the user to make his/her own wrappers.  See an earlier post to the list
about 'constructing the command line' or something like that.  It should be
part of the documentation.  Let me know if you can't find it in the archives.

> I mention above the kind of things I had in mind. Specifically, where is
> the point where it takes more effort to connect things with CORBA then it
> does to rewrite them in python? What should be rewritten and what should be
> connected? From what I've read, gnome development (which you all seem to be
> wisely following closely!) seems to do a good job of making CORBA tie a lot
> together, but I'm not completely positive how this all can translate to
> Loci. Once again, confusion is overwhealming me!

Just remember: CORBA and/or Python for

    (1) Widgets
    (2) Low-level customization of Loci
    (3) Wrappers

Gary described this as 'middleware', which is a good way to think of it.

    Front-endware: Loci's Workspace/GUI
    Middleware: CORBA and/or Python for points mentioned above
    Back-endware: Bioinformatics apps, unmodified

                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |

More information about the Pipet-Devel mailing list