[Pipet Devel] Databases/Languages (was New guy speaks up)

Tue Nov 30 22:39:10 EST 1999

Hello all! Thanks for the responses. Exciting to hear I'm not *completely*
out in left field on all of this. My thoughts:

J.W. Bizzaro wrote:
Gary Van Domselaar wrote:

1) Databases/Converters.

J.W. Bizzaro wrote:
>If Loci worked with _one_ particular database to do this, it would be akin to
>having an internal format.  We should have plug-in databases, which are what
>the 'Containers' are meant to be.  I agree wholeheartedly with the idea of
>using databases as alternatives to data formats, but you have to consider
>(it's the same problem with an internal format) that some programmers won't
>agree on how good 'our preference' is, if we choose _one_ database.  With
>every technology, you have both ardent supporters and staunch detractors; we
>just can't please everyone with only one option.

I agree completely. This is kind of what I was thinking of when I mentioned
how the Perl DBI/DBD stuff works. I took some time to look a little at the
Python drivers and it looks like the same type of thing is possible with
python, as Gary pointed out (Gary Van Domselaar wrote: >Python has
crazy-wicked bindings to  MySQL (and mSQL)...). Based on the work I did
migrating a Sybase/Oracle DB under perl to a MySQL DB under perl, the only
real changes were to the driver and to fix SQL that was not supported under
MySQL (which is a more lean DB and doesn't support things like VIEWS and
UNION). So I think it would be quite possible to write some simple widely
supported SQL and have a system which could plug into multiple databases. I
could test PostgreSQL, MySQL, and mSQL on my system and the Linux users out
there could provide testing for Oracle/Sybase to kind of ensure a database
independent system. Since it would be independent, it wouldn't have to be
distributed with Loci, and so we wouldn't even have to worry about LGPL
stuff. Yay, No lawyers coming round!

Gary Van Domselaar wrote:
> Loci as a Graphical Shell/ Graphical
>Scripting language with a database 'locus', but the actual database, and
>data model used to store the sequence data (and annotations) would be an
>'option' depending on what the developers have provided for loci.  so
>there may be a relational database, an object database etc.,

	Based on this thinking and the 'plug-in' idea mentioned so often,
why not implement a database as a loci? (maybe this could be some kind of
derivative of the container?). This way, the user can use or not use a
database depending on their work with Loci.
	For instance, if I am using Loci to pull single files from the PDB
and pipe them into RasMol for viewing, it would be really stupid to have an
intermediate step where I stick a single object into a database. By
contrast, if I am parsing the current UniGene text file (100MB), it  is
crazy to not have some structured way to store this. I mean, I would be
none too happy a user if I watched my computer spend an hour parsing a huge
document and another several BLASTing the results and then had the computer
crash losing all of my data. A 'plug-in' database loci could serve as the
storage for huge data files--allowing data backup and easy access to the
important parts of the data (sequences in this case). Is this the kind of
plan everyone was thinking of?

Gary Van Domselaar wrote:
>To put a sequence into a relational database, wouldn't you have to
>design a data model to place the sequence information into? at least
>tables like: Sequence, Accesssion, Features, Bibliography
>
>Then you would have a sequence data format based on a relational data
>model, no?  I'm no database exert either, so I'm asking out of sheer
>naievety here :-)

I agree completely. A database is still a data intermediate, but I think it
at least has the advantages of: 1) being readily storable 2) allowing
specific parts of the data to be individually queried. 3) being flexible
enough to allow a wide variety of data types to be stored without data loss.

Gary Van Domselaar wrote:
>Loci's own database requirements may not be so
>much for sequence storage as much as it is for things like the container
>locus, which is a queriable locus that contains other loci.

One thing I'm not clear about--how does all this relate to the idea of the
container and storing loci? I guess I'm not clear about exactly what it
means to store a converter loci, for instance? Even more confusing to me,
how do you query a container locus? How does this relate with storing
actual data? Confusion, confusion over here!

2) Other Languages.

Gary Van Domselaar wrote:
>That would mean two interpreters running together. I'm not sure if Jeff
>would be smiling at that idea.  Python can be extended with C libraries,
>and C can 'embed' Python.  Python can do just about anything Perl can
>do, and it can access precompiled binaries, so, is there a need for
>Perl?

I will agree that I would rather not mess around with two languages/two
interpreters and all of that jazz. No fun! However there are a number of
things that are implemented in perl currently that are not available in
python. For instance, the bioperl modules. Although the biopython project
is dealing with building the same functionalities in python they currently
have no code (and the list has been relatively silent!). In addition, a
number of excellent programmers are coding in perl and so there are a lot
of good scripts/code available. Should all perl scripts either: a) be run
through CORBA to be used with Loci? or b) have to be reimplemented in
python to be used with Loci? For example, how are we planning on connecting
to AceDB servers--rewriting AcePerl or running it through CORBA? I think
our disadvantage if we try and rewrite everything is that we can't take
full advantage of work done in other languages and have to work really hard
to keep the python implementation "up to date" with the perl. In addition,
there is an unhealthy competition between perl and python for programmer
time, regardless of which is a "better" language.

J.W. Bizzaro wrote:
>Since the Loci _core_ is a rather thin graphical shell with some networking
>capabilities, it probably isn't a sin to want the core in one language, even
>if it is Python (not a bad choice, I think).  But for the extended Loci
>_system_, it is very important to have connectivity with other languages:
>particularly Perl and Java (favorites of bioinformaticists - for now).  Most
>of the connectivity should be handled by CORBA.  I really don't know of any
>other way to mix Python and Perl...unless we work through a C core, but why?
>What did you have in mind?

I mention above the kind of things I had in mind. Specifically, where is
the point where it takes more effort to connect things with CORBA then it
does to rewrite them in python? What should be rewritten and what should be
connected? From what I've read, gnome development (which you all seem to be
wisely following closely!) seems to do a good job of making CORBA tie a lot
together, but I'm not completely positive how this all can translate to
Loci. Once again, confusion is overwhealming me!

Okay, I think I've confused myself enough for this e-mail! Again, thanks
for your responses and for making thing clearer for at least a little
while. Sorry to have muddied everything up again!

Brad