[Pipet Devel] New guy speaks up

Brad Chapman chapmanb at arches.uga.edu
Tue Nov 30 05:29:51 EST 1999

Hello brave Locians!
I know that no one has ever seen me here before, but Jeff has been kind
enough to provide me a cvs account and I've been doing a bit o' learning
from the code and docs there. In addition, I've been sorting through the
old archives and reading up on all of the discussion I've missed and also
trying to become more familiar with CORBA/GTK/bonobo/python. I meant to
hang back silently a while longer and try and learn more, but I've been
getting too excited about Loci and so I just felt the uncontrollable need
to speak up with what I've been thinking about. I apologize if I am too out
of touch still!
	First let me give you a little background on myself so you can have
an idea where I'm coming from. I'm a just starting off grad student in
botany at the U of Georgia and although my background is in molecular
biology/botany stuff, I've been directing my graduate thesis work towards
including computational work--both out of necessity and interest (well,
mostly interest!) and have been trying to springboard off of my limited
background in programming (C++) into the wide world of open source
programming. Whew, so that's me in one sentence! My interest in loci comes
from the fact that my work will include integrating a number of previously
existing programs and hopefully some distributed computing (so I don't have
to run all of my sequence alignments on my pieced together home computer
and wait 12 days for them to finish!). In addition, I love the ideas behind
Loci and want to see it succeed.
	So, because of all my interest I've been trying to get up to speed
and so I have a few questions/discussion items from the depths of the
mailing list archives:

1) First off, I really like the new core. I can connect dots together. Very
nice--excellent work, Jeff! I do get some seriously weird behavior (Weird
behavior 1) if I 1) add a document, 2) add a converter, 3) link them
together (a 0 from the document to a 0 from the converter) I get some weird
crazy stuff--a green 0/0 coming from the document which isn't connected to
anything and a green 0/0 from the converter which moves together with a red
0 from the document. Weird Behavior 2) If I am really stupid and place dots
from the same loci onto one point (ie. put a bunch of points from a
processor onto one connection) I lose the connection and get a couple of
huge green circles from the processor. Trippy!) These aren't complaints or
anything (it's brand new--who could complain!?!), just what I've noticed!

2) Converters. This is based on Oct 15 discussions about how to convert
data between formats. My random thought was--why have a specific internal
format (ie. XML)? Instead, how about when a document comes in (in some
particular format) it is parsed and pushed into a relational database (see
point 3 for more on the storage component). This eliminates a need for an
internal data format (because, man, we do not need anymore formats to put
sequence data in!) and also allows direct querying of specific parts of a
format (ie. you can search only sequence data, or search only bib data, or
search only genbank id, or whatever). In this way, to read any particular
file type into Loci, you would need to write a plug-in that would insert
data into the database. To me, this seems analogous to the DBD/DBI
mechansisms that perl uses for database connectivity.

3) Data storage. This goes back to discussions from Sept 22 (and a few
other places). I've been thinking about this from a very molecular biology
viewpoint (ie. DNA/protein data) and as I mentioned in point 2, I think
that a good way to store the data would be in a relational database. This
1) eliminates the need to write a database 2) allows you to take advantage
of functionality already implemented in exisiting databases
(querying/sorting/dealing with the database (SQL)) 3)Would keep the
database used "independent" of the rest of loci. SQL is "fairly standard"
across databases (MySQL, PostgreSQL, Oracle, Sybase) so this would allow
users of loci to run it on their personal favorite database. Here again I
am thinking about how the DBI/DBD database connectivity works for perl.
 	I think for a LGPLed project, there are two choices of databases:
1) MySQL (http://www.mysql.org) GPLs old versions (as Jeff mentioned
earlier). The new version costs for Microsoft users. 2) PostgreSQL
(http://www.postgresql.org) has a "do whatever you want with it" copywrite.
Both are very good from what I hear and have decent documentation to make
learning easier. I currently mess around with MySQL (I converted ArrayDB,
an NIH microarray storage/query program, to run with MySQL instead of
Sybase), but am by no means an expert, but from what I hear the basic
difference between MySQL and PostgreSQL are that MySQL is lighter and
faster, while PostgreSQL supports more functions and has better data
integrity. But like I said, I'm no expert!

4) Other languages besides python. This isn't from the discussion, but from
my own personal interest--how easy is it to intergrate other languages with
python and the planned Loci implementation? Does everything have to run
through CORBA before it can interoperate or is there a clean way to, say,
use python and perl together?

5) Me. (Sorry, I'm definately not egocentric!) I would really like to see
Loci succeed and would like to try to get involved in some coding. I will
admit I'm not ready to code in Python yet since I know next to nothing (I
am working through the documentation, though!) but I'm really
interested/excited about the project and so I would like to try to jump in
(and hopfully swim!) and try to do some helping. In regards to this, I
would therefore like put the question forward to everyone about what I
could/should work on to start. Like I've mentioned, I'm definately no
expert on anything, but I can try to help as much as possible and do my
best. So what do you think? What should I hack?

6) Congrats to Jeff on his expert networking. I just wanted to say how
incredibly good Jeff is at getting the word out about Loci. I originally
heard about it and the TOL through his post on biopython. However, now he
is going for the big time--Loci and gnome! Good luck and excellent work!

Well that is all my rambling for now. Hopefully I haven't been too far
behind everyone's thinking and if I have, please flame me gently...


More information about the Pipet-Devel mailing list