(continuation) Brad Chapman wrote: > > 2) Converters. This is based on Oct 15 discussions about how to convert > data between formats. My random thought was--why have a specific internal > format (ie. XML)? Instead, how about when a document comes in (in some > particular format) it is parsed and pushed into a relational database (see > point 3 for more on the storage component). This eliminates a need for an > internal data format (because, man, we do not need anymore formats to put > sequence data in!) and also allows direct querying of specific parts of a > format (ie. you can search only sequence data, or search only bib data, or > search only genbank id, or whatever). In this way, to read any particular > file type into Loci, you would need to write a plug-in that would insert > data into the database. To me, this seems analogous to the DBD/DBI > mechansisms that perl uses for database connectivity. If Loci worked with _one_ particular database to do this, it would be akin to having an internal format. We should have plug-in databases, which are what the 'Containers' are meant to be. I agree wholeheartedly with the idea of using databases as alternatives to data formats, but you have to consider (it's the same problem with an internal format) that some programmers won't agree on how good 'our preference' is, if we choose _one_ database. With every technology, you have both ardent supporters and staunch detractors; we just can't please everyone with only one option. > 3) Data storage. This goes back to discussions from Sept 22 (and a few > other places). I've been thinking about this from a very molecular biology > viewpoint (ie. DNA/protein data) and as I mentioned in point 2, I think > that a good way to store the data would be in a relational database. This > 1) eliminates the need to write a database 2) allows you to take advantage > of functionality already implemented in exisiting databases > (querying/sorting/dealing with the database (SQL)) 3)Would keep the > database used "independent" of the rest of loci. SQL is "fairly standard" > across databases (MySQL, PostgreSQL, Oracle, Sybase) so this would allow > users of loci to run it on their personal favorite database. Here again I > am thinking about how the DBI/DBD database connectivity works for perl. I agree. > I think for a LGPLed project, there are two choices of databases: > 1) MySQL (http://www.mysql.org) GPLs old versions (as Jeff mentioned > earlier). The new version costs for Microsoft users. 2) PostgreSQL > (http://www.postgresql.org) has a "do whatever you want with it" copywrite. > Both are very good from what I hear and have decent documentation to make > learning easier. I currently mess around with MySQL (I converted ArrayDB, > an NIH microarray storage/query program, to run with MySQL instead of > Sybase), but am by no means an expert, but from what I hear the basic > difference between MySQL and PostgreSQL are that MySQL is lighter and > faster, while PostgreSQL supports more functions and has better data > integrity. But like I said, I'm no expert! Loci is LGPL'd, so we can run proprietary databases too. There is some confusion in the license about what 'linking' means, and whether Loci would be considered the library or the program. I think we can interpret it pretty librally, providing the basic points of the (L)GPL are preserved: * Users can run Loci at no cost * Users can modify Loci's code * Users can redistribute Loci under the same license * Users can redistribute modified versions of Loci under the same license (an important distinction of the GPL) * The source code must remain accessible at no cost > 4) Other languages besides python. This isn't from the discussion, but from > my own personal interest--how easy is it to intergrate other languages with > python and the planned Loci implementation? Does everything have to run > through CORBA before it can interoperate or is there a clean way to, say, > use python and perl together? Since the Loci _core_ is a rather thin graphical shell with some networking capabilities, it probably isn't a sin to want the core in one language, even if it is Python (not a bad choice, I think). But for the extended Loci _system_, it is very important to have connectivity with other languages: particularly Perl and Java (favorites of bioinformaticists - for now). Most of the connectivity should be handled by CORBA. I really don't know of any other way to mix Python and Perl...unless we work through a C core, but why? What did you have in mind? > 5) Me. (Sorry, I'm definately not egocentric!) I would really like to see > Loci succeed and would like to try to get involved in some coding. I will > admit I'm not ready to code in Python yet since I know next to nothing (I > am working through the documentation, though!) but I'm really > interested/excited about the project and so I would like to try to jump in > (and hopfully swim!) and try to do some helping. In regards to this, I > would therefore like put the question forward to everyone about what I > could/should work on to start. Like I've mentioned, I'm definately no > expert on anything, but I can try to help as much as possible and do my > best. So what do you think? What should I hack? Hey, we've got tons of stuff to do. The only problem is I've been neglecting the compilation of a TODO or task list. I'm going to get right on it now, and I'll let people pick what they think is most interesting to them, and appropriate for their skills. > 6) Congrats to Jeff on his expert networking. I just wanted to say how > incredibly good Jeff is at getting the word out about Loci. I originally > heard about it and the TOL through his post on biopython. However, now he > is going for the big time--Loci and gnome! Good luck and excellent work! *blush* I gather everyone around for the big party but neglect to bring the food, drinks and music! > Well that is all my rambling for now. Hopefully I haven't been too far > behind everyone's thinking and if I have, please flame me gently... No open flames allowed in the Laboratory! Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+