Hello all! Thanks for the responses. Exciting to hear I'm not *completely* out in left field on all of this. My thoughts: J.W. Bizzaro wrote: Gary Van Domselaar wrote: 1) Databases/Converters. J.W. Bizzaro wrote: >If Loci worked with _one_ particular database to do this, it would be akin to >having an internal format. We should have plug-in databases, which are what >the 'Containers' are meant to be. I agree wholeheartedly with the idea of >using databases as alternatives to data formats, but you have to consider >(it's the same problem with an internal format) that some programmers won't >agree on how good 'our preference' is, if we choose _one_ database. With >every technology, you have both ardent supporters and staunch detractors; we >just can't please everyone with only one option. I agree completely. This is kind of what I was thinking of when I mentioned how the Perl DBI/DBD stuff works. I took some time to look a little at the Python drivers and it looks like the same type of thing is possible with python, as Gary pointed out (Gary Van Domselaar wrote: >Python has crazy-wicked bindings to MySQL (and mSQL)...). Based on the work I did migrating a Sybase/Oracle DB under perl to a MySQL DB under perl, the only real changes were to the driver and to fix SQL that was not supported under MySQL (which is a more lean DB and doesn't support things like VIEWS and UNION). So I think it would be quite possible to write some simple widely supported SQL and have a system which could plug into multiple databases. I could test PostgreSQL, MySQL, and mSQL on my system and the Linux users out there could provide testing for Oracle/Sybase to kind of ensure a database independent system. Since it would be independent, it wouldn't have to be distributed with Loci, and so we wouldn't even have to worry about LGPL stuff. Yay, No lawyers coming round! Gary Van Domselaar wrote: > Loci as a Graphical Shell/ Graphical >Scripting language with a database 'locus', but the actual database, and >data model used to store the sequence data (and annotations) would be an >'option' depending on what the developers have provided for loci. so >there may be a relational database, an object database etc., Based on this thinking and the 'plug-in' idea mentioned so often, why not implement a database as a loci? (maybe this could be some kind of derivative of the container?). This way, the user can use or not use a database depending on their work with Loci. For instance, if I am using Loci to pull single files from the PDB and pipe them into RasMol for viewing, it would be really stupid to have an intermediate step where I stick a single object into a database. By contrast, if I am parsing the current UniGene text file (100MB), it is crazy to not have some structured way to store this. I mean, I would be none too happy a user if I watched my computer spend an hour parsing a huge document and another several BLASTing the results and then had the computer crash losing all of my data. A 'plug-in' database loci could serve as the storage for huge data files--allowing data backup and easy access to the important parts of the data (sequences in this case). Is this the kind of plan everyone was thinking of? Gary Van Domselaar wrote: >To put a sequence into a relational database, wouldn't you have to >design a data model to place the sequence information into? at least >tables like: Sequence, Accesssion, Features, Bibliography > >Then you would have a sequence data format based on a relational data >model, no? I'm no database exert either, so I'm asking out of sheer >naievety here :-) I agree completely. A database is still a data intermediate, but I think it at least has the advantages of: 1) being readily storable 2) allowing specific parts of the data to be individually queried. 3) being flexible enough to allow a wide variety of data types to be stored without data loss. Gary Van Domselaar wrote: >Loci's own database requirements may not be so >much for sequence storage as much as it is for things like the container >locus, which is a queriable locus that contains other loci. One thing I'm not clear about--how does all this relate to the idea of the container and storing loci? I guess I'm not clear about exactly what it means to store a converter loci, for instance? Even more confusing to me, how do you query a container locus? How does this relate with storing actual data? Confusion, confusion over here! 2) Other Languages. Gary Van Domselaar wrote: >That would mean two interpreters running together. I'm not sure if Jeff >would be smiling at that idea. Python can be extended with C libraries, >and C can 'embed' Python. Python can do just about anything Perl can >do, and it can access precompiled binaries, so, is there a need for >Perl? I will agree that I would rather not mess around with two languages/two interpreters and all of that jazz. No fun! However there are a number of things that are implemented in perl currently that are not available in python. For instance, the bioperl modules. Although the biopython project is dealing with building the same functionalities in python they currently have no code (and the list has been relatively silent!). In addition, a number of excellent programmers are coding in perl and so there are a lot of good scripts/code available. Should all perl scripts either: a) be run through CORBA to be used with Loci? or b) have to be reimplemented in python to be used with Loci? For example, how are we planning on connecting to AceDB servers--rewriting AcePerl or running it through CORBA? I think our disadvantage if we try and rewrite everything is that we can't take full advantage of work done in other languages and have to work really hard to keep the python implementation "up to date" with the perl. In addition, there is an unhealthy competition between perl and python for programmer time, regardless of which is a "better" language. J.W. Bizzaro wrote: >Since the Loci _core_ is a rather thin graphical shell with some networking >capabilities, it probably isn't a sin to want the core in one language, even >if it is Python (not a bad choice, I think). But for the extended Loci >_system_, it is very important to have connectivity with other languages: >particularly Perl and Java (favorites of bioinformaticists - for now). Most >of the connectivity should be handled by CORBA. I really don't know of any >other way to mix Python and Perl...unless we work through a C core, but why? >What did you have in mind? I mention above the kind of things I had in mind. Specifically, where is the point where it takes more effort to connect things with CORBA then it does to rewrite them in python? What should be rewritten and what should be connected? From what I've read, gnome development (which you all seem to be wisely following closely!) seems to do a good job of making CORBA tie a lot together, but I'm not completely positive how this all can translate to Loci. Once again, confusion is overwhealming me! Okay, I think I've confused myself enough for this e-mail! Again, thanks for your responses and for making thing clearer for at least a little while. Sorry to have muddied everything up again! Brad