Gary Van Domselaar wrote: > > I dont think this aspect of the loci core has been very thoroughly > addressed. Does anyone have any ideas on how we might implement data > storage for Loci? In early conversations, we realized the need to split the data into two basic types according to how they would be managed: (1) Data kept (as XML?) on the filesystem: mostly for storage; data are not being passed via the Loci system (2) Data kept as (CORBA) objects: data that are being passed via Loci Alan then proposed the concept of pluggable/modular 'data managers'. A DM would manage data of any specific type and is pretty much synonymous to what I have been calling a 'translator', which converts data from one format to another, plus the underlying infrastructure (what will actually be handled by CORBA). Here is an excerpt from Alan's e-mail on June 1 (this is in the archive; GST refers to General Sequence Toolkit): --------------- [snip] I am loosly defining the "Data Manager" or DM as the interface and backend or server side code for managing bioinformatics data of various types. Some of the design goals for the DM: 1) Allow for extendibility (ie DM plugins) 2) Simple, general, but sufficient interface 3) Minimize transfere of un-needed data 4) Allow for relocation of data 5) Allow for read only access 6) Enable wrappers for common non-xml, non-gst data sets (ie genbank) 7) Allow multi user access 8) The interface should not assume anything regarding the data except that it is in an XML format. 9) Enable network transparency 10) Simple and robust xreferencing 11) ??? In the most basic sense, GST would not have a DM but rather a DM interface to DM plug-ins. Some examples: 1) Genbank/Entrez DM would consist of a local plug-in that provide the program with read only access to NCBI's databases over the internet. The non-xml genbank entries would automatically be wrapped/converted into xml by the DM. 2) Intranet Oracle or AceDB database DM would consist of a local plug-in (as well as a server for the plug-in possibly). The plug-in would handle the network transparency as well as wrapping/converting the data to xml. 3) A file system DM would consist of a plugin for GST as well as a server for the plug-in. Transfere of data from the file system to GST would be handled by a socket b/w the plugin and the server. When a user starts up GST, if a filesystem DM for his/her personal GST directory is not running, one is automatically started. If the user is on another computer, they can still access their personal GST directory as long as the file system DM is running on the same computer as the personal GST directory tree. [snip] --------------- There is very little difference between what Alan is talking about here (except we are using CORBA and leaning away from making our own bio-XML), and in fact much of the most recent design for Loci comes from Alan's description of the GST. > I'm no database expert, so I'm a little hesitant to > suggest how we should go about it, but it does seem important to me that > loci should be able to store analysis results in a relational database > (Informax's VectorNTI uses a relational database to store its data). Hmmm. That is providing (1) there are numerous analysis results to be stored and 'related', and (2) the user needs to store the results this way. Are you suggesting this as an option or as a standard way of storing everything? > This > would facilitate the construction of customized, sharable databases. In > keeping with Loci's philosophy of not adopting specific data formats, it > seems to me that Loci should probably not adopt a single database, but > rather have the capability to interface with any of the popular > databases, such as oracle, sql, mysql, etc. PHP has this capability, and > is one of the biggest reasons why it is so successful. Right. I think that is exactly what Alan was suggesting with the data manager proposal. But again, I'm not sure everything has to be put in some database. What do you guys think? Cheers. Jeff -- +----------------------------+ | J.W. Bizzaro | | jeff at bioinformatics.org | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------+