It's about time I got back to this :-) Gary Van Domselaar wrote: > > Like Humberto mentioned recently, pbmtools automatically converts > everything to an internal format, then reparses it out to the desired > format. NCBI does the same with their all-encompassing ASN.1 format. > So, in the data-independent Loci model, how would Loci's internal format > be implemented as a plug-in? Yes. Although an 'internal plug-in' is an oxymoron :-) I want (and I think everyone else here wants) as many aspects of Loci as possible to be plugged-in or modular. So instead of an 'internal format', I would call it a 'neutral conversion format'. And like any other data format, it is only present in Loci as long as the locuses are there to work with it. > Would the plug-in developer be responsible > to create a locus that converts all incoming data to an internal format, > like so: > > ______ > |Data|<----idl/api<----Request document > |Base| > | | > ______----->idl/api---->document--->conversion--------->processing-->result---storage(database, > to "loci" internal (file, etc). > format-then parse > to required format If the plug-in/locus developer is dealing with a data format new to Loci, they should provide SOME sort of converter. A converter that ouputs data in the neutral conversion format would be minimal. > If this is the model, its seems to me that a great deal of work would > befall on the plug-in developers, and the Loci framework itself would be > quite minimal (which is not a bad thing). I think the less that Loci comes standard with, the more malleable it will be, which is a Good Thing. > This begs the question, how will the loci plug-in to the Loci > architecture? What would Loci be, at this data-independent core? Locuses/loci should be able to (1) Communicate with the Workspace to send/receive GUI information (2) Communicate with a directory service or 'hub' establish the CORBA connections > I'm not an expert on network-object models, or data object models, or > databases for that matter, so these issues frighten and confuse me. I too am frightened and confused. > I'm > beginning to write up the Loci white-pages, so some enlightenment on > these issues would go a long way to help me write intellible stuff! I > have read a bit about AppLab (a Java-based command-line application > wrapper that runs throught CORBA). AppLab is very similar to our design, > although it is bioinfo-centric, as is NetGenics SYNERGY. How do we > decouple the nature of the data from the data-framework itself? As an example of how separate the data is from the data-framework, the data is kept as XML, and the data-frameowrk communicates via CORBA. These are very different models for data management and do not mix very well. All the CORBA system needs to know is that there is some text (XML) that needs to go somewhere. At least that's the way I see it. > > genbank - genbank (not needed) > > pdb - pdb (not needed) > > fasta - fasta (not needed) > > bsml - bsml (not needed) > > Ineresting. What scenario do you envision for this data 'passthru' > scheme? It's simple. We're just connecting output to input in every case. Data conversion means making 2 extra connections (adding a converter). If data conversion is not needed, which is true in the case I mentioned above, you just leave the converter out. If everything must be converted to an 'internal format', which is something I'm arguing against, then data conversion can never be left out. > A genbank doc could be connected to a genbank-readable > processor/widget/whatever without the need of passing thru a convertor, > therefore, no wasted conversion time or resources. Right. > Similarly, a > convertor could be constructed to realize that internal format > conversion is not necessary and simply relay the data (ie in dynamic > situations where the format of the incoming data, or the requirements of > the receiving locus are not known in advance). Comments anyone? IMO, you do not want to relay data when the format is unknown. I don't know, maybe it should just be saved. > What are you thinking? I recall from the bioobjects project (from the > bioinformatics journal .pdf that gotcirculated a while back) Shhh! > that the > biosequence data is abstracted into its basic types: raw sequence data, > internal id, Locus or Accession number, references (including > bibliographic, organism, etc), x-refs to other databases, and feature > information. these data structures are then assembled into an object and > stored in and object database for access via CORBA.... It's good to see someone reads the references I send them :-) The BioObjects project is certainly not data format independent. We could come up with a nice XML-based, cross-referenced data format for our neutral conversions. > > We have to ask ourselves this when thinking about the conversion process: > > > > How is Loci going to handle data from the Genome Projects, where an > > annotated file may be gigabytes to terabytes in size??? > > I'm stymied on this one... ;-) Well, we have to consider this. The trend in bioinformatics has been toward a great increase in the size of documents. Cheers. Jeff -- +----------------------------+ | J.W. Bizzaro | | jeff at bioinformatics.org | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------+