[Pipet Devel] Data Storage Interfaces

J.W. Bizzaro bizzaro at bc.edu
Wed Jun 23 01:28:38 EDT 1999

Ah, I finally get to this message.

"Alan J. Williams" wrote:
> Jeff wrote:
> > (2) GST requires Oracle DB; Loci uses PAOS and a simple DB
> I guess I didn't present this clear enough.  My thought for GST was to
> define an interface for data retrieval and storage and then provide
> several standard data manager plug-ins that implement this interface.

That sounds like a good idea to me.

> Iinitially it would probably be a local filesystem plug-in and a db
> plug-in [either mysql or acedb].)

Yeah, I'd go with a free database.  There is also MSQL and PostgreSQL (and
BerkeleyDB?).  I'm not sure about AceDB, but PostgreSQL has a BSD license; MSQL
and MySQL are not free for commercial use.  I was just thinking that Oracle
would be a huge prerequisite for a program.

> In many ways, it sounds simmilar to what
> is being talked about for Loci.  I guess the only difference is that the
> GST approach would be limited to local plug-in based access (However, a
> plug-in that utilized CORBA could be implemented). The advantage from what
> I can tell of using a local plug-in approach as opposed to a PAOS approach
> is that you are not tied to a specific transfere technology, you are only
> limited in that the endpoint/startpoint from the client side must be a
> plug-in that implements the DM interface.

Hmmm.  I think I get it now.  So, DM is almost an abstraction layer for
plug-ins, an "interface" or API?

> > And the GST DM is something like Loci's server/daemon, "Locid".
> Sort of (I am probably not being careful about consistently using the term
> DM). So the approach would be:
>     1) A well defined DM interface (which opperates on the client side)
>     2) DM Plug-in(s) which implement the DM Interface (These also opperate
>        on the client side)
>     3) Depending on the DM Plug-in, various backend or middle end
>        "servers" may need to be implemented.
> So with Loci, it sounds like the middle end and backend "servers"
> technologies are fixed, where with a plug-in approach only the plug-in
> interface in fixed.

Well, I'm thinking of something that any locus can talk to at any time and that
can provide a seamless interface across networks.  I'm not sure how this would
work is the deamon is really deamons, and some might be there and some might not
be.  It seems like you would still need some central manager to keep track of
what is what, which is really what Locid would do.

> > In Loci, a single XML document "travels" a workpath, so everything is
> > done serially (within one path). The document will collect various XML's
> So do you mean that the original data, and analysis results, etc will all
> accumulate in one xml document? This doesn't sound like a good idea, but I
> am probably misunderstanding what you are saying. Why might this be a bad
> idea (I am thinking as I write, so don't flame me too hard ;o):

Flame flame flame.  I guess the document "conceptually" travels the workpath,
but not really.  I've been working with concepts mostly.  In reality, there may
be better approaches.

>    1) Complicates data locking in a multi-user model
>    2) Increases server load by forcing parsing of un-needed data (See recent
>       post from bioperl with comment on server side XML parsing). If
>       parsing isn't done on the server side, then you have the issue of
>       having to transfere all that combined data.
> It sounds better to me to just implement a robust cross referencing
> mechanism, assume each data object to be just one "item" (ie blast
> results, or a sequence, or a restriction map). Then let the backend server
> store the data as it sees fit (ie as one huge flat file, as individual
> database entries, as individual files, ...)

I agree.  This is probably what we'll do.

> > But as far as Loci is concerned, can we make it so that the XML types
> > (DTD's) are not hard-coded into Loci?  What if each locus were
> > responsible for finding its own XML parser/translator?  That would
> > pretty much make Loci a general purpose command-line wrapper.
> Definately! To take it to a further extreme, Loci should provide the xml
> parsing and a SAX like interface to the data.  Then each loci doesn't have
> to implement an XML parser, only the SAX callbacks to handle the XML data.
> Then loci could handle any XML data for which a locus is provided and it
> could even handle data in the absence of a DTD. (The advantage of the SAX
> interface over passing the whole tree, is that memory isn't wasted on
> building parts of the tree that aren't needed.) I am not really familiar
> with the SAX interface, but hints could be provided so that only elements
> the locus wants are passed through the SAX callbacks.

We agree again :-)

> Justin's comments are basically what I was thinking.  Rather than passing
> a program all the data, just pass it a reference (maybe an XLink/XPointer)
> to the input data. Under the plug-in model, plug-ins for networked data
> could implement a cache mechanism to speed up access.

It sounds like we could be working on the same project.  Enough of Loci's data
management model is still unimplemented and up in the air, that we could come up
with something by working together on this.  So, we'll use DM's, if everyone
likes the idea.  I'm flexible, and this isn't The Bizzaro Project :-)  If we
work together, it would just be more likely we'll have something successful.

We spoke before about licensing issues, and the problem was that if Loci uses
the GPL, it could not be extended with something that has your own license (APL
- Alan's Public License :-).  But if Loci uses the LGPL (Lesser GPL), which we
could if we don't include straight GPL parts, you could extend Loci with your
own stuff and use whatever license you want.  The LGPL is the same license that
the Gtk+ toolkit uses, and you're using that already.

Personally, I'd love to see Loci extended and find its way into all sorts of
places and uses.

J.W. Bizzaro                  mailto:bizzaro at bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

More information about the Pipet-Devel mailing list