[Pipet Devel] Data Storage Interfaces

Wed Jun 23 18:35:29 EDT 1999

> > Definately! To take it to a further extreme, Loci should provide the xml
> > parsing and a SAX like interface to the data.  Then each loci doesn't  have
> > to implement an XML parser, only the SAX callbacks to handle the XML data.
> > Then loci could handle any XML data for which a locus is provided and it
> > could even handle data in the absence of a DTD. (The advantage of the SAX
> > interface over passing the whole tree, is that memory isn't wasted on
> > building parts of the tree that aren't needed.) I am not really familiar
> > with the SAX interface, but hints could be provided so that only elements
> > the locus wants are passed through the SAX callbacks.
>
> It sounds like we could be working on the same project.  Enough of Loci's data
> management model is still unimplemented and up in the air, that we could come up
> with something by working together on this.  So, we'll use DM's, if everyone
> likes the idea.  I'm flexible, and this isn't The Bizzaro Project :-) If we
> work together, it would just be more likely we'll have something successful.

I think we both had the same plans, anyway.

However, rather than just a SAX interface, I want DOM, too, and possibly
others. I think the DOM level 2 spec is shaping to handle what I want.

With SAX, a callback function is registered, and then the parser calls
this function with each tag of XML it finds. The callback function does
what it likes with the data. Instead, we could pass a certain filter, and
only those nodes would be returned, but then we're getting into DOM2.
Plus, DOM2 is supposed to eventually describe a query method, which will
be even better. DOM2 also describes ranges, which could be used to mark
off chunks of the data (data from a specific analysis, for instance).
DOM2 even describes notification events

Now, even if it's a DOM object, the whole thing does not have to be passed
over the network. Remember DOM is really just an interface description.
What's actually behind the interface is arbitrary.

Now what if we do this --

We describe a protocol to provide access to a DOM object over a network.
GNOME is (or may have already) doing this for CORBA. An IDL is provided
which describes the DOM interface. It doesn't matter what the object is, 
how big, or how it is stored on the other side of the CORBA interface,
and most importantly, the client does not have to transfer the whole
object across the ORB to use it. It just calls the functions (via the
ORB). We can do the same thing over TCP/IP, if necessary, too.

So, we have some kind of object storage, which has our _virtual_ DOM
interface over some kind of back-end (be it a big tree in memory,
that Lore XML database, AceDB, Oracle, or some kind of caching, 
seekable XML parser). These are the central repositories of objects.
Other things connect to them and take/put little pieces that
they want. A remote object could create it's own object representation
of the data, or just rely on the object store to hold it over the 
network.

A really smart one of these things would be obviously integrated 
with the Workshop components. Whenever some operation was "committed" 
beyond the specific tool in use, it would update this object. There 
might have been a notification signal sitting on that specific node, 
too, which would then cause some other tool to get a DOM notification 
(the details here are still sketchy, as the DOM2 spec is still an 
early draft). This would be the central hub of Loci communication.

Now, is this what you mean by DM (data manager, right?)?

This is kind of like the workflow server (or was it system) I 
described back in the early days, but after reading the DOM2 ideas, 
it's starting to take shape. Also note, the DOM2 stuff is just an 
interface. Consider the concept for now. We could just as easily 
do the same thing in a different interface we made up (but why?) 
and really, there's no reason why the virtual interface on each 
end has to be the same (as long as they speak the low-level
network protocol).

This could potentially allow for a little more distributed, roaming
path for the object. A client to a store could really be another
store itself (where a store is simply a program somewhere which 
presents a virtual interface to some data it has), taking bits of 
data from other places and presenting a new interface.

Justin