[Pipet Devel] DOM speed/memory use

Thu Jun 17 14:21:40 EDT 1999

> I've used boulder to do some work on genetic mapping, and found it to be a 
> little slow, but I'm not real good at OO perl, and may have been doing 
> something dumb. The recent discussion on DOM makes me think it's a problem 
> with the document-as-object model, however.

I think the problem with existing DOM implementations in Perl and Python
is that they are using full objects for each node. DOM describes an
interface, and the underlying structure can be implemented in simpler, and
far more lightweight, ways than a full object.

If we implement a DOM level 2-ish interface (or something with similar
functionality) it can request certain subsection of the whole document. We
can probably just generate a skeleton of the XML document in memory, with
indices into the XML file. Then we just load the parts the client wants.
We could probably do a caching implementation, too. All of this would
still be transparent, however.

Also, the whole concept of a client wanting all of the "foo" in "bar", or
whatever, is simplified this way, too. The client doesn't have to know
anything about the "encapsulating" XML structure. It just has to know the
name of the substructure it wants (we'll probably have to use namespaces
to prevent conflicts).

XQL or some other XML query language would make it even more robust.

Fundamentally, I think it might be a good idea to put a lot of power and
functionality into this "Loci Object", because it makes things simpler for
all of the other clients/tools. I think it would also reduce the amount of
code, since without it, every tool will have to duplicate the ability to
parse XML and somehow identify, extract, and internally represent the
specific data it wants.

This could be a rather vital component of the whole system, and it might
be necessary to write this part in C and provide Python bindings.
Definitely not right away, but if speed and memory use become problematic,
it can be moved to C code, with no effect on the interface.

Justin