[Pipet Devel] Tools for parsing XML with Python

Mon Sep 27 08:47:08 EDT 1999

> When I wrote to the BioXML mailing list about an XML database, Guy Hulbert
gave
> this reply:
>
> > Because this fills your database with blobs, so why use a database at
all ?
> > You'd be better off, performance-wise, storing the XML docs in the file
system
> > and just use the database to manage the file store (I worked as a
sysadmin for
> > a product that did just this for scanned images).

This confuses me a bit. Are you saying that you don't need a DB, or that you
don't want to store the contents of the DB in a DB-specific manner?

( I believe that you wan't to be able to send queries to the DB such as "
give me all interfaces/objects/applications that can preform this operation
upon this kind of data "?
Isn't that a DB? )

> This causes me think about the operation of 'container loci'.  Recall that
they
> can contain other loci and act as a database.  Well, I'm not a database
expert,
> but I think we want a system that will manage loci as files on the
filesystem. 
> This is why: Loci will not be of any particular data format (as I've tried
to
> stress recently).  This will avoid any substantial 'import and
translation'
> function that will require the Loci system to (1) spend time and space on
large
> datasets and (2) lock Loci into a one-of-a-kind data format.  But it will
also
> give us a neat way of 'opening' loci: The 'container loci' can merely be
set to
> read from/write to a certain directory on the filesystem, and the
directories
> will serve to separate locus categories.  So, for example, the user can
put all
> GenBank docs for Dictyostelium under the directory

Hmm.
There is a Gnome project called the Gnome-DB ( you didn't expect that name,
did you? :)
http://www.chez.com/rmoya/gnome-db/index.html

and it has got a DB-engine that uses raw files. I have not examined it yet (
save that I have read their homepage ) so I don't know much about it, but it
might be worth a look.

<snip>
> And will serve as 'dead storage' for loci.  But if we really want to solve
the
> '2 terabyte document problem' (for genome analyses, as an example) that
Jim
> Freeman brought up to me a few weeks ago, we can't duplicate everything
that
> goes from dead storage to active use.  Therefore, loci (treated as files)
will
> have to either remain in place or be moved to another directory, and NOT
> duplicated.
>
> I'd like to get some feedback about this from you guys.

If you pass around CORBA objects, that "represents" the actual file, you
will not need to duplicate the file.
( GNOME::Stream is a very good candidate! :)

In either case, passing around a file "reference", instead of reading
directly from the file, seems to be the obvious solution.

// Liss
ps. GNOME::Storage / GNOME::Stream, dwells inside the bonobo package, in
case that you didn't find them.
There are a lot of goodies in that package, that might be of interest to
loci.