[Pipet Devel] Tools for parsing XML with Python

J.W. Bizzaro bizzaro at bc.edu
Mon Sep 27 08:08:58 EDT 1999


When I wrote to the BioXML mailing list about an XML database, Guy Hulbert gave
this reply:

> Because this fills your database with blobs, so why use a database at all ?
> You'd be better off, performance-wise, storing the XML docs in the file system
> and just use the database to manage the file store (I worked as a sysadmin for
> a product that did just this for scanned images).

This causes me think about the operation of 'container loci'.  Recall that they
can contain other loci and act as a database.  Well, I'm not a database expert,
but I think we want a system that will manage loci as files on the filesystem. 
This is why: Loci will not be of any particular data format (as I've tried to
stress recently).  This will avoid any substantial 'import and translation'
function that will require the Loci system to (1) spend time and space on large
datasets and (2) lock Loci into a one-of-a-kind data format.  But it will also
give us a neat way of 'opening' loci: The 'container loci' can merely be set to
read from/write to a certain directory on the filesystem, and the directories
will serve to separate locus categories.  So, for example, the user can put all
GenBank docs for Dictyostelium under the directory

    ~/loci/containers/Dictyostelium/

and then set one container locus to point to that directory.

So, the container locus will be a number of programs in one:

    (1) Standard locus
    (2) Database
    (3) File manager

And will serve as 'dead storage' for loci.  But if we really want to solve the
'2 terabyte document problem' (for genome analyses, as an example) that Jim
Freeman brought up to me a few weeks ago, we can't duplicate everything that
goes from dead storage to active use.  Therefore, loci (treated as files) will
have to either remain in place or be moved to another directory, and NOT
duplicated.

I'd like to get some feedback about this from you guys.


Cheers.
Jeff


Gary Van Domselaar wrote:
> 
> Hey kids,
> 
> I found this while looking for a GPL XML database manager:
> 
> Tools for parsing XML with Python:
> 
> http://www.stud.ifi.uio.no/~lmariusg/download/python/xml/index.html
> 
> It may be worth a look.  Still no sign of a GPL XML database manager...
> 
> Regards,
> 
> --gary


-- 
                         +----------------------------+
                         |        J.W. Bizzaro        |
                         |  jeff at bioinformatics.org   |
                         |                            |
                         |        THE OPEN LAB        |
                         | Open Source Bioinformatics |
                         |                            |
                         | http://bioinformatics.org/ |
                         +----------------------------+




More information about the Pipet-Devel mailing list