[Pipet Devel] workflow diagram data model and databases

Sun Jan 9 19:25:37 EST 2000

> 1. When workspace.py starts up (ie. when a new workspace is created):
> 	a. make a directory: loci-file/workxml/workspace#.
> 	The number will refer to the number of the workstation being opened.
> 	Everytime a composite locus is opened, a new directory should be
>  	created.
> 	b. copy the file baselocus.xml to the new directory. This will be the
> 	overall script for the whole container.
> 2. When a locus is added to the workspace:
> 	a. copy the file locustype.xml to loci-file/workxml/workspace#
> 	modify the name as locustype#.xml
> 	b. make a xml:link to the locus in baselocus.xml
> 3. When loci are connected:
> 	a. modify the two loci to indicate the connection.
> 	change the input xml:link of the input to point to the output xml file
> 	change the output xml:link of the output to point to the input xml file

this sounds about right wrt to what Jeff and I were discussing.

> 4. When info is added about a locus:
> (Note: this has only been sort of implemented for containers)
> 	a. load the DOM tree of the locus from its xml file
> 	b. make the modifications
> 	c. save the resulting xml file

I think this is the best 'open standard' way of doing it.
> 
> So basically, the overall plan is that every workspace gets a unique
> directory created containing baselocus.xml, an xml file with links to each
> of the loci in the workspace, and xml files for each loci in the workspace.

Or I would say that each workspace geta a unique database created conting a baselocux.xml, and xml files for each locus in the 
workspace.  Whether we plan to have an xml file containing the linking iformation as opposed to containing the linking 
information in each locus is a matter of debate.  I have no personal preference, In fact a separate xml file the locus 
_references_ and the connections (including connection type)  between them (genereated dynamically during the construction of a 
WFD) may be an interesting approach.

> 	So far, everything seems to work okay from my 4 points, except that
> point 4 has only been semi-implemented for containers. You can right
> (button 3) click on a container, and you will get an x window with options
> for the container. So far, the "set container" and "show contents" buttons
> work. When you click on "set container" you get a file-chooser dialog where
> you can select a directory for the container to hold. Then the program will
> load the xml file for this container, convert it into a DOM tree, add the
> container contents to it, and then save the xml file. The "show contents"
> button can then be used to retrieve these contents and display them as a
> tree.
> 	This is much like the ugly loci-file window did before, except that
> now things are done in DOM trees. Unfortunately, dealing with DOM trees
> also has led to a big slow-down in the time it takes to walk through a
> directory tree and write it as xml. To sort-of counteract this, the
> directory structure will only be parsed to a certain depth (currently it is
> set to something like 3). I'll try to think up speed-ups, but dealing with
> DOM trees slows things down. Sorry!

what if parsing depth was set to one, and subdirectories were retrieved dynamaically, (by user action folder.open) to reveal
the contents of that subdirectory.  The 'filesystem' get parsed together piecemeal as the user manually traverses it.  This 
aspect may becomem more important when the 'filesystem' resides over a network with  a slow pipe.  I submint that only what is 
necessay be undertaken.  Just my 2 cents (Cdn) ;-)

> 	Okay, whew, I think that's it. Let me get into the message!
> 
> >today Jeff and I agonized over different methods of storing descriptions
> >of the workspace in a database.  This led us to try and develop a data
> >model for the workflow diagram, which is no easy task.  the workspace has
> >elements of tree-based model,
> 
> Right, excellent point! When a workspace/composite loci is created within
> another workspace, the newly created workspace directory should be inside
> the previous workspace directory. I'll try to make my xml model do this.

jeff and I are thinking about data models as well.  This is a critical part of the Loci's design, so it is in everyone's best 
interests to participate on this subject.  Thanks for your your input, Brad.  It it much appreciated

> 
> >0.  The XML description of the WFD should be modular, but easily portable.
> 
> I think making it xml makes it intrinsically portable. Once you create an
> xml workspace, you can zip it up (or use an xml compression tool) and send
> it around to your hearts content.
> 
> >1. The WFD should be constructed from a number of smaller XML documents,
> >essentially one per locus in the WFD. If the WFD contains a composite
> >Locus, then that locus is itself a pointer to the xml documents contained
> >within it.
> 
> Right-o. I think I've done this with my baselocus.xml thing. Let me know
> your thoughts on whether this satisfies this condition.
>
sounds good to me.

> >A WFD then should be represented a single database (collection of files).
> >The DBMS should be able to manage multiple independent databases.
> 
> I think the directory structure that I currently have could be shoved into
> a  database in the following way:
> 
> directories		-> main databases
> xml files 		-> sub-databases within the main database
> info in xml files	-> the column/row info within the sub-database
> 
> >Connectivity between Loci must be preserved:  If you want to extract a
> >subset of loci from a WFD you must first disconnect thost loci from any
> >'external' loci, or you must extract the entire superset of connected
> >loci along with the selected subset. I hope that makes sense.
> 
> Okay. I've connected loci using the xml:link linking language. How does
> this sound? Once we get the ability to disconnect links working, I think it
> shouldn't be too hard to disconnect the xml:links.

that could be taken care of easily enough with a Loci database interface exception handling message, so again sound good to me.

> 
> >The DBMS should operate as client/server processes in order to
> >accommodate distributed processing requirements.
> 
> Do we want to have a DBMS as a client/server process separate from the Loci
> client/server stuff, or as a part of it?

For modularity, I would propose a separate client/server. I'm sure the XDBM database is a client/server system, no?
> 
> >The DBMS should be able to quickly provide an XML description of
> >information stored inside the database.
> 
> Okay, so we need xml to database and database to xml converters, right?
> 
Only if we decide on MySQL or postgreSQL (which is better for storing our data  model than MySQL I have learned).
> 
> >Essentially our options, as far as we can see are:
> >
> >1. Make our own custom database to store our workflow diagram.  This may
> >be easier than it sounds because the nature of our data storage needs are
> >so unique and specific that trying to write an interface to an existing
> >DBMS might be just as hard or harder that writing our own custom loci-db.
> >
> >2. Use the MySQL database with an XML->SQL->XML interface.  This would
> >require some thinking in order to derive a relational data model that can
> >accommodate the possibly quite complex Loci WFD.
> >
> >3.  Use the PostgreSQL Object-Relational database with an XML-SQL-XML
> >interface.  I'm not 'up' on how postgres differs from MySQL, but if it
> >can more naturally handle objects (loci) and the relationships between
> >them  (connections) than this may be a better choice that MySQL.  The
> >same considerations exist for creating an intelligent data model for the
> >WFD as in option 2.
> >
> >4. Use an XML database.
> 
> Okay, I'll take an early stand on this issue and go straight for point
> number 4, specifically using XDBM as our XML database (side note: did we
> come to any conclusions about whether we can safely use this?). My
> arguments for this:
> 
> 1. I think it will be *a lot* of work to write a database, or map xml into
> a relational database like MySQL/PostgreSQL.

I agree.

> 2. I have been looking at XDBM and I really think it does a lot of what we
> need (these points are taken from the xdbm documentation)
> 	a. Provides xdbm2xml and xml2xdbm converters.
> 	b. Stores the XML in a pre-parsed format so we don't need to go
> through entire XML files to find stuff.

Big Plus.

> 	c. You can load only parts of the XML file at a time.
> 	d. Allows you to stored linked lists (the xml:links, I assume)
> 	e. Will support DOM complient interfaces.

Good!

> 
> The disadvantages are that XDBM is brand new and probably still has a lot
> of bugs to work out. In addition, the "FreeDOM" interfaces which will
> supply the DOM complient interface is still under design/development, and
> will require a set of python bindings once they are available.

Well, we're no strangers to writing python bindings ;-)

i'm also leaning towards Brad's idea of using the XML database.  I'd like to see an 'all XML solution' for communication 
between our front-end and middleware, and possibly even backend-ware stuff.

BTW brad. Jeff and I looked at you latst work on representing filesystems as containers in Loci. Very nice Work. We looked at 
the code and it looks like you are using they python xmllib.  You may want to see if you get get the GNOME xml-parser to work 
for you instead.  it has a SAX interface http://www.megginson.com/SAX/index.html).  To accomodate the retrieval of XML from a 
distributed environment, the XML parser needs to be CORBA compliant (It can then pss the DOM over CORBA).  The GNOME 
xml-parser can do this:  Check out (http://xmlsoft.org/xml.html).

Regards

g.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gary Van Domselaar		gvd at redpoll.pharmacy.ualberta.ca
Faculty of Pharmacy 		Phone: (780) 492-4493
University of Alberta		FAX:   (780) 492-5305
Edmonton, Alberta, Canada