[Pipet Devel] Getting rid of modification of xml definition files

Sat Jul 29 18:36:10 EDT 2000

Hello all;
    Jean-Marc and I have been having a lot of debate back and forth 
over the way the Piper should represent the desription of the workflow 
diagram. The way that Piper has worked up until this point is that it 
stores the description of each locus/node added to a network as a 
separate XML file in the filesystem (a *.def file). These files are 
stored in a heirarchial system, with each composite locus (network) 
having a directory with all of the xml files describing the children 
loci/nodes inside of it. Every time something is modified in the user 
interface, the xml file for representing the modified object gets 
changed to reflect this. Similary, when we need to retrieve 
informationa bout an object, we get this from the XML file.
    Recently in addition to this "permanent" storage layer, I added an 
"in-memory" storage layer on top of this. This was due to the fact 
that all of the file accesses were really slowing things down, and it 
was a *huge* speed improvement to add this "in-memory" layer on top. 
Even more recently (well, I'm still working on it right now :-) I've 
been merging this in-memory layer with the Overflow UI* library, and 
using this library as the in-memory layer.
    So anyways (you knew I would have to get to a point somewhere in 
here, didn't you :-), Jean-Marc has been suggesting that we get rid of 
the "permanent" storage layer and manipulation of the XML files, and 
instead use the Overflow '*.n' file format as the permanent storage 
format for Piper. When we hashed this back and forth we came up with a 
number of reasons to do this:

1. Speed improvement by not having to access the filesystem so much.
2. The "individual XML files for each locus" format will not scale 
well to large networks (Jean-Marc has some with *tons* of nodes).
3. The *.n file format is more compact and thus easier to transport 
between compouters.
4. The naming of the *.def files is ugly and hard to handle, 
especially when there are tons of nodes.
5. Storing these *.def files could get to be a beast, especially if 
you are working with tons of different networks at once.

On the other side, the best reason to retain the '*.def' file 
manipulation strategy is that if we start having tons and tons of 
options that could be set in *.def files (ie. having to do with XML 
descriptions of user interface components) this could make the *.n 
files get huge, bloated and hard to comprehend.
    After a lot of discussion, I guess I'm on the side of Jean-Marc 
(which is why I'm writing this mail, after all) and think it might be 
best to do away with the copying and manipulation of *.def files. 
These *.def files will still be used to define a node (as they are 
being used for right now in Overflow and Piper), but will just be like 
C header files -- descriptions. Internally, the dl will manipulate 
the information about the work flow diagram in the "in-memory" layer 
and use the *.n format for permanent storage. Since the dl runs as a 
separate process from the UI, crash recovery will still be possible 
since we can just save a *.n file as a crash file, and then re-load 
from that.
    So at any rate, this is a proposal to do away with the *.def 
file manipulation in the dl. What do people think about this? Are 
there any proponents of leaving it the old way? Other things that 
should be considered? Comments are very welcome!

Brad