[Pipet Devel] IRC - Overflow/Piper merging and XML files

Brad Chapman chapmanb at arches.uga.edu
Tue May 30 02:49:05 EDT 2000


Hello all;
	Jean-Marc and I were talking on IRC last night trying to get things
set up for merging Piper and Overflow, and also just generally discussing
other stuff about the definition layer and XML representing nodes/loci. I
thought some people might be interested, so here's the log.
Questions/comments are more than welcome, as always. Happy reading!

Brad

        jm: I was wondering, do you think the UI* class (after some added
features) could fit for the whole GUI (both Overflow and GMS)?
      brad: Yeah, that is exactly what I've been trying to figure out :-).
What I'm thinking about doing is adjusting the XML stuff I have now in the
dl to make it look more like the UI stuff, and then go from there. As I'm
redoing this communication I'm stealing a lot from Overflow, like using
parameters to represent everything "internal" to a node.
        jm: What do you mean by adjusting the XML. BTW, just to make sure
I'm not dead wrong, for me the UI* classes would be the equivalent of the
DL and the GUI* would be the GUI.
      brad: Well, the dl currently has it's own xml format that it stores
everything about the ui in. It is a lot different then Overflow's because
each node stores its info in a separate xml file, and the files are updated
constantly as the ui changes.
      brad: UI* and DL, GUI ad UI <- Right, they are approximately
equivalent :-). I think they each do some different things which I'd like
to merge together. For instance, one big thing that the dl does right now
is deal with the filesystem and with connecting to databases, which I need
for the stuff I want to do. Of course, Overflow
      brad: ... does a lot of things that the dl doesn't...
        jm: We need to be careful about the meaning of node. Is it node as
a class or node as an instance of the class. For instance, the "ls.n"
wrapper would be a node class, while the place you use it in a program
would be a node instance.
        jm: I don't see a reason to put each node instance in a separate
XML file, while putting each node class definition in a separate file makes
sense (except for the builtin node classes with are in .so)
      brad: Right, good point. What I mean is that each "node instance" in
a program has an individual XML file describing it.
      brad: The reason for this is that I see the node instance and class
definition as one in the same. The instance can just be modified by
changing parameters when the user is building a work-flow diagram.
      brad: So I guess I sort of see node instances and class definitions
as more similar than you do. But I'm not sure if this is the right way to
think about it...
        jm: I'm not sure we're talking about the same thing. Using the "C
function" analogy, the node class is the code (and prototype) for the
function, while the node instance is simply a function call.
        jm: Maybe one thing that (I think is different from Loci) is the
recursivity introduced by the "a Network is a Node"
      brad: Okay, but I see the XML file as being part of the code. So,
where you put the info about parameters, inputs and outputs into the code
describing a node, I would rather have this info in the XML file describing
a node. Then the code could get this information from the XML file. Does
that make any sense?
      brad: "a Network is a Node" <- We do have the same sort of thing in
Loci actually. The 'composites' (in Loci, the equivalent of Networks) are
also nodes. So they also have XML definitions. Is this what you mean by
recursivity?
        jm: Let's take the 'ls.n' example, this file is a "node class
definition" (I just made up the expression). The parameters it accepts are
defined from the "subnet_param"s you use, and the inputs and outputs are
defined from the net terminals.
        jm: Now, when you want to use the ls node, you bring up an instance
of it (with the new node menu). The system then identifies what parameters,
inputs and outputs are required. You can then connect it to other node
instances.
        jm: Are we talking about the same thing?
      brad: Okay, I'm with you.
      brad: I think we are talking about the same thing :-)
        jm: (the networks are node is what I meant by recursivity... I
didn't know Loci was doing it too)
      brad: Loci doing it <- Well, sort of but not really yet. We are not
really at that stage yet because we don't have plugged in programs to do
that with, so we never really got past the "base" types.
        jm: Basically, what goes between the <Node ...> and the </Node> is
just the equivalent of a function call: you specify the parameters you pass
to the function, but this function is defined elsewhere.
      brad: So the idea was discussed, but not yet implemented. Hopefully
for Piper we can swipe this from Overflow :-).
        jm: Is the "each node is defined in a XML file" the equivalent of
defining 'ls.n' in it's own file?
      brad: I see what you are saying. One issue to consider in piper is
that the dl doesn't have direct access to the "function" to get the
information about it like Overflow does (because of the added layers of
abstraction). This is why I want to put the info about what needs to be
passed to the function call into an xml file. Does that make sense?
      brad: each node like ls.n <- Yes, exactly. Except I am just extending
the concept to also include "base" (non-derived) nodes.
        jm: Just to make sure, that would be (for Overflow) to have one (or
many) XML file that keeps the data about builtin nodes so you don't need to
load the libs to get it?
      brad: Right, so you get the data without loading the libs, exactly :-).
        jm: I was thinking about doing that... that's one of the things I
wanted to discuss with you.
      brad: Cool :-).
      brad: IMO, we should just stick with doing it in Piper and kind of
focus on that, if you don't mind too much. Or do you want to add it to
Overflow "stand-alone" as well.
        jm: However, I'm not sure about the details: One big file, one file
per lib, one file per node? What do we do to make sure everything is
consistent? ...
        jm: I don't want to keep two separate ways of doing the same job,
let's try to find something that will fit for both Piper and stand-alone
Overflow
      brad: I guess it should start with one xml file per "function call"
where a "function call" is the smallest possible node that could be useful
to do something. Then other xml files can build from these. How does that
sound?
      brad: separate ways <- Good! I just didn't want to have to try and
remember two different ways :-).
        jm: You mean one "fnuction definition"?
      brad: Yeah function definition :-) Sorry.
        jm: Then a builtin node XML file would contain fields about "which
lib to load" and "what C++ class name to use"...
      brad: I guess it would probably need to.
      brad: Maybe I can write up a revised XML format based on what I sent
before and send it to you so we can think about this more. I can also write
up a couple of "example" XML files for different overflow nodes so we have
something concrete to think about. How does this sound?
        jm: I'm having a crazy idea... what if we tried to setup a quick
"overflow-only" version of piper, just a a base and proof of concept. It
would differ from the current overflow by a modified DL (what you've been
doing with UI*) and a new Python GUI. Is it feasable?
        jm: Show me the XML!
      brad: crazy idea <- That sounds great! That is sort of like what I
was proposing on the list, but a little more ambitious :-). I'm very much
for doing this. The only thing I would add is that we need to include the
bl layer, which for now is just an "extra" layer, but will eventually be
doing the scheduling of nodes, etc.
      brad: XML <- Okay :-). I'll work something up and send it on to you.
Then we can work that through more and get the exact syntax and everything
down.
        jm: My idea of the BL is very vague... but if you see how to do it,
sure. But the idea would be to have some working code ASAP, so we can add
features. I've found that as soon as there is something barely working, the
work goes a lot faster. That's been the case for Overflow.
      brad: I agree totally and that is definately what I want. Things are
very frustrating to do now because there is nothing actually "working."
Plus this way we can get rid of stand-alone Overflow and just have
Piper-based Overflow :-).
        jm: That's exactly what I was thinking about! The main reason I
want to keep "stand-alone" overflow alive is that I need it to do real work
(i.e. it is my only development environment for my master). The sooner
Piper is on a feature match, the sooner I can put more time on it.
        jm: ...just thinking of that. It we set up a node definition XML
file for builtin node, it should support nodes that have a variable number
of inputs. This is supported by the overflow libraries, but it can't be
used in Overflow now, because the UI* classes don't support it.
      brad: That sounds super, and I completely understand your reasons for
wanting stand-alone right now. Having more people working directly with
Piper would be great!
      brad: variable inputs <- Agreed. This way should have that since I'll
just have a <input> tag or something for inputs, so new ones can be added
or old ones removed.
        jm: Just an idea... Do you think it would be feasable to take
Overflow as it is now, and replace one part after the other (and add a BL
layer) until it becomes Piper?
      brad: bl <- You should probably talk some with Jarl and see if you
guys can work on calling Overflow functions from the bl. I only know
vaguely how the bl works and he might have a better idea how to do this
than I.
      brad: idea <- I think that is very possible. Let's start with the
basic nodes to make 'ls' work and print out, and then build from there.
How's that sound?
        jm: When I say variable inputs, I don't mean old and new inputs...
For example, the Mux node (class) can be instanciated with any number of
inputs it can have 10000 inputs if you like. If you try to connect a link
to input "Foo", this input become valid.
        jm: Well, instead of starting with ls, I'd start with the hundred
of already defined (and working) builtin nodes.
      brad: inputs <- Okay, that makes sense. I think the GUI should have
some kind of option to "add input" or "add output" and a new link choice
will appear in the UI and a new <input> tag will be added to the XML file.
Right now the DL uses XML in temporary storage to represent the GUI, so it
would be no big deal to add a new tag to the XML file.
        jm: What do you mean by "the DL uses XML in temporary storage to
represent the GUI"?
      brad: ls and built-in nodes <- Okay. I just thought 'ls' would be a
bit more of a challenge since it requires both built in nodes, and the
ability to construct these nodes to make a bigger node :-).
        jm: Sure, we can do ls *too*.
      brad: temp storage <- What I mean is that when a new node of a
particular "type" is created in the GUI, the dl copies the XML file
describing that type to a temproary storage location. Then as the GUI
modifies the node (by adding parameters, etc.) this XML copy gets modified,
and not the original. So the XML copy being modified has the same format as
the "permanent" XML file describing a type, only with GUI added info. Does
that make more sense?
        jm: This "something I always wanted to know but was afraid to ask":
when you want to distribute your job, to you want to run the same job (and
split the data) on multiple machines, or do you want to send different jobs
to different machines?
      brad: ls <- I just don't want to lose my lovely 'ls' example :-).
Seriously, you are right, this is definately in addition to the basic
types, which we should do first.
        jm: I'm lost with your temporary storage... when you say a new node
is created, do you refer to a new definition (class) or a new instance?
      brad: afraid to ask <- Different jobs. I don't know *anything* about
real parallel computing, so this is kind of a cheap and easy way to do it,
I hope.
      brad: temp storage <- I mean a new instance (thanks for giving me the
right words :-). So then the new instance is modified in temporary storage
while the definition/class remains unmodified.
        jm: My idea would be more to specify in the definition that a node
can handle variable inputs/outputs, but I don't see the need for the
temporary XML. The fact that we added an input to that node can be included
in the "program" XML file.
      brad: Well, another idea behind the temporary XML is that it gives
you immediate back-up in case of a crash, since it is a permanent
representation of the GUI at any point.
        jm: The thing I dislike is that you end up with lots and lots of
temporary files... plus that fact that I like the programs to be
"self-contained".
        jm: ...or am I completely off the track?
      brad: I agree about the temporary files, but I'm not exactly sure
what you mean by self contained. The dl classes are very similary to the
Overflow UI* classes, with the main difference being that they grab their
info from XML files instead of from in-memory class variables, or whatever.
      brad: temp files <- What I would like to do to deal with this is to
store the XML in some kind of database, but there aren't any very good XML
databases that do this kind of thing. So for now I've just stuck with
writing stuff to the filesystem...
      brad: temp files <- Also, these files are cleaned up when a program
shuts down normally, so you don't have to worry about a million temp files
building up to sap all your hard drive memory.
        jm: What I mean be self-contained, it that an "Overflow program"
can be completely defined by a single .n that you can export.
        jm: Then it means that all the info in the temp files has to be
somewhere else... where is it?
      brad: self-contained <- This can be done with the multiple files as
well. I use xml:link link files with other ones, and they could all be
inlined into a single file using this. I haven't gotten into this yet, but
this was the idea.
      brad: info in temp files <- What do you mean? I just add the info
directly into the XML file (using DOM to manipulate the XML).
        jm: Sorry, can you simlpify you last remark?
        jm: About the differences between the DL, That's minor, since I
already load node info from XML for "External" (ie. non-builtin) nodes.
      brad: simplify <- Do you mean the xml:link stuff or the dom stuff?
        jm: Both! (I don't know anything about either)
      brad: xml:link <- This isn't really anything more than just a
"standard" way for linking files in xml, like href in html or something. So
inside an xml file I have something like:
      brad: <my_link xml:link = "simple" href = "another_file.xml" />. Then
I can go through (with a dom iterator that moves through the xml file tag
by tag) and everytime I see a xml:link, I subsitute this with the
"another_file.xml" that I reference. So then everything gets substituted
into the original main file.
        jm: But how does that allow to remove the temp files with changing
the program?
      brad: dom <- This is the w3c's standard way to manipuate XML.
Basically, what you do is use a parser to parse XML into a tree-like
structure, where each tag or attribute or whatever is a node in the tree.
Then, since the XML is now in memory in the form of this dom tree, you can
add new nodes (tags) or move nodes around, and thus change the format of
the tree. Then you can save it back out as XML again. This is what I use to
manipulate the XML files in temporary storage.
      brad: remove temp files <- They aren't really removed (unless a node
is deleted), but are just modified, using DOM (document object model,
BTW...). The temporary files are only deleted when the nodes are deleted
normally. Am I making any sense? :-).
        jm: I think my IQ is going back to the carrot state... I guess
it'll be easier to see when you send me some sample XML... but still I
don't like the idea of temporary files when we could store the added inputs
in the main file (that's the only data that differs from the node
definition).
        jm: Didn't you say when you exit Piper, the temp files get removed,
but that no data is lost?
      brad: The big reason for using lots of files is that reading an xml
file into memory using dom is pretty slow, so it is better to use smaller
files to make each manipulation quicker. If you had one big file, it would
really be a mess.
        jm: Well, that's the whole point of using "sub networks'
      brad: no data is lost <- Well, the idea is that if you crash or
something, the temporary files won't get deleted. Then when the program
starts up, it will find these files which aren't supposed to be there, and
then can possibly resurrect the crashed system in the GUI. I haven't work
on this enough to actually realize it, but it is basically there...
      brad: sub-networks <- Agreed. I guess this is just a way to divide
the node storage to an even finer degree.
      brad: But maybe we should call it for now and I'll send you the XML
i'm thinking of and we can talk more later. Whadda you think?
      brad: It'll give me a chance to digest this all :-).
        jm: That's what I was about to suggest... If assimilated a lot of
info and now the processing pipe is full.
      brad: Agreed. I'll send this log to the list too so other people can
read it over if they want. Thanks for talking stuff through with me. I'm
really looking forward to having Overflow and Piper come together :-).






More information about the Pipet-Devel mailing list