[Pipet Devel] infrastructure things

Sun Feb 28 03:44:27 EST 1999

Justin et al,

I'll get to what you wrote about infrastructure things in the next e-mail, but
first I'd like to make a few points.

You wrote an e-mail a couple months ago about how you think the workflow system
would function, from the point of view of an XML file created by the Benchtop,
monitored by the Benchtop/CGL, traveling to the Gatekeeper, and back.  But I
want to bring up some questions about the true mobility of the XML file.

Just how confusing would everything get if each locus got posession of either
(1) the one-and-only XML file or (2) just a copy.

Problem with case (1): What if the XML needs to be split for forked analyses?

E.g., the user has a sequence, gets an aligned sequence from a database, and now
wants to do something else with the new sequence.

What happens to the XML file?  Do we make a copy of the entire file (case 2!) to
be used with the new sequence, or do we cut the XML file in half...so to speak?

Problem with case (2): Will the information ever have to be sewn back together?

E.g., there is a fork in an analysis, as described for case (1).

Will we ever have to consider the whole analysis a single XML file, bringing all
pieces back together?  Or do we consider each fork/child to be a new analysis,
never to rejoined with its parent?

Another confusing point is the idea that the XML file actually moves.  I
referred to it once as a basketball that is passed between players, but everyone
should be comfortable with the fact that each file will remain where it was
created...AND THIS IS TRUE EVEN FOR SERVER-SIDE ANLYSES!

The way I see it, we have a Python program on the client machine that handles
all of the interactions with the Gatekeeper.  So, EACH LOCUS WON'T HAVE TO DEAL
DIRECTLY WITH THE GATEKEEPER!   They deal with "Porta Internet", which makes
everything transparent or seem like it is all on the client machine.  (The same
is true for Porta CORBA.)

Maybe instead of basketball players tossing a basketball around, the baskbetball
tosses the players around :-)

You wrote about how Benchtop/GCL "updates a local copy" of the XML.  I
personally think each locus should update the XML it is working with (the
"Locus-In-Charge" or LIC), by itself, so as not to overwhelm Benchtop.  (Realize
that there should be no limit on the number of loci/processes spawned for forked
analyses, so Benchtop would have to handle in some cases a lot of
communication...maybe hundreds of XML files...in a word, it would be a
"bottleneck".)  In the case of server-side analyses, going thru Porta Internet
and Gatekeeper, Gatekeeper should not use Benchtop to update the XML and take
the next step, rather I think it should be Porta Internet, the LIC.

Now what about those spawned loci/processes?  If Benchtop were the only LIC, all
spawned processes would be the first generation children of Benchtop.  But if
each locus were capable of spawning its own child, and that child capable of
spawning its own, the workload would just be much more distributed--each locus
would be an LIC.  One thing leads to another, if you recall that song by The
Fixx.

At this point, we need to answer the questions I proposed above.  I think if the
analysis needs to fork, the LIC should copy the XML, put relevant instructions
in each copy, spawn two loci for the task, handing the copies over.  (And maybe
at this point the parent can be closed.)  But the copies won't be automatically
sewn back togther at the end (we could have an option to combine XML's, as an
afterthought here).

But, in the way I think things should work, would those little drawings on the
Benchtop give the user an indication of what is going on, or what the progress
is?  You thought that this is how the Benchtop would operate, which is a very
good idea.  And we do need _some_ sort of communication for this.  So if we let
the LIC's handle everything, can each LIC just send a simple "hello" back to the
Benchtop?  If a new child is spawned, maybe the first thing that child does is
tell the Benchtop what it is and where it came from...Maybe this function could
also be used to build a database of loci available to the users...? 

In short, we are thinking of a highly distributed set of intelligent agents
existing all over.  Benchtop should be the user's eyes to the whole world of
Loci, not the brain of Loci.

Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro at bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--