[Pipet Devel] and still more infrastructure things

Sun Feb 28 05:52:17 EST 1999

> Okay, so what you once considered the responsibility of the Benchtop/GCL, you
> now consider that of the wfs.

It's a separate process, but like an extension of the Benchtop/GCL, it
just handles all of the little details behind the scenes.

> So, I'll try to look at the XML as an object rather than a file this
> time.  And
> wfs launches the apps, not individual loci/clients.

I think I need a clarification on the meaning of a locus. My understanding
was a locus is term covering an instance of Porta/Gatekeeper/analysis
tool(s) on a computer somewhere. It's just a place where analysis is done,
and that's it. The wfs system worries about direction of the whole object.

> Of course we should make a sharp division at the start between data that is
> biological and data that is for the workflow system.  I even imagine the very
> top of the file/object to be all workflow stuff.

I agree. I had intended to make a generic C/BS/BioML2 format first. Then
this would be what's under the data sections, so LociML would just
encapsulate that portion of it.

As for the algorithm and statistics stuff, I was thinking of that as
something potentially useful to keep in with sequence/structure/relation
data. For instance, it could be useful to know a structure was derived
using some particular X-ray crystallography technique. That stuff is
related to Loci.

> > Control has to describe the analysis pathway.
> ...description of the whole pathway

Yeah. Just a XML version of the GCL view.

> > Status is information concerning the data returned at each analysis step.
> ...what was collected along the way

More specifically, how the collection went. Actual data would get stuck
back in a block under <data>.

> Nice.  But how will Paos handle this?  Are we looking at some major changes to
> Paos itself?

I don't think so. My intention was to have the wfs only send what that
specific analysis needed. Input, output, and status each have an attribute
on the object. The wfs sends input once, reads output once (and merges the
new data with the full object), and gets constant updates on the status
attribute. So whenever the analysis tool changes status, the wfs knows,
and the benchtop can be updates (assuming any are paying attention at the
moment).

> > Also, there's no particular reason there couldn't be multiple entries for
> > a stage.
> 
> stage == step?  Or I guess a step can contain different stages...

The stage, step, and order terminology I used in the example XML are all
bad and need to be changed, but the idea was just that multiple things
could be happening at once.

> Right.  That'd save time, but be difficult to manage.  Now we're talking about
> concurrency.
> 
> Hmmm.  Now are we dealing with the whole forking/sewing issue here?
> Once an XML
> object is split up, will it have to be put back together again?

Concerning the dependency scheduling, it wouldn't be difficult to manage
this from a central server, as I was envisioning the wfs. If an object
roamed independently, it would be difficult to manage, unless we had it
all of the threads regroup when data needed to be rejoined.

> I was thinking about keeping workflow data together.
> 
> Also, ID numbers could be longer and randomly generated.

Yes, it needs to be restructured. Many of the ID numbers would be assigned
by the GCL to XML query translator.

> > The wfs identifies queries it can currently run
> 
> How?  By the database of available loci/clients?

However GCL defines it. I imagine explicitly naming a server as one
option, or just specifying a type of analysis, where the wfs will use a
list of some kind to find one available.
But before it contacts the server, it has to make sure it has all of the
data available for its query (check dependencies).

> > giving it only the portions
> > of the xml file necessary for it to run (query and relevant data
> > sections).
> 
> Yeah, this is where I see Porta Internet or Gatekeeper filtering out
> stuff the
> server-side algorithms/databases don't need.

I had imagined the wfs server doing that, but I imagine are difference is
in semantics. Basically, the analysis tool just gets what it needs.

> Just work with Konrad on the markup of structure.

Ok Konrad, I'm interested in hearing your ideas on describing structures.

> I'm glad you think this will go quickly.  Are you able to work with
> Paos as it
> is, or will Carlos need to make changes? 

At the very least, I can pass blocks of XML through attributes on the paos
object. It would be interesting to see if the Paos object could be a
mirror of the XML, however.
So:
<status>
 <message>Ok</message>
</status>
Becomes:
paos_object.status.message = 'Ok'

But I can work without that.

> How comfortable are you with the
> Python?

I miss enclosed blocks, but otherwise I'm doing ok.
{
   whitespace   usage should 
      be random  . you can just  parse  around

 it.
}

What odd things amuse me at 5AM.

Justin Bradford
justin at ukans.edu