[Pipet Devel] BioML vs BSML

J.W. Bizzaro bizzaro at bc.edu
Tue Jan 26 20:04:11 EST 1999


Justin Bradford wrote:

> Does BSML not fulfill all of the requirements Loci needs?

The Bioinformatic "Sequence" ML is pretty much for just that.  Although they
claim you can embed a PDB (Proten Data Bank) file inside of BSML.  But Konrad is
not a fan of PDB either.

> I'm guessing so, since CML was also planned.
> If so, what's missing?

BSML is missing any decent description of structure, and CML is missing an
acceptable description of structure for molecules larger than what organic
chemists deal with.

We actually can ignore the small chemical descriptions for Loci.  If we just had
something that was as good with sequences as BSML and as good with large
molecule structure as CML is with small molecule structure.

> A visualization program is going to have to know the format of the data it
> gets back from the analysis program (obviously), so the XML translation
> wrappers will have to be consistent. Now, we could use two different
> languages, but a viewer may want data from two different tools, each with
> a different ML (markup language).

How about making our own XML?  I think having four XML's has already diluted the
field so that we can't complain about our XML being a proprietary format.  I
think Justin and Konrad could coordinate this effort, and the others can offer
input on sequence representations.  Really, we can get much of the sequence part
from what we like about BSML and BioML.

This may actually be necessary if we are to embed queries and commands into the
documents.

Konrad, you thought we might want to do this back when we had only three people
involved.  Maybe we can call it "LocusML" or "Bio-Object ML" (BOML) or
"Bio-Macromolecule ML" (BMML).

Give me some feedback.

> Also, we'll be wanting to chain several tools together, which is going to
> require tools taking input data from a ML, right?

Yep.

> But we also want control information tagging along with the object? And
> that would also be XML data?

Yepper.

> Furthermore, I'd like it if this thing could query/update databases, too
> (ie, a glyph for submitting my new protein structure to Brookhaven, or get
> the sequence for some gene out of the GDB, etc.)

You mean have a Loci _tool_ for this?  You're not talking about XML here.

> Now let me see if I understand the system so far.
> Paos is the network transport layer. But which end does the server run on?
> Jeff made a comment earlier implying the Paos server runs on the user's
> machine.

I believe we can have multiple Paos servers.  Exactly where they go, I'm not
sure.

BTW, Carlos wrote in some detail about Paos and Loci in his e-mail messages from
Monday.

> One client is the GCL/viewer/monitor and one is on the actual
> machine running the analysis tool. But how would a connection be made to
> between the server and the analysis client? Doesn't the Paos server have
> to be on the analysis end?

(I'm sorry about using the word "client" to describe the user's machine.  Of
course it also describes a program that communicates with a server.  When I say
client, I mean local machine.)

Yes, I think Paos can reside on both the server and client.  Carlos will have
some documentation for us that can clear things up, and I think there is a
README at the Paos Web site.

@@@
> Also, a workflow/batch control system is in charge of directing the
> movements of the object (via Paos). In case of failure, the Paos object is
> updated with some exception, and the workflow system is notified and deals
> with it appropriately.

Yes sir!

> Throughout this process, the workflow system is also updating the Paos
> object with current status

The XML object can be changed, yes.

> and the anaylisis programs update the object
> (or create new ones?), which the monitor client is displaying for the
> user.

Yes, the GCL glyph, which can open a window to show current status.

> When complete, the visualization/viewer program is notified, takes
> the Paos object and renders it for the user.

Right!

> Am I close?

Oh ya!

> If so, it makes sense to use the Paos object to store control, exception,
> and status info. Data for anaylsis and analyzed data are stored in
> separate attributes.

Yes.  These are complications that may require us to write our own XML.

> The gatekeeper takes the data from the appropriate
> attribute (as told by relevant control information), modifies it as
> necessary for the analysis tool, and runs that tool.

Now we are back to analyzing the XML data (Paos object), back up to where I
typed @@@.  These are not two types of analyses.  The gatekeeper will work with
the workflow system, etc.

> Output is then committed to the Paos object (after conversion to the
> appropriate XML dialect by the gatekeeper), and the workflow system
> decides what to do next (depending on control info), until eventually, it
> is handed back to the user's client.

Yes!  I think you know just what I've been thinking.

> In this model, the workflow system is a Paos server/client combo. It
> would get the original object from the user, hand that to an analysis
> server, but keep a local copy updated, which the user (status monitor)
> would access for updates.

I'm not sure about keeping a local copy of the data.  You say that the data
would updated, which would require the whole XML object to be transferred many
times.  I was thinking only once at the end, but the analysis locus could just
keep reporting what is being done...like writing a log file.

> ...and then repeat the whole process (ie.
> give the object to the next analysis server, ...)

Yes, when GCL is used to automate some analyses.

> All the user client stuff access the workflow system directly, which deals
> with the individual analysis servers. This runs as a separate process, so
> you might have a server running this. The client starts up his Loci
> GCL program on a networked computer anywhere, builds the analysis batch,
> starts it, gets an ID number, and can close the program and walk away.

I never thought of that, but it's a great idea!

> Then from any other computer with Loci (or via the web when that interface
> is done), enters the batch ID, and can see everything that has happened to
> it so far along with it's current status.

Hmmm.  Turning the client off and getting the data from another client, means
the server needs to know the original client is off and that the information
should be held until the ID is provided.  I think it'll work.  The server may
keep a copy on file for a time specified by the user.  That way, the server
doesn't have to probe for the client loci that sent the data.

> When it's done, the user can
> save the object locally for future reference (or maybe it's moved to a
> networked Loci archive system [just a Paos server]).

Yes.  The object will appear to the user as a Loci object in the file open
dialog, and it will appear as a larger glyph on the benchtop.  It won't have to
go through any translation again.

> Of course, the workflow process could be run locally as well, along with
> all of the analysis tools.

Yes yes yes yes!!!

> Also, the workflow system could implement more
> than just Paos network connection to the analysis programs, such as CORBA,
> COM, IRC (biobots!), etc. all of which would be transparent to the client
> tools.

Yes!  Each connection is filtered by a porta locus, just like the Porta Internet
& Gatekeeper combination.

> So is that what everyone what already thinking?

The rain in Spain falls mainly on the plane...Yes! By George I think he's got
it!

> Also, whenever I said "analyis tool/server", that could be replaced with
> "database query/update".

It sure can.

> 
> Now what does the query language look like, and how do we embed info from
> analysis and db access early in the batch into later queries. Especially
> if we have multiple XML dialects that the tools speak in. Ugh. Well I have
> 2.5 hours of day-dreaming/class tomorrow to come up with something.

Well, again, if we make up our own system it will be less complicated...but
we'll have more work.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro at bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--



More information about the Pipet-Devel mailing list