[Pipet Devel] Another XML proposal.

Humberto Ortiz Zuazaga hortiz at neurobio.upr.clu.edu
Tue Mar 30 09:22:35 EST 1999

> > I'd like to propose that Loci use many small XML DTDs instead of
> > trying for a kitchen sink DTD.

> LocusML should also be able to
> describe relationships between those pieces of data, if necessary,
> however. We might need specific DTDs for relationships (ie. a restriction
> map, which contains a number of short sequence components), as a lot of
> relationships will be very hard to express generically.

Yes, I propose a DTD for each kind of relationship, where for example a 
structural alignment could have a structural alignment DTD, and that DTD 
allowed for embedding a multi-sequence alignemnt entity that in turn contained 
several protein sequence entities, structure entities, each protein sequence 
could contain a set of reference entities.  Each entity could be in a 
different DTD.

We also need a DTD for page or canvas composition of multiple display loci, 
for embeding a figure in a figure, for example.

> I don't like how BSML is structured, but I do like the detail it allows.

I didn't mean BSML specifically, just that a sequence DTD should stick to 
describing only sequence information.

> > a different XML dialect could be used for structure information
> > (including structural annotations in sequences).
> Yes. I'm not sure where to begin on structure. Someone here had ideas
> on this, but I'm not sure who or what became of them.

With my proposal, we can defer on defining a structure DTD until we actually 
have more clue.

> > The workspace contacts the restriction map locus, which returns an XML
> > object describing the parameters and options this restriction map
> > locus requires or supports. 
> _That_ is an interesting idea. I had just been assuming a generic
> interface for types of loci (for example, a restriction map locus has
> three arguments and it doesn't vary), but rather than having a bunch of
> hardcoded loci types, we can query the locus for it's interface (of course
> we'll want to cache interfaces).

The gatekeeper can also handle finding appropriate loci:

workspace says I have a BICML nucleotide sequence v4.1 object, I want to 
perform a restriction map with these enzymes and see the sizes of the digested 

A tacg locus on server.example.com can reply saying, I can do the analysis, 
please send me the sequence, and the enzymes you want off of this list, to 
view the output you need a locus that can display v3.5 digest files, here is a 
url for a gnome-python locus for a compatible viewer.

> > An option handling locus can then prompt
> > me for the enzymes I want to cut with, the output format I prefer,
> > etc.
> Going back to Jeff's idea about embedding python in XML, a locus could
> return an interface description with UI code to handle the query
> configuration (probably optional for exotic cases; most of the time it
> would be generic fields with default UI handlers).

Again, we dont have to pass back the UI code, just a URL to it, the workspace 
may well already have a copy locally.  I think it's a bad idea to embed the 
python code in the xml.  It violates the principle that the DTDs should stick 
to the point, and it really gets ugly when you consider the security 
implications.  Locus will ship with loci for displaying many kinds of DTDs, 
and a site manager may well not allow the workspace to download untrusted 
code.  With my proposal, the worspace just has to locate any locus that can 
display the result DTD, you may well have several sequence viewers on your 
machine already.

> > The restriction map locus can now return the results as several xml
> > objects:  a bibliographic reference object describing the algorithm
> > used to perform the analysis; a result object containing the requested
> > results; a locus object containing the gnome-python source code for
> > a gui-locus that can display the results.
> Before we go overboard with passing interface code around though, I'd like
> to strongly encourage the presence of powerful, high-level widgets in the
> workspace app. We don't want to be passing around a generic sequence
> viewer all the time.

That's what I mean. An analysis locus can just say my output is in BICML v3.2 
format, here is the url for a viewer if you don't have one.  The workspace 
then chooses whether or not to retreive the UI code.

> Although, perhaps we don't even need to bother trying to express the
> internal Loci data stuff as XML. Will we ever need to write it out to XML?
> Possibly only the actual biological data needs XML expression, just to
> facilitate interaction between Loci derived data and non-Loci tools.

I argue that all our data structures should be representable as XML.  This 
would let people write loci in any language, export individual components for 
other tools, and facilitate exchange of data.

Storing data in python specific or binary formats restricts your options.

Hopefully, we'll soon be able to embed Loci figures in our gnome word 
processor papers!

Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz at neurobio.upr.clu.edu

More information about the Pipet-Devel mailing list