From david.lapointe at umassmed.edu  Sun Jan 24 15:30:58 1999
From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu)
Date: Fri Feb 10 19:18:05 2006
Subject: [Pipet Devel] BlastXML
Message-ID: <93307F07DE63D211B2F30000F808E9E525D662@edunivexch02.umassmed.edu>

Ok, I am one of those weasels too. I am not suggesting moving to Java but
here's a piece that came down the BioWidget pipeline this week.

%%%%%%%%%%
January 19, 1999

PharmTools SDK Suite(TM) (Early Access 1)

Announcing the availability of PharmTools SDK Suite(TM) (Early Access 1)
from WorkingObjects.com for evaluation. PharmTools SDK is a collection
of reusable Java frameworks and toolkits for use in the development of
bioinformatics applications. This early access release includes: the
Blast Parsing Framework and BlastXML-SDK. The Blast Parsing Framework is
a set of design patterns and Java classes for processing native (i.e.,
local) and NCBI website generated Blast2 reports. BlastXML-SDK extends
the Blast Parsing Framework functionality to support the creation and
processing of BlastXML documents.

The PharmTools SDK Suite may be downloaded from the WorkingObjects.com
website at
http://www.workingobjects.com.

%%%%%%%%%%%%

It's pretty large, about 100 kb of *.jar for the application, and 500 kb for
the sdk.jar with a blastxml.dtd.  It has a functionality that relates to
some discussion we had earlier this week on parsing the output of programs,
which incidently is made easier by perl (regex).  Many authors of *new and
improved* programs, ie FASTA, have included parsible output into their
programs.  This makes it easier to connect different analyses. IMHO a good
thing.  Most MolbBio packages that I have seen are just a bag of unrelated
pieces. meaning you can't run the output of BLAST or FASTA into CLUSTAL
without a bit of work.

David


From bizzaro at bc.edu  Sun Jan 24 19:14:00 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:05 2006
Subject: [Pipet Devel] BlastXML
References: <93307F07DE63D211B2F30000F808E9E525D662@edunivexch02.umassmed.edu>
Message-ID: <36ABB745.218CD79E@bc.edu>

I'm not anti-Java either (well...maybe I am :-).  You know, if any of these
non-Python tools are open source, they will be very useful for us to see how
it was done.  If we are interested in a pure Python core, we may be able to
translate some things to Python.

BTW, what translators are out there that translate ???-to-Python?

BlastXML should be good to look at because we want Blast searches to be
included in Loci.  Not that Loci will come with Blast, but we want it to be
one of the first things added to the core.  The same goes for FASTA.

> Many authors of *new and
> improved* programs, ie FASTA, have included parsible output into their
> programs.  This makes it easier to connect different analyses.

Something we must take a look at.  Have you looked into that Harry?

> IMHO a good
> thing.  Most MolbBio packages that I have seen are just a bag of unrelated
> pieces. meaning you can't run the output of BLAST or FASTA into CLUSTAL
> without a bit of work.

I couldn't have said it better myself! :-)  This is what Loci must _not_
become!  Whatever we use, it must pass this test:  Will it cause a break in
the Loci continuum?  Will Loci become "a bag of unrelated pieces"?  All Loci
data (in XML) should be able to be tossed around between the core parts of
Loci like a basketball at a Harlem Globetrotter's game!


Jeff
bizzaro@bc.edu


david.lapointe@umassmed.edu wrote:
> 
> Ok, I am one of those weasels too. I am not suggesting moving to Java but
> here's a piece that came down the BioWidget pipeline this week.
> 
> %%%%%%%%%%
> January 19, 1999
> 
> PharmTools SDK Suite(TM) (Early Access 1)
> 
> Announcing the availability of PharmTools SDK Suite(TM) (Early Access 1)
> from WorkingObjects.com for evaluation. PharmTools SDK is a collection
> of reusable Java frameworks and toolkits for use in the development of
> bioinformatics applications. This early access release includes: the
> Blast Parsing Framework and BlastXML-SDK. The Blast Parsing Framework is
> a set of design patterns and Java classes for processing native (i.e.,
> local) and NCBI website generated Blast2 reports. BlastXML-SDK extends
> the Blast Parsing Framework functionality to support the creation and
> processing of BlastXML documents.
> 
> The PharmTools SDK Suite may be downloaded from the WorkingObjects.com
> website at
> http://www.workingobjects.com.
> 
> %%%%%%%%%%%%
> 
> It's pretty large, about 100 kb of *.jar for the application, and 500 kb for
> the sdk.jar with a blastxml.dtd.  It has a functionality that relates to
> some discussion we had earlier this week on parsing the output of programs,
> which incidently is made easier by perl (regex).  Many authors of *new and
> improved* programs, ie FASTA, have included parsible output into their
> programs.  This makes it easier to connect different analyses. IMHO a good
> thing.  Most MolbBio packages that I have seen are just a bag of unrelated
> pieces. meaning you can't run the output of BLAST or FASTA into CLUSTAL
> without a bit of work.
> 
> David

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Sun Jan 24 20:54:26 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:05 2006
Subject: [Pipet Devel] BlastXML
In-Reply-To: <36ABB745.218CD79E@bc.edu>
Message-ID: <Pine.LNX.3.96.990124171038.11723A-100000@cx408397-a.irvn1.occa.home.com>

On Sun, 24 Jan 1999, J.W. Bizzaro wrote:

    [most deleted]

> > Many authors of *new and
> > improved* programs, ie FASTA, have included parsible output into their
> > programs.  This makes it easier to connect different analyses.
> 
> Something we must take a look at.  Have you looked into that Harry?

I've been involved in a porting project involving FASTA, but I admit I
missed the bit about making the output more parsable - I'll go back and
check it more carefully - thanks!!  Do others have any other pointers to
packages that have made efforts to make parseable output?

In relation to this is an approach that Lincoln Stein discussed in an article
about using perl for the human genome project which I'll also throw out for
general misinformation: the use of an i/o language called boulderio which
had its beginnings in development of the Whitehead's 'Primer'.  He described
it as a way to pass data thru pipes with each added analyses being able to
tag it with additional info. I'm not suggesting using it as is, but the idea
of being able to add analytical value to a pipes/streams-based dataflow is
vary attractive, especially to a large effort such as a genome initiative or
even pharma.  The article and links to boulderio are at:
http://bio.perl.org/GetStarted/tpj_ls_bio.html
http://stein.cshl.org/software/boulder/

This is a lightweight approach to marking up data so that it can be passed
from app to app.  It is not a very formal approach, but it has been used to
coordinate some very large sequencing efforts.

> 
> > IMHO a good
> > thing.  Most MolbBio packages that I have seen are just a bag of unrelated
> > pieces. meaning you can't run the output of BLAST or FASTA into CLUSTAL
> > without a bit of work.
> 
> I couldn't have said it better myself! :-)  This is what Loci must _not_
> become!  Whatever we use, it must pass this test:  Will it cause a break in
> the Loci continuum?  Will Loci become "a bag of unrelated pieces"?  All Loci
> data (in XML) should be able to be tossed around between the core parts of
> Loci like a basketball at a Harlem Globetrotter's game!

I like that analogy.  

Cheers
Harry

From bizzaro at bc.edu  Sun Jan 24 21:25:40 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:05 2006
Subject: [Pipet Devel] embedding queries in XML
References: <Pine.LNX.3.96.990124171038.11723A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36ABD616.30DAC33E@bc.edu>

Harry Mangalam wrote:

> the use of an i/o language called boulderio which
> had its beginnings in development of the Whitehead's 'Primer'.  He described
> it as a way to pass data thru pipes with each added analyses being able to
> tag it with additional info. I'm not suggesting using it as is, but the idea
> of being able to add analytical value to a pipes/streams-based dataflow is
> vary attractive, especially to a large effort such as a genome initiative or
> even pharma.
> 
> This is a lightweight approach to marking up data so that it can be passed
> from app to app.  It is not a very formal approach, but it has been used to
> coordinate some very large sequencing efforts.

I like what was suggested by someone on the team earlier, that the XML file
can contain a list of queries to be performed and already performed.  In that
sense, the XML files say "This is where I'm going.  And this is where I've
been.  Can you help me?"  So if the player who catches the basketball doesn't
know where to throw it to, he can read the name of the recipient and sender
right off of the ball.

Why would a locus get an XML file it wasn't intended to get?  Maybe this idea
is best suited for a router system.  Can each locus be a router?  Each one
_should_ be.  Even if a locus was intended to get the data and do something
with it, if the next step is to send the data somewhere else, it should know
where to send it.

Maybe the list of queries/commands can be put into the XML from the GCL
Benchtop, at the start of the analysis.  This way, the GCL won't have to
control every step.  Each locus won't have to ask the GCL where it should go
next.  The XML data will know the path.

Once again, I'm not sure how this would work with Paos.  Can you enlighten us
Carlos?  Can an XML object be treated as a mobile object with a mission?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at mroe.cs.colorado.edu  Mon Jan 25 02:44:28 1999
From: carlosm at mroe.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] embedding queries in XML
Message-ID: <Pine.OSF.4.03.9901250143390.10228-100000@busboy.sped.ukans.edu>

On Sun, 24 Jan 1999, J.W. Bizzaro wrote:

> > This is a lightweight approach to marking up data so that it can be passed
> > from app to app.  It is not a very formal approach, but it has been used to
> > coordinate some very large sequencing efforts.
> 
> I like what was suggested by someone on the team earlier, that the XML file
> can contain a list of queries to be performed and already performed.  In that
> sense, the XML files say "This is where I'm going.  And this is where I've
> been.  Can you help me?"  So if the player who catches the basketball doesn't
> know where to throw it to, he can read the name of the recipient and sender
> right off of the ball.

This sounds like a workflow system to me. Except the agents are now tools
instead of office workers.

> Why would a locus get an XML file it wasn't intended to get?  Maybe this idea
> is best suited for a router system.  Can each locus be a router?  Each one
> _should_ be.  Even if a locus was intended to get the data and do something
> with it, if the next step is to send the data somewhere else, it should know
> where to send it.

Again, this sounds like nodes in a distributed workflow system. Exceptions
can occur in each node and depending on the presence on matching exception
handler the node knows where to send the data next - or just raise a flag
and say "I don't know what to do with this, help me!" 

> Maybe the list of queries/commands can be put into the XML from the GCL
> Benchtop, at the start of the analysis.  This way, the GCL won't have to
> control every step.  Each locus won't have to ask the GCL where it should go
> next.  The XML data will know the path.
>
> Once again, I'm not sure how this would work with Paos.  Can you enlighten us
> Carlos?  Can an XML object be treated as a mobile object with a mission?

I assume that an XML object is an intermediate result in the execution of
some composition of analysis tools, correct? If you want to add XML
objects as agents who can actively seek their next tool, you need to
provide the environment to do that (there is a Python module that offers a
safe execution environment for such mobile objects). Those execution
environments would be like special shells that receive XML objects and
execute them. From the view point of Paos these shells would be Paos
clients. They submit status information to one or more Paos server and
receive notifications that either control the execution of tools or modify
the content of XML objects (such as routing information). Other clients of
Paos are GCL editors and monitors that visualize the whereabouts and
status of these mobile XML objects.

So I see Paos as control and monitoring infrastructure for shells which
receive and send XML objects from/to other shells and start and feed tools
according to these XML objects. I think you call these shells analysis
loci (correct?). But you could use the same shells as visualization loci. 
For me a visualization loci is just another glyph in a GCL construct. To
the user the only difference to an analysis loci is the fact that it
usually runs fairly local to the user and calls a tool that shows up as
a Gnome application that visualizes data.

Let me know whether this sounds right. 

Carlos


From bizzaro at bc.edu  Tue Jan 26 18:45:39 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] embedding queries in XML
References: <Pine.OSF.4.03.9901250143390.10228-100000@busboy.sped.ukans.edu>
Message-ID: <36AE53A2.B6DCEF36@bc.edu>

Carlos Maltzahn wrote:

> This sounds like a workflow system to me. Except the agents are now tools
> instead of office workers.

Yes!  I ignored the concept of the workflow system because it related to office
workers, but I guess we can treat the loci as workers.  Great!  Now where did I
see a workflow system recently?  Zope does this, right?

> I assume that an XML object is an intermediate result in the execution of
> some composition of analysis tools, correct? 

The XML is generated from whatever biological data the user starts with.  They
will get some piece of info (a sequence or a structure) and will want to do
something with it.  As soon as Loci knows what the data is, it is put into XML,
and it stays in XML indefinitely.

> So I see Paos as control and monitoring infrastructure for shells which
> receive and send XML objects from/to other shells and start and feed tools
> according to these XML objects. I think you call these shells analysis
> loci (correct?).

If by shells you mean analysis tools, in whatever language, wrapped in Python so
that they become transparent, then yes.

> But you could use the same shells as visualization loci.

Well...They're shells in the sense that GTK/GNOME GUI are wrapped in Python. 
But they will be pure Python.

> For me a visualization loci is just another glyph in a GCL construct. To
> the user the only difference to an analysis loci is the fact that it
> usually runs fairly local to the user and calls a tool that shows up as
> a Gnome application that visualizes data.
> 
> Let me know whether this sounds right.

Yes!  I think you see it the way I do.  Only visualization loci are always
local/client, while analysis loci can be remote/Internet-server (as I have been
describing them) _or_ local/client.  The mechanism for local or remote analysis
loci (gatekeeper and porta) should work nearly the same.  My reasoning for
having both is that local analysis will give better control and faster feedback,
while remote analysis will expand the Loci installation to the extent of what is
on the Internet.

BTW, Carlos, don't worry about rushing to get the documentation done.  We
understand that your thesis is your personal life and much more important ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Wed Jan 27 19:16:58 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] our own XML
References: <Pine.OSF.4.03.9901252356030.6613-100000@busboy.sped.ukans.edu> <36AE660B.4D9707B7@bc.edu> <199901271742.SAA17578@dirac.cnrs-orleans.fr>
Message-ID: <36AFAC7A.860982B3@bc.edu>

Justin et al,

Then let's try to design our own XML, emphasizing (1) biomacromolecule structure
according to Konrad's specifications, (2) biopolymer sequence, (3) commands and
queries used by Loci, (4) object orientation, and (5) workflow...as these things
work best with Paos.  And let's see if we can make use of the best of existing
XML's/DTD's.

Carlos, can Paos be extended to offer _native_ support for XML objects that
embed queries and other information needed to make our workflow system, as we've
been discussing lately?  Not to make Paos work only with our biological XML, but
to work with any XML that supports the embedding of workflow information.


Jeff


Konrad Hinsen wrote:
> 
>    Konrad, you thought we might want to do this back when we had only three people
>    involved.  Maybe we can call it "LocusML" or "Bio-Object ML" (BOML) or
>    "Bio-Macromolecule ML" (BMML).
> 
> Fine with me, and I'd certainly use it for other applications as well.
> 
> On the other hand, it is possible to design DTDs by extending existing
> ones. Perhaps this is a good idea to save effort and keep compatibility
> to some extent.

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Sat Jan 23 14:03:48 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36A95B6D.BF276B19@bc.edu>
Message-ID: <Pine.LNX.3.96.990123105527.7480A-100000@cx408397-a.irvn1.occa.home.com>

Hi All,

   Having only recently come to this arena, what's the group's evaluation of
the relative merits of BSML:
http://www.visualgenomics.com/sbir/rfc.htm

vs BioML:
http://www.proteometrics.com/BIOML/

I'm still going thru the text of the specs but if any of you have strong
arguments regarding either approaches, I'd very much appreciate it.  The
bioperl people seem to like the BioML approach.

Also, if you have come to an approach that can rationalize the competing
stds, please let me know.

Cheers
Harry

From bizzaro at bc.edu  Sat Jan 23 18:00:05 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <Pine.LNX.3.96.990123105527.7480A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36AA5475.7BAE2552@bc.edu>

Harry,

We had some rather lengthy exchanges about the use of BSML and CML in Loci.  We
were looking for an XML that would work well for both sequence and structure
information.  But it seemed that neither was good at both.

In sort, we stepped back from the issue, and decided we would try to support
BOTH, until something better came along.  We even considered making our own XML
(we would call it BMML, "BioMolecule Markup Language").  What we really wanted
was just one language that would give us an excellent description of large bio
macromolecules.  Well, of course that would have been too much to take on right
now.

Konrad actually has a lot to say about descriptive languages for macromolecule
structure.  He had corresponded with Peter Murray Rust, the author of CML and
someone rather influential in the development of XML.  Konrad, as he can tell
you, is not satisfied with any current format.

BioML comes as a bit of a surprise to me.  It seems to be brand new.  Looking it
over a bit, it does seem to do a better job at handling structure than BSML, and
I like the inclusion of all sorts of biological information (it can describe
organisms as well, it seems)...although some may argue this is "bloat".

I would like Konrad to give us his impression of BioML.  It would be nice to use
one XML rather than two.  My big question, however, is the licensing.  The Web
page says to contact David Fenyo about the "commercial" use of BioML.  I wonder
if this is one of those "well, as long as you don't make money on it" licenses. 
If so, we have problems: It won't fit with GNU GPL.

I wrote a message to David Fenyo about this, and to see if he can give us a
contrast between BioML and BSML.  A cc will be sent to the Tulip list.


Cheers!
Jeff
bizzaro@bc.edu


Harry Mangalam wrote:
> 
> Hi All,
> 
>    Having only recently come to this arena, what's the group's evaluation of
> the relative merits of BSML:
> http://www.visualgenomics.com/sbir/rfc.htm
> 
> vs BioML:
> http://www.proteometrics.com/BIOML/
> 
> I'm still going thru the text of the specs but if any of you have strong
> arguments regarding either approaches, I'd very much appreciate it.  The
> bioperl people seem to like the BioML approach.
> 
> Also, if you have come to an approach that can rationalize the competing
> stds, please let me know.
> 
> Cheers
> Harry

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sat Jan 23 18:00:16 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML license
Message-ID: <36AA5480.CF43C1A8@bc.edu>

David,

I was just told about the BioML language.  I am the coordinator of a rather
large project to develop a free and open source (GNU GPL) bioinformatics
package.  It's called "The Loci Project," and here is the Web site:

    http://www.uml.edu/Dept/Chem/BICGroup/Apps/TULIP/

XML will be the backbone of communication between tools.  We were looking
closely at BSML and CML for descriptions of both sequence and structure (neither
does both well).

Anyway, my question to you, since you are the person to contact for "commercial"
uses of BioML, is what sort of restrictions do you have on the use of BioML?  I
was not able to find this information on the Web site.  Although Loci is not
commercial, our licensing (GPL) is not compatible with other licenses that
restrict commercial use.

Also, could you give us a brief contrast between BioML and BSML?  What was the
motivation behind making "another" XML?  Thank you!


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sat Jan 23 18:11:38 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <Pine.LNX.3.96.990123105527.7480A-100000@cx408397-a.irvn1.occa.home.com> <36AA5475.7BAE2552@bc.edu>
Message-ID: <36AA572A.45B1237A@bc.edu>

"J.W. Bizzaro" wrote:
> I would like Konrad to give us his impression of BioML.

And of course Justin, who is our resident XML X-pert ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Sun Jan 24 00:38:21 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36AA5475.7BAE2552@bc.edu>
Message-ID: <Pine.LNX.3.96.990123202858.7900A-100000@cx408397-a.irvn1.occa.home.com>

Hi Jeff et al,

As you probably have found out now, BioML is being used by the bio.perl
group and the perl masters at perl.org already have a pretty large archive of
useful scripts for manipulating XML.

And it looks like, in a VERY fast spin thru the docs, that using their XML
parser tools, it may be possible to use both of these XMLs, using these perl
modules to handle parsing and interconversion.

ie:
http://www.perl.com/pace/pub/perldocs/1998/11/xml.html
http://www.perl.com/pace/pub/perldocs/1998/12/cooper-01.html

perl also has gtk bindings and in general, I've heard about and done more
things using perl in the bio world than with python.  Not to say python
doesn't make some or even most things easier - just that perl has a proven
track record in the bio area.

It seems not to be at cross purposes to the objectives of LOCI to implement
chunks of it in perl, no?  As long as it remains easliy implementable, and
usable, and freely re-distributable, perl is as much of an option as
python, isn't it?

Also, how is the LOCI project planning on handling the display of the
results of this effort?  Both the BioML and the BSML browsers that are
available are MS-centric and certainly do not use gtk.  The only browser
that I'm aware of that does is gzilla (www.gzilla.com) and mnemonic (
http://www.mnemonic.org/).  Are you planning on using either of these for
your display dev platform?  Or are you not implementing any display
technology?

VisualGenomics is planning a Java-based BSML browser, but I'm sure that it
won't use gtk, unless heavily funded.  It'll probably be written with the
swing classes - what's the redist controls on swing - I'm not a Java
follower - sorry.

OK - enough mind rot from me for the present...

Cheers
harry

On Sat, 23 Jan 1999, J.W. Bizzaro wrote:

> Harry,
> 
> We had some rather lengthy exchanges about the use of BSML and CML in Loci.  We
> were looking for an XML that would work well for both sequence and structure
> information.  But it seemed that neither was good at both.
> 
> In sort, we stepped back from the issue, and decided we would try to support
> BOTH, until something better came along.  We even considered making our own XML
> (we would call it BMML, "BioMolecule Markup Language").  What we really wanted
> was just one language that would give us an excellent description of large bio
> macromolecules.  Well, of course that would have been too much to take on right
> now.
> 
> Konrad actually has a lot to say about descriptive languages for macromolecule
> structure.  He had corresponded with Peter Murray Rust, the author of CML and
> someone rather influential in the development of XML.  Konrad, as he can tell
> you, is not satisfied with any current format.
> 
> BioML comes as a bit of a surprise to me.  It seems to be brand new.  Looking it
> over a bit, it does seem to do a better job at handling structure than BSML, and
> I like the inclusion of all sorts of biological information (it can describe
> organisms as well, it seems)...although some may argue this is "bloat".
> 
> I would like Konrad to give us his impression of BioML.  It would be nice to use
> one XML rather than two.  My big question, however, is the licensing.  The Web
> page says to contact David Fenyo about the "commercial" use of BioML.  I wonder
> if this is one of those "well, as long as you don't make money on it" licenses. 
> If so, we have problems: It won't fit with GNU GPL.
> 
> I wrote a message to David Fenyo about this, and to see if he can give us a
> contrast between BioML and BSML.  A cc will be sent to the Tulip list.
> 
> 
> Cheers!
> Jeff
> bizzaro@bc.edu
> 
> 
> Harry Mangalam wrote:
> > 
> > Hi All,
> > 
> >    Having only recently come to this arena, what's the group's evaluation of
> > the relative merits of BSML:
> > http://www.visualgenomics.com/sbir/rfc.htm
> > 
> > vs BioML:
> > http://www.proteometrics.com/BIOML/
> > 
> > I'm still going thru the text of the specs but if any of you have strong
> > arguments regarding either approaches, I'd very much appreciate it.  The
> > bioperl people seem to like the BioML approach.
> > 
> > Also, if you have come to an approach that can rationalize the competing
> > stds, please let me know.
> > 
> > Cheers
> > Harry
> 
> -- 
> J.W. Bizzaro                  Phone: 617-552-3905
> Boston College                mailto:bizzaro@bc.edu
> Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
> --
> 

Cheers,
Harry

Harry J Mangalam, Developmental + Cell Biology
Rm 4201, Biological Sciences II, UC Irvine, Irvine, CA, 92697
(949) 824 4824[vox], (949) 824 8551[fax], mangalam@uci.edu
http://hornet.bio.uci.edu/~hjm/

From bizzaro at bc.edu  Sun Jan 24 02:05:04 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <Pine.LNX.3.96.990123202858.7900A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36AAC620.56ECC21C@bc.edu>

Harry Mangalam wrote:
> It seems not to be at cross purposes to the objectives of LOCI to implement
> chunks of it in perl, no?  As long as it remains easliy implementable, and
> usable, and freely re-distributable, perl is as much of an option as
> python, isn't it?

Errrr mmmmmmmmmm argggggghh!  I'm having convulsions here ;-)  It seems at every
turn, there's something supposedly better to use, and I'm left having to defend
what I have chosen.  If I don't give in, I'm too stubborn.  But if I do, Loci
more closely resembles a smorgasbord of technologies.

I am trying to keep Loci homogenous in terms of technology.  I think Python is a
knock-out language, beating even Perl in most respects.  I know Perl is very
prominent in sequence analyses...It's prominent in just about everything.  But I
think Python is not far behind in acceptance, and is gaining momentum.

What can we do?  If there is absolutely no other choice, we can go with
something in Perl, ***providing we consider it a temporary option.  If we can
find something later in Python or can convert it to Python, then we will.  But
don't give in too easily.

> 
> Also, how is the LOCI project planning on handling the display of the
> results of this effort?  Both the BioML and the BSML browsers that are
> available are MS-centric and certainly do not use gtk.  The only browser
> that I'm aware of that does is gzilla (www.gzilla.com) and mnemonic (
> http://www.mnemonic.org/).  Are you planning on using either of these for
> your display dev platform?  Or are you not implementing any display
> technology?

You're talking about gtk-based XML browsers?  The Gnome libraries have a canvas
that is rather powerful.  I think we can make our own XML-to-display modules
using Python-Gnome bindings.  Of course what we are doing is unique.  The only
other bioinformatics XML browsers out there are the two you mentioned for BioML
and BSML.  So there are no standard libraries for handling this sort of thing.

Thomas is working on a sequence editor, I think with the Gnome canvas.  How have
things been developing, Thomas?

Harry, each Loci GUI tool will be a rather small XML browser, designed to
specifically handle *one* type of display.  For example, we will have separate
tools for the display of short DNA sequences, circular genomes, chromosomes,
protein sequences with secondary structure, 3D DNA structures, 3D protein
structures, phylogenic trees, DNA sequence editing, protein sequence editing,
data plots, and others we haven't thought of yet.  The idea is to keep things
small and modular.  There really won't be any very large apps in Loci.

> 
> VisualGenomics is planning a Java-based BSML browser, but I'm sure that it
> won't use gtk, unless heavily funded.  It'll probably be written with the
> swing classes - what's the redist controls on swing - I'm not a Java
> follower - sorry.
> 

Yeah, if it uses Java, it will almost certainly use Swing.  I think Swing is now
a part of the standard distribution of Java, which is supposed to be fairly easy
to obtain.  The license is not nearly GNU GPL.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Sun Jan 24 12:46:19 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36AAC620.56ECC21C@bc.edu>
Message-ID: <Pine.LNX.3.96.990124085211.10905A-100000@cx408397-a.irvn1.occa.home.com>

Hi All (new content inline below)

On Sun, 24 Jan 1999, J.W. Bizzaro wrote:

> Harry Mangalam wrote:
> > It seems not to be at cross purposes to the objectives of LOCI to implement
> > chunks of it in perl, no?  As long as it remains easliy implementable, and
> > usable, and freely re-distributable, perl is as much of an option as
> > python, isn't it?
> 
> Errrr mmmmmmmmmm argggggghh!  I'm having convulsions here ;-)  It seems at every
> turn, there's something supposedly better to use, and I'm left having to defend
> what I have chosen.  If I don't give in, I'm too stubborn.  But if I do, Loci
> more closely resembles a smorgasbord of technologies.

:) I'm sorry for having caused the early morning convulsions, Jeff.  I'm
just trying to get a handle on the big picture.  I wasn't actually suggesting
replacing Python with perl for the central technology, but rather as one of
the toolsets that supports your central technology.  The problem I see with
being too doctrinaire as to the languages used is that you run the risk of
alienating some that support your model and see alternative ways of
accomplishing a task (that may have already been solved with great effort)
using another approach and as long as using it doesn;t conflict with your
central goal, I don't see the problem.

 
> I am trying to keep Loci homogenous in terms of technology.  I think Python is a
> knock-out language, beating even Perl in most respects.  I know Perl is very
> prominent in sequence analyses...It's prominent in just about everything.  But I
> think Python is not far behind in acceptance, and is gaining momentum.
> 
> What can we do?  If there is absolutely no other choice, we can go with
> something in Perl, ***providing we consider it a temporary option.  If we can
> find something later in Python or can convert it to Python, then we will.  But
> don't give in too easily.

Hear, hear.  That's all I'm suggesting.  However, IMHO excluding biosequence
artists on the basis of what language they choose is certain to make for bad
blood in the community and that's not what we want.  As long as there's a
standard way for the components to interact, anything that contributes to
the effort should be considered.  Speaking of which, one of the things that
attracted me to this approach was it's close coding and thematic relation to
the GNOME project.  One way to rationalize the different language issue
would be build the components using the GNOME's ORBit definition, which is a
lightweight, lo-memory, GPLed, CORBA-compliant ORB (albeit written in C, but
I'm not sure that would affect much). Info is at:

http://www.labs.redhat.com/orbit/

Has that approach been evaluated?  I'm certainly not trying to throw grit in
the gastank - it's something that I'm currently investigating and so far it
seems to be quite promising.  If the list can give me reasons why it's a bad
idea, I'd very much appreciate it.

And finally, I appreciate the difficulties of trying to herd a bunch of wild,
perl/C/scheme/lisp/etc-crazed code weasels.

>From one of the weasels,
Cheers
Harry

From david.lapointe at umassmed.edu  Sun Jan 24 13:01:56 1999
From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <Pine.LNX.3.96.990123202858.7900A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <93307F07DE63D211B2F30000F808E9E525D661@edunivexch02.umassmed.edu>

There is a Perl-XML FAQ at http://www.perlxml.com/faq/perl-xml-faq.html

Also, Activestate.com has a bunch of perl mail-lists running, one of which
is the PERL-XML list (http://www.activestate.com/lyris/lyris.pl). You can
browse the archives as a guest. CPAN has several XML related modules.

http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI
/XML/


David
> -----Original Message-----
> From: Harry Mangalam [mailto:hjm@cx408397-a.irvn1.occa.home.com]
> Sent: Sunday, January 24, 1999 12:38 AM
> To: tulip-list@busboy.sped.ukans.edu
> Subject: Re: [Pipet Devel] BioML vs BSML
>
>
> Hi Jeff et al,
>
> As you probably have found out now, BioML is being used by
> the bio.perl
> group and the perl masters at perl.org already have a pretty
> large archive of
> useful scripts for manipulating XML.
>
> And it looks like, in a VERY fast spin thru the docs, that
> using their XML
> parser tools, it may be possible to use both of these XMLs,
> using these perl
> modules to handle parsing and interconversion.
>
> ie:
> http://www.perl.com/pace/pub/perldocs/1998/11/xml.html
> http://www.perl.com/pace/pub/perldocs/1998/12/cooper-01.html
>
> perl also has gtk bindings and in general, I've heard about
> and done more
> things using perl in the bio world than with python.  Not to
> say python
> doesn't make some or even most things easier - just that perl
> has a proven
> track record in the bio area.
>
> It seems not to be at cross purposes to the objectives of
> LOCI to implement
> chunks of it in perl, no?  As long as it remains easliy
> implementable, and
> usable, and freely re-distributable, perl is as much of an option as
> python, isn't it?
>
> Also, how is the LOCI project planning on handling the display of the
> results of this effort?  Both the BioML and the BSML browsers that are
> available are MS-centric and certainly do not use gtk.  The
> only browser
> that I'm aware of that does is gzilla (www.gzilla.com) and mnemonic (
> http://www.mnemonic.org/).  Are you planning on using either
> of these for
> your display dev platform?  Or are you not implementing any display
> technology?
>
> VisualGenomics is planning a Java-based BSML browser, but I'm
> sure that it
> won't use gtk, unless heavily funded.  It'll probably be
> written with the
> swing classes - what's the redist controls on swing - I'm not a Java
> follower - sorry.
>
> OK - enough mind rot from me for the present...
>
> Cheers
> harry
>
> On Sat, 23 Jan 1999, J.W. Bizzaro wrote:
>
> > Harry,
> >
> > We had some rather lengthy exchanges about the use of BSML
> and CML in Loci.  We
> > were looking for an XML that would work well for both
> sequence and structure
> > information.  But it seemed that neither was good at both.
> >
> > In sort, we stepped back from the issue, and decided we
> would try to support
> > BOTH, until something better came along.  We even
> considered making our own XML
> > (we would call it BMML, "BioMolecule Markup Language").
> What we really wanted
> > was just one language that would give us an excellent
> description of large bio
> > macromolecules.  Well, of course that would have been too
> much to take on right
> > now.
> >
> > Konrad actually has a lot to say about descriptive
> languages for macromolecule
> > structure.  He had corresponded with Peter Murray Rust, the
> author of CML and
> > someone rather influential in the development of XML.
> Konrad, as he can tell
> > you, is not satisfied with any current format.
> >
> > BioML comes as a bit of a surprise to me.  It seems to be
> brand new.  Looking it
> > over a bit, it does seem to do a better job at handling
> structure than BSML, and
> > I like the inclusion of all sorts of biological information
> (it can describe
> > organisms as well, it seems)...although some may argue this
> is "bloat".
> >
> > I would like Konrad to give us his impression of BioML.  It
> would be nice to use
> > one XML rather than two.  My big question, however, is the
> licensing.  The Web
> > page says to contact David Fenyo about the "commercial" use
> of BioML.  I wonder
> > if this is one of those "well, as long as you don't make
> money on it" licenses.
> > If so, we have problems: It won't fit with GNU GPL.
> >
> > I wrote a message to David Fenyo about this, and to see if
> he can give us a
> > contrast between BioML and BSML.  A cc will be sent to the
> Tulip list.
> >
> >
> > Cheers!
> > Jeff
> > bizzaro@bc.edu
> >
> >
> > Harry Mangalam wrote:
> > >
> > > Hi All,
> > >
> > >    Having only recently come to this arena, what's the
> group's evaluation of
> > > the relative merits of BSML:
> > > http://www.visualgenomics.com/sbir/rfc.htm
> > >
> > > vs BioML:
> > > http://www.proteometrics.com/BIOML/
> > >
> > > I'm still going thru the text of the specs but if any of
> you have strong
> > > arguments regarding either approaches, I'd very much
> appreciate it.  The
> > > bioperl people seem to like the BioML approach.
> > >
> > > Also, if you have come to an approach that can
> rationalize the competing
> > > stds, please let me know.
> > >
> > > Cheers
> > > Harry
> >
> > --
> > J.W. Bizzaro                  Phone: 617-552-3905
> > Boston College                mailto:bizzaro@bc.edu
> > Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
> > --
> >
>
> Cheers,
> Harry
>
> Harry J Mangalam, Developmental + Cell Biology
> Rm 4201, Biological Sciences II, UC Irvine, Irvine, CA, 92697
> (949) 824 4824[vox], (949) 824 8551[fax], mangalam@uci.edu
> http://hornet.bio.uci.edu/~hjm/
>

From bizzaro at bc.edu  Sun Jan 24 16:40:10 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <Pine.LNX.3.96.990124085211.10905A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36AB933A.5A270BC2@bc.edu>

Harry Mangalam wrote:
> :) I'm sorry for having caused the early morning convulsions, Jeff.  I'm
> just trying to get a handle on the big picture.  I wasn't actually suggesting
> replacing Python with perl for the central technology, but rather as one of
> the toolsets that supports your central technology.  The problem I see with
> being too doctrinaire as to the languages used is that you run the risk of
> alienating some that support your model and see alternative ways of
> accomplishing a task (that may have already been solved with great effort)
> using another approach and as long as using it doesn;t conflict with your
> central goal, I don't see the problem.
> 

I don't want to be misunderstood regarding my position on various languages. 
One of the main goals of Loci is to provide a framework to unify many bio tools
of different languages.  ***But can we keep just the core of Loci virgin
Python?  It will be hard enough allowing all sorts of languages to be attached
to the core.


> Hear, hear.  That's all I'm suggesting.  However, IMHO excluding biosequence
> artists on the basis of what language they choose is certain to make for bad
> blood in the community and that's not what we want.  As long as there's a
> standard way for the components to interact, anything that contributes to
> the effort should be considered.  Speaking of which, one of the things that
> attracted me to this approach was it's close coding and thematic relation to
> the GNOME project.  One way to rationalize the different language issue
> would be build the components using the GNOME's ORBit definition, which is a
> lightweight, lo-memory, GPLed, CORBA-compliant ORB (albeit written in C, but
> I'm not sure that would affect much). Info is at:
> 

Yes, we have considered CORBA and ORBit.  Right now there is no decent free
implementation of CORBA for Python.  But there is an effort underway to make
Python bindings to ORBit, which we will consider.

It is not a bad idea.  CORBA may be the best way for tools of various languages
to connect to the Loci core without going thru the Gatekeeper.  So, we may very
well see various different GUI attached to Loci.  It's just that we won't do
anything like that for the _core_.

***Remember, the Loci core won't contain even one analysis tool!  We are
considering the core to be very small, consisting of Python/C GTK/GNOME modules.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Jan 24 18:58:16 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <93307F07DE63D211B2F30000F808E9E525D661@edunivexch02.umassmed.edu>
Message-ID: <36ABB397.734DAB46@bc.edu>

Thanks for the info, David.

In case we start to believe there is nothing for XML under Python, here is a
link to the XML-SIG (Special Interest Group):

    http://www.python.org/sigs/xml-sig/

You will find links there to much of the work being done with Python-XML, and
there is a lot.

Also, here is a link to the Python-CORBA SIG:

    http://www.python.org/sigs/do-sig/


Jeff
bizzaro@bc.edu


david.lapointe@umassmed.edu wrote:
> 
> There is a Perl-XML FAQ at http://www.perlxml.com/faq/perl-xml-faq.html
> 
> Also, Activestate.com has a bunch of perl mail-lists running, one of which
> is the PERL-XML list (http://www.activestate.com/lyris/lyris.pl). You can
> browse the archives as a guest. CPAN has several XML related modules.
> 
> http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI
> /XML/
> 
> David


-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hinsen at cnrs-orleans.fr  Mon Jan 25 05:11:54 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36AA5475.7BAE2552@bc.edu> (bizzaro@bc.edu)
References: <Pine.LNX.3.96.990123105527.7480A-100000@cx408397-a.irvn1.occa.home.com> <36AA5475.7BAE2552@bc.edu>
Message-ID: <199901251011.LAA21088@dirac.cnrs-orleans.fr>

> I would like Konrad to give us his impression of BioML. It would be

I don't think my opinion is so relevant; my field of work is rather
different from the Loci project. I work on structures, and BioML
does not seem to have any provision for structures at all. Which is
fine, of course, not everything has to be designed for my needs ;-)
My complaint with CML is that it claims to handle biomolecular
structures and does it badly.

> the licensing. The Web page says to contact David Fenyo about the
> "commercial" use of BioML. I wonder if this is one of those "well,

I am not even sure that a data format is copyrightable. If it is, the
current downloadable DTD does not contain any copyright statement or
usage restrictions, so I don't see why it shouldn't be used for
commercial applications.

That aside, I did notice a couple of strange features in and about
BioML that make me wonder whether it is the format of choice. First,
and most importantly, I have the impression that the inventors have
not quite understood the point of XML - separating content from
layout. BioML contains some purely graphical entity definitions, for
example &paragraph; defined as &newline;&tab;. In my opinion such
things should never appear in XML files. Paragraphs should be marked
up with a paragraph tag, whose visual interpretation is left to a
stylesheet definition.

Second, the BioML inventors seem to be more Windows-centric than
Microsoft itself. Who would have the crazy idea of offering
documentation in portable HTML format only as a self-extracting
archive for Windows? Of course this doesn't affect the language,
but I'd hate to see the next release contain tags for defining
COM objects...

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From justin at ukans.edu  Tue Jan 26 02:21:33 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <199901251011.LAA21088@dirac.cnrs-orleans.fr>
Message-ID: <Pine.OSF.4.03.9901252356030.6613-100000@busboy.sped.ukans.edu>

> I don't think my opinion is so relevant; my field of work is rather
> different from the Loci project. I work on structures, and BioML
> does not seem to have any provision for structures at all. Which is
> fine, of course, not everything has to be designed for my needs ;-)
> My complaint with CML is that it claims to handle biomolecular
> structures and does it badly.

Does BSML not fulfill all of the requirements Loci needs?
I'm guessing so, since CML was also planned.
If so, what's missing?

A visualization program is going to have to know the format of the data it
gets back from the analysis program (obviously), so the XML translation
wrappers will have to be consistent. Now, we could use two different
languages, but a viewer may want data from two different tools, each with
a different ML (markup language).

Also, we'll be wanting to chain several tools together, which is going to
require tools taking input data from a ML, right?

But we also want control information tagging along with the object? And
that would also be XML data? 

Furthermore, I'd like it if this thing could query/update databases, too 
(ie, a glyph for submitting my new protein structure to Brookhaven, or get
the sequence for some gene out of the GDB, etc.)

Now let me see if I understand the system so far.
Paos is the network transport layer. But which end does the server run on?
Jeff made a comment earlier implying the Paos server runs on the user's
machine. One client is the GCL/viewer/monitor and one is on the actual
machine running the analysis tool. But how would a connection be made to
between the server and the analysis client? Doesn't the Paos server have
to be on the analysis end?

Also, a workflow/batch control system is in charge of directing the
movements of the object (via Paos). In case of failure, the Paos object is
updated with some exception, and the workflow system is notified and deals
with it appropriately.

Throughout this process, the workflow system is also updating the Paos
object with current status and the anaylisis programs update the object
(or create new ones?), which the monitor client is displaying for the
user. When complete, the visualization/viewer program is notified, takes
the Paos object and renders it for the user.

Am I close?
If so, it makes sense to use the Paos object to store control, exception,
and status info. Data for anaylsis and analyzed data are stored in
separate attributes. The gatekeeper takes the data from the appropriate
attribute (as told by relevant control information), modifies it as
necessary for the analysis tool, and runs that tool.
Output is then committed to the Paos object (after conversion to the
appropriate XML dialect by the gatekeeper), and the workflow system
decides what to do next (depending on control info), until eventually, it
is handed back to the user's client.

In this model, the workflow system is a Paos server/client combo. It
would get the original object from the user, hand that to an analysis
server, but keep a local copy updated, which the user (status monitor)
would access for updates. When one analysis step is done (and it had
resynced it's copy of the remote object) it would delete the object on the
analysis server (remote object), and then repeat the whole process (ie.
give the object to the next analysis server, ...)

All the user client stuff access the workflow system directly, which deals
with the individual analysis servers. This runs as a separate process, so
you might have a server running this. The client starts up his Loci
GCL program on a networked computer anywhere, builds the analysis batch,
starts it, gets an ID number, and can close the program and walk away.
Then from any other computer with Loci (or via the web when that interface
is done), enters the batch ID, and can see everything that has happened to
it so far along with it's current status. When it's done, the user can
save the object locally for future reference (or maybe it's moved to a
networked Loci archive system [just a Paos server]).

Of course, the workflow process could be run locally as well, along with
all of the analysis tools. Also, the workflow system could implement more
than just Paos network connection to the analysis programs, such as CORBA,
COM, IRC (biobots!), etc. all of which would be transparent to the client
tools.

So is that what everyone what already thinking?

Also, whenever I said "analyis tool/server", that could be replaced with
"database query/update". 

Now what does the query language look like, and how do we embed info from
analysis and db access early in the batch into later queries. Especially
if we have multiple XML dialects that the tools speak in. Ugh. Well I have
2.5 hours of day-dreaming/class tomorrow to come up with something.

Sorry for the ramblingness...

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Tue Jan 26 18:58:44 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <Pine.LNX.3.96.990123105527.7480A-100000@cx408397-a.irvn1.occa.home.com> <36AA5475.7BAE2552@bc.edu> <199901251011.LAA21088@dirac.cnrs-orleans.fr>
Message-ID: <36AE56B4.3D1429F0@bc.edu>

Konrad Hinsen wrote:

> I don't think my opinion is so relevant; my field of work is rather
> different from the Loci project. I work on structures

Are you kidding?  Half of loci will be for structural analyses!  People think
bioinformatics is just about sequence analyses, and I believe wrongly so. 
Because many people exclude structural analyses, we were careful to name The BIC
Group, "The Biomolecular Informatics and Computation Group".  So we are involved
in informatics plus anything else that involves computers and biology.

I want to make certain that we include structural analysis tools in our list of
analysis tools to be used.

> and BioML
> does not seem to have any provision for structures at all. Which is
> fine, of course, not everything has to be designed for my needs ;-)

I recall some BioML examples with structural data.  Unless your talking about
BSML.  But you'll say that including structural data and making a good provision
for it are completely different ;-)

> I am not even sure that a data format is copyrightable. If it is, the
> current downloadable DTD does not contain any copyright statement or
> usage restrictions, so I don't see why it shouldn't be used for
> commercial applications.

Hmmm.  And we didn't hear back from them.

> That aside, I did notice a couple of strange features in and about
> BioML that make me wonder whether it is the format of choice. First,
> and most importantly, I have the impression that the inventors have
> not quite understood the point of XML - separating content from
> layout. BioML contains some purely graphical entity definitions, for
> example &paragraph; defined as &newline;&tab;. In my opinion such
> things should never appear in XML files. Paragraphs should be marked
> up with a paragraph tag, whose visual interpretation is left to a
> stylesheet definition.

That is strange and maybe a good reason to not use it.

> Second, the BioML inventors seem to be more Windows-centric than
> Microsoft itself. Who would have the crazy idea of offering
> documentation in portable HTML format only as a self-extracting
> archive for Windows? Of course this doesn't affect the language,
> but I'd hate to see the next release contain tags for defining
> COM objects...

Windows is the world...not.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 26 20:04:11 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:08 2006
Subject: [Pipet Devel] BioML vs BSML
References: <Pine.OSF.4.03.9901252356030.6613-100000@busboy.sped.ukans.edu>
Message-ID: <36AE660B.4D9707B7@bc.edu>

Justin Bradford wrote:

> Does BSML not fulfill all of the requirements Loci needs?

The Bioinformatic "Sequence" ML is pretty much for just that.  Although they
claim you can embed a PDB (Proten Data Bank) file inside of BSML.  But Konrad is
not a fan of PDB either.

> I'm guessing so, since CML was also planned.
> If so, what's missing?

BSML is missing any decent description of structure, and CML is missing an
acceptable description of structure for molecules larger than what organic
chemists deal with.

We actually can ignore the small chemical descriptions for Loci.  If we just had
something that was as good with sequences as BSML and as good with large
molecule structure as CML is with small molecule structure.

> A visualization program is going to have to know the format of the data it
> gets back from the analysis program (obviously), so the XML translation
> wrappers will have to be consistent. Now, we could use two different
> languages, but a viewer may want data from two different tools, each with
> a different ML (markup language).

How about making our own XML?  I think having four XML's has already diluted the
field so that we can't complain about our XML being a proprietary format.  I
think Justin and Konrad could coordinate this effort, and the others can offer
input on sequence representations.  Really, we can get much of the sequence part
from what we like about BSML and BioML.

This may actually be necessary if we are to embed queries and commands into the
documents.

Konrad, you thought we might want to do this back when we had only three people
involved.  Maybe we can call it "LocusML" or "Bio-Object ML" (BOML) or
"Bio-Macromolecule ML" (BMML).

Give me some feedback.

> Also, we'll be wanting to chain several tools together, which is going to
> require tools taking input data from a ML, right?

Yep.

> But we also want control information tagging along with the object? And
> that would also be XML data?

Yepper.

> Furthermore, I'd like it if this thing could query/update databases, too
> (ie, a glyph for submitting my new protein structure to Brookhaven, or get
> the sequence for some gene out of the GDB, etc.)

You mean have a Loci _tool_ for this?  You're not talking about XML here.

> Now let me see if I understand the system so far.
> Paos is the network transport layer. But which end does the server run on?
> Jeff made a comment earlier implying the Paos server runs on the user's
> machine.

I believe we can have multiple Paos servers.  Exactly where they go, I'm not
sure.

BTW, Carlos wrote in some detail about Paos and Loci in his e-mail messages from
Monday.

> One client is the GCL/viewer/monitor and one is on the actual
> machine running the analysis tool. But how would a connection be made to
> between the server and the analysis client? Doesn't the Paos server have
> to be on the analysis end?

(I'm sorry about using the word "client" to describe the user's machine.  Of
course it also describes a program that communicates with a server.  When I say
client, I mean local machine.)

Yes, I think Paos can reside on both the server and client.  Carlos will have
some documentation for us that can clear things up, and I think there is a
README at the Paos Web site.

@@@
> Also, a workflow/batch control system is in charge of directing the
> movements of the object (via Paos). In case of failure, the Paos object is
> updated with some exception, and the workflow system is notified and deals
> with it appropriately.

Yes sir!

> Throughout this process, the workflow system is also updating the Paos
> object with current status

The XML object can be changed, yes.

> and the anaylisis programs update the object
> (or create new ones?), which the monitor client is displaying for the
> user.

Yes, the GCL glyph, which can open a window to show current status.

> When complete, the visualization/viewer program is notified, takes
> the Paos object and renders it for the user.

Right!

> Am I close?

Oh ya!

> If so, it makes sense to use the Paos object to store control, exception,
> and status info. Data for anaylsis and analyzed data are stored in
> separate attributes.

Yes.  These are complications that may require us to write our own XML.

> The gatekeeper takes the data from the appropriate
> attribute (as told by relevant control information), modifies it as
> necessary for the analysis tool, and runs that tool.

Now we are back to analyzing the XML data (Paos object), back up to where I
typed @@@.  These are not two types of analyses.  The gatekeeper will work with
the workflow system, etc.

> Output is then committed to the Paos object (after conversion to the
> appropriate XML dialect by the gatekeeper), and the workflow system
> decides what to do next (depending on control info), until eventually, it
> is handed back to the user's client.

Yes!  I think you know just what I've been thinking.

> In this model, the workflow system is a Paos server/client combo. It
> would get the original object from the user, hand that to an analysis
> server, but keep a local copy updated, which the user (status monitor)
> would access for updates.

I'm not sure about keeping a local copy of the data.  You say that the data
would updated, which would require the whole XML object to be transferred many
times.  I was thinking only once at the end, but the analysis locus could just
keep reporting what is being done...like writing a log file.

> ...and then repeat the whole process (ie.
> give the object to the next analysis server, ...)

Yes, when GCL is used to automate some analyses.

> All the user client stuff access the workflow system directly, which deals
> with the individual analysis servers. This runs as a separate process, so
> you might have a server running this. The client starts up his Loci
> GCL program on a networked computer anywhere, builds the analysis batch,
> starts it, gets an ID number, and can close the program and walk away.

I never thought of that, but it's a great idea!

> Then from any other computer with Loci (or via the web when that interface
> is done), enters the batch ID, and can see everything that has happened to
> it so far along with it's current status.

Hmmm.  Turning the client off and getting the data from another client, means
the server needs to know the original client is off and that the information
should be held until the ID is provided.  I think it'll work.  The server may
keep a copy on file for a time specified by the user.  That way, the server
doesn't have to probe for the client loci that sent the data.

> When it's done, the user can
> save the object locally for future reference (or maybe it's moved to a
> networked Loci archive system [just a Paos server]).

Yes.  The object will appear to the user as a Loci object in the file open
dialog, and it will appear as a larger glyph on the benchtop.  It won't have to
go through any translation again.

> Of course, the workflow process could be run locally as well, along with
> all of the analysis tools.

Yes yes yes yes!!!

> Also, the workflow system could implement more
> than just Paos network connection to the analysis programs, such as CORBA,
> COM, IRC (biobots!), etc. all of which would be transparent to the client
> tools.

Yes!  Each connection is filtered by a porta locus, just like the Porta Internet
& Gatekeeper combination.

> So is that what everyone what already thinking?

The rain in Spain falls mainly on the plane...Yes! By George I think he's got
it!

> Also, whenever I said "analyis tool/server", that could be replaced with
> "database query/update".

It sure can.

> 
> Now what does the query language look like, and how do we embed info from
> analysis and db access early in the batch into later queries. Especially
> if we have multiple XML dialects that the tools speak in. Ugh. Well I have
> 2.5 hours of day-dreaming/class tomorrow to come up with something.

Well, again, if we make up our own system it will be less complicated...but
we'll have more work.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Tue Jan 26 22:52:19 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:09 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36AE660B.4D9707B7@bc.edu>
Message-ID: <Pine.OSF.4.03.9901262055460.23430-100000@busboy.sped.ukans.edu>

> How about making our own XML?  I think having four XML's has already diluted the
> field so that we can't complain about our XML being a proprietary format.  I
> think Justin and Konrad could coordinate this effort, and the others can offer
> input on sequence representations.  Really, we can get much of the sequence part
> from what we like about BSML and BioML.

I was thinking the same thing too. Nothing seems to do exactly what we
want, and it will be simpler for querying purposes if we only deal with
one XML. Conversion to other formats could also be done, for exporting the
data outside of the Loci system.

> Give me some feedback.

It would help if the people with lots of experience using existing formats
could comment on how they'd like it to work.

> > But we also want control information tagging along with the object? And
> > that would also be XML data?

What about storing control information in the Paos object, rather than in
the XML? Or could we make the Paos object a mirror of the XML format?
The purpose of this should become clearer as I explain other things.

> > Furthermore, I'd like it if this thing could query/update databases, too
> > (ie, a glyph for submitting my new protein structure to Brookhaven, or get
> > the sequence for some gene out of the GDB, etc.)
> 
> You mean have a Loci _tool_ for this?  You're not talking about XML here.

Well, whatever we use to describe queries should be capable of querying
and updating databases, ideally. That way, a database dependent step could
be as simple as an analysis step. This would require a gatekeeper
interface, of course. I just want to make sure we can fit it in
seamlessly.

> Yes, I think Paos can reside on both the server and client.  Carlos will have
> some documentation for us that can clear things up, and I think there is a
> README at the Paos Web site.

A Paos client has to make a connection to a Paos server. Therefore, there
must be a Paos server answering requests wherever an analysis tool is
located.

> > Also, a workflow/batch control system is in charge of directing the
> > movements of the object (via Paos). In case of failure, the Paos object is
> > updated with some exception, and the workflow system is notified and deals
> > with it appropriately.
> 
> Yes sir!

But there has to be something around constantly to monitor these Paos
objects throughout their lifetime. This would be the workflow system
(wfs). It would be responsible for directing objects, keeping track of
their status, and providing an interface for the user to check up on it.

> > Throughout this process, the workflow system is also updating the Paos
> > object with current status
> 
> The XML object can be changed, yes.

Now is the XML object in the Paos object, or are they the same thing?
Since Paos can deliver updates on only specific attributes, I wanted to
take advantage of that. Like I mentioned earlier, the Paos object could be
a representation of the XML format we create, or it could contain XML data
from analysis steps. In the latter case, other attributes of the Paos
object would contain status and control information. That way it could be
updated "atomically", regardless of the other XML data it contains.

> > If so, it makes sense to use the Paos object to store control, exception,
> > and status info. Data for anaylsis and analyzed data are stored in
> > separate attributes.
> 
> Yes.  These are complications that may require us to write our own XML.

Again, would it make sense for this to be in the Paos object, the XML it
contains, or is their a difference?

> > The gatekeeper takes the data from the appropriate
> > attribute (as told by relevant control information), modifies it as
> > necessary for the analysis tool, and runs that tool.
> 
> Now we are back to analyzing the XML data (Paos object), back up to where I
> typed @@@.  These are not two types of analyses.  The gatekeeper will work with
> the workflow system, etc.

Maybe. I was thinking that the Paos object contained the XML data in an
attribute, which was extracted and presented to the gatekeeper depending
on what it was supposed to do with it.
But if the whole Paos object is an XML representation, then the gatekeeper
takes what it needs.

> > In this model, the workflow system is a Paos server/client combo. It
> > would get the original object from the user, hand that to an analysis
> > server, but keep a local copy updated, which the user (status monitor)
> > would access for updates.
> 
> I'm not sure about keeping a local copy of the data.  You say that the data
> would updated, which would require the whole XML object to be transferred many
> times.  I was thinking only once at the end, but the analysis locus could just
> keep reporting what is being done...like writing a log file.

Ok. I think you had envisioned just the gatekeeper just dealing with the
whole XML file, which contained control and status info, and was stored
in the Paos object. I want to take advantage of the object nature of Paos,
and use multiple attributes on the object. One for control, one for
status, one for data storage (the XML returned by the analyses).
That way status could be updated individually of the rest of the XML data.
Of course, it would be even better if the Paos object was simply a
representation of the XML data. Then analyses could be updated atomically,
too. Also, this way, the user client wouldn't have to parse XML. It would
be provided with an object-oriented view of it right away. Sort of like
DOM, which we could even provide an interface, too.

> > ...and then repeat the whole process (ie.
> > give the object to the next analysis server, ...)
> 
> Yes, when GCL is used to automate some analyses.

GCL is used to build the control data. The workflow system does the work,
according to the control information.

> > All the user client stuff access the workflow system directly, which deals
> > with the individual analysis servers. This runs as a separate process, so
> > you might have a server running this. The client starts up his Loci
> > GCL program on a networked computer anywhere, builds the analysis batch,
> > starts it, gets an ID number, and can close the program and walk away.
> 
> I never thought of that, but it's a great idea!

I had considered the possibility of our objects (which, for clarification,
refers to a batch of controls, data to be analyzed, data already analyzed,
and various status information) roaming independently of a "central"
server. They could be passed from gatekeeper to gatekeeper directly.
However, that would make it impossible to monitor them, unless the object
"called home" every now and then. But I don't like that. It makes more
sense for the user to query the object when it wants information.
For that to be possible, there has to be some constant, central server
which is watching the object. This would be the workflow system.
It's in charge of directing the object, and it constantly keeps tabs on
it's status. This is why I want atomic updates on status info.

The workflow system (wfs) is really a Paos server, but it only talks to
user clients. However, it pretends to be a Paos client to communicate with
the Paos server associated with an analysis tool. When sending an object
to be analyzed, the wfs commits the object to the remote (analysis)
server. It also requests notification on all updates to it's status
attributes. The copy of the object local to the wfs is updated with the
remote status info. When analysis is complete, the wfs syncs it's copy
with the remote copy, and then removes the remote copy.

Now, at any time, a user client can access the wfs, and get the status
information from the copy on the wfs. The user will always know where the
wfs is, since it's running locally (either just for that one user, or
maybe a department or university-wide instance).
When an object completes, it gets moved to an archive section of the wfs.
The user client accesses this object via a unique ID.

Since the wfs is networked, the object can be accessed from any Loci user
client. The user just has to know the wfs location and the object ID.

> Hmmm.  Turning the client off and getting the data from another client, means
> the server needs to know the original client is off and that the information
> should be held until the ID is provided.  I think it'll work.  The server may
> keep a copy on file for a time specified by the user.  That way, the server
> doesn't have to probe for the client loci that sent the data.

See above; the wfs doesn't care if the original client is still around. It
just holds onto the object until someone comes along to retrieve it.
The wfs shouldn't seek out the user client. The user client comes to it.
Also, the object ID is provided when the object is first started.
Click "Start Analysis", and the wfs responds with "OK, here's your
ID." The user client should have an option to keep track of those for you,
but the user should also be able to access the object from any other Loci
user client, just using that info.

> Well, again, if we make up our own system it will be less complicated...but
> we'll have more work.

That's probably the next step. Although, I'm curious about your thoughts
on how to use the Paos object. I'm becoming rather fond of the object
representation of the XML format.

Although, I should make sure Paos is capable of this.
Carlos, can it handle complex Python container classes?
And can we update elements in it atomically?

If not, just using separate attributes on the Paos object for the various
components would work (the status, control, query, and data attributes).

Justin Bradford
justin@ukans.edu


From hinsen at cnrs-orleans.fr  Wed Jan 27 11:04:58 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:09 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36AE56B4.3D1429F0@bc.edu> (bizzaro@bc.edu)
References: <Pine.LNX.3.96.990123105527.7480A-100000@cx408397-a.irvn1.occa.home.com> <36AA5475.7BAE2552@bc.edu> <199901251011.LAA21088@dirac.cnrs-orleans.fr> <36AE56B4.3D1429F0@bc.edu>
Message-ID: <199901271604.RAA18094@dirac.cnrs-orleans.fr>

   Are you kidding?  Half of loci will be for structural analyses!  People think
   bioinformatics is just about sequence analyses, and I believe wrongly so. 

Fine, then I can perhaps contribute more than just Python expertise...

   I recall some BioML examples with structural data.  Unless your talking about

Such as? I didn't find any (but after the second example my network connection
broke down...), and neither did I find any way to specify structure
in the BioML documentation.

Konrad
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hinsen at cnrs-orleans.fr  Wed Jan 27 12:42:59 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:09 2006
Subject: [Pipet Devel] BioML vs BSML
In-Reply-To: <36AE660B.4D9707B7@bc.edu> (bizzaro@bc.edu)
References: <Pine.OSF.4.03.9901252356030.6613-100000@busboy.sped.ukans.edu> <36AE660B.4D9707B7@bc.edu>
Message-ID: <199901271742.SAA17578@dirac.cnrs-orleans.fr>

   Konrad, you thought we might want to do this back when we had only three people
   involved.  Maybe we can call it "LocusML" or "Bio-Object ML" (BOML) or
   "Bio-Macromolecule ML" (BMML).

Fine with me, and I'd certainly use it for other applications as well.

On the other hand, it is possible to design DTDs by extending existing
ones. Perhaps this is a good idea to save effort and keep compatibility
to some extent.

Konrad

-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Mon Jan  4 12:26:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:11 2006
Subject: [Pipet Devel] [Fwd: Development]
Message-ID: <3690F9B4.70E55F46@bc.edu>

>From Thomas...

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas Sicheritz <thomas@evolution.bmc.uu.se>
Subject: Development
Date: Mon, 4 Jan 1999 09:49:37 +0100 (MET)
Size: 2305
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990104/9ebecdf5/attachment.mht
From bizzaro at bc.edu  Mon Jan  4 12:26:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:11 2006
Subject: [Pipet Devel] [Fwd: Development]
Message-ID: <3690F9B4.70E55F46@bc.edu>

>From Thomas...

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas Sicheritz <thomas@evolution.bmc.uu.se>
Subject: Development
Date: Mon, 4 Jan 1999 09:49:37 +0100 (MET)
Size: 2305
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990104/9ebecdf5/attachment-0001.mht
From bizzaro at bc.edu  Mon Jan  4 14:59:13 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:11 2006
Subject: [Pipet Devel] to do list
Message-ID: <36911D91.37748194@bc.edu>

Justin,

Here is a copy of the to do list I recently sent to the others:


The next step is to build a list, not of analysis tools, but of dynamic
client-side interfaces.  EMBOSS will take care of the big analysis tools for
us.  For now, here is a brief list of some dynamic "loci" I've been thinking
of.  Please feel free to add to this:

  (1)  Benchtop/workspace.  GUI representation of all data objects
       (files, documents, graphs) and possibly various loci.  Also
       may be used for automation of analyses (recall GCL?).

  (2)  File translation interface:  to read in various DNA/protein
       document formats and convert them to XML.  Also may be used
       to query databases and sort/compile documents.

  (3)  Sequence visualization/editing tool:  to manipulate DNA/protein
       sequences

  (4)  Sequence comparison tool:  to show multiple sequences aligned
       or translated.  May also perform some functions of (6)

  (5)  3D visualization tool:  to display molecules as 3D structures, with
       emphasis on a schematic/cartoon representation.

  (6)  Graphing tool:  to display plots against sequences and to make simple
       graphs.  Some may argue this isn't needed, but I need it for my
       programs, so others may too ;-)

  (7)  HTML browser implementation:  separate from the other tools, this
       would be a way for anyone with a browser to access analysis loci.

The best approach may be for each of us to pick a single tool to concentrate
on.  And remember, most of these tools will be XML browsers of a sort.

We should also make a list of "trivial" analysis/conversion tools for the
client-side.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Jan  4 15:45:45 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: Development]
References: <3690F9B4.70E55F46@bc.edu>
Message-ID: <36912879.DBCE52D@bc.edu>

> Hmm - right now I feel I have time to get familiar with python ... so maybe I
> am going to try to build a sequence editor ... any suggestions ?

Great!

The sequence vis/edit locus should be closely tied to the sequence comparison
locus: I think users may open up various sequences (using the file/document
translation locus) into the sequence comparison locus and then double-click on a
sequence to see it in the vis/edit tool.

Users may also load a sequence directly into the vis/edit locus, bypassing the
comparison locus.

Once in the vis/edit locus, sequences should be treated like an image in the
GIMP or Photoshop:  Users should be able to click-drag-select segments of a
sequence just like an area in an image.  I imagine the background of the
selected segment would transition from white to black...or maybe a dashed-line
box will surround the segment...I'd prefer the former.

Once selected, the segments can be cut (^X) copied (^C) pasted (^V) or deleted
(^D).  The user should also be able to zoom in and out on the sequence...zooming
in to the resolution of one residue.  The mouse pointer can point out where
selections or insertions occur.  I'd also like to see a box on the side that
shows the start and stop positions of selections, in numerical values.  The menu
bar should contain a file menu with open, close and exit...and an edit menu with
copy, cut, paste and delete...maybe even undo...these are obvious standards.

I can't recall exactly how your Tcl/Tk editor works.  I may have described much
of it already.  I think this is a fun tool to be working on.

Also. take a look at the graphics on this Web site:

  http://www.latrobe.edu.au/www/genetics/compmap.96.01.html

It is a chromosome map comparison tool (which may be a part of what you're going
to do...or another tool?), but I like the graphics.  With the gnome-canvas
widget (see below) we will be able to make anti-aliased shapes like this.


> 
> I don't know anything about pythons way to handle classes - is there any
> reason for me to code the sequence classes in C++ ? - or would it be enough
> to let  python handle the basic sequence object and code the heavy number
> crunching part in C++/C ?

>From my experience, Python may be better at handling this sort of thing than
even C++, but Konrad is the best person to answer this question right now.  By
the way, C should be used for number crunching rather than C++.  We discussed
the "problems" with C++ before you came aboard, and we feel that C is more
portable and more directly linked with Python and GTK.  So the Tulip core
distribution should be all Python and ANSI-C.  Third party add-ons can be
whatever...we just want the core to be consistent.


> 
> I think I allready know from my Tcl/Tk sequence editor what
> solutions/ways I definitely should avoid :-)
> - anybody else with tips/hints/critics ?
> If not, I am going to bugger my printer with ... some ... pages of python
> and GTK manuals/references.
> 

I'm sure you got my message about PyG Tools, but again, you may want to start at
my page:

  http://www.uml.edu/Dept/Chem/BICGroup/PyGTools/

The first widget binding I think you'll want to get familiar with is the
gnome-canvas (part of PyGNOME).  It is supposed to be similar to the Tk canvas.


-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Jan  4 15:45:45 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: Development]
References: <3690F9B4.70E55F46@bc.edu>
Message-ID: <36912879.DBCE52D@bc.edu>

> Hmm - right now I feel I have time to get familiar with python ... so maybe I
> am going to try to build a sequence editor ... any suggestions ?

Great!

The sequence vis/edit locus should be closely tied to the sequence comparison
locus: I think users may open up various sequences (using the file/document
translation locus) into the sequence comparison locus and then double-click on a
sequence to see it in the vis/edit tool.

Users may also load a sequence directly into the vis/edit locus, bypassing the
comparison locus.

Once in the vis/edit locus, sequences should be treated like an image in the
GIMP or Photoshop:  Users should be able to click-drag-select segments of a
sequence just like an area in an image.  I imagine the background of the
selected segment would transition from white to black...or maybe a dashed-line
box will surround the segment...I'd prefer the former.

Once selected, the segments can be cut (^X) copied (^C) pasted (^V) or deleted
(^D).  The user should also be able to zoom in and out on the sequence...zooming
in to the resolution of one residue.  The mouse pointer can point out where
selections or insertions occur.  I'd also like to see a box on the side that
shows the start and stop positions of selections, in numerical values.  The menu
bar should contain a file menu with open, close and exit...and an edit menu with
copy, cut, paste and delete...maybe even undo...these are obvious standards.

I can't recall exactly how your Tcl/Tk editor works.  I may have described much
of it already.  I think this is a fun tool to be working on.

Also. take a look at the graphics on this Web site:

  http://www.latrobe.edu.au/www/genetics/compmap.96.01.html

It is a chromosome map comparison tool (which may be a part of what you're going
to do...or another tool?), but I like the graphics.  With the gnome-canvas
widget (see below) we will be able to make anti-aliased shapes like this.


> 
> I don't know anything about pythons way to handle classes - is there any
> reason for me to code the sequence classes in C++ ? - or would it be enough
> to let  python handle the basic sequence object and code the heavy number
> crunching part in C++/C ?

>From my experience, Python may be better at handling this sort of thing than
even C++, but Konrad is the best person to answer this question right now.  By
the way, C should be used for number crunching rather than C++.  We discussed
the "problems" with C++ before you came aboard, and we feel that C is more
portable and more directly linked with Python and GTK.  So the Tulip core
distribution should be all Python and ANSI-C.  Third party add-ons can be
whatever...we just want the core to be consistent.


> 
> I think I allready know from my Tcl/Tk sequence editor what
> solutions/ways I definitely should avoid :-)
> - anybody else with tips/hints/critics ?
> If not, I am going to bugger my printer with ... some ... pages of python
> and GTK manuals/references.
> 

I'm sure you got my message about PyG Tools, but again, you may want to start at
my page:

  http://www.uml.edu/Dept/Chem/BICGroup/PyGTools/

The first widget binding I think you'll want to get familiar with is the
gnome-canvas (part of PyGNOME).  It is supposed to be similar to the Tk canvas.


-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Jan  4 16:12:09 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: Development]
References: <3690F9B4.70E55F46@bc.edu> <36912879.DBCE52D@bc.edu>
Message-ID: <36912EA9.437D88C4@bc.edu>

> Once in the vis/edit locus, sequences should be treated like an image in the
> GIMP or Photoshop:  Users should be able to click-drag-select segments of a
> sequence just like an area in an image.  I imagine the background of the
> selected segment would transition from white to black...or maybe a dashed-line
> box will surround the segment...I'd prefer the former.
> 

Hmmm.  Now that I think about it, it should also be like a word processor.  The
user can position the mouse pointer between two residues and click on the left
mouse button to move the "cursor" to that spot.  He/she can even start typing
out letters from the keyboard, inserting residues as they type.  Hold down the
shift key and press left or right arrow keys and the residues are selected one
at a time.  The backspace key will delete residues on the left...you get the
idea :-)

Also, when a sequence is edited, we should keep a note in the XML what was added
or removed and where.  So, maybe a red vertical line will appear in a sequence
where something was...spliced.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Jan  4 16:12:09 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: Development]
References: <3690F9B4.70E55F46@bc.edu> <36912879.DBCE52D@bc.edu>
Message-ID: <36912EA9.437D88C4@bc.edu>

> Once in the vis/edit locus, sequences should be treated like an image in the
> GIMP or Photoshop:  Users should be able to click-drag-select segments of a
> sequence just like an area in an image.  I imagine the background of the
> selected segment would transition from white to black...or maybe a dashed-line
> box will surround the segment...I'd prefer the former.
> 

Hmmm.  Now that I think about it, it should also be like a word processor.  The
user can position the mouse pointer between two residues and click on the left
mouse button to move the "cursor" to that spot.  He/she can even start typing
out letters from the keyboard, inserting residues as they type.  Hold down the
shift key and press left or right arrow keys and the residues are selected one
at a time.  The backspace key will delete residues on the left...you get the
idea :-)

Also, when a sequence is edited, we should keep a note in the XML what was added
or removed and where.  So, maybe a red vertical line will appear in a sequence
where something was...spliced.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From Thomas.Sicheritz at molbio.uu.se  Tue Jan  5 04:22:07 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: Development]
In-Reply-To: <36912879.DBCE52D@bc.edu>
References: <3690F9B4.70E55F46@bc.edu>
	<36912879.DBCE52D@bc.edu>
Message-ID: <13969.55581.685813.418675@beagle.bmc.uu.se>

 > I can't recall exactly how your Tcl/Tk editor works.  I may have
 > described much of it already.  I think this is a fun tool to be working
 > on.

You can look at a VERY stripped tclet ( Tcl/Tk Applet) version at
http://evolution.bmc.uu.se/~thomas/loci/xbblet.tcl

-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From bizzaro at bc.edu  Tue Jan  5 15:20:14 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: Development]
References: <3690F9B4.70E55F46@bc.edu>
		<36912879.DBCE52D@bc.edu> <13969.55581.685813.418675@beagle.bmc.uu.se>
Message-ID: <369273FE.388311F4@bc.edu>

Thomas.Sicheritz@molbio.uu.se wrote:
> 
>  > I can't recall exactly how your Tcl/Tk editor works.  I may have
>  > described much of it already.  I think this is a fun tool to be working
>  > on.
> 
> You can look at a VERY stripped tclet ( Tcl/Tk Applet) version at
> http://evolution.bmc.uu.se/~thomas/loci/xbblet.tcl
> 

Yes it is very much how I described :-)  But the editing locus I was thinking of
would make use of XML to display the context of the sequence...if available. 
That is, a sequence that comes from a GenBank document would _always_ retain
information on where the UTR, CDS, exon and intron, etc. regions are.  So I see
a sequence in the editor displayed as a "genetic map".  You know how molecular
biologists draw out horizontal bars of different colors representing different
regions, binding sites, etc.  I think the editor can show a color-coded map with
ACGT bases transposed over the map or maybe sitting below it.  Looking at
molecular bio journals might help to find an informative and attractive
solution.

I know we don't have the facility right now to go from GenBank to XML to
representation.  This facility might be a good starting point for Justin and
others who are well versed in XML/DOM...hint hint ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan  5 16:33:39 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] example GUIs
Message-ID: <36928533.BAA16209@bc.edu>

DNAstar and MacVector are two "competing" application suites for sequence
analysis.   They have some very nicely made GUIs.  Take a look:

    http://www.dnastar.com/products/products.html
    http://www.oxmol.com/prods/macvector/work/

I wonder if DNA sequence analysis tools should be different programs from
protein (or polypeptide) sequence analysis tools, or maybe a single program such
as the sequence editor can switch between the two?  Of course they present some
very different problems...but then again...?  What do you guys think?

We should also consider different types of genetic maps, according to the
system: chromosome vs. bacterial circular genome vs. plasmid vs. viral genome. 
Even proteins can be represented in several different ways:  primary/sequence
vs. secondary vs. tertiary vs. quarternary.  I'm just thinking about whether
we'll need one big tool to show these or many smaller tools.  I tend to favor
many small loci here.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan  5 16:33:39 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] example GUIs
Message-ID: <36928533.BAA16209@bc.edu>

DNAstar and MacVector are two "competing" application suites for sequence
analysis.   They have some very nicely made GUIs.  Take a look:

    http://www.dnastar.com/products/products.html
    http://www.oxmol.com/prods/macvector/work/

I wonder if DNA sequence analysis tools should be different programs from
protein (or polypeptide) sequence analysis tools, or maybe a single program such
as the sequence editor can switch between the two?  Of course they present some
very different problems...but then again...?  What do you guys think?

We should also consider different types of genetic maps, according to the
system: chromosome vs. bacterial circular genome vs. plasmid vs. viral genome. 
Even proteins can be represented in several different ways:  primary/sequence
vs. secondary vs. tertiary vs. quarternary.  I'm just thinking about whether
we'll need one big tool to show these or many smaller tools.  I tend to favor
many small loci here.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan  5 16:43:06 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Fwd: [Pipet Devel] [Fwd: Development]]
Message-ID: <3692876A.FB4AD56B@bc.edu>

...
-------------- next part --------------
An embedded message was scrubbed...
From: "J.W. Bizzaro" <bizzaro@bc.edu>
Subject: Re: [Pipet Devel] [Fwd: Development]
Date: Tue, 05 Jan 1999 20:20:14 +0000
Size: 1995
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990105/8d927766/attachment.mht
From bizzaro at bc.edu  Tue Jan  5 16:43:06 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Fwd: [Pipet Devel] [Fwd: Development]]
Message-ID: <3692876A.FB4AD56B@bc.edu>

...
-------------- next part --------------
An embedded message was scrubbed...
From: "J.W. Bizzaro" <bizzaro@bc.edu>
Subject: Re: [Pipet Devel] [Fwd: Development]
Date: Tue, 05 Jan 1999 20:20:14 +0000
Size: 1995
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990105/8d927766/attachment-0001.mht
From bizzaro at bc.edu  Wed Jan  6 12:59:28 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: BioWidgets]
Message-ID: <3693A480.91B49F47@bc.edu>

Attached is from David...

I have these sites bookmarked, and they are good examples.  I recall David
Searls's BioTk from a Gene-COMBIS article...very nice.  Thomas, you're probably
familiar with it.  BTW, what happened to the "BioWidgets Consortium"?  It hasn't
been updated in 1.5 years!  Many of the links are broken.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
An embedded message was scrubbed...
From: david.lapointe@umassmed.edu
Subject: BioWidgets
Date: Wed, 6 Jan 1999 11:18:51 -0500
Size: 1552
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990106/af6d37c2/attachment.mht
From bizzaro at bc.edu  Wed Jan  6 12:59:28 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: BioWidgets]
Message-ID: <3693A480.91B49F47@bc.edu>

Attached is from David...

I have these sites bookmarked, and they are good examples.  I recall David
Searls's BioTk from a Gene-COMBIS article...very nice.  Thomas, you're probably
familiar with it.  BTW, what happened to the "BioWidgets Consortium"?  It hasn't
been updated in 1.5 years!  Many of the links are broken.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
An embedded message was scrubbed...
From: david.lapointe@umassmed.edu
Subject: BioWidgets
Date: Wed, 6 Jan 1999 11:18:51 -0500
Size: 1552
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990106/af6d37c2/attachment-0001.mht
From bizzaro at bc.edu  Fri Jan  8 02:54:53 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] more on gnome-canvas
Message-ID: <3695B9CD.FA8B3670@bc.edu>

FYI, from the GNOME home page:

New GNOME Canvas Information

The Canvas is a very exciting feature of GNOME. It allows very high level
manipulation of objects. The programmer need not worry about handling the
redrawing of the canvas during expose events, the Canvas does all this for you.
The end result is a wonderful API that allows extremely rapid application
development. 

The latest version of the Canvas in gnome-libs has the new antialiased rendering
engine incorperated in it. See the GNOME Canvas Development page for screenshots
and more information. (Warning: graphics intensive)

    http://www.gnome.org/devel/canvas/


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Jan  8 02:54:53 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] more on gnome-canvas
Message-ID: <3695B9CD.FA8B3670@bc.edu>

FYI, from the GNOME home page:

New GNOME Canvas Information

The Canvas is a very exciting feature of GNOME. It allows very high level
manipulation of objects. The programmer need not worry about handling the
redrawing of the canvas during expose events, the Canvas does all this for you.
The end result is a wonderful API that allows extremely rapid application
development. 

The latest version of the Canvas in gnome-libs has the new antialiased rendering
engine incorperated in it. See the GNOME Canvas Development page for screenshots
and more information. (Warning: graphics intensive)

    http://www.gnome.org/devel/canvas/


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From Thomas.Sicheritz at molbio.uu.se  Mon Jan 11 07:24:40 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] [Fwd: BioWidgets]
In-Reply-To: <3693A480.91B49F47@bc.edu>
References: <3693A480.91B49F47@bc.edu>
Message-ID: <13977.58855.91987.738400@beagle.bmc.uu.se>

J.W. Bizzaro writes:

 > I have these sites bookmarked, and they are good examples.  I recall
 > David Searls's BioTk from a Gene-COMBIS article...very nice.  Thomas,
 > you're probably familiar with it. 

BioTk was a nice start, but unfortunately they stopped the development
before it became really useful - thats one of the reasons I hacked BioWish.

 > BTW, what happened to the "BioWidgets
 > Consortium"?  It hasn't been updated in 1.5 years!  Many of the links
 > are broken.

I think the perltk based project moved to java, and some other biowidgets
projects just stopped.

Survivers:
http://www.cbil.upenn.edu/bioWidgets/
http://www.ii.uib.no/~oleart/biowidgets.html

-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From Thomas.Sicheritz at molbio.uu.se  Mon Jan 11 08:19:00 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:12 2006
Subject: [Pipet Devel] python data structure
In-Reply-To: <3695B9CD.FA8B3670@bc.edu>
References: <3695B9CD.FA8B3670@bc.edu>
Message-ID: <13977.60819.945966.806093@beagle.bmc.uu.se>

Hej all python gurus ...

What python data type would you recommend for the class representation
of a nucleotide sequence ? 
- string, list or array (module) ?
I am not (yet) familiar with the performance questions of python types, but 
I got the impression that lists are very slow - and I have no idea how the
array module is implemented. (btw I used strings in Tcl)

J.W. Bizzaro writes:
> I wonder if DNA sequence analysis tools should be different programs from
> protein (or polypeptide) sequence analysis tools, or maybe a single
> program such as the sequence editor can switch between the two?  Of
> course they present some very different problems...but then again...?
> What do you guys think?

In case of an editor/viewer - I vote for different programs/implementations.
- Sequence analysis tools are just connected modules - e.g. the blast
module/parser/filter is only slightly different for DNA or protein
sequences.

my 2 cents ...
-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From bizzaro at bc.edu  Tue Jan 12 03:01:36 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Fwd: [Pipet Devel] [Fwd: BioWidgets]]
Message-ID: <369B0160.9EC51307@bc.edu>

>From Thomas...
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: Re: [Pipet Devel] [Fwd: BioWidgets]
Date: Mon, 11 Jan 1999 13:24:40 +0100 (MET)
Size: 2438
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990112/6f23d485/attachment.mht
From bizzaro at bc.edu  Tue Jan 12 03:01:36 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Fwd: [Pipet Devel] [Fwd: BioWidgets]]
Message-ID: <369B0160.9EC51307@bc.edu>

>From Thomas...
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: Re: [Pipet Devel] [Fwd: BioWidgets]
Date: Mon, 11 Jan 1999 13:24:40 +0100 (MET)
Size: 2438
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990112/6f23d485/attachment-0001.mht
From bizzaro at bc.edu  Tue Jan 12 03:02:49 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
Message-ID: <369B01A9.AD8F6B36@bc.edu>

>From Thomas....

[Konrad, can you take a shot at this question?]
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: [Pipet Devel] python data structure
Date: Mon, 11 Jan 1999 14:19:00 +0100 (MET)
Size: 2708
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990112/33cf084d/attachment.mht
From bizzaro at bc.edu  Tue Jan 12 03:02:49 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
Message-ID: <369B01A9.AD8F6B36@bc.edu>

>From Thomas....

[Konrad, can you take a shot at this question?]
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: [Pipet Devel] python data structure
Date: Mon, 11 Jan 1999 14:19:00 +0100 (MET)
Size: 2708
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990112/33cf084d/attachment-0001.mht
From hinsen at cnrs-orleans.fr  Tue Jan 12 04:04:35 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
In-Reply-To: <369B01A9.AD8F6B36@bc.edu> (bizzaro@bc.edu)
References: <369B01A9.AD8F6B36@bc.edu>
Message-ID: <199901120904.KAA14700@dirac.cnrs-orleans.fr>

> >From Thomas....
> 
> [Konrad, can you take a shot at this question?]

I'll try...

> What python data type would you recommend for the class representation
> of a nucleotide sequence ? 
> - string, list or array (module) ?
> I am not (yet) familiar with the performance questions of python types, but 
> I got the impression that lists are very slow - and I have no idea how the
> array module is implemented. (btw I used strings in Tcl)

The main question is what operations you want to perform on nucleotide
sequences. Here are some considerations:

- Strings are compact and benefit from a large range of string operations
  (in module "string"). However, elements can only be characters,
  and strings are immutable, i.e. cannot be changed once created.
  So any modification requires constructing a new string. But being
  immutable can be an advantage as well, e.g. you can use strings as
  keys in dictionaries.

- Lists can store any data type, and can be modified in a very general
  way (including insertion of lists etc.), but there are fewer
  operations available on them.

- Tuples are just immutable lists.

- Arrays don't seem to be very useful for non-numerical data, with two
  exceptions: they can most easily be accessed from C modules, and
  they facilitate certain structural operations.

In terms of performance, there is not so much difference for basic
operations (creation, indexing, etc.). The main concern should be to
as many built-in operations as possible for typical manipulations;
any piece of Python code is much slower than a simple call to a
built-in function implemented in C! So the first thing to do is to
find out which operations are to be performed on nucleotide sequences,
and which of them occur most frequently.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Tue Jan 12 07:02:02 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
References: <369B01A9.AD8F6B36@bc.edu> <199901120904.KAA14700@dirac.cnrs-orleans.fr>
Message-ID: <369B39BA.C71F482D@bc.edu>

Konrad Hinsen wrote:

> - Strings are compact and benefit from a large range of string operations
>   (in module "string"). However, elements can only be characters,
>   and strings are immutable, i.e. cannot be changed once created.
>   So any modification requires constructing a new string. But being
>   immutable can be an advantage as well, e.g. you can use strings as
>   keys in dictionaries.

What are the limits on string sizes in Python (too lazy to look it up right
now)?  If it is 256, as with some languages, I imagine this presents a little
problem.  String immutabilty does also make sequence manipulation a bit awkward.


> - Arrays don't seem to be very useful for non-numerical data, with two
>   exceptions: they can most easily be accessed from C modules, and
>   they facilitate certain structural operations.

I have used arrays of characters in the past.  Using parallel arrays can be a
covenient way to index or "markup" sequences, i.e. the second array can be used
to indicated where features start and stop.

Another thought: Many analysis programs are limited by having to put everything
into RAM, all in one shot.  I tend to prefer keeping the sequence file open and
reading in chunks at a time.  BTW, some simple database features of Python allow
you to keep and work from a data structure stored as a file, correct?

On the same note, system resources are growing enough that they can handle large
sequences in RAM.  But on the other hand, the sequencing projects are turning
out larger sequence files.  The human genome will be one of the largest
sequences (how big? 100 Gb?), and I think the frog genome is several times
larger (go figure).  Imagine, seriously because this will be hot stuff in a few
years, that someone using Loci/Tulip will want to manipulate parts of the human
genome like they can now with BioWish and E. coli.

> 
> In terms of performance, there is not so much difference for basic
> operations (creation, indexing, etc.). The main concern should be to
> as many built-in operations as possible for typical manipulations;
> any piece of Python code is much slower than a simple call to a
> built-in function implemented in C! So the first thing to do is to
> find out which operations are to be performed on nucleotide sequences,
> and which of them occur most frequently.
> 

Right, and just because I keep harping Python, doesn't mean we can't turn to
compiled C when we really need it...and we may with sequences ranging in the
millions and billions (I sound like Carl Sagan).


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 12 09:58:26 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] analysis management
Message-ID: <369B6312.7E85D461@bc.edu>

I thought I'd share some ideas I had about how the Loci user might manage
multiple analyses.  By management I mean keeping track of what has to be done
and what was already done.

The first idea I thought of some time ago but haven't mentioned yet.  It is an
expansion upon the concept of a log file.  Normally log files are generated one
for each run of a program.  But I think we can change that a bit to suit the
need of any scientist.

You know a "good" scientist will write all experiment data in a physical
journal.  I think though that it is most incovenient--a real headache infact--to
take everything that comes off of the screen and write it down.  Even cutting
out printouts and gluing them is can be a big hassle when you consider how much
data a computer normally generates. (I infact convinced my advisor to let me a
keep a computer-based journal--in HTML).

Well, to get to the point, I think Loci should keep a running log of all
actions.  That is, record everything to a single file--in HTML--with links to
data files, images, whatever.  Even keeping track of times down to the
second...better than anyone can do with a notebook.  Imagine having a
Web-browsable catalogged journal of all Loci analyses!

The second idea I mentioned already and have on the Loci Web page.  It has to do
with using icons and arrows to represent documents and analyses being performed
on them.  I've seen this before, although it is not very common.  What I wanted
to point out to you guys was the data mining program called "Clementine".  Has
anyone used it?  Ken Marx (Lowell professor that is entertaining the Loci
Project) told me the user interface works much like I am describing for Loci. 
So, here is the Web site for Clementine:

    http://www.isldsi.com/clementine.htm

And here is the screenshot.  If nothing else, just glance at it to see what I am
talking about.

    http://www.isldsi.com/_borders/Image53.gif

In other news, Prof. Marx says he will purchase a new Linux box to dedicate to
the Loci Project and act as a Web server.  We can each have accounts, etc.  I
also convinced him that when the Project takes off we may need more servers to
host the first server-side analysis loci.  Linux boxes are pretty cheap, so I'm
sure we'll get just about whatever we want ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 12 09:58:26 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] analysis management
Message-ID: <369B6312.7E85D461@bc.edu>

I thought I'd share some ideas I had about how the Loci user might manage
multiple analyses.  By management I mean keeping track of what has to be done
and what was already done.

The first idea I thought of some time ago but haven't mentioned yet.  It is an
expansion upon the concept of a log file.  Normally log files are generated one
for each run of a program.  But I think we can change that a bit to suit the
need of any scientist.

You know a "good" scientist will write all experiment data in a physical
journal.  I think though that it is most incovenient--a real headache infact--to
take everything that comes off of the screen and write it down.  Even cutting
out printouts and gluing them is can be a big hassle when you consider how much
data a computer normally generates. (I infact convinced my advisor to let me a
keep a computer-based journal--in HTML).

Well, to get to the point, I think Loci should keep a running log of all
actions.  That is, record everything to a single file--in HTML--with links to
data files, images, whatever.  Even keeping track of times down to the
second...better than anyone can do with a notebook.  Imagine having a
Web-browsable catalogged journal of all Loci analyses!

The second idea I mentioned already and have on the Loci Web page.  It has to do
with using icons and arrows to represent documents and analyses being performed
on them.  I've seen this before, although it is not very common.  What I wanted
to point out to you guys was the data mining program called "Clementine".  Has
anyone used it?  Ken Marx (Lowell professor that is entertaining the Loci
Project) told me the user interface works much like I am describing for Loci. 
So, here is the Web site for Clementine:

    http://www.isldsi.com/clementine.htm

And here is the screenshot.  If nothing else, just glance at it to see what I am
talking about.

    http://www.isldsi.com/_borders/Image53.gif

In other news, Prof. Marx says he will purchase a new Linux box to dedicate to
the Loci Project and act as a Web server.  We can each have accounts, etc.  I
also convinced him that when the Project takes off we may need more servers to
host the first server-side analysis loci.  Linux boxes are pretty cheap, so I'm
sure we'll get just about whatever we want ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hinsen at cnrs-orleans.fr  Tue Jan 12 10:07:19 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
In-Reply-To: <369B39BA.C71F482D@bc.edu> (bizzaro@bc.edu)
References: <369B01A9.AD8F6B36@bc.edu> <199901120904.KAA14700@dirac.cnrs-orleans.fr> <369B39BA.C71F482D@bc.edu>
Message-ID: <199901121507.QAA18454@dirac.cnrs-orleans.fr>

> What are the limits on string sizes in Python (too lazy to look it up right
> now)?  If it is 256, as with some languages, I imagine this presents a little
> problem.  String immutabilty does also make sequence manipulation a bit awkward.

The length of a string must fit into an int variable. So in practice
you shouldn't rely on having strings larger than 2**31 character if
you want your program to be portable. In other words, there is no
serious limitations.

> I have used arrays of characters in the past.  Using parallel arrays can be a
> covenient way to index or "markup" sequences, i.e. the second array can be used
> to indicated where features start and stop.

But you could also use two lists for that, or lists of lists,
depending on requirements. Of course there is nothing wrong with
character arrays, except that you give up many useful string
operations.

> Another thought: Many analysis programs are limited by having to put
> everything into RAM, all in one shot. I tend to prefer keeping the
> sequence file open and reading in chunks at a time. BTW, some simple
> database features of Python allow you to keep and work from a data
> structure stored as a file, correct?

I don't see what you refer to. Python's file handling works much like
C's stdio library; you can read arbitrary parts out of a file. There
are also database interfaces (dbm and variants), which make it easy to
store data in large files, but these are special-format files that are
hardly useable with general programs like editors.

Assuming a modern OS, you can also use memory mapping for large files,
but I am not sure that we can already afford to ignore OS without
memory mapping support.

> Right, and just because I keep harping Python, doesn't mean we can't turn to
> compiled C when we really need it...and we may with sequences ranging in the
> millions and billions (I sound like Carl Sagan).

Of course. But even in that case, all that has to be implemented in C
is one rather small module.

Example: suppose we use strings for nucleotide sequences now, and then
find out next year that we must be able to treat sequences that are
longer than available memory. Then we'll just write a small C module
that implements a special "nucleotide sequence" type. This can look
like a drop-in replacement for strings to Python, and all that will
have to be changed in the Python code is the place where nucleotide
sequence types are created. There are some advantages to a language
without static type checking!

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Tue Jan 12 12:30:27 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
References: <369B01A9.AD8F6B36@bc.edu> <199901120904.KAA14700@dirac.cnrs-orleans.fr> <369B39BA.C71F482D@bc.edu> <199901121507.QAA18454@dirac.cnrs-orleans.fr>
Message-ID: <369B86B3.AE784E74@bc.edu>

Konrad Hinsen wrote:
> I don't see what you refer to. Python's file handling works much like
> C's stdio library; you can read arbitrary parts out of a file. There
> are also database interfaces (dbm and variants), which make it easy to
> store data in large files, but these are special-format files that are
> hardly useable with general programs like editors.
> 

You'll have to pardon my ignorance.  I am too used to manipulating text files in
Pascal.  (Don't laugh.)


> Example: suppose we use strings for nucleotide sequences now, and then
> find out next year that we must be able to treat sequences that are
> longer than available memory. Then we'll just write a small C module
> that implements a special "nucleotide sequence" type. This can look
> like a drop-in replacement for strings to Python, and all that will
> have to be changed in the Python code is the place where nucleotide
> sequence types are created. There are some advantages to a language
> without static type checking!
> 

...and this "nucleotide sequence" type will work straight from a file rather
than memory.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 12 12:31:54 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
Message-ID: <369B870A.13CCFDFF@bc.edu>

Konrad Hinsen wrote:
> I don't see what you refer to. Python's file handling works much like
> C's stdio library; you can read arbitrary parts out of a file. There
> are also database interfaces (dbm and variants), which make it easy to
> store data in large files, but these are special-format files that are
> hardly useable with general programs like editors.
> 

You'll have to pardon my ignorance.  I am too used to manipulating text files in
Pascal.  (Don't laugh.)


> Example: suppose we use strings for nucleotide sequences now, and then
> find out next year that we must be able to treat sequences that are
> longer than available memory. Then we'll just write a small C module
> that implements a special "nucleotide sequence" type. This can look
> like a drop-in replacement for strings to Python, and all that will
> have to be changed in the Python code is the place where nucleotide
> sequence types are created. There are some advantages to a language
> without static type checking!
> 

...and this "nucleotide sequence" type will work straight from a file rather
than memory.


Jeff
--
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 12 12:31:54 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] python data structure
Message-ID: <369B870A.13CCFDFF@bc.edu>

Konrad Hinsen wrote:
> I don't see what you refer to. Python's file handling works much like
> C's stdio library; you can read arbitrary parts out of a file. There
> are also database interfaces (dbm and variants), which make it easy to
> store data in large files, but these are special-format files that are
> hardly useable with general programs like editors.
> 

You'll have to pardon my ignorance.  I am too used to manipulating text files in
Pascal.  (Don't laugh.)


> Example: suppose we use strings for nucleotide sequences now, and then
> find out next year that we must be able to treat sequences that are
> longer than available memory. Then we'll just write a small C module
> that implements a special "nucleotide sequence" type. This can look
> like a drop-in replacement for strings to Python, and all that will
> have to be changed in the Python code is the place where nucleotide
> sequence types are created. There are some advantages to a language
> without static type checking!
> 

...and this "nucleotide sequence" type will work straight from a file rather
than memory.


Jeff
--
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 12 12:52:32 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:13 2006
Subject: [Pipet Devel] Re: Casbah Project
References: <93307F07DE63D211B2F30000F808E9E525D643@edunivexch02.umassmed.edu>
Message-ID: <369B8BE0.ED4EBF13@bc.edu>

david.lapointe@umassmed.edu wrote:
> 
> Some of the recent email here reminded me of this project (NTLUG -North
> Texas Linux Users Group). Basically the notion of content management.
> 
> http://www.ntlug.org/casbah/index.shtml

Hmmmm.  Indeed.  This is something I think we could pick up a few ideas from.

Casbah does intend to be the framework for an application such as Loci, but I
think we should avoid the Java (we want Python to be the real backbone for
Loci), and some of the other components of Casbah may be a bit of bloat for us. 
But real interesting stuff...thanks.  Anyone else want to comment on it?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hinsen at cnrs-orleans.fr  Tue Jan 12 13:35:59 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] python data structure
In-Reply-To: <369B870A.13CCFDFF@bc.edu> (bizzaro@bc.edu)
References: <369B870A.13CCFDFF@bc.edu>
Message-ID: <199901121835.TAA18878@dirac.cnrs-orleans.fr>

> You'll have to pardon my ignorance. I am too used to manipulating
> text files in Pascal. (Don't laugh.)

I am just surprised: Standard Pascal didn't even have any facility
to work with text files. OK, nobody used Standard Pascal, but still...

> ...and this "nucleotide sequence" type will work straight from a file rather
> than memory.

Or even from a network connection. It doesn't matter!

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Sun Jan 17 09:49:50 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] [Fwd: Paos project]
Message-ID: <36A1F88E.F3B7F660@bc.edu>

Fellow Tulipians,

I contacted Carlos Maltzahn, author of the Paos Project, which is a "Python
Active Object Server".  I was considering using such a system rather than CGI. 
The main benefit of this would be tighter and more active communication between
client and server loci.  Paos is similar to Bobo in some respects, but is
smaller and not primarily for HTML.  This is the Paos Web site:

    http://www.cs.colorado.edu/~carlosm/software.html

I asked Carlos if he would be interested in joining the Loci Project, being
responsible for integrating Paos and establishing the communication framework. 
His reply is attached.

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
An embedded message was scrubbed...
From: Carlos Maltzahn <carlosm@moet.cs.colorado.edu>
Subject: Re: Paos project
Date: Sat, 16 Jan 1999 16:35:42 -0700 (MST)
Size: 3348
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990117/29412d37/attachment.mht
From bizzaro at bc.edu  Sun Jan 17 10:42:30 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Re: Paos project
References: <Pine.GSU.4.05.9901161548100.3417-100000@moet.cs.colorado.edu>
Message-ID: <36A204E6.69657E02@bc.edu>

Carlos Maltzahn wrote:
> The LOCI project looks very interesting. I'd love to spend some time on
> this. Unfortunately, I'm currently in a very busy phase of graduating so
> my contributions might be pretty small until May. But I would like to
> join.

That's no problem.  Several of the Loci developers are very busy with their real
lives right now.  I for one am trying to prepare for my second year Ph.D.
exams.  Instead of having a couple people try to do everything, we are inviting
many people to contribute, with the philosophy that many hands make light work.

> Paos has been dormant for years and I would like to revive it and make it
> more usable. Bobo is probably more sophisticated and efficient for
> retrieval but I don't know whether it supports a notification service as
> Paos does. The last time I looked at Bobo (which is also years ago), it
> was entirely web based, i.e. its front end is a web server. Paos provides
> a Client module that makes it very easy to write clients that have
> persistent connections to the Paos server, using a Paos specific protocol.

What I do not like about Bobo is that it is very much HTML-centric, as you
mentioned, and I think much of what Bobo has to offer, we don't need.  We will
be using XML and some special protocols, which we haven't nailed down yet.  I
think Paos will fit better with what we are trying to do.

By the way, the Paos Web page says it works with Python 1.4.  Has it been tested
with Python 1.5?  Will it need many modifications?

> I really like the idea of the Glyphic Command Language. For the
> Chautauqua system I wrote a graphical editor that lets you edit a
> bi-partite control graph (similar to Petri Nets). Because the editor is a
> Paos client, it also lets you observe its execution. So one idea would be
> to build a similar editor (Python/Tkinter) to construct GCL structures and
> then watch their execution.
> 

I have a few more links to systems similar to GCL:

  Clementine data miner
    http://www.isldsi.com/_borders/Image53.gif

  Lego Mindstorms RCX language
    http://www.legomindstorms.com/program/tips_tricks/tips_orgprog.html

  Lego Mindstorms Robolab
    http://www.lego.com/dacta/robolab/rcxprograms.htm

  Crickit Logoblocks
   
http://lcs.www.media.mit.edu/people/fredm/projects/cricket/logoblocks/index.html

I plan to have GCL be a part of the Loci workspace, which will consist of a
laboratory bechtop (this is where we will have objects represented as glyphs)
and a laboratory notebook (this will be a simple HTML browser for viewing
persistent analysis logs).  Oh, and we are using Python/GTK rather than
Tkinter.  Here is a page I have describing "PyG Tools":

    http://www.uml.edu/Dept/Chem/BICGroup/PyGTools/

> Chautauqua was explicitly designed to support exception handling. One
> could imagine using similar mechanisms to support exception handling in
> GCL executions so that expensive intermediate results don't get lost if
> part of the execution fails.

That sounds nice.  I think you will appreciate how well the object distribution
model will work for biological analyses.  And I think you will enjoy working
with a type of data that very few computer scientists have worked
with...Bioinformatics deals with some very unique and I think exciting problems!

> 
> Hmm, I wish I wouldn't have to write a thesis right now ...

I wish I wouldn't have to take the second year exams right now ;-)

> 
> If you are still interested in using Paos, my first contribution to Loci
> could be to write some documentation for it. Let me know.

Sure.  I think the other developers may need to get the gist of Paos first.  We
are also developing a list of loci tools that need to be developed.  I'll put
you on the mailing list (nothing automated yet--I just redistribute what I get),
and we'll see things developing (still) over the spring.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Jan 17 11:20:20 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Petri Nets
Message-ID: <36A20DC4.7E0B0B0D@bc.edu>

Carlos referred to "Petri Nets" in his e-mail.  For those not familiar with
Petri Nets (I was not), I found a Web site that gives an overview of Coloured
Petri Nets (CPN), an extension:

    http://www.daimi.au.dk/CPnets/intro/

Petri Nets date back to the 60's.  They aren't directly applicable (I far as I
can tell) to our bioinformatics-specific communication model.  But they share
many similarities, particularly with the Glyphic Command Language we will use on
the benchtop to represent connected loci, documents, analyses, etc.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 19 12:28:52 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] [Fwd: What kind of people are you looking at ?]
Message-ID: <36A4C0D1.E51E8510@bc.edu>

>From Raynald...
-------------- next part --------------
An embedded message was scrubbed...
From: Raynald de =?iso-8859-1?Q?Lahond=E8s?= <lahondes@pasteur.fr>
Subject: What kind of people are you looking at ?
Date: Tue, 19 Jan 1999 17:56:56 +0000
Size: 2004
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990119/f50a8fed/attachment.mht
From bizzaro at bc.edu  Tue Jan 19 13:25:13 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Re: What kind of people are you looking at ?
Message-ID: <36A4CE03.C6529824@bc.edu>

Raynald de Lahond?s wrote:
> 
> I can program a little (I have coded a few line in Tcl).
> It seems that python looks quite interesting. I'd like to help
> developpement provided a little help.

Raynald, we can use people who are not programmers.  If you are new to
programming, you may want to learn Python and follow the development of Loci
with us.  What you could do to help, while you are learning Python, is (1)
test the programs and (2) write some instructions (documentation) in French
for users of Loci.  Is this something you would like to do?

Your help would be very valuable, and we would appreciate it!


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Jan 19 14:21:53 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Biosoft software
Message-ID: <36A4DB47.4A6A8A4@bc.edu>

In the message from Raynald, he mentioned "GeneJockey".  This program is from
Biosoft.  Below are some links to Biosoft programs that we might look to for ideas/inspiration:

GeneJockey II (some ideas for viewers and editors):
  http://www.biosoft.com/genejock.htm
Screenshot (but I think we can make something twice as good looking):
  http://www.biosoft.com/genescr.htm

Below are two programs for enzyme analyses, which we haven't really addressed,
but enzyme analysis is something done in most biochem labs.  It may be better
as an add-on to Loci???:

AssayZap:
  http://www.biosoft.com/assaywin.htm
Screenshot:
  http://www.biosoft.com/asywscr.htm

WinZyme:
http://www.biosoft.com/winzyme.htm


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From lahondes at pasteur.fr  Wed Jan 20 09:19:58 1999
From: lahondes at pasteur.fr (Raynald de =?iso-8859-1?Q?Lahond=E8s?=)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Re: What kind of people are you looking at ?
References: <36A4CE03.C6529824@bc.edu>
Message-ID: <36A5E60E.D236EA43@pasteur.fr>

"J.W. Bizzaro" wrote:
> Raynald, we can use people who are not programmers.  If you are new to
> programming, you may want to learn Python and follow the development of Loci
> with us.  What you could do to help, while you are learning Python, is (1)
> test the programs and (2) write some instructions (documentation) in French
> for users of Loci.  Is this something you would like to do?

Yes, I will be glad to do that.

> 
> Your help would be very valuable, and we would appreciate it!

I think this project is very interesting.

-- 
Raynald de Lahondes
Unite des Virus Oncogenes - Departement de Biotechnologie
Institut Pasteur - 25, rue du Docteur Roux
75724 Paris Cedex 15 - FRANCE
tel: 01.45.68.84.54 - fax: 01.40.61.30.33 - cellular: 06.15.65.85.08
email: lahondes@pasteur.fr

From rahul at photino.sid.rice.edu  Wed Jan 20 22:06:34 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Web interface
In-Reply-To: <36A4DB47.4A6A8A4@bc.edu>
Message-ID: <Pine.LNX.4.05.9901202018520.20226-100000@photino.sid.rice.edu>

I was thinking about the Web interface we are planning to have to TULIP.

We'll need to plan out our design of the other tools very carefully so
that we don't have to create messy kludges to get this web interface to
work.

Many of the GTK widgets are difficult to incorporate into pure HTML pages.
Using JavaScript may help, but there may still be some difficulties. It
may even turn out that we'll need to create a Java interface instead.

We really ought to limit the widgets we use and the way we modify them.
- Looking at the glade widget palette, the first 3 rows shouldn't be a
  problem. Lists are fine as long as they are kept simple.
- Trees... not too good.
- Columned lists shouldn't be too bad (either a table or a bunch of lists
  next to each other w/ Javascript to keep their scrolling in sync... Is
  that possible?)
- Columned tree... ugh.
- Rulers, shouldn't really need those.
- The H and V rules aren't a problem.
- The scales and scrollbars, on the other hand present a problem w/out
  JavaScript.
- Menu bars are fine, but we may need/want to use layers to implement
  them.
- Status bar is easy, so is toolbar.
- Progress bars may need some JavaScript.
- Arrows are trivial.
- Image and pixmap should be simple (as long as my assumptions of what
  they do are right.)
- Drawing area, probably kind of tough if my assumption of what it does is
  right.
- Font selection, not needed.
- Most of the Containers can be implemented with tables and/or frames.
- For scrolled window, viewport, and handle box, I think we'll need
  layers.

Maybe we can have wrapper classes around the GTK widgets and then have
them also able to create HTML code.

What are everyone's thoughts on this? We really need to make sure our UI
possibilities are strictly specified and translatable to HTML/JavaScript.
It may be a pain to do that, I think we want to keep away from requiring
Java, which will be a real pain.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
      Version 10.423.999.211011001.23.20110101.042
      (c)1996-1998, All rights reserved.
      Disclaimer available upon request.

From bizzaro at bc.edu  Thu Jan 21 12:08:10 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] [Fwd: loci]
Message-ID: <36A75EFA.4CCF3D68@bc.edu>

I just got this e-mail...
-------------- next part --------------
An embedded message was scrubbed...
From: Harry Mangalam <mangalam@uci.edu>
Subject: loci
Date: Thu, 21 Jan 1999 08:46:05 -0800
Size: 3512
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990121/5eaed44e/attachment.mht
From bizzaro at bc.edu  Thu Jan 21 13:11:49 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Web interface
Message-ID: <36A76DE0.5A346E1E@bc.edu>

>From Rahul...
-------------- next part --------------
An embedded message was scrubbed...
From: Rahul Jain <rahul@photino.sid.rice.edu>
Subject: [Pipet Devel] Web interface
Date: Wed, 20 Jan 1999 21:06:34 -0600 (CST)
Size: 3499
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990121/b1e7c97d/attachment.mht
From hinsen at cnrs-orleans.fr  Thu Jan 21 13:33:24 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Web interface
In-Reply-To: <36A76DE0.5A346E1E@bc.edu> (bizzaro@bc.edu)
References: <36A76DE0.5A346E1E@bc.edu>
Message-ID: <199901211833.TAA18268@dirac.cnrs-orleans.fr>

> What are everyone's thoughts on this? We really need to make sure our UI
> possibilities are strictly specified and translatable to HTML/JavaScript.

Do we? I am not even sure that a Web interface is realistic for
everything. The Web was designed for distributing information, not for
interactive manipulation. Yes, I know about Java applets etc., but I
have disabled Java for good reasons, and I am not the only one. Maybe
it will get better over time, but I won't bet on it.

I'd say the most important feature is a really good GTK interface,
without restrictions imposed by compatibility with rather simple
technology.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Thu Jan 21 13:45:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Re: Web interface
References: <Pine.LNX.4.05.9901202018520.20226-100000@photino.sid.rice.edu>
Message-ID: <36A775B1.5D5379F1@bc.edu>

Rahul Jain wrote:
> 
> I was thinking about the Web interface we are planning to have to TULIP.
> 
> We'll need to plan out our design of the other tools very carefully so
> that we don't have to create messy kludges to get this web interface to
> work.

My initial thought regarding the Web interface was not that we could or should
replicate the GUI loci in a Web browser.  My thought was that Loci can provide
an HTML interface to _some_ of the _analysis_ loci, considering they are
command-line, short-lived, output ASCI text, which will be formatted into XML anyway.

Konrad and I have communicated quite a bit about the limitations of HTML.  In
fact, when I first mentioned XML, Konrad thought I was speaking of putting
everything into a Web browser.  Neither he nor I like the static interface of
HTML browsers, so I will make a point that GUI loci will not look or act like
one but will be very dynamic.

In short, I think the Web interface can be a quick-and-dirty way for people
without the Loci package to access the wealth of analysis loci we may someday amass.

I appreciate that you looked into this.  But I think this plan would be an
enormous task, and probably best left to some heavy duty Java...which we don't
want to use for reasons I've expressed before.

Do you want to take on this Web interface project, as the simpler project I
envisioned?  I think the biggest part of this would be some sort of XML to
HTML conversion, which is actually what XSL is all about.  The problem with
XSL is that browsers don't support it right now, and I don't think there is a
specification for CML or BSML.  And I think another big thing (providing no
XSL is used) would be converting diagrams specified by XML into GIF format. 
You would have to write a program that would generate custom GIF's.  Check out
NCBI's "graphical view" of nucleotide data:

    http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/framik?gi=3647232&db=Nucleotide

Also, the Web interface implementation will work from one server that I will
have set up.  It will act as a limited version of Loci, and will contact
certain analysis loci around the Internet.

This is no small project either.  It will almost parallel Loci itself and use
many of the same components.  So, we must limit the types of views and
analysis tools that the Web interface can handle.


Jeff
bizzaro@bc.edu


> 
> Many of the GTK widgets are difficult to incorporate into pure HTML pages.
> Using JavaScript may help, but there may still be some difficulties. It
> may even turn out that we'll need to create a Java interface instead.
> 
> We really ought to limit the widgets we use and the way we modify them.
> - Looking at the glade widget palette, the first 3 rows shouldn't be a
>   problem. Lists are fine as long as they are kept simple.
> - Trees... not too good.
> - Columned lists shouldn't be too bad (either a table or a bunch of lists
>   next to each other w/ Javascript to keep their scrolling in sync... Is
>   that possible?)
> - Columned tree... ugh.
> - Rulers, shouldn't really need those.
> - The H and V rules aren't a problem.
> - The scales and scrollbars, on the other hand present a problem w/out
>   JavaScript.
> - Menu bars are fine, but we may need/want to use layers to implement
>   them.
> - Status bar is easy, so is toolbar.
> - Progress bars may need some JavaScript.
> - Arrows are trivial.
> - Image and pixmap should be simple (as long as my assumptions of what
>   they do are right.)
> - Drawing area, probably kind of tough if my assumption of what it does is
>   right.
> - Font selection, not needed.
> - Most of the Containers can be implemented with tables and/or frames.
> - For scrolled window, viewport, and handle box, I think we'll need
>   layers.
> 
> Maybe we can have wrapper classes around the GTK widgets and then have
> them also able to create HTML code.
> 
> What are everyone's thoughts on this? We really need to make sure our UI
> possibilities are strictly specified and translatable to HTML/JavaScript.
> It may be a pain to do that, I think we want to keep away from requiring
> Java, which will be a real pain.
> 
> --
> -> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
> -> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
> -> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
> -> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
> |--|--------|--------------|----|-------------|------|---------|-----|-|
>       Version 10.423.999.211011001.23.20110101.042
>       (c)1996-1998, All rights reserved.
>       Disclaimer available upon request.

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Jan 21 14:30:18 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Raynald joins
Message-ID: <36A78040.136F6770@bc.edu>

Fellow Tulipians,

Raynald has agreed to join the project in the area of testing and
documentation (French and English).  He would also like to learn Python and
may do some development later on.

Raynald, I added you to the mailing list.  Also, when you are writing
documentation in French, could you write it in English too, so that it won't
have to be rewritten or translated later.  Don't worry about how well the
English comes out, because I can edit that.

Raynald and Konrad, do you think we should have French and German translations
of the Loci Web pages as well?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Thu Jan 21 14:30:46 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Web interface
In-Reply-To: <36A76DE0.5A346E1E@bc.edu>
Message-ID: <Pine.GSU.4.05.9901211219120.22767-100000@moet.cs.colorado.edu>


I don't know GTK/Gnome very well but how portable is it? Is that an issue?
I personally don't mind using GTK/Gnome but a powerful web interface that
runs everywhere (using Mozilla) sounds also very interesting.

Also, don't underestimate the things you can do with JavaScript and newer
versions of HTML. I found
http://developer.netscape.com/viewsource/index_frame.html?content=archive/archivelist.html
useful to get an impression. 

See also
http://developer.netscape.com/viewsource/goodman_drag/goodman_drag.html
for (at least to me) surprising applications.

Carlos 

On Thu, 21 Jan 1999, J.W. Bizzaro wrote:

    >From Rahul...

From rahul at photino.sid.rice.edu  Thu Jan 21 14:41:01 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Re: Web interface
In-Reply-To: <36A775B1.5D5379F1@bc.edu>
Message-ID: <Pine.LNX.4.05.9901211314490.1600-100000@photino.sid.rice.edu>

On Thu, 21 Jan 1999, J.W. Bizzaro wrote:

> Rahul Jain wrote:
> > 
> > I was thinking about the Web interface we are planning to have to TULIP.
> > 
> > We'll need to plan out our design of the other tools very carefully so
> > that we don't have to create messy kludges to get this web interface to
> > work.
> 
> My initial thought regarding the Web interface was not that we could or should
> replicate the GUI loci in a Web browser.  My thought was that Loci can provide
> an HTML interface to _some_ of the _analysis_ loci, considering they are
> command-line, short-lived, output ASCI text, which will be formatted into XML anyway.
> 
> Konrad and I have communicated quite a bit about the limitations of HTML.  In
> fact, when I first mentioned XML, Konrad thought I was speaking of putting
> everything into a Web browser.  Neither he nor I like the static interface of
> HTML browsers, so I will make a point that GUI loci will not look or act like
> one but will be very dynamic.
> 
> In short, I think the Web interface can be a quick-and-dirty way for people
> without the Loci package to access the wealth of analysis loci we may someday amass.
> 

Since we are limiting the interface to many loci to GTK/GNOME, we are
limiting the people able to use Loci to those with Linux. GTK/GNOME may
compile on other Unices, but I don't think GNOME does, and I'm sure that
it'll probably take quite a bit of tweaking to get it to compile on any
other system. Our main concern should be getting it to work under Windows,
since those who use any other Unix won't have any trouble with Linux if
they need to run it. Windows users, on the other hand are often either
unable to understand Linux or not allowed by superiors to use Linux.
That's where we really need to target the Web interface.

> I appreciate that you looked into this.  But I think this plan would be an
> enormous task, and probably best left to some heavy duty Java...which we don't
> want to use for reasons I've expressed before.
> 
> Do you want to take on this Web interface project, as the simpler project I
> envisioned?  I think the biggest part of this would be some sort of XML to
> HTML conversion, which is actually what XSL is all about.  The problem with
> XSL is that browsers don't support it right now, and I don't think there is a
> specification for CML or BSML.  And I think another big thing (providing no
> XSL is used) would be converting diagrams specified by XML into GIF format. 
> You would have to write a program that would generate custom GIF's.  Check out
> NCBI's "graphical view" of nucleotide data:
> 
>     http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/framik?gi=3647232&db=Nucleotide

I think this is where Perl really can be useful. It's designed to process
and manipulate text, and there are modules for XML and HTML. Also, there's
GD for creating GIFs. OTOH, Perl and Python running at the same time would
probably wear out all but the best servers, so I think we may have to rely
on Python alone. Then again, mod_perl would make the system much more
responsive. If there's a GD module for Python, then it shouldn't be too
tough to do the whole thing in Python and have it integrate much more
cleanly with the other parts of Loci.

> Also, the Web interface implementation will work from one server that I will
> have set up.  It will act as a limited version of Loci, and will contact
> certain analysis loci around the Internet.

Oh, the way I envisioned it, the Web interface would be a package that
could be installed on any Loci server as a Loci client and it would use
CGI to handle the requests from other computers.

> This is no small project either.  It will almost parallel Loci itself and use
> many of the same components.  So, we must limit the types of views and
> analysis tools that the Web interface can handle.

I think I'll do this project, as I haven't taken any molbio/genetics
courses yet. Considering the situation of the people who would use the Web
interface, they probably have a JavaScript capable browser, so I can
implement most of the widgets. (Does IE have layers support?)

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
      Version 10.423.999.211011001.23.20110101.042
      (c)1996-1998, All rights reserved.
      Disclaimer available upon request.

From bizzaro at bc.edu  Thu Jan 21 14:53:27 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Web interface
References: <Pine.GSU.4.05.9901211219120.22767-100000@moet.cs.colorado.edu>
Message-ID: <36A785AC.26C24D14@bc.edu>

Carlos Maltzahn wrote:
> 
> I don't know GTK/Gnome very well but how portable is it? Is that an issue?

It was made for Linux/UNIX.  It has been ported to Windows, and it was
designed to be portable to most architectures.  But the best support right now
is on Linux/UNIX.  You may disagree, but I think Linux/UNIX is the best
platform for developing Loci, and I consider Windows and Mac ports to be
important but not our primary consideration.

I think many compromises are made using a truly portable GUI widget set, such
as Tkinter or Java.  And this is underscored by many of the complaints people
have about these being bloated and slow.  I want to make an excellent
Linux/UNIX package first and then consider the other platforms later.  We may
even switch widget sets for other platforms.  That's the nice thing about
Python...we can do it.  But we'll see.


> I personally don't mind using GTK/Gnome but a powerful web interface that
> runs everywhere (using Mozilla) sounds also very interesting.

Yes, I think so ;-)  I recognize the limitations of a static HTML browser, so
we can't have just that.  Besides, NCBI and others already use HTML and do it
pretty well.  An HTML interface will be important for non-Linux/UNIX users.  I
think it helps solve our portability problem for now.  Will HTML someday be
good enough for Loci?  Well, if and when Word and Excel are ported to HTML, we
will port Loci.  I want Loci to be that dynamic, which HTML just isn't right now.

> 
> Also, don't underestimate the things you can do with JavaScript and newer
> versions of HTML. I found
> http://developer.netscape.com/viewsource/index_frame.html?content=archive/archivelist.html
> useful to get an impression.
> 
> See also
> http://developer.netscape.com/viewsource/goodman_drag/goodman_drag.html
> for (at least to me) surprising applications.

These are examples of both JavaScript and DHTML (Dynamic HTML).  The first is
Netscape's proprietary scripting language, and the other is Microsoft's
proprietary extension of HTML.  I am aware of how well they both work, and yes
they can do some amazing things.  The HTML interface can make use of these,
but I would like some feedback on using these proprietary languages in a GNU
project.  I had rejected doing that before, which is one of the reasons why we
aren't using Java or Tcl/Tk.  Maybe we should stick with open source everywhere...?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Jan 21 15:30:04 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Re: Web interface
References: <Pine.LNX.4.05.9901211314490.1600-100000@photino.sid.rice.edu>
Message-ID: <36A78E3E.4806760A@bc.edu>

Rahul Jain wrote:
> Since we are limiting the interface to many loci to GTK/GNOME, we are
> limiting the people able to use Loci to those with Linux. GTK/GNOME may
> compile on other Unices, but I don't think GNOME does,

Both GTK and the entire GNOME desktop have been ported to just about every
platform that can run X-Windows.  So, we really have all Unices covered.

> and I'm sure that
> it'll probably take quite a bit of tweaking to get it to compile on any
> other system.

Probably.  Especially if we use UNIX utilities.  Portability is a huge issue
that we aren't about to conquer just yet.  You can see Sun tearing their hair
out over this issue and Java.  *But* by using Python, we are in a much better
position to port Loci than EMBOSS would be.

***It's a tradeoff guys!  We are sticking with one platform for development so
that we don't limit ourselves to the intersection of all UNIX, Windows, and
Mac GUI, which is relatively small.  *And* we get to use native UNIX
implementations, not everything running through a virtual machine.  Besides, I
have confidence that GTK/GNOME will find its way to those other platforms
without any effort of our own.  If not, we'll see about porting to native
Windows and Mac API...It's been done.

> Our main concern should be getting it to work under Windows,
> since those who use any other Unix won't have any trouble with Linux if
> they need to run it. Windows users, on the other hand are often either
> unable to understand Linux or not allowed by superiors to use Linux.
> That's where we really need to target the Web interface.

Yes, that is exactly what the Web interface is targeting :-)  But can we put
all of the bells and whistles in the Web interface that we have available with
GTK?  There is just no way right now.  As I said in my last e-mail to Carlos,
we want Loci to be as dynamic as Word and Excel.  If Microsoft can't put those
in a Web browser, I certainly doubt we could put Loci.  The Web interface will
simply have to work with the limitations imposed on that type of interface.

> I think this is where Perl really can be useful. It's designed to process
> and manipulate text, and there are modules for XML and HTML. Also, there's
> GD for creating GIFs. OTOH, Perl and Python running at the same time would
> probably wear out all but the best servers, so I think we may have to rely
> on Python alone. Then again, mod_perl would make the system much more
> responsive. If there's a GD module for Python, then it shouldn't be too
> tough to do the whole thing in Python and have it integrate much more
> cleanly with the other parts of Loci.
> 

I am not really anti-Perl, but Python can handle much of what Perl can, and I
think we should not try to mix Perl and Python.  I don't know if there is a GD
module for Python.  We'll have to look (Konrad, do you know of one?).  There
is an XML parser being developed by the Python developers.  Of course, Python
can handle text as well as Perl can.

> Oh, the way I envisioned it, the Web interface would be a package that
> could be installed on any Loci server as a Loci client and it would use
> CGI to handle the requests from other computers.

Yes it would be a client in the way it would substitute for all (nearly) of
the client side loci.  But it would act as a Web server (really, work with a
Web server) that can be tapped into by anyone using a Web browser.  Where
should it go?  I think we may just need one at the main Loci URL (which
doesn't exist yet).  Why do you think it should be portable?  It can be, but I
don't think it has to be, as long as one Web server can handle the requests.

> I think I'll do this project, as I haven't taken any molbio/genetics
> courses yet. Considering the situation of the people who would use the Web
> interface, they probably have a JavaScript capable browser, so I can
> implement most of the widgets.

Great!  But let's see what we can do using standard CGI over JavaScript (see
my last message).  If we have to use it, then we have to.

> (Does IE have layers support?)

You've got me there.  I haven't used IE much at all.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Thu Jan 21 16:07:02 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:14 2006
Subject: [Pipet Devel] Web interface
In-Reply-To: <36A785AC.26C24D14@bc.edu>
Message-ID: <Pine.GSU.4.05.9901211314380.22767-100000@moet.cs.colorado.edu>

    
    > Also, don't underestimate the things you can do with JavaScript and newer
    > versions of HTML. I found
    > http://developer.netscape.com/viewsource/index_frame.html?content=archive/archivelist.html
    > useful to get an impression.
    > 
    > See also
    > http://developer.netscape.com/viewsource/goodman_drag/goodman_drag.html
    > for (at least to me) surprising applications.
    
    These are examples of both JavaScript and DHTML (Dynamic HTML).  
    The first is Netscape's proprietary scripting language, and the
    other is Microsoft's proprietary extension of HTML.  I am aware of
    how well they both work, and yes they can do some amazing things.  
    The HTML interface can make use of these, but I would like some
    feedback on using these proprietary languages in a GNU project.  
    I had rejected doing that before, which is one of the reasons why
    we aren't using Java or Tcl/Tk.  Maybe we should stick with open
    source everywhere...?

JavaScript is open source isn't it? It's part of Mozilla. Supposedly the
Raptor/Gecko layout engine is going to support "HTML 4.0, CSS 1/2, XML
1.0, and the Document Object Model" (first stable version of Gecko is due
sometime during first half of 1999). For example for dragging and dropping
stuff around you need layers (HTML4), event handling (JavaScript/DOM), and
absolute positioning of elements (CSS2/DOM). I'm not sure how soon Mozilla
is going to support all this, but I suspect within this year.

Because there is no open source version that supports a sufficiently
dynamic interface yet - but it might arrive within this year -, it is
probably a good idea to implement the UI in GTK first. Once the design
stabilizes and the open source web interface language implementation
becomes available, some of us can then see how far one can push a dynamic
web interface.

Carlos

From justin at ukans.edu  Thu Jan 21 17:21:54 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] toolkit and data access/storage
In-Reply-To: <36A78E3E.4806760A@bc.edu>
Message-ID: <Pine.OSF.4.03.9901211537480.31501-100000@busboy.sped.ukans.edu>

On Thu, 21 Jan 1999, J.W. Bizzaro wrote:
> 
> Yes, that is exactly what the Web interface is targeting :-)  But can we put
> all of the bells and whistles in the Web interface that we have available with
> GTK?  There is just no way right now.  As I said in my last e-mail to Carlos,
> we want Loci to be as dynamic as Word and Excel.  If Microsoft can't put those
> in a Web browser, I certainly doubt we could put Loci.  The Web interface will
> simply have to work with the limitations imposed on that type of interface.
> 

A Windows port using gtk should not be a significant problem. A
cross-compile with cygnus' tools and Win gdk, should be all that's
necessary. Theoretically Mac OS X should be even less work (once someone
ports gdk).
I agree, though, that for development purposes, it's best not to even
worry about it. Linux is where we'll find the most contributors in the
early phase of development.

> Yes it would be a client in the way it would substitute for all (nearly) of
> the client side loci.  But it would act as a Web server (really, work with a
> Web server) that can be tapped into by anyone using a Web browser.  Where
> should it go?  I think we may just need one at the main Loci URL (which
> doesn't exist yet).  Why do you think it should be portable?  It can be, but I
> don't think it has to be, as long as one Web server can handle the requests.

I was under the impression that the CGI program would return an XML file
which the client used for its display. Using a browser connected directly
to this would require additional information (javascript, style sheets,
etc) for display, right? Or would the dhtml interface be conditional (only
send it if it's not our special app client)?

Also, I think a typical, url-encoded CGI request syntax is poor for this
system. There's no particular reason we couldn't feed the server program
our own query syntax. I think using one of the proposed XML query
languages would be more useful. I like AT&T's; it's a mix of XML and
semi-structured query language (halfway between SQL and OQL).
It lets you specify the format of the XML returned.
This could be tied to a database or analysis program, or a mixture of
both. A client can take whatever pieces of information from any source it
likes and combine them for it's display.

And for now, it could still be implemented over a typical HTTP
connection, via an intermediate "dispatch" agent.

The basic idea is like the Casbah project, just not so complicated, and no
Java. I'm working on the basic idea for a database/application
infrastructure for education related ideas.

XML-QL (AT&T's proposal):
http://www.w3.org/TR/NOTE-xml-ql/

XQL (Microsoft's proposal):
http://www.w3.org/TandS/QL/QL98/pp/xql.html

Supposedly, there was a conference on query languages a while back, but I
haven't seen any working drafts on w3 yet. It's hard to tell which becomes
the recommendation (probably a mixture of the two).

I'm not quite sure how the interface layer between the query parsing and
database retrieval and/or application will work yet. Ideas are welcome.


Also, do we want a listserv? If so, I can have something set up here.

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Thu Jan 21 18:16:57 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] Re: loci
References: <Pine.LNX.3.96.990121131601.32465A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36A7B54F.6E59DBC0@bc.edu>

Harry Mangalam wrote:
> I'm considering releasing the next version under std GPL, but I'm old enough
> to want to take a good look at it and try to consider the possibilities that
> GPL requires/allows.

Okay.  I hope you do!  I started out worrying about what greedy corporate
types might do with my programs, but I don't care as much now.  I think the
GPL gives pretty good protection to the intellectual property of the
developers.  And I think an important part of an unestablished project is
_not_ restricting one's work to anyone.

> 
> Right - it's a vanilla ANSI-C command-line app - one of the reasons for
> dev'ing it was to supply something like a DNA Strider for the command-line.
> I develop on Linux and port to IRIX/SunOS/Solaris/DEC Unix/HPUX with no
> problems.. yet.

That's fine.  I think ANSI-C is more portable than C++ and is more suited to a
UNIX environment.  We use ANSI-C to supplement Python.  But Python is
preferred here because of some very powerful features.  It is also, from my
observation, the scripting language most preferred by physical scientist (I
sound like an advertisement)...considered by a few to be a likely replacement
for FORTRAN.

> 
> I was considering starting with the former (a GUI-wrapped command-line app)
> and moving to the latter (fully GUI), but I'm still feeling my way in terms
> of how to present it.  As I understand LOCI, the underlying apps can be
> distributed but communicate at the XML layer.  I'm starting to add this to
> tacg for reasons related to interoperability but until yesterday, unrelated
> to LOCI :).

Yes.  Tools can/will communicate locally via (1) direct Python implementation
or (2) indirect use of XML.  The other way to communicate is (3) remotely via
XML with a CGI-like interface.

> Are you planning to make a psuedo-visual programming language out of this -
> is this what the GCL is?  If I understand XML correctly, this shoul dbe
> possible but would probably require a large expenditure of energy...

Yes.  I was just talking to Carlos Maltzahn in our group about that (Carlos
developed the Paos Project for distributing Python objects over a network, and
he will help incorporate that into Loci).  GCL is about the highest level
programming language you may find, and it is specifically to manage multi-step
analyses in a graphical way. (I don't consider biologists to be very
computer-savvy.)  The XML is mostly used to format data and not to
issue/manage commands.  The job of GCL is to issue/manage commands.  But
putting commands in with the XML is an interesting thought.

> 
> I like the idea of being able to use whatever underlying language you want -
> lots of goodies are written in perl and there's no real reason to exclude
> those apps/libs and those authors.

Yes!  The distributed model with a CGI-like mode of operation will make this
connection between Loci and any other command-line language!


> I also am working on contract for NCGR (National Center for Genome Resources
> (Santa Fe) which is also interested in developing freely available tools for
> sequence analysis/bioinformatics and I'll try to get them to pay attention to what you're
> trying to do - there may be room for some effort on their part.

What sort of "effort" are you speaking of?  Loci is unfunded, but that may change.

> I tend to agree with your feeling that both can go forward - with the state
> of funding these days, there will be times when one or the other will be
> moving faster, but both groups should stay in contact with each other.  It
> would be good if representatives of both could meet and get drunk together
> at some birds-of-a-feather meeting soon...

:-)  Which continent?

> OK - I am VERY much interested in hearing what the rest of you think about
> how this should be done - I'm interested in getting biosequence analysis
> made much easier and cheaper and have tried to walk my talk by taking the
> time to put something together towards that end.  If I can contribute to
> other projects by this, so much the better.

IF you go GPL, I would hope that tacg would be a model for taking a
command-line bioinformatics program and adding a GUI to it.  In other words, a
good point of collaboration for us would be to use the energy that would be
put into making a GTK interface for tacg, and put it into developing the part
of Loci that tacg would need.  We would otherwise be duplicating our efforts. 
We are just starting to make these tools, such as sequence visualization and
editing tools, so why not make them to work for tacg (as well as some EMBOSS
programs)?  Thomas Sicheritz (author of BioWish) is working on an editor right
now, and I think Justin Bradford may help with XML implementation.

> 
> So by all means, plase keep me in the loop.  If I can help out in any way.
> let me know...

Okay.  You are hereby on the mailing list.  And I'll consider you an observer
who may want to help (please help :-) with incorporating his not-yet-GPL
analysis algorithms.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Jan 21 19:11:34 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] toolkit and data access/storage
References: <Pine.OSF.4.03.9901211537480.31501-100000@busboy.sped.ukans.edu>
Message-ID: <36A7C218.1FD827BE@bc.edu>

Justin Bradford wrote:
> I was under the impression that the CGI program would return an XML file
> which the client used for its display.

Well...actually the model is changing some now that Carlos will help with Paos
(see the messages we sent).  I'm using the term CGI-like, because at some
point, on the server side, the "query" or command will be passed from a Python
script (let's call it "Gatekeeper") to a stand-alone analysis algorithm (runs
from command line and returns XML).  The XML will then be returned to
Gatekeeper and sent back to the client.  This is a lot like a Web browser
(client) communicating with a CGI program (on a server).

> Using a browser connected directly
> to this would require additional information (javascript, style sheets,
> etc) for display, right? Or would the dhtml interface be conditional (only
> send it if it's not our special app client)?

Okay.  You're asking how will the XML accomidate the fact that the client can
be either a GUI loci or a Web browser?  Good question.  I want the XML to be
formatted to best accomidate the GUI loci (you know, the tools made with PyGTK
and all that).  The job of the person developing the Web/HTML interface
(Rahul) is to write a translator that will turn the XML into HTML + GIF.

> 
> Also, I think a typical, url-encoded CGI request syntax is poor for this
> system. There's no particular reason we couldn't feed the server program
> our own query syntax. I think using one of the proposed XML query
> languages would be more useful. I like AT&T's; it's a mix of XML and
> semi-structured query language (halfway between SQL and OQL).
> It lets you specify the format of the XML returned.
> This could be tied to a database or analysis program, or a mixture of
> both. A client can take whatever pieces of information from any source it
> likes and combine them for it's display.

Again, we are going with distributed Python objects via Paos.  But the "query"
or command language used is something I haven't thought much about.  We do
want the XML that is returned to be something the client can handle.  For
example, the client asks (queries) the Gatekeeper for a Chao-Fasman prediction
of protein secondary structure.  It should get something back that can be
displayed by that client, or the client may have to pass the info along to
another client.  In any case, the data has to be of the type that has a client
in existence for it.

Maybe the client doesn't need to say what it is expecting.  Maybe we just need
a filter for things not expected.  If we do get something unexpected, it may
be the fault of analysis algorithm author who tried to implement something not
supported.  (BTW, we do need a specification and a template system for
converting analysis loci output into XML that the clients can handle--falls
between the Gatekeeper and the analysis loci).

> The basic idea is like the Casbah project, just not so complicated, and no
> Java. I'm working on the basic idea for a database/application
> infrastructure for education related ideas.
> 
> XML-QL (AT&T's proposal):
> http://www.w3.org/TR/NOTE-xml-ql/
> 
> XQL (Microsoft's proposal):
> http://www.w3.org/TandS/QL/QL98/pp/xql.html

I would like you to consider what query language we need, that would best suit
XML transfer.  But since this is a GNU-only project, we can't go with a
proprietary language. ***Besides, those are more complex because they need to
be general purpose.  Can you invent something along those lines that is small
and special purpose for Loci?

> 
> I'm not quite sure how the interface layer between the query parsing and
> database retrieval and/or application will work yet. Ideas are welcome.

You mean the program that takes the query, converts the query to a
command-line, issues the command to the analysis loci, accepts the ASCI text,
converts it to XML, and sends it back to the client?  Once again, the
Gatekeeper!  That's a BIG component to Loci that needs to be set up before
most anything else will work.  Of course it will be in Python and be closely
connected to Paos.  Who wants that project???  It is actually a part of
defining a query langauge and the protocol for handling XML.  Do you want to
take a shot, Justin?  The other XML projects are (1) converting standard docs
like PDB and GENBANK to XML, and (2) parsing XML to display images within the
client loci.

> 
> Also, do we want a listserv? If so, I can have something set up here.

Heh.  I know it's getting too much for me to handle.  Can you set something up
until I get some servers going at UMass Lowell for this project?  Thanks!  The
list of people that got this message (including you and me) is the whole
mailing list.  BTW, Jay Painter is off the list...he's working too hard on
getting GNOME ready for RedHat 6.0.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Thu Jan 21 20:12:32 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] toolkit and data access/storage
In-Reply-To: <36A7C218.1FD827BE@bc.edu>
Message-ID: <Pine.GSU.4.05.9901211738210.22767-100000@moet.cs.colorado.edu>


Jeff Bizzaro wrote:
    
    Again, we are going with distributed Python objects via Paos.  
    But the "query" or command language used is something I haven't
    thought much about.  We do want the XML that is returned to be
    something the client can handle.  For example, the client asks
    (queries) the Gatekeeper for a Chao-Fasman prediction of protein
    secondary structure.  It should get something back that can be
    displayed by that client, or the client may have to pass the info
    along to another client.  In any case, the data has to be of the
    type that has a client in existence for it.

Paos does have a (pretty ad-hoc) query language. The README in the
distribution contains a terse description/definition.
(ftp://ftp.cs.colorado.edu/users/carlosm/README.paos). Results are Python
objects. The client module and the base classes for schema definitions are
optimized for reducing object serialization overhead. The result of a
query are objects that match the query and all "primitive objects" such as
strings and numbers that are values of instance variables. Pointers to
other objects are internally represented as object IDs in the form of
strings. They are transparently loaded from the server as the user
references them. The client maintains a local cache so that re-references
of attributes that point to other objects don't cause any client/server
traffic. Cache consistency depends on the use of notification services.
Objects that are received via notifications are written into the same
cache. Thus, one can maintain very tight cache consistency by
appropriately defining notification requests. But notification requests
also allow you to limit the scope of cache consistency to a few relevant
objects.

I'm not sure how Paos should communicate with the actual tools (in the
case people agree to use it as "gatekeeper"). I personally don't like CGI
because of it's unflexible fork-request-once-response-once-terminate
assumption. I suspect these tools are running for a longer time period and
we would like to be able to find out about their state. On the other
hand it's probably not a good idea to run them natively in Paos
because of their size, no? What are people's thoughts about this?

Carlos

From bizzaro at bc.edu  Thu Jan 21 20:44:48 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] toolkit and data access/storage
References: <Pine.GSU.4.05.9901211738210.22767-100000@moet.cs.colorado.edu>
Message-ID: <36A7D7EB.4D52C752@bc.edu>

Carlos Maltzahn wrote:
> 
> I'm not sure how Paos should communicate with the actual tools (in the
> case people agree to use it as "gatekeeper").

Actually, I was not calling Paos "Gatekeeper".  Let's see if I have the Paos
model right:  Some clients are local to the user, and the server is remote and
acts as hub for remote clients to communicate with local clients.  Correct?

If I am correct, the Gatekeeper would be a remote client to Paos.  It converts
queries into command-lines (via Ajax by EMBOSS?) for execution by the analysis
algorithms, waits for response, gets ASCI text response, formats it into XML
(according to a template), and sends it back to the client.

on the server side, we are catering to
fork-request-once-response-once-terminate programs made by who-knows and
whenever with whatever language.  In other words, we still need a CGI-like
system.  But this is only one type of Loci client.  Other clients can make
better use of Paos.

> I personally don't like CGI
> because of it's unflexible fork-request-once-response-once-terminate
> assumption. I suspect these tools are running for a longer time period and
> we would like to be able to find out about their state.
                               ^^^^^^^^^^^^^^^^^^^^^^^^
Yes!  This is something I realized would not work with standard CGI.  These
analysis algorithms (server side only) will be longer lived than standard CGI
scripts.  Some may take hours or days to complete.  I would like to return the
state or maybe the time elapsed to the client so that the user knows something
is happening and the whole thing didn't just die.

> On the other
> hand it's probably not a good idea to run them natively in Paos
> because of their size, no? What are people's thoughts about this?

Because biological analysis algorithms tend to be command-line,
run-once-and-terminate, I think we will just have to treat them like rather
big CGI programs.  That's not to say that other tools, libraries, etc.,
written for Loci, will not communicate directly using Paos and XML...They can
and will.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Thu Jan 21 21:30:05 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] toolkit and data access/storage
In-Reply-To: <36A7D7EB.4D52C752@bc.edu>
Message-ID: <Pine.GSU.4.05.9901211901560.22767-100000@moet.cs.colorado.edu>


Correct me if I'm wrong but I think what we really are trying to design
here is somekind of batch processing management system. Our department
runs a commercial product called Load Sharing Facility (LSF) sold by
Platform Computing (www.platform.com). See
http://www.cs.colorado.edu/csops/FAQ/lsf.html for our installation and
http://www.cs.colorado.edu/csops/FAQ/lsf-webpages/quick-admin.html for
some documentation on it.

Is there an open source equivalent for this? 

If we want to define and monitor computations with GCL we need to have
some way to manage batch processing on different machines, query their
state, and have access to intermediate results or check points. That means
each execution needs to be submitted to some kind of management system
that then schedules and runs the tools in some kind of shell that it can
remotely query and control. Once we have established such a management
system it should be fairly easy to write a GCL user interfaces for it. I
can see Paos to sit on top of such a management system and the GCL editors
and monitors to be Paos clients.

LSF is designed primarily for workload management. Our focus would be more
on composing tools, and scheduling, and controlling them.

Carlos

On Thu, 21 Jan 1999, J.W. Bizzaro wrote:

    on the server side, we are catering to
    fork-request-once-response-once-terminate programs made by
    who-knows and whenever with whatever language.  In other words, we
    still need a CGI-like system.  But this is only one type of Loci
    client.  Other clients can make better use of Paos.
    
    > I personally don't like CGI 
    > because of it's unflexible fork-request-once-response-once-terminate 
    > assumption. I suspect these tools are running for a longer time period and
    > we would like to be able to find out about their state.

    Yes!  This is something I realized would not work with standard
    CGI.  These analysis algorithms (server side only) will be longer
    lived than standard CGI scripts.  Some may take hours or days to
    complete.  I would like to return the state or maybe the time
    elapsed to the client so that the user knows something is
    happening and the whole thing didn't just die.
    
    > On the other 
    > hand it's probably not a good idea to run them natively in Paos 
    > because of their size, no? What are people's thoughts about this?
    
    Because biological analysis algorithms tend to be command-line,
    run-once-and-terminate, I think we will just have to treat them
    like rather big CGI programs.  That's not to say that other tools,
    libraries, etc., written for Loci, will not communicate directly
    using Paos and XML...They can and will.
    
    
    Jeff
    -- 
    J.W. Bizzaro                  Phone: 617-552-3905
    Boston College                mailto:bizzaro@bc.edu
    Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
    --
    

From bizzaro at bc.edu  Thu Jan 21 21:35:11 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] Web interface
References: <Pine.GSU.4.05.9901211314380.22767-100000@moet.cs.colorado.edu>
Message-ID: <36A7E3B6.D6FDB87F@bc.edu>

Carlos Maltzahn wrote:
> JavaScript is open source isn't it? It's part of Mozilla. Supposedly the
> Raptor/Gecko layout engine is going to support "HTML 4.0, CSS 1/2, XML
> 1.0, and the Document Object Model" (first stable version of Gecko is due
> sometime during first half of 1999). For example for dragging and dropping
> stuff around you need layers (HTML4), event handling (JavaScript/DOM), and
> absolute positioning of elements (CSS2/DOM). I'm not sure how soon Mozilla
> is going to support all this, but I suspect within this year.

Okay.  I didn't realize JavaScript would go open source with Mozilla.  If
that's the case, then I have no quarrels about using it.  *But* the Mozilla
license is more restrictive than GPL...Hmmm.

> Because there is no open source version that supports a sufficiently
> dynamic interface yet - but it might arrive within this year -, it is
> probably a good idea to implement the UI in GTK first. Once the design
> stabilizes and the open source web interface language implementation
> becomes available, some of us can then see how far one can push a dynamic
> web interface.

I agree.  That's just the way I see it :-)  Who knows, if the Web becomes THAT
dynamic, the Web interface and the rest of the Loci clients may merge...But
are we to expect that Netscape will provide an all-pupose, cross-platform GUI
widget set?  Hmmm.  It does sound hard to believe...We'll wait and take the
conservative route here.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hinsen at cnrs-orleans.fr  Fri Jan 22 04:15:28 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] Raynald joins
In-Reply-To: <36A77C94.F730D99C@bc.edu> (bizzaro@bc.edu)
References: <36A77C94.F730D99C@bc.edu>
Message-ID: <199901220915.KAA19818@dirac.cnrs-orleans.fr>

> Raynald and Konrad, do you think we should have French and German translations
> of the Loci Web pages as well?

I'd say yes, but not now. As soon as there is code that people can
actually use, it makes sense. I think we can safely expect everyone
interested in *development* to be able to deal with a website in
English.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hinsen at cnrs-orleans.fr  Fri Jan 22 04:25:23 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] Re: Web interface
In-Reply-To: <36A78E3E.4806760A@bc.edu> (bizzaro@bc.edu)
References: <Pine.LNX.4.05.9901211314490.1600-100000@photino.sid.rice.edu> <36A78E3E.4806760A@bc.edu>
Message-ID: <199901220925.KAA21138@dirac.cnrs-orleans.fr>

> I am not really anti-Perl, but Python can handle much of what Perl can, and I
> think we should not try to mix Perl and Python.  I don't know if there is a GD
> module for Python.  We'll have to look (Konrad, do you know of one?).  There

There is, but it's no longer maintained
(http://alumni.dgs.monash.edu.au/~richard/gdmodule/). The module of
choice for creating graphics in Python is the Python Imaging Library
(http://www.python.org/sigs/image-sig/Imaging.html). I have used it
for some small tasks and it works as advertised.

> Great!  But let's see what we can do using standard CGI over JavaScript (see
> my last message).  If we have to use it, then we have to.

Anyone interested in Web interfaces should have a look at Zope
(http://www.zope.org). All I have used myself is the module
ZPublisher, which is essentially an object-oriented CGI library (but I
promise that once you have used it you don't want to deal with plain
CGI any more).

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hinsen at cnrs-orleans.fr  Fri Jan 22 04:29:34 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] Web interface
In-Reply-To: <Pine.GSU.4.05.9901211314380.22767-100000@moet.cs.colorado.edu>
	(message from Carlos Maltzahn on Thu, 21 Jan 1999 14:07:02 -0700
	(MST))
References: <Pine.GSU.4.05.9901211314380.22767-100000@moet.cs.colorado.edu>
Message-ID: <199901220929.KAA21140@dirac.cnrs-orleans.fr>

> JavaScript is open source isn't it? It's part of Mozilla. Supposedly the

The problem with JavaScript is not licensing, but compatibility.
I haven't tried myself, but those who did try to write non-trivial
JavaScript code supposed to work in all popular browsers tell me
that it's not a pleasant experience.

> absolute positioning of elements (CSS2/DOM). I'm not sure how soon Mozilla
> is going to support all this, but I suspect within this year.

And how long until it works reliably? It seems that Web browsers
are the only software category whose quality standards are even below
scientific code.

> Because there is no open source version that supports a sufficiently
> dynamic interface yet - but it might arrive within this year -, it is
> probably a good idea to implement the UI in GTK first. Once the design
> stabilizes and the open source web interface language implementation
> becomes available, some of us can then see how far one can push a dynamic
> web interface.

That sounds like a good approach to me.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hinsen at cnrs-orleans.fr  Fri Jan 22 04:42:35 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] toolkit and data access/storage
In-Reply-To: <36A7D7EB.4D52C752@bc.edu> (bizzaro@bc.edu)
References: <Pine.GSU.4.05.9901211738210.22767-100000@moet.cs.colorado.edu> <36A7D7EB.4D52C752@bc.edu>
Message-ID: <199901220942.KAA23302@dirac.cnrs-orleans.fr>

> > I personally don't like CGI
> > because of it's unflexible fork-request-once-response-once-terminate
> > assumption. I suspect these tools are running for a longer time period and
> > we would like to be able to find out about their state.
>                                ^^^^^^^^^^^^^^^^^^^^^^^^
> Yes!  This is something I realized would not work with standard CGI.  These
> analysis algorithms (server side only) will be longer lived than standard CGI
> scripts.  Some may take hours or days to complete.  I would like to return

There are solutions to this. Something I have considered for
monitoring long-running MD simulations is a two-threaded program
(remember that Python has very nice threading support) with one thread
running the simulation and the other one running the Zope HTTP server
(which is a specialized Web server for ZPublisher). Since threads
share global data, the Web server could always access the state of the
simulation and provide any information the user wants.

Also have a look at the PCGI (persistent CGI)
http://starship.skyport.net/crew/jbauer/persistcgi/ system, which is
more generic.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From lahondes at pasteur.fr  Fri Jan 22 06:42:24 1999
From: lahondes at pasteur.fr (Raynald de =?iso-8859-1?Q?Lahond=E8s?=)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] Raynald joins
References: <36A78040.136F6770@bc.edu>
Message-ID: <36A86420.1157AFF9@pasteur.fr>

"J.W. Bizzaro" wrote:
> Raynald and Konrad, do you think we should have French and German translations
> of the Loci Web pages as well?

I think this is the kind of thing you hope to find on internet, don't
you ?

-- 
Raynald de Lahondes
Unite des Virus Oncogenes - Departement de Biotechnologie
Institut Pasteur - 25, rue du Docteur Roux
75724 Paris Cedex 15 - FRANCE
tel: 01.45.68.84.54 - fax: 01.40.61.30.33 - cellular: 06.15.65.85.08
email: lahondes@pasteur.fr

From justin at ukans.edu  Fri Jan 22 15:31:08 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] tulip mailing list
Message-ID: <Pine.OSF.4.03.9901221423420.10426-100000@busboy.sped.ukans.edu>

I've set up a tulip mailing list at:
tulip-list@busboy.sped.ukans.edu

It's running majordomo, so for new people to subscribe, they send mail to
majordomo@busboy.sped.ukans.edu with "subscribe tulip-list" in the message
body.

To unsubscribe, just do the same as above, substituting "subscribe" with
"unsubscribe" in the message body.

Also, I have it automatically insert the [Pipet Devel] in the subject, if it's
not already there, and the reply-to header is set to the list.

For those of you using procmail, I recommend keying it off the sender
header. Here's a sample recipe:
:0:
* ^Sender: owner-tulip-list@busboy.sped.ukans.edu
mail/tulip

The current recipients are:
justin@ukans.edu
bizzaro@bc.edu
hinsen@cnrs-orleans.fr
jabbo@mindless.com
Thomas.Sicheritz@molbio.uu.se
david.lapointe@umassmed.edu
rahul@photino.sid.rice.edu
carlosm@moet.cs.colorado.edu
lahondes@pasteur.fr
hjm@cx408397-a.irvn1.occa.home.com


Mail me if you have any questions or problems.

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Fri Jan 22 16:11:14 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] tulip mailing list
References: <Pine.OSF.4.03.9901221423420.10426-100000@busboy.sped.ukans.edu>
Message-ID: <36A8E971.E7514D87@bc.edu>

Great!  Thanks Justin!

I'll post this info on the Loci Web page ASAP.

BTW, do you know how to create an HTML archive of the mailing list, for people
to access later?  And can I get a list from time to time of everyone on the
list?


Jeff
bizzaro@bc.edu


Justin Bradford wrote:
> 
> I've set up a tulip mailing list at:
> tulip-list@busboy.sped.ukans.edu
> 
> It's running majordomo, so for new people to subscribe, they send mail to
> majordomo@busboy.sped.ukans.edu with "subscribe tulip-list" in the message
> body.
> 
> To unsubscribe, just do the same as above, substituting "subscribe" with
> "unsubscribe" in the message body.
> 
> Also, I have it automatically insert the [Pipet Devel] in the subject, if it's
> not already there, and the reply-to header is set to the list.
> 
> For those of you using procmail, I recommend keying it off the sender
> header. Here's a sample recipe:
> :0:
> * ^Sender: owner-tulip-list@busboy.sped.ukans.edu
> mail/tulip
> 
> The current recipients are:
> justin@ukans.edu
> bizzaro@bc.edu
> hinsen@cnrs-orleans.fr
> jabbo@mindless.com
> Thomas.Sicheritz@molbio.uu.se
> david.lapointe@umassmed.edu
> rahul@photino.sid.rice.edu
> carlosm@moet.cs.colorado.edu
> lahondes@pasteur.fr
> hjm@cx408397-a.irvn1.occa.home.com
> 
> Mail me if you have any questions or problems.
> 
> Justin Bradford
> justin@ukans.edu

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Jan 22 21:15:00 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] batch processing
Message-ID: <36A930A4.3CE49B7A@bc.edu>

Carlos,

I did a search on freshmeat.net for batch processing systems, and I found the
two below.  They are both GNU GPL, but Queue looks like it will do more than we
need.  Funny things is, Queue was the top item on the page when I first
connected to Freshmeat :-)

GNU Queue:

    http://bioinfo.mbb.yale.edu/~wkrebs/queue.html

Generic NQS:

    http://www.gnqs.org/home.htm

Let me know what you think of these and what we might be able to do.  Thanks.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Jan 22 21:35:41 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] more batch processing
Message-ID: <36A9357D.C758271A@bc.edu>

One more, CERN NQS.  It is "freely available".  I don't know the license or the
Web site, but here's the FTP site:

    ftp://shift.cern.ch/pub/NQS/

Looking at Generic NQS (last e-mail), I think it may be the way to go.  GNU
Queue is for "homogeneous clusters of workstations".


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Fri Jan 22 22:41:42 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] more batch processing
In-Reply-To: <36A9357D.C758271A@bc.edu>
Message-ID: <Pine.LNX.3.96.990122193137.4940A-100000@cx408397-a.irvn1.occa.home.com>

Hi All,

    Re: the question of queueing, load-sharing/leveling, and being able to
track a pid persistently, isn't this something that should be addressed at
the system level?  ie - isn't this something that should be taken up in
concert with the kernel folks or maybe the gnome folks so that the tulip
plan doesn't go off in a direction ideal for us but turns out to be the one
NOT chosen by others?  

I HATE committees but I hate rewriting large chunks of code more.

In the interim, if we need something to get this off the ground, a little
hack could be writ to take the pid of the process and track it thru a cgi
call to look at the the appropriate /proc entry.  This approach would, of
course, require a different shim for evey OS (Irix is different than linux
is different than Solaris, etc), but it would allow progress without
committing to a possibly nonsensical path.

Or we just ignore it for the present and write a dummy call
HereBePersistantIds() that allows us to sidestep it.  If it's gonna be done,
it should be done right, but waiting for it to be done right doesn't have to
lock other efforts.

Or...I'm completely offbase and forgive me...

Cheers
harry


On Sat, 23 Jan 1999, J.W. Bizzaro wrote:

> One more, CERN NQS.  It is "freely available".  I don't know the license or the
> Web site, but here's the FTP site:
> 
>     ftp://shift.cern.ch/pub/NQS/
> 
> Looking at Generic NQS (last e-mail), I think it may be the way to go.  GNU
> Queue is for "homogeneous clusters of workstations".
> 
> 
> Jeff
> -- 
> J.W. Bizzaro                  Phone: 617-552-3905
> Boston College                mailto:bizzaro@bc.edu
> Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
> --
> 

Cheers,
Harry

Harry J Mangalam, Developmental + Cell Biology
Rm 4201, Biological Sciences II, UC Irvine, Irvine, CA, 92697
(949) 824 4824[vox], (949) 824 8551[fax], mangalam@uci.edu
http://hornet.bio.uci.edu/~hjm/

From bizzaro at bc.edu  Fri Jan 22 23:48:08 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:15 2006
Subject: [Pipet Devel] more batch processing
References: <Pine.LNX.3.96.990122193137.4940A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36A95488.68192128@bc.edu>

Harry Mangalam wrote:
>     Re: the question of queueing, load-sharing/leveling, and being able to
> track a pid persistently, isn't this something that should be addressed at
> the system level?  ie - isn't this something that should be taken up in
> concert with the kernel folks or maybe the gnome folks so that the tulip
> plan doesn't go off in a direction ideal for us but turns out to be the one
> NOT chosen by others?
> 
As you mentioned below, if we expect the kernel (Linux?) or GNOME developers to
solve the problem, we (1) have to wait for these guys to do it, if they even
want to, and (2) we end up with something that is platform (in this case Linux)
dependent.

> 
> In the interim, if we need something to get this off the ground, a little
> hack could be writ to take the pid of the process and track it thru a cgi
> call to look at the the appropriate /proc entry.  This approach would, of
> course, require a different shim for evey OS (Irix is different than linux
> is different than Solaris, etc), but it would allow progress without
> committing to a possibly nonsensical path.

If there is some other way to do it, that won't require a different version of
Loci for each flavor of UNIX (some on the team think it is bad enough we are
ignoring Windows), that's fine with me.

I think "all we need" is a binding from Python to GNQS (Generic NQS).  We can
get the source code, but I don't think writing a binding will require that we
recompile it.  It shouldn't be all that bad.

Regarding compatibility, GNQS has been ported to nearly all flavors of
UNIX...just like Python and GTK and GNOME.  I don't know if we can call it
nonsensical.  From what I read, it was one of the first of all UNIX batch
systems, derived from the very first one used by NASA.  Is it out of date?  I
don't know.

> 
> Or we just ignore it for the present and write a dummy call
> HereBePersistantIds() that allows us to sidestep it.  If it's gonna be done,
> it should be done right, but waiting for it to be done right doesn't have to
> lock other efforts.

We absolutely need to have the Paos and XML framework set up before we can even
test something like a batch system.  So I agree with you.  Let's just pretend
for now that we will have some system set up for doing this.  Exactly what it is
I think Justin and Carlos need to think about very carefully...I'll put the
burden on someone else ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sat Jan 23 00:17:33 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] template system
Message-ID: <36A95B6D.BF276B19@bc.edu>

Harry,

Thinking about the whole Gatekeeper/batch system thing, we do need some sort of
a template system to convert formatted ASCII output to XML.

You know that command-line programs like yours will need to put the output into
XML so that we can have the GUI clients show pretty pictures (very pretty--I'm
looking for publication quality).  But I don't want to require the authors of
the command-line programs to change anything in their programs.

What I envisioned is someone who wants to plug a new command-line program into
the server-side of Loci, will write a text file that is a template for the
Gatekeeper to convert the text into XML.  I'm not sure just how it will work,
but I think the template essentially needs to say "this much of the output is
such and such, and that much is such and such".  You know that XML is linear,
and the ASCII output from the command-line program can be read one character at
a time.  Somehow we need to give the Gatekeeper instructions on making a
conversion between two linear formats.

Do you have any thoughts on this?  Can you think of how you might make templates
for tacg?  Is this something you'd like to work on as a project?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Sat Jan 23 00:40:21 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] template system
In-Reply-To: <36A95B6D.BF276B19@bc.edu>
Message-ID: <Pine.LNX.3.96.990122213128.5263A-100000@cx408397-a.irvn1.occa.home.com>

I was planning to re-write the output specifically to generate XML (as a
commandline flag request), but this is an interesting approach.  In one sense
it would be an XML preprocessor - some sort of tag or format that would be
easy for the author of a cli app to insert in his output to hint to the XML
preprocessor to 'treat this grid of x,y numbers as a flibber' or 'treat this
column of numbers as a trippet'. 

The tag hint is breaking with your idea about keeping the output untouched,
but sometimes a little hint is a big break - If a little fudging saves a lot
of work, I'll go for the fudge. 

So yes, I'll think (and do) something about this.  It will probably be
necessary for me to actually re-write my output to get a handle on the issues
that need to be addressed for other generic cli apps, but yes, I'll give it a
shot.

Cheers
Harry


On Sat, 23 Jan 1999, J.W. Bizzaro wrote:

> Harry,
> 
> Thinking about the whole Gatekeeper/batch system thing, we do need some sort of
> a template system to convert formatted ASCII output to XML.
> 
> You know that command-line programs like yours will need to put the output into
> XML so that we can have the GUI clients show pretty pictures (very pretty--I'm
> looking for publication quality).  But I don't want to require the authors of
> the command-line programs to change anything in their programs.
> 
> What I envisioned is someone who wants to plug a new command-line program into
> the server-side of Loci, will write a text file that is a template for the
> Gatekeeper to convert the text into XML.  I'm not sure just how it will work,
> but I think the template essentially needs to say "this much of the output is
> such and such, and that much is such and such".  You know that XML is linear,
> and the ASCII output from the command-line program can be read one character at
> a time.  Somehow we need to give the Gatekeeper instructions on making a
> conversion between two linear formats.
> 
> Do you have any thoughts on this?  Can you think of how you might make templates
> for tacg?  Is this something you'd like to work on as a project?
> 
> 
> Jeff
> -- 
> J.W. Bizzaro                  Phone: 617-552-3905
> Boston College                mailto:bizzaro@bc.edu
> Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
> --
> 

Cheers,
Harry

Harry J Mangalam, Developmental + Cell Biology
Rm 4201, Biological Sciences II, UC Irvine, Irvine, CA, 92697
(949) 824 4824[vox], (949) 824 8551[fax], mangalam@uci.edu
http://hornet.bio.uci.edu/~hjm/

From bizzaro at bc.edu  Sun Jan 24 19:39:07 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] and another thing...
Message-ID: <36ABBD27.A146A6CE@bc.edu>

To expand upon the Globetrotter analogy, each locus must be aware of the other
loci in the installation, without having anything added to the code.  In other
words, the bball players must be able to handle the addition and subtraction
of other players.

If someone has Loci installed with say 10 tools, and they download an 11th
tool from some third-party developer, the original 10 must know the 11th is
there and what it can do.  And the 11th must know that there are 10 others and
what each of them can do.

This is similar to what I had planned for the Gatekeeper.  The Gatekeeper will
know what tools are installed locally (on the server) and what each can do. 
This information is reported to the connecting client, so that the client loci
will then know what analysis loci are there.  I actually have that expanded a
bit to include a hub server at Lowell (or wherever) that will have all the
Gatekeepers on the Internet registered, so that someone with Loci can see all
of the analysis loci available in the world.

For both the client and server sides, we need databases to keep track of what
loci are present and what they can do.  In the case of the client (GUI) loci,
all of the public objects need to be recorded.  Carlos, does this make sense
regarding Paos?  Paos is an active object server.  Does it have a way to
catalog objects that may change as the configuration changes?  This is VERY important!

Can you guys see now, with this model in mind, just how difficult it would be
to get Loci to operate harmoiously with more than one language at the core?


Sweet Georgia Brown...


Jeff
bizzaro@bc.edu


-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Jan 24 22:54:35 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] porta
Message-ID: <36ABEAE4.3146619B@bc.edu>

I wrote:
> This is similar to what I had planned for the Gatekeeper.  The Gatekeeper will
> know what tools are installed locally (on the server) and what each can do. 
> This information is reported to the connecting client, so that the client loci
> will then know what analysis loci are there.

In case this isn't clear to anyone, the function of the Gatekeeper (and
possibly a client side locus that handles all calls to the Gatekeeper--I'm
calling it the "Porta Internet", Latin for Internet portal) is to make the
analysis algorithms transparent to the client loci.  In other words, the
information will come from a Python module (Gatekeeper via Porta Internet) and
be nicely packaged as an XML object.  The clients must have no idea they are
communicating with non-Python programs.  They act as though Porta Internet is
just another client.

Oh, and that goes for CORBA as well.  We should have a Porta CORBA that turns
the CORBA objects into Python/Paos/Loci objects, making Perl, etc. transparent
to the clients.  (Maybe we should use CORBA to connect Perl to Loci, since
we've been talking about using Perl, unless anyone knows a better way.)

Think Globetrotters, not Washington Generals ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at mroe.cs.colorado.edu  Mon Jan 25 02:43:34 1999
From: carlosm at mroe.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] and another thing...
Message-ID: <Pine.OSF.4.03.9901250143010.10228-100000@busboy.sped.ukans.edu>

[Jeff]
> For both the client and server sides, we need databases to keep track of
> what loci are present and what they can do.  In the case of the client
> (GUI) loci, all of the public objects need to be recorded.  Carlos, does
> this make sense regarding Paos?  Paos is an active object server.  Does
> it have a way to catalog objects that may change as the configuration
> changes?  This is VERY important! 

I believe so. Depending on how you design the object schema you can
register notification requests with a Paos server to notify you of any
changes as well as any additions/removals. Notification requests have the
same power as regular queries. So all you need to do is to define the
catalog of objects that may change as a query and register it with Paos. 
Additions/removals are handled by formulating queries for changes in sets
of objects. 
 
* * * 

I'm currently in a paper deadline crunch - sorry - but I'm working on the
Paos documentation/tutorial "real soon now". :) 

Carlos

From hjm at cx408397-a.irvn1.occa.home.com  Tue Jan 26 21:18:48 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] What is Paos?
In-Reply-To: <36AE660B.4D9707B7@bc.edu>
Message-ID: <Pine.LNX.3.96.990126181703.19991A-100000@cx408397-a.irvn1.occa.home.com>

Is there some docs or description on Paos?  I'm not familiar with it,
although it sounds like it might be a some kind of object database with a
little brokerage mixed in...?


Cheers,
Harry

Harry J Mangalam, Developmental + Cell Biology
Rm 4201, Biological Sciences II, UC Irvine, Irvine, CA, 92697
(949) 824 4824[vox], (949) 824 8551[fax], mangalam@uci.edu
http://hornet.bio.uci.edu/~hjm/

From carlosm at mroe.cs.colorado.edu  Tue Jan 26 21:54:30 1999
From: carlosm at mroe.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] What is Paos?
In-Reply-To: <Pine.LNX.3.96.990126181703.19991A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <Pine.LNX.3.96.990126192823.1211B-100000@go>


Unfortunately, the documentation of Paos is currently very poor. I'm
working on improving that (but don't expect anything before next week. If
you look at http://www.cs.colorado.edu/~carlosm/software.html you will
find a paper on Paos in German - use babelfish for (poor) translation.
Another bit of documentation is in
ftp://www.cs.colorado.edu/users/carlosm/README.paos. 

Sorry,
Carlos

On Tue, 26 Jan 1999, Harry Mangalam wrote:

> Is there some docs or description on Paos?  I'm not familiar with it,
> although it sounds like it might be a some kind of object database with a
> little brokerage mixed in...?
> 
> 
> Cheers,
> Harry
> 
> Harry J Mangalam, Developmental + Cell Biology
> Rm 4201, Biological Sciences II, UC Irvine, Irvine, CA, 92697
> (949) 824 4824[vox], (949) 824 8551[fax], mangalam@uci.edu
> http://hornet.bio.uci.edu/~hjm/
> 
> 

From bizzaro at bc.edu  Tue Jan 26 23:47:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] [Fwd: Express]
Message-ID: <36AE9A50.107E6D32@bc.edu>

Fellow Locians,

This is a reply to a message I sent to Conrad Parker a few months ago.  I wrote
to him about his GTK/GNOME Web browser, "Express".  I thought it might serve as
the core for XML display in Loci.

Look at his comment near the end, about Aube and GCL.  Interesting.

I'll send him a message back right now.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
An embedded message was scrubbed...
From: Conrad Parker <conradp@cse.unsw.edu.au>
Subject: Re: Express
Date: Wed, 27 Jan 1999 14:10:03 +1100
Size: 5716
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990127/984fef1d/attachment.mht
From bizzaro at bc.edu  Wed Jan 27 00:30:51 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] Re: Express
References: <364B8DC1.9B17FEEB@bc.edu> <19990127141003.H4759@cse.unsw.edu.au>
Message-ID: <36AEA48B.CC29FAA5@bc.edu>

Hi Conrad!

Wow.  I thought maybe you died or something ;-)

Conrad Parker wrote:

> I'm planning on making Express handle XML applications nicely, though I haven't
> looked into it much yet. Insofar as my contribution involves writing a
> browser which can support various XML applications, yes I'd like to be involved
> with TULIP :) Beyond that I don't think I can help much - my knowledge of
> biochemistry doesn't extend too much beyond high school and brief encounters in
> studying information theory and genetic algorithms :)

We have 9 bioscientists on our team, so you need not worry about that.  Time has
passed, and at this point we are looking for code to a generic XML browser, but
something we can build upon.  Each GUI tool in Loci (the name Tulip is being
phased out) will be a special-purpose XML browser and will support one (probably
one) XML definition.

Also, we are working with Python/C with bindings to GTK/GNOME.  So, we need
something we can wrap some Python code around.  We do have bindings to all of
the GTK/GNOME widgets, so we may be able to make the whole thing in Python.  But
of course native C will be faster.  Which would you recommend?  I haven't looked
into the speed requirements for a browser, but what we need will be graphics
intensive.

> cool :) looking at your developer's page, if Jay Painter is working on the
> BSML implementation then it should probably be ok for me to just do the web
> browser support (which will of course give networking etc).

Unfortunately, Jay is tied up with GNOME development for RedHat and may not come
back to Loci.  He was our only GTK/GNOME expert.  So, I guess we'll have to
start with page one of the tutorial :-P

> The reason I mention it is because its architecture is similar to your ideas
> for TULIP. In particular, looking at your ideas for GCL (do you have an
> implementation yet?) it looks like the way you want to be able to connect up
> components (tools) is similar to the way aube works - however aube's system is
> currently entirely graphical (ie. you can connect up various components, but
> not load/save the state of connections). I am looking at using XML to handle
> this information, as it can save parameters of each component more cleanly than
> a scripting language could.

We are just now trying to implement an active object server for Python (Paos, by
Carlos Maltzahn).  And we're talking about making a workflow system so that XML
objects can be juggled and tracked.  I agree that XML is a nice way to handle
this sort of data.

> So, if you'd like to save yourself some coding you can use the system I've got
> going with aube, including some widgets for selecting inputs and connecting up
> components. I'll soon be adding an overview widget for editing the whole graph
> of connections

Aube looks very nice!  You know, we _really_ need a GTK guru to put the graphics
together for GCL (all the glyphs and arrows).  This may be even more helpful to
us than an XML browser, and it wouldn't require any knowledge of biology.  I
imagine you might be able to make use of GCL in Aube or other programs (in fact,
I think GCL is the way to go for user-friendly UNIX).  Is this something you'd
like to take on?  Have you tried to make dnd icons and user-manipulated graphics
with gnome-canvas?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--