From bizzaro at bc.edu  Thu Feb  4 03:22:56 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] XML-RPC
Message-ID: <36B958E0.2ED2AF61@bc.edu>

Fellow Locians,

Not to say that we should use this protocol (I guess it was developed by
Microsoft), but XML-RPC seems to embed object requests (or "procedure calls") in
an XML.  Here is a link for those who are not familiar with it:

    http://www.scripting.com/davenet/98/07/xmlRpcForNewbies.html


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hinsen at cnrs-orleans.fr  Thu Feb  4 03:44:48 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] XML-RPC
In-Reply-To: <36B958E0.2ED2AF61@bc.edu> (bizzaro@bc.edu)
References: <36B958E0.2ED2AF61@bc.edu>
Message-ID: <199902040844.JAA17952@dirac.cnrs-orleans.fr>

> Not to say that we should use this protocol (I guess it was developed by
> Microsoft), but XML-RPC seems to embed object requests (or "procedure calls") in
> an XML.  Here is a link for those who are not familiar with it:
> 
>     http://www.scripting.com/davenet/98/07/xmlRpcForNewbies.html

And here's the Python implementation:

http://www.pythonware.com/products/xmlrpc/

-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Thu Feb  4 04:19:02 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] better link to Python-XML
Message-ID: <36B96606.27415BA7@bc.edu>

Here is a link to the Python-XML "home page", which I guess is difficult to find
from the XML-SIG page I wrote about earlier:

    http://www.python.org/topics/xml/index.html

See, lots of XML stuff for Python :-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From rahul at photino.sid.rice.edu  Thu Feb  4 11:29:02 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] better link to Python-XML
In-Reply-To: <36B96606.27415BA7@bc.edu>
Message-ID: <Pine.LNX.4.05.9902041022310.17332-100000@photino.sid.rice.edu>

On Thu, 4 Feb 1999, J.W. Bizzaro wrote:

> Here is a link to the Python-XML "home page", which I guess is difficult to find
> from the XML-SIG page I wrote about earlier:
> 
>     http://www.python.org/topics/xml/index.html
> 
> See, lots of XML stuff for Python :-)

My reason for wanting to use Perl in specific parts of the software are
not because of a lack of support in Python for specific libraries, but
because of the purpose of the language. Perl is *designed* for processing
text, Python is like a normal programming language, designed for
calculation.
If we have all of our communication between the various tools in pure XML,
we can use any language we want for the tools. As for C, that should
really only be used in the processor-intensive routines, most of which
would be called from the Python scripts (as they are handling the 
computation).
Perl would only be use when we want to input text and manipulate it into
another text format, e.g. my project, the web interface. That is something
where the sloppiness of Perl becomes useful and almost essential. In
Python, the project is doable, but not nearly as easy because Python
wasn't meant to do all of this stuff, or at least not as much as Perl was
meant to do it.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
      Version 10.423.999.211011001.23.20110101.042
      (c)1996-1998, All rights reserved.
      Disclaimer available upon request.

From bizzaro at bc.edu  Thu Feb  4 17:25:21 1999
From: bizzaro at bc.edu (bizzaro@bc.edu)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] better link to Python-XML
In-Reply-To: <Pine.LNX.4.05.9902041022310.17332-100000@photino.sid.rice.edu>
Message-ID: <MailDrop1.2d7j-PPC.990204182521@x1-6-00-05-02-58-7f-3c.bc.edu>

On Thu, 4 Feb 1999 10:29:02 -0600 (CST) rahul@photino.sid.rice.edu
(Rahul Jain) wrote:

>My reason for wanting to use Perl in specific parts of the software
are
>not because of a lack of support in Python for specific libraries,
but
>because of the purpose of the language. Perl is *designed* for
processing
>text, Python is like a normal programming language, designed for
calculation.

I don't see such a big difference between Perl and Python regarding
their text handling capabilities.  Python, like any good UNIX
scripting language, uses ASCII as a standard mean of
communication...so it has to be designed for it.

However, since your project is dealing with even more text than the
others, by having to manipulate HTML, you have a good point about
needing the best tool for the job.

>If we have all of our communication between the various tools in pure
>XML, we can use any language we want for the tools.

...providing they can communicate with the Paos object server.

> As for C, that should
>really only be used in the processor-intensive routines, most of
which
>would be called from the Python scripts (as they are handling the
computation).

Right.

>Perl would only be use when we want to input text and manipulate it
into
>another text format, e.g. my project, the web interface. That is
something
>where the sloppiness of Perl becomes useful and almost essential. In
>Python, the project is doable, but not nearly as easy because Python
>wasn't meant to do all of this stuff, or at least not as much as Perl
was
>meant to do it.

Actually, your project, the Web interface, is not a part of the "core
distribution" of Loci, as I've been defining it.  It is the core that
I am really fighting over trying to keep it all Python.  The Web
interface is the way for non-UNIX users (and UNIX users who don't have
Loci installed) to access all/most of the analysis algorithms that
will be tied into Loci via Internet servers.  So, I think you can use
whatever programming language you want to.

But I think you'll find that we may come up with Python alternatives
to the Perl tools you'll be using.  Also, you want to be sure you can
tie into the Paos server, so some Python may be necessary.


Jeff
bizzaro@bc.edu

From bizzaro at bc.edu  Wed Feb 10 11:02:48 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] GTK port to BeOS et al
Message-ID: <36C1ADA8.6E258A8F@bc.edu>

Locians,

Here is an article about the porting of GTK to the BeOS.  This is of interest to
us because the plan for Loci is to develop under Python-GTK and wait for GTK to
migrate to non-UNIX systems.  I have been confident that the great interest in
GTK will solve much of the portability issue for us.

    http://www.benews.com/story/?ID=623


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Wed Feb 10 14:44:29 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] Hmmm
Message-ID: <36C1E19D.7B8D1EFE@bc.edu>

Locians,

An interesting footnote:

You may have seen the link on Loci overview page that sends you to the page for
GCG Wisconsin Package pricing.  The very first thing I say against GCG is how
terribly expensive it is...$10,000 for this and that.

    http://www.gcg.com/ordering/price_schedule.html

Well, that page _used_to_ give prices.  They now have a list of phone numbers. 
It makes me wonder if we've been spotted ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Feb 15 20:16:09 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:16 2006
Subject: [Pipet Devel] goings on
Message-ID: <36C8C6D9.2D2B2F0@bc.edu>

Hello Locians!

I guess we've all been kind of quiet.  I'm working on a better description of
the project for the Web site.  This is in preparation of some announcements I
hope to post at some relevant places on the Internet.  Once posted, I hope we
can pick up some more developers, particularly people with experience in GTK. 
We have many more projects than people at this time.  I also hope to get a new
projects/TODO list to you guys soon.

Also, I still haven't heard back from Peter Rice of the EMBOSS project. 
Everything is really quiet there.

For your reading pleasure, I found an interesting editorial by Ajay Shah that
seems to hit on some of the key features of our project.  Here is an excerpt:

(http://ny.us.mirrors.freshmeat.net/news/1998/11/15/911138358.html)

Strategies for building applications software

  * Model 1: Clean core, with third party extensions

The development model which fits open source the best, of course, is something
like GIMP or Emacs, where a technically solid core is extensible by third
parties. This is the most parallelisable development style which obtains the
maximum human inputs from across the globe with minimal problems of
coordination.

If such a design can be applied to build a product, then I believe that `open
source' always wins because of the range of extensions, and the code quality
therein. The entry barrier of knowledge required to obtain the thrills of
producing useful code is very low with the scripting languages used in such
situations - as compared with starting from scratch writing in C. Hence it's
easier for the project to recruit developers. I suspect this design will work
for a spreadsheet and (to some extent) for a presentation program, but not
really for a word processor.

  * Model 2: Moving the application onto the network

The second way in which open source can make inroads is by making an established
product category obsolete. If personal finance programs turn into Internet sites
then the personal finance category ceases to exist. I have seen applications
which are painful attempts at putting databases (on CD or on hard disk) for
local querying under Microsoft Windows. This is ultimately obsolete because it's
so much more sensible to simply query this same data over the Internet.

Open source developers are in a unique position to apply this principle. Open
source developers are innovative, and highly knowledgeable about the Internet.
Open source developers have no qualms about cannibalising existing product
lines, a hurdle which limits innovation with many shareholder-owned companies.
To take a standalone application and convert it into an Internet service scores
high marks on the coolness scale; it'd attract development talent.

To the extent that innovative open source developers migrate existing
application categories into Internet versions, the problem of replicating
existing software is sidestepped. Of course, if Microsoft is able to own basic
protocols of the Internet or of Internet commerce, then Internet applications
could be even more closed than traditional MS Windows applications. Microsoft
has thus far had a near--zero impact upon protocol or technology development in
the context of the Internet, so this is not going to be easy for them.

  * Model 3: Applications which implement 20% of the features which account for
90% of the use

Every software product manager knows the misery of seeing 90% of users use only
20% of the features. I believe this is the direction from which new projects can
rapidly come up against well-established incumbents.

I feel there is something misplaced about the debates about whether `open
source' applications software match the features of mainstream commercial
products. A product which contains 20% of the features of a mainstream word
processor is adequate at the low-end market, since the bulk of the low-end
market never uses the complex features anyway. New projects should work to
carefully isolate the features which the `open source' applications should
match.

It can't be very difficult for the wizards to hack up a filter which logs the
features used by existing word processor users. Such a program, runing at
workplaces all over the world, would yield data about the features that are
useful versus the features that aren't. This is reminiscent of the discovery, in
the days that preceded RISC, that compilers were only utilising a small core of
the instruction set.

I suspect that a program which implements one-fifth the complexity accounts for
90% of the usage. A clean reimplementation of these one--fifth of the features
would be lean and bug-free when compared with the bloated implementations that
are presently found with commercial user applications.

If this conjecture is on track, it implies that Microsoft's marketing department
is confused in what they're trying on applications software complexity. I
believe that 90% of humans will enjoy a lean word processor (with one-fifth the
features of existing GUI word processors) and the remaining 10% would be better
off with TeX.

Free, as in zero dollars

rms has talked at length about the issue of freedom, not price. I agree with him
on the way his argument applies to the development process. However, when we
discuss large-scale adoption by computer users worldwide, I wonder if we're
losing sight of the power of `free', as in zero dollars. If there was one thing
I was surprised to not see in the `haloween memo', it was the discomfort that
Microsoft must feel when competing against a price tag of 0. It is common to ask
whether linux, apache etc. are beating Microsoft technically, and generally the
answers are in the affirmative both on product and on development process. The
debate is incomplete unless we also factor in the price at which users access
the alternative products.

This is where the "20% of the features" product becomes compelling. The
competition is not between a 100% product and a 20% product at the same price.
The competition is between a 100% product at commercial prices versus a 20%
product at zero cost. Users would have to really want the remaining 80% of the
features to put up the money for commercial software. My suspicion is that the
fraction of users who use those remaining 80% of the features is around 10%.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Feb 18 10:09:37 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] [Fwd: EMBOSS]
Message-ID: <36CC2D31.5E9025A2@bc.edu>

Peter Rice from EMBOSS finally sent me a message.  It seems they're making
progress.

Jeff
bizzaro@bc.edu
-------------- next part --------------
An embedded message was scrubbed...
From: Peter Rice <pmr@sanger.ac.uk>
Subject: EMBOSS
Date: Thu, 18 Feb 1999 09:21:13 GMT
Size: 2018
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990218/0e0fc19b/attachment.mht
From bizzaro at bc.edu  Thu Feb 18 10:49:28 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] Re: EMBOSS
References: <199902180921.JAA09093@scarp.sanger.ac.uk>
Message-ID: <36CC3688.89D0DA6@bc.edu>

Peter Rice wrote:

> Richard Durbin tells me you thought I had disappeared :-)

Yes, I was beginning to wonder ;-)

> We are busy working on an EMBOSS release. We now have 5 more
> folk working on applications in Hinxton, with more to join soon.

Great.  TULIP/Loci is slowly progressing.  We have 10 people on the list now,
plus we are collaborating with Harry Mangalam of the tacg project:

    http://hornet.bio.uci.edu/~hjm/projects/tacg/tacg2.main.html

Much of our work lately has been innovating a new object management or workflow
system, which is the real meat of the project.  We have drafted the developer of
the Paos project, Carlos Maltzahn:

    http://www.cs.colorado.edu/~carlosm/software.html

We hope TULIP/Loci will be a framework for connecting bioinformatics and
structural biology programs of any type to a central GUI.  And we still think
Loci and EMBOSS can collaborate on this, since our projects are complementary
not competing.

I invite you to subscribe to our mailing list.  Send an e-mail to
majordomo@busboy.sped.ukans.edu with "subscribe tulip-list" in the message body.

I'd like to know if I can subscribe to your "closed" emboss-dev mailing list...?

> 
> I have been very busy with documentation and support for them.
> There is a release 0.0.4 on our FTP server which is a nightly dump
> of the current sources. Watch for changes in the file size to catch
> new versions as it does not change every day.
> 
> ftp://ftp.sanger.ac.uk/pub/pmr/emboss/EMBOSS_0.0.4.tar.gz
> 
> We will be reviewing the documentation next week before releasing it
> to the rest of the world. It may take a few extra days to patch it up.

We look forward to it.  Have you made any changes to the design since we last
communicated, in December?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Feb 18 15:54:19 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] reply from Peter
Message-ID: <36CC7DFB.206B303B@bc.edu>

Locians,

Attached is the reply from Peter.

Peter, tacg may "overlap" EMBOSS, but Loci will not.  Loci is only concerned
with developing a framework for communication between tools, plus a set of small
sequence/structure visualization/manipulation tools.  Larger analysis programs
will come from elsewhere (such as tacg and EMBOSS).  We will not be creating
anything new in that respect.

Possibly the first thing we would like to implement from the EMBOSS project is
Ajax/ACD.  We have a "locus" being developed by Justin Bradford called
"Gatekeeper", which will act as a gateway between loci and command-line analysis
tools.  Gatekeeper needs to convert queries/requests from Loci into command-line
standard-in (much like Ajax) plus convert standard-out into XML.


Jeff
bizzaro@bc.edu
-------------- next part --------------
An embedded message was scrubbed...
From: Peter Rice <pmr@sanger.ac.uk>
Subject: Re: EMBOSS
Date: Thu, 18 Feb 1999 16:03:37 GMT
Size: 2407
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990218/79e9f418/attachment.mht
From bizzaro at bc.edu  Thu Feb 18 16:09:48 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] correction
Message-ID: <36CC819C.439247CA@bc.edu>

I guess it is both Justin and Harry that will be developing Gatekeeper.  I need
to get a new project list out :-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Feb 23 19:18:30 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] more nice interfaces
Message-ID: <36D34556.673C4A2B@bc.edu>

Thomas,

Attached are screenshots of some more interfaces you might want to look at.  One
is of a sequence editor, and the other is of a sequence aligner.  Both are for
Windows, but they are very much along the line of what I was thinking of:
publication-quality WYSIWYG tools.

These pics are from a commercial package called Vector NTI Suite, by InforMax. 
I don't know the exact price, but it is in the $1,000s, since they sent me an
e-mail about a $700 discount.  Here is the Web site:

    http://www.informaxinc.com/vntisuite/index.html

They have a downloadable demo, if anyone is interested.

BTW, how's the sequence editor coming along?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqedit.gif
Type: image/gif
Size: 30399 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990224/4028d437/seqedit.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqalign.gif
Type: image/gif
Size: 31233 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990224/4028d437/seqalign.gif
From bizzaro at bc.edu  Thu Feb 25 23:27:27 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] stuff
Message-ID: <36D622AF.AC5A67EC@bc.edu>

Locians,

Tidbits:

We're getting an Internet line put in at UMass Lowell for the project.  And a
friend of mine is donating a Linux server to use until Ken Marx or I purchase a
new one.  We may see it up and running in a couple weeks.

Plus I thought you'd like to read an interesting comparison between Python and
Perl at the LinuxWorld Web site:

    http://linuxworld.com/linuxworld/expo/lw-python.html?0225


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From Thomas.Sicheritz at molbio.uu.se  Fri Feb 26 09:20:43 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] more nice interfaces
In-Reply-To: <36D34556.673C4A2B@bc.edu>
References: <36D34556.673C4A2B@bc.edu>
Message-ID: <14038.43467.520971.536971@beagle.bmc.uu.se>

Hej again,

First, I don't remeber if I ahve allready replied to this ... :-)

 > Attached are screenshots of some more interfaces you might want to look at.  One
 > is of a sequence editor, and the other is of a sequence aligner.  Both are for
 > Windows, but they are very much along the line of what I was thinking of:
 > publication-quality WYSIWYG tools.

If you strip the windows feel and look ... ok. - but I thought we were
going  to make something more gimpish ...

Another commerc. application:
A former colleague send me this screendump (lousy quality) 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: xwd.gif
Type: image/gif
Size: 128305 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990226/518e40a0/xwd.gif
-------------- next part --------------


 > BTW, how's the sequence editor coming along?
I just returned to work. I have just started to take a look into python and 
converted my biowish C module to a python extension.
if anyone is interested: http://evolution.bmc.uu.se/~thomas/tulip/

Questions:
 * how can I combine a python module with a python class definition
   I want to add python code to the c-module ...
 * how can I implement this tcl code in python ?
   foreach i "reverse coplement antiparallel" {
       puts [eval bb_sequence.$i $seq]
   }
 * what minimum set do I need for compiling gnome canvas ?
   I really dont want to compile all possible (sound,game ..) modules on my 
   solarisbox ...

c ya
-thomas


-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From hinsen at cnrs-orleans.fr  Fri Feb 26 09:51:50 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] more nice interfaces
In-Reply-To: <14038.43467.520971.536971@beagle.bmc.uu.se>
	(Thomas.Sicheritz@molbio.uu.se)
References: <36D34556.673C4A2B@bc.edu> <14038.43467.520971.536971@beagle.bmc.uu.se>
Message-ID: <199902261451.PAA15154@dirac.cnrs-orleans.fr>

> Questions:
>  * how can I combine a python module with a python class definition
>    I want to add python code to the c-module ...

Sorry, I don't understand what you are trying to do. Something
with Python and C and modules... Could you give a more detailed
description?

>  * how can I implement this tcl code in python ?
>    foreach i  "reverse coplement antiparallel" {
>        puts [eval bb_sequence.$i $seq]
>    }

I'd have to know what the Tcl code means! I suppose it's a loop
over three strings, which in Python is

for i in ["reverse" "coplement" "antiparallel"]:
    ....

But I don't understand the stuff with "puts" etc.

>  * what minimum set do I need for compiling gnome canvas ?
>    I really dont want to compile all possible (sound,game ..) modules on my 
>    solarisbox ...

You mean Python modules? Certainly no more than what is activated
by default in the Python distribution. About Gnome, I don't know...

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hjm at cx408397-a.irvn1.occa.home.com  Fri Feb 26 12:18:12 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] more nice interfaces
In-Reply-To: <14038.43467.520971.536971@beagle.bmc.uu.se>
Message-ID: <Pine.LNX.3.96.990226091628.12115A-100000@cx408397-a.irvn1.occa.home.com>

You're right - that is a lousy screen shot :), but brightened up, it becomes
readable, and it actually looks pretty nice - what is the application?  It
would be nice to know what functionality underlies the pretty face.

hjm

On Fri, 26 Feb 1999 Thomas.Sicheritz@molbio.uu.se wrote:

> Hej again,
> 
> First, I don't remeber if I ahve allready replied to this ... :-)
> 
>  > Attached are screenshots of some more interfaces you might want to look at.  One
>  > is of a sequence editor, and the other is of a sequence aligner.  Both are for
>  > Windows, but they are very much along the line of what I was thinking of:
>  > publication-quality WYSIWYG tools.
> 
> If you strip the windows feel and look ... ok. - but I thought we were
> going  to make something more gimpish ...
> 
> Another commerc. application:
> A former colleague send me this screendump (lousy quality) 
> 
> 

Cheers,
Harry

Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com 


From david.lapointe at umassmed.edu  Fri Feb 26 13:33:30 1999
From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu)
Date: Fri Feb 10 19:18:17 2006
Subject: [Pipet Devel] more nice interfaces
In-Reply-To: <Pine.LNX.3.96.990226091628.12115A-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <93307F07DE63D211B2F30000F808E9E525D6CF@edunivexch02.umassmed.edu>

Yes I am curious also. It seems to be java applets. What is BioWeb?


Next:

foreach i  "reverse coplement antiparallel" {
        puts [eval bb_sequence.$i $seq]
	}
I would imagine that bb_sequence.{reverse complement antiparallel} returns
$seq as a reversed, complemented, or reverse-complemented string through
puts ( write string out ).

David

 -----Original Message-----
> From: Harry Mangalam [mailto:hjm@cx408397-a.irvn1.occa.home.com]
> Sent: Friday, February 26, 1999 12:18 PM
> To: tulip-list@busboy.sped.ukans.edu
> Subject: Re: [Pipet Devel] more nice interfaces
>
>
> You're right - that is a lousy screen shot :), but brightened
> up, it becomes
> readable, and it actually looks pretty nice - what is the
> application?  It
> would be nice to know what functionality underlies the pretty face.
>
> hjm
>
> On Fri, 26 Feb 1999 Thomas.Sicheritz@molbio.uu.se wrote:
>
> > Hej again,
> >
> > First, I don't remeber if I ahve allready replied to this ... :-)
> >
> >  > Attached are screenshots of some more interfaces you
> might want to look at.  One
> >  > is of a sequence editor, and the other is of a sequence
> aligner.  Both are for
> >  > Windows, but they are very much along the line of what I
> was thinking of:
> >  > publication-quality WYSIWYG tools.
> >
> > If you strip the windows feel and look ... ok. - but I
> thought we were
> > going  to make something more gimpish ...
> >
> > Another commerc. application:
> > A former colleague send me this screendump (lousy quality)
> >
> >
>
> Cheers,
> Harry
>
> Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com
>
>

From bizzaro at bc.edu  Fri Feb 26 15:49:03 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] more nice interfaces
References: <36D34556.673C4A2B@bc.edu> <14038.43467.520971.536971@beagle.bmc.uu.se>
Message-ID: <36D708BF.44C4AEA8@bc.edu>

Thomas.Sicheritz@molbio.uu.se wrote:

> If you strip the windows feel and look ... ok. - but I thought we were
> going  to make something more gimpish ...

Gimpish, meaning everything gets its own little window?  Yes, unless 2+ things
are much better being in the same window.  A file list on the side, as I keep
seeing, seems to be a convenient feature.

In any case, the most important thing (and this is where comparisons to GIMP
come in) is that the data appears just as it would be printed in a publication. 
So, in a sense, what the users are doing is manipulating a picture, image,
photo, whatever.

> Another commerc. application:
> A former colleague send me this screendump (lousy quality)

I am also interested in just what that is a picture of.  It seems to be a rather
comprehensive little package written in Java.

>  > BTW, how's the sequence editor coming along?
> I just returned to work. I have just started to take a look into python and
> converted my biowish C module to a python extension.
> if anyone is interested: http://evolution.bmc.uu.se/~thomas/tulip/

Grrreat! ;-)

> Questions:
>  * what minimum set do I need for compiling gnome canvas ?
>    I really dont want to compile all possible (sound,game ..) modules on my
>    solarisbox ...

I think I can answer this one!  You need to get just the gnome-libs
distribution.  For Python bindings, you need just gnome-python, which is at
0.100.0 right now I think.

BTW, GTK+ 1.2, and PyGTK 0.5.11 just came out.  gnome-python 0.100.0 comes with
PyGTK 0.5.11.

But following the PyGTK developments closely, I have to warn everyone that there
are some major revisions occuring now, so that something made in PyGTK 0.5.6
will probably need major revisions to work with PyGTK 1.0, when it comes out. 
This should not be a great concern to us since we have almost nothing written. 
But I am still confident that Python-GNOME/GTK is the best path for us.

Along this line, I was reading about Corel's decision to support the WINE
project, which lets Windows programs run on UNIX.  They consider Windows to be
the development/deployment evironment, which is then made "portable and
transparent" by WINE.  I think our use of UNIX works in the reverse.  We can
develop for Python/GTK/GNOME/UNIX, for which there are efforts to port to
Windows, etc.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Feb 26 16:17:53 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] licensing
Message-ID: <36D70F81.7F3E7103@bc.edu>

Locians,

I'm sure you know that Loci/TULIP is supposed to be licensed under the GNU
General Public License (GPL).  But there is also the LGPL or Library GPL.

What is the major difference between these two?  Why does the LGPL exist?  It
turns out that the wording of the GPL prevents programs licensed as such from
being incorporated into non-free or proprietary programs (GPL says that any
project that extends the work covered by GPL must also be GPL).  And this would
cover links to any library.  So, legally, one cannot connect a proprietary
program to a GPL program.  If you guys have been following the debate over KDE
and GNOME, this is at the heart of the issue:  KDE is GPL, but Qt (the library)
is owned by Troll, which is "illegal".

So, what about Loci?  If we use GPL, can just anyone link their apps into it, as
we intended?  No.  But this is where the LGPL comes in.  Knowing how restrictive
it would be licensing libraries under GPL, GNU/FSF made the LGPL.  This simply
removes the clause in GPL that all programs that link to the library/program be
free too.  All other aspects of the GPL remain.

GTK and GNOME, by the way, are LGPL.  But using LGPL doesn't mean your program
is a library.  GNU/FSF is actually going to change the name of LGPL to "Lesser
GPL".

Therefore, I think we should license Loci under LGPL.  This is an important
issue to settle now, even though Loci is vaporware, because the source code will
be available as soon as it is written.  For example, Thomas's sequence editor is
somewhat non-vapor.

The good news is, Harry, tacg won't have to be GPL to be "a part of" Loci.  We
wrote before about tacg's license, how it restricts commercial use/distribution.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From david.lapointe at umassmed.edu  Fri Feb 26 16:23:36 1999
From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
Message-ID: <93307F07DE63D211B2F30000F808E9E525D6D1@edunivexch02.umassmed.edu>

I think this is where the Green GIF came from. Pictures at 10:00!

http://www.informaxinc.com/ssbm/ssbm.html

David Lapointe
Manager - Research Computing Services
UMass Medical School
Worcester, MA 01655
508/856-5141


From bizzaro at bc.edu  Fri Feb 26 16:36:53 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] and still more licensing
Message-ID: <36D713F5.BC37BEBF@bc.edu>

By the way, every source file that we generate must include this copyright
statement from GNU.

Of course you can use your name for name of author, but please include The BIC
Group, which is the rest of us.  Example:

Copyright (C) 1999 by Konrad Hinsen and The BIC Group


-------------------------cut---------------------------------


Copyright (C) <year> by <name of author>

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Library General Public License for more details.

You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the
Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.


-------------------------cut---------------------------------


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Fri Feb 26 16:38:54 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
In-Reply-To: <93307F07DE63D211B2F30000F808E9E525D6D1@edunivexch02.umassmed.edu>
Message-ID: <Pine.LNX.4.10.9902261337390.13745-100000@cx408397-a.irvn1.occa.home.com>

Ahh yes, Informax's new Oracle-based infosystem - starting at $2M.  It
better be good...  Still, nice of them to do interface prototyping for
us.. ;)

hjm


On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote:

/I think this is where the Green GIF came from. Pictures at 10:00!
/
/http://www.informaxinc.com/ssbm/ssbm.html
/
/David Lapointe
/Manager - Research Computing Services
/UMass Medical School
/Worcester, MA 01655
/508/856-5141
/
/

Cheers,
Harry

Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com 


From bizzaro at bc.edu  Fri Feb 26 17:06:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
References: <93307F07DE63D211B2F30000F808E9E525D6D1@edunivexch02.umassmed.edu>
Message-ID: <36D71AD4.99D145A9@bc.edu>

Jeeeez!  Does the concept seem a little familiar?  BTW, this is the same company
from which I got the first pics.

You know, I have been thinking seriously about taking Loci one step further and
making it a system for Internet-wide research collaboratives, between loosely
affiliated people.  It's something I still have to clear with Ken Marx, but I
was thinking that we, The BIC Group, could use Loci to collaborate on some
"open" research projects, making an "open laboratory" that treats scientific
research like a GNU software project.  Any thoughts?


Jeff
bizzaro@bc.edu


david.lapointe@umassmed.edu wrote:
> 
> I think this is where the Green GIF came from. Pictures at 10:00!
> 
> http://www.informaxinc.com/ssbm/ssbm.html
> 
> David Lapointe
> Manager - Research Computing Services
> UMass Medical School
> Worcester, MA 01655
> 508/856-5141

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Feb 26 17:09:19 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
References: <Pine.LNX.4.10.9902261337390.13745-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36D71B8E.68A02D59@bc.edu>

Harry Mangalam wrote:
> 
> Ahh yes, Informax's new Oracle-based infosystem - starting at $2M.  It
> better be good...

$2,000,000 or $2,000???  For how many users?  I can hardly believe it's 2
million.

> Still, nice of them to do interface prototyping for
> us.. ;)

Yes, hehehe ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Fri Feb 26 17:13:39 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
In-Reply-To: <36D71B8E.68A02D59@bc.edu>
Message-ID: <Pine.LNX.4.10.9902261410470.13860-100000@cx408397-a.irvn1.occa.home.com>

I'm not sure it still stands, but their previous promo listed this as part
of a $2M (that's MILLION) system for bioinformatics.  You had to buy the
oracle db from them as well as pay substantial support costs.

cf Incyte's system which uses SGI's Mineset for $1M -$2M/year and it
doersn;t sound so bizar .. oops .. strange.

hjm


On Fri, 26 Feb 1999, J.W. Bizzaro wrote:

/Harry Mangalam wrote:
/> 
/> Ahh yes, Informax's new Oracle-based infosystem - starting at $2M.  It
/> better be good...
/
/$2,000,000 or $2,000???  For how many users?  I can hardly believe it's 2
/million.
/
/> Still, nice of them to do interface prototyping for
/> us.. ;)
/
/Yes, hehehe ;-)
/
/
/Jeff
/-- 
/J.W. Bizzaro                  Phone: 617-552-3905
/Boston College                mailto:bizzaro@bc.edu
/Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
/--
/

Cheers,
Harry

Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com 


From david.lapointe at umassmed.edu  Fri Feb 26 17:14:03 1999
From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
In-Reply-To: <36D71AD4.99D145A9@bc.edu>
Message-ID: <93307F07DE63D211B2F30000F808E9E525D6D4@edunivexch02.umassmed.edu>

Yeah, I realized that just after I sent that message.

$2M ? Seems like a lot but if you've invested $20 million( or more)
in sequencing hardware what's $2M to make it work?

Are you talking about Collaboratories? That is an interesting concept.


David

David Lapointe
Manager - Research Computing Services
UMass Medical School
Worcester, MA 01655
508/856-5141


> -----Original Message-----
> From: J.W. Bizzaro [mailto:bizzaro@bc.edu]
> Sent: Friday, February 26, 1999 5:06 PM
> To: tulip-list@busboy.sped.ukans.edu
> Subject: Re: [Pipet Devel] Check out this URL
>
>
> Jeeeez!  Does the concept seem a little familiar?  BTW, this
> is the same company
> from which I got the first pics.
>
> You know, I have been thinking seriously about taking Loci
> one step further and  making it a system for Internet-wide research
collaboratives,
> between loosely affiliated people.  It's something I still have to clear
with
> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to
> collaborate on some "open" research projects, making an "open laboratory"
that
> treats scientific research like a GNU software project.  Any thoughts?
>
>
> Jeff
> bizzaro@bc.edu
>

From hjm at cx408397-a.irvn1.occa.home.com  Fri Feb 26 17:58:08 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Check out this URL
In-Reply-To: <93307F07DE63D211B2F30000F808E9E525D6D4@edunivexch02.umassmed.edu>
Message-ID: <Pine.LNX.4.10.9902261428300.13918-100000@cx408397-a.irvn1.occa.home.com>

Hi Again,

   Well, this wasn;t part of my original interest in this group and may be
well-suited for it, but let me describe one of the things I'm
working on (partially supported by National Center for Genomic Resources
(NCGR, out of Santa Fe, NM) in support of a yeast genomics project at UC
Irvine.

The UCI group has gotten an Affymetrix Genechip machine and is busy
subjecting yeast to various stresses, generating whole-genome datasets for
time points along this stress.  I'm building a relational database with a
web interface that will suck up those datasets (and be amenable to accepting
data from other such gene expression studies) and allow it to be queried on
various params, as well as subjecting the returned values to various
statistical analyses with the stats language 'R' (a clone of S/SPlus), using
gnuplots for the simple outputs, VRML for complex viz's.  

Because the size of the datasets are so large (6k orfs x 4 timepoints, plus
associated pointers, descriptors, images, etc) and the number of them is
going to be pretty big, I'm using mysql as a prototyping system, with perl
glue, talking thru Apache/FASTCGI, replacing the perl with C as I
identify bottlenecks. There will be a generic interface to commandline apps
(other clustering routines, tacg, clustalw, blast, etc, so that it can
become pretty extensible.  NCGR may rewrite it at commercial
quality to support their plant genomics project, but I get to do the fun
part...

I hadn't considered it, but you bring up the possibility of using such a
system as a collaboratory by making the analyses persistent in some way,
either as paths thru an analysis or the analysis itself (altho that would
get very large very fast) so that they might be re-used or extended by
others interested in the topic.  Or maybe just the paths thru an analysis
would be an important resource - if I could somehow record the 'analysis
track' that users took, I could identify, then automate them so that the
whole pathway could be boiled down to a button.

This is WELL off the LOCI topic, but perhaps the 2 could be designed to
communicate at some level.
As I said, it was never the intent for the above-described project to use
LOCI, but if they can be made to better co-exist so much the better.

Cheers
Harry

On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote:

/Yeah, I realized that just after I sent that message.
/
/$2M ? Seems like a lot but if you've invested $20 million( or more)
/in sequencing hardware what's $2M to make it work?
/
/Are you talking about Collaboratories? That is an interesting concept.
/
/
/David
/
/David Lapointe
/Manager - Research Computing Services
/UMass Medical School
/Worcester, MA 01655
/508/856-5141
/
/
/> -----Original Message-----
/> From: J.W. Bizzaro [mailto:bizzaro@bc.edu]
/> Sent: Friday, February 26, 1999 5:06 PM
/> To: tulip-list@busboy.sped.ukans.edu
/> Subject: Re: [Pipet Devel] Check out this URL
/>
/>
/> Jeeeez!  Does the concept seem a little familiar?  BTW, this
/> is the same company
/> from which I got the first pics.
/>
/> You know, I have been thinking seriously about taking Loci
/> one step further and  making it a system for Internet-wide research
/collaboratives,
/> between loosely affiliated people.  It's something I still have to clear
/with
/> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to
/> collaborate on some "open" research projects, making an "open laboratory"
/that
/> treats scientific research like a GNU software project.  Any thoughts?
/>
/>
/> Jeff
/> bizzaro@bc.edu
/>
/

Cheers,
Harry

Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com 


From bizzaro at bc.edu  Fri Feb 26 19:16:26 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] collaboratories - was Check out this URL
References: <Pine.LNX.4.10.9902261428300.13918-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36D7395A.96CD0CCE@bc.edu>

Hello Harry.

It does sound as if the project you described is a great example of how Loci
could be used as a collaboratory.  Even if the base installation of Loci does
not include every tool needed to do this, the license would allow NCGR to extend
it and repackage it...providing the original Loci code remains LGPL.

I don't know if they'd like to sell it (you said they'd like to rewrite it at
commercial quality), but contrary to popular belief, GNU programs can be sold,
just like Linux.  The key is that you don't make it proprietary; you're just
selling packaged media and support.

***We're hitting on an important strategy here for Loci.  What is most important
is that Loci becomes ubiquitous and highly accepted.  By not restricting
commercial use or redistribution, we're going a long way toward that goal.

Personally, I don't care about getting rich off of anything.  But anyone can
make money from Loci, by distributing CD-ROM's, manuals, etc.  I think even
selling server time for those server-side analyses is an option.

Yeah, any kind of collaboratory may be implemented once we set Loci up to do
that sort of thing.  I don't see it being a big step beyond the whole concept of
a distributed workflow system.  Public or private, open or closed, we can do
it.  Harry, were you referring to an open collaboratory or a closed one?

Can you guys imagine the impact this could have on the field if Loci were to be
successful?


Jeff
bizzaro@bc.edu


Harry Mangalam wrote:
> 
> Hi Again,
> 
>    Well, this wasn;t part of my original interest in this group and may be
> well-suited for it, but let me describe one of the things I'm
> working on (partially supported by National Center for Genomic Resources
> (NCGR, out of Santa Fe, NM) in support of a yeast genomics project at UC
> Irvine.
> 
> The UCI group has gotten an Affymetrix Genechip machine and is busy
> subjecting yeast to various stresses, generating whole-genome datasets for
> time points along this stress.  I'm building a relational database with a
> web interface that will suck up those datasets (and be amenable to accepting
> data from other such gene expression studies) and allow it to be queried on
> various params, as well as subjecting the returned values to various
> statistical analyses with the stats language 'R' (a clone of S/SPlus), using
> gnuplots for the simple outputs, VRML for complex viz's.
> 
> Because the size of the datasets are so large (6k orfs x 4 timepoints, plus
> associated pointers, descriptors, images, etc) and the number of them is
> going to be pretty big, I'm using mysql as a prototyping system, with perl
> glue, talking thru Apache/FASTCGI, replacing the perl with C as I
> identify bottlenecks. There will be a generic interface to commandline apps
> (other clustering routines, tacg, clustalw, blast, etc, so that it can
> become pretty extensible.  NCGR may rewrite it at commercial
> quality to support their plant genomics project, but I get to do the fun
> part...
> 
> I hadn't considered it, but you bring up the possibility of using such a
> system as a collaboratory by making the analyses persistent in some way,
> either as paths thru an analysis or the analysis itself (altho that would
> get very large very fast) so that they might be re-used or extended by
> others interested in the topic.  Or maybe just the paths thru an analysis
> would be an important resource - if I could somehow record the 'analysis
> track' that users took, I could identify, then automate them so that the
> whole pathway could be boiled down to a button.
> 
> This is WELL off the LOCI topic, but perhaps the 2 could be designed to
> communicate at some level.
> As I said, it was never the intent for the above-described project to use
> LOCI, but if they can be made to better co-exist so much the better.
> 
> Cheers
> Harry
> 
> On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote:
> 
> /Yeah, I realized that just after I sent that message.
> /
> /$2M ? Seems like a lot but if you've invested $20 million( or more)
> /in sequencing hardware what's $2M to make it work?
> /
> /Are you talking about Collaboratories? That is an interesting concept.
> /
> /
> /David
> /
> /David Lapointe
> /Manager - Research Computing Services
> /UMass Medical School
> /Worcester, MA 01655
> /508/856-5141
> /
> /
> /> -----Original Message-----
> /> From: J.W. Bizzaro [mailto:bizzaro@bc.edu]
> /> Sent: Friday, February 26, 1999 5:06 PM
> /> To: tulip-list@busboy.sped.ukans.edu
> /> Subject: Re: [Pipet Devel] Check out this URL
> />
> />
> /> Jeeeez!  Does the concept seem a little familiar?  BTW, this
> /> is the same company
> /> from which I got the first pics.
> />
> /> You know, I have been thinking seriously about taking Loci
> /> one step further and  making it a system for Internet-wide research
> /collaboratives,
> /> between loosely affiliated people.  It's something I still have to clear
> /with
> /> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to
> /> collaborate on some "open" research projects, making an "open laboratory"
> /that
> /> treats scientific research like a GNU software project.  Any thoughts?
> />
> />
> /> Jeff
> /> bizzaro@bc.edu
> />
> /
> 
> Cheers,
> Harry
> 
> Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Feb 26 20:28:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] language lawyers
Message-ID: <36D74A2C.5B2D9432@bc.edu>

For you language lawyers out there, more of what I just wrote about GPL vs. LGPL
can be found at the following sites:

Richard Stallman argues libraries should be GPL:
    http://www.gnu.org/philosophy/why-not-lgpl.html

Eric Kidd rebuts, says use LGPL:
    http://www.randomhacks.com/~emk/why-lgpl-good.html


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hjm at cx408397-a.irvn1.occa.home.com  Sat Feb 27 10:40:39 1999
From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] collaboratories - was Check out this URL
In-Reply-To: <36D7395A.96CD0CCE@bc.edu>
Message-ID: <Pine.LNX.4.10.9902270717140.16778-100000@cx408397-a.irvn1.occa.home.com>

On Sat, 27 Feb 1999, J.W. Bizzaro wrote:

/It does sound as if the project you described is a great example of how Loci
/could be used as a collaboratory.  Even if the base installation of Loci does
/not include every tool needed to do this, the license would allow NCGR to extend
/it and repackage it...providing the original Loci code remains LGPL.

I have to make a distinction here - NCGR is paying me to develop code
that does a particular job in support of a project they're very interested
in.  While I have my own agenda in terms of freedom and redistribution of
code (to which they're surprisingly open), they have their own agenda.  They
are a non-profit org, but they were set up as a hot-house agency to spin off
for-profits if possible.  SO while they plan to make all the services they
develop freely available to the public, they want to reserve the right to
spin off a com to exploit that code for companies that want to replicate the
system behind their firewall.  Therefore, there are some additional problems
in basing their code on LOCI.

I know that they're aware of LOCI because I told them about it (one of my
functions is to find out about ideas out there that seem to be worth paying
attention to, like gnome and other CORBA services, like BSML, like Bioperl,
etc that look like they are worthwhile), but it's up to their executive to
decide which to support.

So while I support the idea of LOCI, it and will spend time trying to
integrate aspects of the genex db with LOCI, it doesn;t mean that NCGR will
officially support it.  The problem with who owns intellectual property is
HUGE in SW (I just resigned from UCI because of it to work on NCGR's
project), so don't go looking for large developers to leap onto the freee
software bandwagon - there is huge resistancce, especially from their legal
depts.  Yhe success of Redhat and Gnu/Linux is changing that, but slowly. 
I'm counting on it b/c I'm starting a company to try to do (sort of) the
same thing, with my software - the core software is free, but I'll sell
support, customization, and interface components to those who want/need
them)...

That said, for what NCGR wants to do, it seems to me that the software is
almost incidental; what they're really selling is the integration technology
and support (not unlike Redhat itself).  They COULD give the software,
charge only for support and that would in fact make more $ for
them, as they would then benefit from other free software developers to
contribute to the code base.

/
/I don't know if they'd like to sell it (you said they'd like to rewrite it at
/commercial quality), but contrary to popular belief, GNU programs can be sold,
/just like Linux.  The key is that you don't make it proprietary; you're just
/selling packaged media and support.

EXACTLY.  You put the words right in my mouth ;).

/***We're hitting on an important strategy here for Loci.  What is most important
/is that Loci becomes ubiquitous and highly accepted.  By not restricting
/commercial use or redistribution, we're going a long way toward that goal.
/
/Personally, I don't care about getting rich off of anything.  But anyone can
/make money from Loci, by distributing CD-ROM's, manuals, etc.  I think even
/selling server time for those server-side analyses is an option.
/
/Yeah, any kind of collaboratory may be implemented once we set Loci up to do
/that sort of thing.  I don't see it being a big step beyond the whole concept of
/a distributed workflow system.  Public or private, open or closed, we can do
/it.  Harry, were you referring to an open collaboratory or a closed one?

/Can you guys imagine the impact this could have on the field if Loci were to be
/successful?

Yup - it would have a big impact, but there are lots of similar projects
going on in 'coopetition', so it's important to actually produce something.
Bio-perl has already started regular dists of their package, and EMPRESS
will start soon.

It's demo or die.  (I'm one to speak - I really haven't done anthing yet
except flap my lips (they move when I type), but as soon as I finish the
commandline version of tacg V3 (in final packaging for beta release and
documentation now), I'll put some time on trying to LOCI-lize it.)

/Jeff
/bizzaro@bc.edu
/
/
/Harry Mangalam wrote:
/> 
/> Hi Again,
/> 
/>    Well, this wasn;t part of my original interest in this group and may be
/> well-suited for it, but let me describe one of the things I'm
/> working on (partially supported by National Center for Genomic Resources
/> (NCGR, out of Santa Fe, NM) in support of a yeast genomics project at UC
/> Irvine.
/> 
/> The UCI group has gotten an Affymetrix Genechip machine and is busy
/> subjecting yeast to various stresses, generating whole-genome datasets for
/> time points along this stress.  I'm building a relational database with a
/> web interface that will suck up those datasets (and be amenable to accepting
/> data from other such gene expression studies) and allow it to be queried on
/> various params, as well as subjecting the returned values to various
/> statistical analyses with the stats language 'R' (a clone of S/SPlus), using
/> gnuplots for the simple outputs, VRML for complex viz's.
/> 
/> Because the size of the datasets are so large (6k orfs x 4 timepoints, plus
/> associated pointers, descriptors, images, etc) and the number of them is
/> going to be pretty big, I'm using mysql as a prototyping system, with perl
/> glue, talking thru Apache/FASTCGI, replacing the perl with C as I
/> identify bottlenecks. There will be a generic interface to commandline apps
/> (other clustering routines, tacg, clustalw, blast, etc, so that it can
/> become pretty extensible.  NCGR may rewrite it at commercial
/> quality to support their plant genomics project, but I get to do the fun
/> part...
/> 
/> I hadn't considered it, but you bring up the possibility of using such a
/> system as a collaboratory by making the analyses persistent in some way,
/> either as paths thru an analysis or the analysis itself (altho that would
/> get very large very fast) so that they might be re-used or extended by
/> others interested in the topic.  Or maybe just the paths thru an analysis
/> would be an important resource - if I could somehow record the 'analysis
/> track' that users took, I could identify, then automate them so that the
/> whole pathway could be boiled down to a button.
/> 
/> This is WELL off the LOCI topic, but perhaps the 2 could be designed to
/> communicate at some level.
/> As I said, it was never the intent for the above-described project to use
/> LOCI, but if they can be made to better co-exist so much the better.
/> 
/> Cheers
/> Harry
/> 
/> On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote:
/> 
/> /Yeah, I realized that just after I sent that message.
/> /
/> /$2M ? Seems like a lot but if you've invested $20 million( or more)
/> /in sequencing hardware what's $2M to make it work?
/> /
/> /Are you talking about Collaboratories? That is an interesting concept.
/> /
/> /
/> /David
/> /
/> /David Lapointe
/> /Manager - Research Computing Services
/> /UMass Medical School
/> /Worcester, MA 01655
/> /508/856-5141
/> /
/> /
/> /> -----Original Message-----
/> /> From: J.W. Bizzaro [mailto:bizzaro@bc.edu]
/> /> Sent: Friday, February 26, 1999 5:06 PM
/> /> To: tulip-list@busboy.sped.ukans.edu
/> /> Subject: Re: [Pipet Devel] Check out this URL
/> />
/> />
/> /> Jeeeez!  Does the concept seem a little familiar?  BTW, this
/> /> is the same company
/> /> from which I got the first pics.
/> />
/> /> You know, I have been thinking seriously about taking Loci
/> /> one step further and  making it a system for Internet-wide research
/> /collaboratives,
/> /> between loosely affiliated people.  It's something I still have to clear
/> /with
/> /> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to
/> /> collaborate on some "open" research projects, making an "open laboratory"
/> /that
/> /> treats scientific research like a GNU software project.  Any thoughts?
/> />
/> />
/> /> Jeff
/> /> bizzaro@bc.edu
/> />
/> /
/> 
/> Cheers,
/> Harry
/> 
/> Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com
/
/-- 
/J.W. Bizzaro                  Phone: 617-552-3905
/Boston College                mailto:bizzaro@bc.edu
/Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
/--
/

Cheers,
Harry

Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com 


From bizzaro at bc.edu  Sat Feb 27 17:47:06 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] collaboratories - was Check out this URL
References: <Pine.LNX.4.10.9902270717140.16778-100000@cx408397-a.irvn1.occa.home.com>
Message-ID: <36D875EA.30CBB70B@bc.edu>

Harry Mangalam wrote:
> 
> So while I support the idea of LOCI, it and will spend time trying to
> integrate aspects of the genex db with LOCI, it doesn;t mean that NCGR will
> officially support it.  The problem with who owns intellectual property is
> HUGE in SW (I just resigned from UCI because of it to work on NCGR's
> project), so don't go looking for large developers to leap onto the freee
> software bandwagon - there is huge resistancce, especially from their legal
> depts.  Yhe success of Redhat and Gnu/Linux is changing that, but slowly.
> I'm counting on it b/c I'm starting a company to try to do (sort of) the
> same thing, with my software - the core software is free, but I'll sell
> support, customization, and interface components to those who want/need
> them)...

Oh yeah, intellectual property is a very big deal everywhere, with companies and
schools getting all of the rights and employees and students getting none...or
so it seems.  One thing I have to take care of regarding Loci is getting a
disclaimer from UMass Lowell.  The University is much better about these things
than some of the really big schools, like UCI or even MIT.

FSF actually says something about this...

    You should also get your employer (if you work as a programmer)
    or your school, if any, to sign a "copyright disclaimer" for
    the library, if necessary. Here is a sample; alter the names: 

    Yoyodyne, Inc., hereby disclaims all copyright interest in
    the library `Frob' (a library for tweaking knobs) written
    by James Random Hacker.

    signature of Ty Coon, 1 April 1990
    Ty Coon, President of Vice

Whether or not it is that simple, I'll have to see.  I guess I'll be visiting
the Chancellor soon.

Something that is unclear to me, however, and maybe you guys can give me your
opinion, is if a copyrighter can change the GNU license.  In other words, just
because UMass Lowell may be one of the copyrighters on Loci, does that mean they
can decide to make it proprietary?  The GNU license appears to be immutable, and
if so, should it matter if the institution shares the copyright?

Have you thought about this with respect to NCGR, Harry?

> Yup - it would have a big impact, but there are lots of similar projects
> going on in 'coopetition', so it's important to actually produce something.
> Bio-perl has already started regular dists of their package, and EMPRESS
> will start soon.

I don't think I am aware of EMPRESS.  Do you have a URL?  Unless you mean
EMBOSS?

> It's demo or die.  (I'm one to speak - I really haven't done anthing yet
> except flap my lips (they move when I type), but as soon as I finish the
> commandline version of tacg V3 (in final packaging for beta release and
> documentation now), I'll put some time on trying to LOCI-lize it.)

Yeah, me too.  I'd like to start pumping some code out, but where do I begin? 
One big issue now is that most of the GUI tools will share a common core.  We
should be sure not to reinvent that for each tool.  So, in a sense, Thomas will
be breaking ground for most others.  Also, I want to get a GTK hacker to work on
the bechtop (GCL or "Work Flow Diagram").


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Sat Feb 27 18:24:27 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Loci markup language and infrastructure things
Message-ID: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu>

I've been busy with school lately (in fact, I really should be studying   
right now for an exam Monday), so I haven't gotten much of anything done. 
           
However, I've been reading over BioML, BSML, and the bioperl site, and I  
have some ideas about the markup language.

First, reading BSML files makes a lot of things seem overly complex.
Second, BioML looks cleaner, but I hate the organism tag enclosing 
everything. While that information could be useful for a structure or 
sequence, it would be better to reference it, rather than enclosing it.

Also, BSML doesn't seem to cover protein sequences, while BioML does.
However, BSML does seem to allow for more thorough definition of features
in the sequence.

Aesthetically, I prefer BioML over BSML, and I think that's just because
BioML uses different tag names for various features of the sequences,
while BSML just has a general feature tag with lots of options.

Also, BSML, and even BioML to a degree, try to define display information
as well. Do we want that in our ML? I can't see why we would need it,
since we have an intelligent client. BSML seems to be intended for direct
display in a generic BSML browser, in addition to defining data. BSML has
a second DTD with that layout stuff removed, however. BioML has tags for
forms, which seem totally unnecessary.

I would like to effectively merge BioML and BSML, incorporating protein
sequence information and feature specification, and use more descriptive
tag names (like BioML) for defining the sequences and features. I wouldn't
put any layout information in. Does anyone think we need it?

Also, for structure, there don't appear to be any MLs even attempting to
do this, with the exception of CML. So, my idea is to take the PDB file
format and XMLize it. If any of you know any glaring holes in PDB let
me know, and we can work around those.

Also, these sections will need some tags to allow for defining
relationships between multiple objects. It might describe homology,
alignment, etc. between two or more sequences, or for structures, it
might relate 3D similarities, regions of high interaction (binding
probabilities through free energy calculations), and other similar
concepts.

Generated data should also return information about the analysis process,
like the algorithm used, statistical probabilities, etc.

Now that is just the "data" section. A LociML file will have a variety of
additional information as well. We'll probably need control, status, and 
query sections, too.
Control has to describe the analysis pathway.
Status is information concerning the data returned at each analysis step.
Query has to hold the actual query at each step.

Now, the control section is fairly straightforward, as is the status
section, although both will need to be fairly flexible. Incidental
information concerning an analysis that might be useful to the client. I
don't really have any good examples, but I imagine some will come up.

The query section is more complex, but here's my idea:
When the user creates the analysis pathway, all of query commands are
generated at that time as well, but it can make use of variables
referencing data from queries in earlier stages. The workflow system will
fill in the variables for a query before sending it off for that analysis.

Here is a crude example:

<control current="1">
 <step stage="1" server="paosp://some.host/whatever"/ id="q1">
 <step stage="2" server="paosp://foo/" id="q2">
</control>
<status>
 <step id="q1" state="processing">
  <message>Analyzing sequence...</message>
 </step>
</status>

<data>
 <step id="q1">
  <protein>lots of other stuff here</protein>
  <!-- note: this wouldn't show up until after the q1 step is done -->
 </step>
</data>

<query>
 <step id="q1">
  <data>
   <dna id="aaa">....</dna>
  </data>
  <operation type="translate">
   <input id="aaa"/>
   <some other data for translation>
  </operation>
 </step>
 <step id="q2">
  <data>
   <protein id="bbb">
    <variable>data.step[q1].protein.*</variable>
   </protein>
   <protein id="ccc">...</protein>
  </data>
  <operation type="homology">
   <input id="bbb"/>
   <input id="ccc"/>
   <other stuff for query>
  </operation>
 </step>
</query>


There obviously needs to be a lot of detail filled in here, but I think
this gets my basic idea across.

Also, there's no particular reason there couldn't be multiple entries for 
a stage. That's why I defined every component of a query by an id, rather 
than by it's stage. Since the first few steps of an analysis pathway 
might not depend on previous data, we could have multiple steps occuring
simultaneously. There's no reason for all of the steps to be sequential.
This would be especially true of a pathway which had a number of database
queries. Actually, we could probably get rid of the whole ordering thing
completely, since the wfs could just figure out dependencies by the 
variable references in the queries. Of course, the interface for this
could be more complicated...

Also, it probably makes more sense to move all of input data into the
data section, and have the query reference it there. Also, the format
of specifying variables and input in general will probably need to be
improved.

In terms of implementation, I imagine it would work like this:
The wfs identifies queries it can currently run, and creates a 
Paos object on the specified server, giving it only the portions
of the xml file necessary for it to run (query and relevant data
sections). The input data goes into one attribute of the Paos object.
The remote analysis system creates a second attribute containing for
status tags, and when it's complete, it creates an output section
with it's new data. The wfs can frequently grab the status attribute
on the object, since it's small, and update it's local copy for any
clients who want to know what is going on. When the analysis is 
complete, the wfs grabs the output attribute off of the remote 
object and updates it's copy, and moves on. The remote analysis 
system just drops it's object once it has been acknowledged by the 
wfs.

Any thoughts on the markup language, the query syntax, variable
references, asynchronous analyses, or the workflow system (wfs, if
you were wondering what I was refering to)? I'll start my BioML, 
BSML, PDB merger/implementation/cleanup. Once we agree on how Loci
works underneath, a rough wfs/paos/gatekeeper system can be set up
fairly quickly. Then just a quick python wrapper around some analysis 
tool and a simple viewer program will give us a functioning system 
(not a particularly easy to use system, but functioning 
nonetheless).

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Sat Feb 27 23:11:04 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:18 2006
Subject: [Pipet Devel] Loci markup language
References: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu>
Message-ID: <36D8C1D8.DF9B7F3@bc.edu>

I'll reply to a few LocusML (or LociML?) points here and then infrastructure
points in another e-mail.

Justin Bradford wrote:
> 
> I've been busy with school lately (in fact, I really should be studying
> right now for an exam Monday), so I haven't gotten much of anything done.

That's fine.  We all appreciate the work you've done, especially this
e-mail/book ;-)

> First, reading BSML files makes a lot of things seem overly complex.
> Second, BioML looks cleaner, but I hate the organism tag enclosing
> everything. While that information could be useful for a structure or
> sequence, it would be better to reference it, rather than enclosing it.

I think the importance of an organism tag depends on the audience.  Most
biochemists couldn't care less about the organism.  But to microbiologists,
geneticists and the like, this information is very important.  What matters to
you, it seems, is that the organism information _has_to_be_ present.  But I
think as long as it _can_ be inserted at some level, we'll do fine.

> Also, BSML doesn't seem to cover protein sequences, while BioML does.
> However, BSML does seem to allow for more thorough definition of features
> in the sequence.

Of course we'll take the best of both worlds :-)

> Also, BSML, and even BioML to a degree, try to define display information
> as well. Do we want that in our ML? I can't see why we would need it,
> since we have an intelligent client.

No, we don't need display information.  You're absolutely correct that each
locus should be intelligent enough to know how to interpret the data that are
targeted for it (and what locus to pass other data types to if they are
encountered).

> I would like to effectively merge BioML and BSML, incorporating protein
> sequence information and feature specification, and use more descriptive
> tag names (like BioML) for defining the sequences and features. I wouldn't
> put any layout information in. Does anyone think we need it?

By layout, you mean display information?  I don't think we need it.

> Also, for structure, there don't appear to be any MLs even attempting to
> do this, with the exception of CML. So, my idea is to take the PDB file
> format and XMLize it. If any of you know any glaring holes in PDB let
> me know, and we can work around those.

Now Konrad's ears should have perked up here.  He'll have the final word on a
format for structural information, but I recall he does not like any of the
well-accepted formats for structure, especially not PDB.  This is Konrad's
chance to show the world what the perfect description of structure looks like
;-)

What I do want, with respect to PDB's however, is an easy way to translate from
PDB to LocusML, because PDB is the major format for 3D structure right now.

So, Konrad, can you help us make LocusML the perfect structural (among other
things) ML?  Is there a way we can change CML to describe biomacromolecules the
way you want it to?

> Also, these sections will need some tags to allow for defining
> relationships between multiple objects. It might describe homology,
> alignment, etc. between two or more sequences, or for structures, it
> might relate 3D similarities, regions of high interaction (binding
> probabilities through free energy calculations), and other similar
> concepts.

Yes.  That's something I haven't thought much about.

> Generated data should also return information about the analysis process,
> like the algorithm used, statistical probabilities, etc.

Yes, great!


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Feb 28 03:44:27 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] infrastructure things
References: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu>
Message-ID: <36D901EB.BAAA97B0@bc.edu>

Justin et al,

I'll get to what you wrote about infrastructure things in the next e-mail, but
first I'd like to make a few points.

You wrote an e-mail a couple months ago about how you think the workflow system
would function, from the point of view of an XML file created by the Benchtop,
monitored by the Benchtop/CGL, traveling to the Gatekeeper, and back.  But I
want to bring up some questions about the true mobility of the XML file.

Just how confusing would everything get if each locus got posession of either
(1) the one-and-only XML file or (2) just a copy.

Problem with case (1): What if the XML needs to be split for forked analyses?

E.g., the user has a sequence, gets an aligned sequence from a database, and now
wants to do something else with the new sequence.

What happens to the XML file?  Do we make a copy of the entire file (case 2!) to
be used with the new sequence, or do we cut the XML file in half...so to speak?

Problem with case (2): Will the information ever have to be sewn back together?

E.g., there is a fork in an analysis, as described for case (1).

Will we ever have to consider the whole analysis a single XML file, bringing all
pieces back together?  Or do we consider each fork/child to be a new analysis,
never to rejoined with its parent?

Another confusing point is the idea that the XML file actually moves.  I
referred to it once as a basketball that is passed between players, but everyone
should be comfortable with the fact that each file will remain where it was
created...AND THIS IS TRUE EVEN FOR SERVER-SIDE ANLYSES!

The way I see it, we have a Python program on the client machine that handles
all of the interactions with the Gatekeeper.  So, EACH LOCUS WON'T HAVE TO DEAL
DIRECTLY WITH THE GATEKEEPER!   They deal with "Porta Internet", which makes
everything transparent or seem like it is all on the client machine.  (The same
is true for Porta CORBA.)

Maybe instead of basketball players tossing a basketball around, the baskbetball
tosses the players around :-)

You wrote about how Benchtop/GCL "updates a local copy" of the XML.  I
personally think each locus should update the XML it is working with (the
"Locus-In-Charge" or LIC), by itself, so as not to overwhelm Benchtop.  (Realize
that there should be no limit on the number of loci/processes spawned for forked
analyses, so Benchtop would have to handle in some cases a lot of
communication...maybe hundreds of XML files...in a word, it would be a
"bottleneck".)  In the case of server-side analyses, going thru Porta Internet
and Gatekeeper, Gatekeeper should not use Benchtop to update the XML and take
the next step, rather I think it should be Porta Internet, the LIC.

Now what about those spawned loci/processes?  If Benchtop were the only LIC, all
spawned processes would be the first generation children of Benchtop.  But if
each locus were capable of spawning its own child, and that child capable of
spawning its own, the workload would just be much more distributed--each locus
would be an LIC.  One thing leads to another, if you recall that song by The
Fixx.

At this point, we need to answer the questions I proposed above.  I think if the
analysis needs to fork, the LIC should copy the XML, put relevant instructions
in each copy, spawn two loci for the task, handing the copies over.  (And maybe
at this point the parent can be closed.)  But the copies won't be automatically
sewn back togther at the end (we could have an option to combine XML's, as an
afterthought here).

But, in the way I think things should work, would those little drawings on the
Benchtop give the user an indication of what is going on, or what the progress
is?  You thought that this is how the Benchtop would operate, which is a very
good idea.  And we do need _some_ sort of communication for this.  So if we let
the LIC's handle everything, can each LIC just send a simple "hello" back to the
Benchtop?  If a new child is spawned, maybe the first thing that child does is
tell the Benchtop what it is and where it came from...Maybe this function could
also be used to build a database of loci available to the users...? 

In short, we are thinking of a highly distributed set of intelligent agents
existing all over.  Benchtop should be the user's eyes to the whole world of
Loci, not the brain of Loci.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Feb 28 04:07:46 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] infrastructure things
References: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu> <36D901EB.BAAA97B0@bc.edu>
Message-ID: <36D90762.7DD44C37@bc.edu>

You know, what I just wrote says nothing about how Paos and the workflow system
fit into the whole scheme.  And, in fact, some of my points may be point-less in
light of the wfs.

For example, Justin describes the wfs as being reponsible for launching loci; I
said each locus launches it's own child.

We'll just have to look more closely at Paos and wfs as they develop and see how
these issues should be resolved.  In any case, Benchtop is a no-brainer ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Feb 28 05:13:19 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu>
Message-ID: <36D916BF.7852DB4B@bc.edu>

Okay, so what you once considered the responsibility of the Benchtop/GCL, you
now consider that of the wfs.

So, I'll try to look at the XML as an object rather than a file this time.  And
wfs launches the apps, not individual loci/clients.


Justin Bradford wrote:

> Generated data should also return information about the analysis process,
> like the algorithm used, statistical probabilities, etc.

Of course we should make a sharp division at the start between data that is
biological and data that is for the workflow system.  I even imagine the very
top of the file/object to be all workflow stuff.

> Now that is just the "data" section. A LociML file will have a variety of
> additional information as well. We'll probably need control, status, and
> query sections, too.
> Control has to describe the analysis pathway.

...description of the whole pathway

> Status is information concerning the data returned at each analysis step.

...what was collected along the way

> Query has to hold the actual query at each step.

...what still needs to be collected

> Now, the control section is fairly straightforward, as is the status
> section, although both will need to be fairly flexible. Incidental
> information concerning an analysis that might be useful to the client. I
> don't really have any good examples, but I imagine some will come up.

Status should contain the "log" of the analyses.  Status will say what control
says, among other things, when the final destination is reached.  So, at the
final destination, control is irrelevant.

> The query section is more complex, but here's my idea:
> When the user creates the analysis pathway, all of query commands are
> generated at that time as well, but it can make use of variables
> referencing data from queries in earlier stages. The workflow system will
> fill in the variables for a query before sending it off for that analysis.

Sure.  IOW, the query section is dynamic.

> Here is a crude example:
> 
> <control current="1">
>  <step stage="1" server="paosp://some.host/whatever"/ id="q1">
>  <step stage="2" server="paosp://foo/" id="q2">
> </control>
> <status>
>  <step id="q1" state="processing">
>   <message>Analyzing sequence...</message>
>  </step>
> </status>

So status is reported back, via wfs, to the Benchtop, a la my previous e-mail. 
Good.

> <data>
>  <step id="q1">
>   <protein>lots of other stuff here</protein>
>   <!-- note: this wouldn't show up until after the q1 step is done -->
>  </step>
> </data>
> 
> <query>
>  <step id="q1">
>   <data>
>    <dna id="aaa">....</dna>
>   </data>
>   <operation type="translate">
>    <input id="aaa"/>
>    <some other data for translation>
>   </operation>
>  </step>
>  <step id="q2">
>   <data>
>    <protein id="bbb">
>     <variable>data.step[q1].protein.*</variable>
>    </protein>
>    <protein id="ccc">...</protein>
>   </data>
>   <operation type="homology">
>    <input id="bbb"/>
>    <input id="ccc"/>
>    <other stuff for query>
>   </operation>
>  </step>
> </query>

Nice.  But how will Paos handle this?  Are we looking at some major changes to
Paos itself?

> Also, there's no particular reason there couldn't be multiple entries for
> a stage.

stage == step?  Or I guess a step can contain different stages...

> That's why I defined every component of a query by an id, rather
> than by it's stage. Since the first few steps of an analysis pathway
> might not depend on previous data, we could have multiple steps occuring
> simultaneously. There's no reason for all of the steps to be sequential.

Right.  That'd save time, but be difficult to manage.  Now we're talking about
concurrency.

> This would be especially true of a pathway which had a number of database
> queries. Actually, we could probably get rid of the whole ordering thing
> completely, since the wfs could just figure out dependencies by the
> variable references in the queries. Of course, the interface for this
> could be more complicated...

Hmmm.  Now are we dealing with the whole forking/sewing issue here?  Once an XML
object is split up, will it have to be put back together again?

> Also, it probably makes more sense to move all of input data into the
> data section, and have the query reference it there. Also, the format
> of specifying variables and input in general will probably need to be
> improved.

I was thinking about keeping workflow data together.

Also, ID numbers could be longer and randomly generated.

> In terms of implementation, I imagine it would work like this:
> The wfs identifies queries it can currently run

How?  By the database of available loci/clients?

> and creates a
> Paos object on the specified server

...via Porta Internet or whatever, as long as it appears transparent.

> giving it only the portions
> of the xml file necessary for it to run (query and relevant data
> sections).

Yeah, this is where I see Porta Internet or Gatekeeper filtering out stuff the
server-side algorithms/databases don't need.

> The input data goes into one attribute of the Paos object.
> The remote analysis system creates a second attribute containing for
> status tags, and when it's complete, it creates an output section
> with it's new data.

Okay.

> The wfs can frequently grab the status attribute
> on the object, since it's small, and update it's local copy for any
> clients who want to know what is going on.

Yes.  Wonderful!

> When the analysis is
> complete, the wfs grabs the output attribute off of the remote
> object and updates it's copy, and moves on. The remote analysis
> system just drops it's object once it has been acknowledged by the
> wfs.

Okay.

> Any thoughts on the markup language, the query syntax, variable
> references, asynchronous analyses, or the workflow system (wfs, if
> you were wondering what I was refering to)?

Just work with Konrad on the markup of structure.

> I'll start my BioML,
> BSML, PDB merger/implementation/cleanup. Once we agree on how Loci
> works underneath, a rough wfs/paos/gatekeeper system can be set up
> fairly quickly. Then just a quick python wrapper around some analysis
> tool and a simple viewer program will give us a functioning system
> (not a particularly easy to use system, but functioning
> nonetheless).

I'm glad you think this will go quickly.  Are you able to work with Paos as it
is, or will Carlos need to make changes?  How comfortable are you with the
Python?


Buh-bye!
Jeff
bizzaro@bc.edu

From justin at ukans.edu  Sun Feb 28 05:52:17 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <36D916BF.7852DB4B@bc.edu>
Message-ID: <Pine.OSF.4.03.9902280410120.11892-100000@busboy.sped.ukans.edu>

> Okay, so what you once considered the responsibility of the Benchtop/GCL, you
> now consider that of the wfs.

It's a separate process, but like an extension of the Benchtop/GCL, it
just handles all of the little details behind the scenes.

> So, I'll try to look at the XML as an object rather than a file this
> time.  And
> wfs launches the apps, not individual loci/clients.

I think I need a clarification on the meaning of a locus. My understanding
was a locus is term covering an instance of Porta/Gatekeeper/analysis
tool(s) on a computer somewhere. It's just a place where analysis is done,
and that's it. The wfs system worries about direction of the whole object.

> Of course we should make a sharp division at the start between data that is
> biological and data that is for the workflow system.  I even imagine the very
> top of the file/object to be all workflow stuff.

I agree. I had intended to make a generic C/BS/BioML2 format first. Then
this would be what's under the data sections, so LociML would just
encapsulate that portion of it.

As for the algorithm and statistics stuff, I was thinking of that as
something potentially useful to keep in with sequence/structure/relation
data. For instance, it could be useful to know a structure was derived
using some particular X-ray crystallography technique. That stuff is
related to Loci.

> > Control has to describe the analysis pathway.
> ...description of the whole pathway

Yeah. Just a XML version of the GCL view.

> > Status is information concerning the data returned at each analysis step.
> ...what was collected along the way

More specifically, how the collection went. Actual data would get stuck
back in a block under <data>.

> Nice.  But how will Paos handle this?  Are we looking at some major changes to
> Paos itself?

I don't think so. My intention was to have the wfs only send what that
specific analysis needed. Input, output, and status each have an attribute
on the object. The wfs sends input once, reads output once (and merges the
new data with the full object), and gets constant updates on the status
attribute. So whenever the analysis tool changes status, the wfs knows,
and the benchtop can be updates (assuming any are paying attention at the
moment).

> > Also, there's no particular reason there couldn't be multiple entries for
> > a stage.
> 
> stage == step?  Or I guess a step can contain different stages...

The stage, step, and order terminology I used in the example XML are all
bad and need to be changed, but the idea was just that multiple things
could be happening at once.

> Right.  That'd save time, but be difficult to manage.  Now we're talking about
> concurrency.
> 
> Hmmm.  Now are we dealing with the whole forking/sewing issue here?
> Once an XML
> object is split up, will it have to be put back together again?

Concerning the dependency scheduling, it wouldn't be difficult to manage
this from a central server, as I was envisioning the wfs. If an object
roamed independently, it would be difficult to manage, unless we had it
all of the threads regroup when data needed to be rejoined.

> I was thinking about keeping workflow data together.
> 
> Also, ID numbers could be longer and randomly generated.

Yes, it needs to be restructured. Many of the ID numbers would be assigned
by the GCL to XML query translator.

> > The wfs identifies queries it can currently run
> 
> How?  By the database of available loci/clients?

However GCL defines it. I imagine explicitly naming a server as one
option, or just specifying a type of analysis, where the wfs will use a
list of some kind to find one available.
But before it contacts the server, it has to make sure it has all of the
data available for its query (check dependencies).

> > giving it only the portions
> > of the xml file necessary for it to run (query and relevant data
> > sections).
> 
> Yeah, this is where I see Porta Internet or Gatekeeper filtering out
> stuff the
> server-side algorithms/databases don't need.

I had imagined the wfs server doing that, but I imagine are difference is
in semantics. Basically, the analysis tool just gets what it needs.

> Just work with Konrad on the markup of structure.

Ok Konrad, I'm interested in hearing your ideas on describing structures.

> I'm glad you think this will go quickly.  Are you able to work with
> Paos as it
> is, or will Carlos need to make changes? 

At the very least, I can pass blocks of XML through attributes on the paos
object. It would be interesting to see if the Paos object could be a
mirror of the XML, however.
So:
<status>
 <message>Ok</message>
</status>
Becomes:
paos_object.status.message = 'Ok'

But I can work without that.

> How comfortable are you with the
> Python?

I miss enclosed blocks, but otherwise I'm doing ok.
{
   whitespace   usage should 
      be random  . you can just  parse  around

 it.
}

What odd things amuse me at 5AM.

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Sun Feb 28 06:44:03 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.OSF.4.03.9902280410120.11892-100000@busboy.sped.ukans.edu>
Message-ID: <36D92C03.1AF788EF@bc.edu>

We're both late night or early morning people, huh? :-)

Justin Bradford wrote:

> I think I need a clarification on the meaning of a locus. My understanding
> was a locus is term covering an instance of Porta/Gatekeeper/analysis
> tool(s) on a computer somewhere. It's just a place where analysis is done,
> and that's it. The wfs system worries about direction of the whole object.

"Locus" just means any program or object, and I mean _any_.  The name "Loci"
then emphasizes that this is a distributed system to the extreme.  But usually I
mean a client or server process.

> I agree. I had intended to make a generic C/BS/BioML2 format first. Then
> this would be what's under the data sections, so LociML would just
> encapsulate that portion of it.

Fine with me.

> As for the algorithm and statistics stuff, I was thinking of that as
> something potentially useful to keep in with sequence/structure/relation
> data. For instance, it could be useful to know a structure was derived
> using some particular X-ray crystallography technique. That stuff is
> related to Loci.

Hmmm.  It almost lies between biological and workflow data.  I suppose it could
go either place, but the workflow stuff is just temporary really.  When the data
is to be archived, we don't need to keep old status and query data around.

> > > Status is information concerning the data returned at each analysis step.
> > ...what was collected along the way
> 
> More specifically, how the collection went. Actual data would get stuck
> back in a block under <data>.

Okay.

> > Nice.  But how will Paos handle this?  Are we looking at some major changes to
> > Paos itself?
> 
> I don't think so. My intention was to have the wfs only send what that
> specific analysis needed. Input, output, and status each have an attribute
> on the object. The wfs sends input once, reads output once (and merges the
> new data with the full object), and gets constant updates on the status
> attribute. So whenever the analysis tool changes status, the wfs knows,
> and the benchtop can be updates (assuming any are paying attention at the
> moment).

This brings up a question I wrote at the end of this e-mail.

> > Right.  That'd save time, but be difficult to manage.  Now we're talking about
> > concurrency.
> >
> > Hmmm.  Now are we dealing with the whole forking/sewing issue here?
> > Once an XML
> > object is split up, will it have to be put back together again?
> 
> Concerning the dependency scheduling, it wouldn't be difficult to manage
> this from a central server, as I was envisioning the wfs. If an object
> roamed independently, it would be difficult to manage, unless we had it
> all of the threads regroup when data needed to be rejoined.

Of course we can deal with this after we are comfortable with the basic wfs.

> Yes, it needs to be restructured. Many of the ID numbers would be assigned
> by the GCL to XML query translator.

Okay.

> > > The wfs identifies queries it can currently run
> >
> > How?  By the database of available loci/clients?
> 
> However GCL defines it. I imagine explicitly naming a server as one
> option, or just specifying a type of analysis, where the wfs will use a
> list of some kind to find one available.
> But before it contacts the server, it has to make sure it has all of the
> data available for its query (check dependencies).

Yes.  We define dependencies as data, servers, and clients (loci).

> > > giving it only the portions
> > > of the xml file necessary for it to run (query and relevant data
> > > sections).
> >
> > Yeah, this is where I see Porta Internet or Gatekeeper filtering out
> > stuff the
> > server-side algorithms/databases don't need.
> 
> I had imagined the wfs server doing that, but I imagine are difference is
> in semantics. Basically, the analysis tool just gets what it needs.

Right, we agree on the end but not the mean...We'll sort that out.

> At the very least, I can pass blocks of XML through attributes on the paos
> object. It would be interesting to see if the Paos object could be a
> mirror of the XML, however.
> So:
> <status>
>  <message>Ok</message>
> </status>
> Becomes:
> paos_object.status.message = 'Ok'
> 
> But I can work without that.

That brings up a big question I had, and where I've been getting confused...

Is there really any such thing as an "XML object"?  I mean, XML is a way to save
structured data as a _file_.  Python objects, on the other hand, are data
structures in memory.  We would just be going back and forth between file and
object using XML.

So, where do we really need XML?  Could the data just be a Python object?  If we
need to save the object, I think it can just be "pickled"?

Konrad?


Guten Morgen!
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Sun Feb 28 07:09:18 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <36D92C03.1AF788EF@bc.edu>
Message-ID: <Pine.OSF.4.03.9902280544450.11892-100000@busboy.sped.ukans.edu>

> We're both late night or early morning people, huh? :-)

Yes, I think so. The damned Internet never lets me sleep anymore ;)

> > Concerning the dependency scheduling, it wouldn't be difficult to manage
> > this from a central server, as I was envisioning the wfs. If an object
> > roamed independently, it would be difficult to manage, unless we had it
> > all of the threads regroup when data needed to be rejoined.
> 
> Of course we can deal with this after we are comfortable with the basic wfs.

Well, I'm not sure this has been entirely answered. Will the wfs handle
all the analyses from a single centralized process? Or do you still want
for a decentralized analysis object (pathway), where each node sends the
object to the next node, rather than going back to the wfs each time?

> Yes.  We define dependencies as data, servers, and clients (loci).

What is the distinction between a client and server?
Is the wfs a client and the analysis tool (gatekeeper) a server?

> Is there really any such thing as an "XML object"?  I mean, XML is a
> way to save structured data as a _file_.  Python objects, on the other
> hand, are data structures in memory.  We would just be going back and
> forth between file and object using XML.

Yes. I just tend to think of the Paos object structured like the XML file.
I guess I was basically asking for a DOM interface.

> So, where do we really need XML?  Could the data just be a Python
> object?  If we need to save the object, I think it can just be "pickled"?

Well for internal network stuff, it makes sense to just use the Python
object. Like I said, I imagine it structured something like the XML file I
described earlier. Also, for saving a Loci analysis locally, I would
prefer to see it written out to an XML format. The conversion would be
fairly simple, anyway. Rather than an obscure, semi-binary format, why not
use an easy to read text format? It'll make it easier for non-Loci tools
to get information from our files, too.

But you're right, there's no reason to use XML for anything but files and
maybe drag and drop (but that's not important for now). I wasn't thinking
about that earlier. 

Can Paos support complex objects, though? Actually, can Python for that
matter? Can I have things status.analysis[5].message in both?

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Sun Feb 28 07:56:32 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.OSF.4.03.9902280544450.11892-100000@busboy.sped.ukans.edu>
Message-ID: <36D93D00.7123859B@bc.edu>

Justin Bradford wrote:

> Well, I'm not sure this has been entirely answered. Will the wfs handle
> all the analyses from a single centralized process? Or do you still want
> for a decentralized analysis object (pathway), where each node sends the
> object to the next node, rather than going back to the wfs each time?

My intuition and experience tell me a decentralized pathway will be less
complex, work more efficiently, and be considerably faster.

> What is the distinction between a client and server?
> Is the wfs a client and the analysis tool (gatekeeper) a server?

(sigh) I'm now trying to use your terminology, I think...

    client - process performing analysis or visualization
    server - process controlling workflow and clients

What I was referring to in past e-mails is this...

    client - local machine
    server - remote machine

You can see, mixing these up can be confusing ;-)

> > Is there really any such thing as an "XML object"?  I mean, XML is a
> > way to save structured data as a _file_.  Python objects, on the other
> > hand, are data structures in memory.  We would just be going back and
> > forth between file and object using XML.
> 
> Yes. I just tend to think of the Paos object structured like the XML file.
> I guess I was basically asking for a DOM interface.

_Structured_, but we are parsing then writing.

> > So, where do we really need XML?  Could the data just be a Python
> > object?  If we need to save the object, I think it can just be "pickled"?
> 
> Well for internal network stuff, it makes sense to just use the Python
> object. Like I said, I imagine it structured something like the XML file I
> described earlier. Also, for saving a Loci analysis locally, I would
> prefer to see it written out to an XML format. The conversion would be
> fairly simple, anyway. Rather than an obscure, semi-binary format, why not
> use an easy to read text format? It'll make it easier for non-Loci tools
> to get information from our files, too.
> 
> But you're right, there's no reason to use XML for anything but files and
> maybe drag and drop (but that's not important for now). I wasn't thinking
> about that earlier.

Thinking about this a bit more, we do need to work from a file on disk because
our data can be so large.  If a user has 15-20 GUI loci opened at once, and they
are all from DNA Polymerase PDB's and 100 kb GenBank files, and all of this is
in RAM, we'll hear Scotty in the background saying, "She can't take any more of
this Captain.  She's falling apart at the seams!"

But let's reverse the question.  If we need XML files for (1) working with large
data, (2) passing data across the Internet and to CORBA systems, and (3)
archiving data, then what do we need Paos for?

I know it was my idea to choose Paos, but I'm asking if everyone thinks it fits,
and where it fits, considering the model I've been describing.

> Can Paos support complex objects, though? Actually, can Python for that
> matter? Can I have things status.analysis[5].message in both?

I'm not sure about that structure, but from what I know of Python, it'll handle
anything any modern language can.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sun Feb 28 08:11:46 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.OSF.4.03.9902280544450.11892-100000@busboy.sped.ukans.edu> <36D93D00.7123859B@bc.edu>
Message-ID: <36D94092.E3E60859@bc.edu>

Replying to my own e-mail...

"J.W. Bizzaro" wrote:

> But let's reverse the question.  If we need XML files for (1) working with large
> data, (2) passing data across the Internet and to CORBA systems, and (3)
> archiving data, then what do we need Paos for?

> I know it was my idea to choose Paos, but I'm asking if everyone thinks it fits,
> and where it fits, considering the model I've been describing.

How about having Paos handle the workflow data while XML handles the biological,
instead of defining an XML that mixes the two?

I guess it depends on just how Paos passes the objects.  It would be best if
Paos could be called just once, each time the workflow data needed to be passed,
passing it not only to the next process but to the Benchtop too...hmmm.  I
wouldn't want Loci to be anymore centralized than that.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Sun Feb 28 19:54:22 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <36D94092.E3E60859@bc.edu>
Message-ID: <Pine.GSU.4.05.9902281636000.5677-100001@moet.cs.colorado.edu>


    [J.W. Bizzaro]
    But let's reverse the question.  If we need XML files for (1) working with large
    data, (2) passing data across the Internet and to CORBA systems, and (3)
    archiving data, then what do we need Paos for?
    
    I know it was my idea to choose Paos, but I'm asking if everyone
    thinks it fits, and where it fits, considering the model I've
    been describing.
    
    [Later]
    How about having Paos handle the workflow data while XML handles
    the biological, instead of defining an XML that mixes the two?
    
    I guess it depends on just how Paos passes the objects.  It would
    be best if Paos could be called just once, each time the workflow
    data needed to be passed, passing it not only to the next process
    but to the Benchtop too...hmmm.  I wouldn't want Loci to be
    anymore centralized than that.
    
I totally agree that Paos shouldn't shuffle around real data. I see the
role of Paos as a coordination tool but not as a database management
system. I attached a GIF picture to this mail. This picture contains Gnome
clients, Paos server, and Tool Manager (excuse me if I introduce yet
another set of terms). Gnome clients and Tool Manager are Paos clients. A
Gnome client consists of a GCL editor and progress monitor, among other
things. A Tool Manager 

- parses XML data and forwards it to the actual tool, 
- turn the result of a tool into XML data and send it to another tool 
  manager
- sends status information to a Paos server (e.g. processing started or
  completed, or processing ran out of memory),
- receives notifications from a Paos server (e.g. "suspend", "abort",
  or status query),
- queries a Paos server about where to send results to,

The thin lines are communicating Python objects, the thick
lines communicate XML structures. Note that the destination of Tool
Manager can also be a Gnome client which is used to visualize results.

Another question in the discussion was whether to use Python objects for
communication or XML. XML is safer because it is an accepted and
extensible standard. However, transfering serialized objects was the
performance bottleneck in the Chautauqua workflow system (which uses
Paos) and I introduced a bit of trickery to reduce this overhead. So I
would recommend sticking with Python objects for Paos communications but
use XML for everything else.

Carlos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tulip-architecture.gif
Type: image/gif
Size: 3548 bytes
Desc: architecture
Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990228/060d6d63/tulip-architecture.gif