From bizzaro at bc.edu  Thu Mar 25 19:36:34 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] interesting XML article
Message-ID: <36FAD692.D4AD2758@bc.edu>

Some good points are made in this article about XML and how it is more than
HTML++:

http://www.linuxworld.com/linuxworld/lw-1999-03/lw-03-xml.html?03-25


Jeff
bizzaro@bc.edu

From hortiz at neurobio.upr.clu.edu  Mon Mar 29 23:17:35 1999
From: hortiz at neurobio.upr.clu.edu (Humberto Ortiz Zuazaga)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] Another XML proposal.
Message-ID: <199903300417.AAA03711@chimbo.neurobio.upr.clu.edu>

I've been reading the mail archive for the list, and have seen
BSML-vs-bioML-vs-cml threads in the biowidgets and bioperl lists.

I'd like to propose that Loci use many small XML DTDs instead of
trying for a kitchen sink DTD.

This is in keeping with the philosophy of having many small loci for
analysis and display.  The requirement is that any of our DTDs must be
able to contain objects in any of the others.

Specifically:

we could use a simplified BSML for sequence information (and just sequences).

a different XML dialect could be used for structure information
(including structural annotations in sequences).

separate XML DTDs could be defined for references, options to be passed to a
program, and work paths.

So, here's one way this proposal could work.

I sit down at my computer and start up my Workspace.

I retrieve a nucleotide sequence from Genbank, in genbank format, it
is parsed into several XML objects: a nucleotide sequence object, several
bibliographic reference objects, a protein sequence object for the
"/translation=" feature found in the original genbank file.

Each xml object is displayed on the benchtop by the apropriate locus.

Now I click on the button to perform a restriction map of my sequence.

The workspace contacts the restriction map locus, which returns an XML
object describing the parameters and options this restriction map
locus requires or supports.  An option handling locus can then prompt
me for the enzymes I want to cut with, the output format I prefer,
etc.  The sequence object and the options are then passed back to the
restriction map locus for the analysis.

The restriction map locus can now return the results as several xml
objects:  a bibliographic reference object describing the algorithm
used to perform the analysis; a result object containing the requested
results; a locus object containing the gnome-python source code for
a gui-locus that can display the results.

The workspace can check if it already has a gui-locus that can display
the results, and pases the results to it, or downloads the code and
generates the gui-locus.

As loci are loaded into the workspace, they can register the ability
to handle a particular DTD or set of DTDs.

We also need not pass around the entire XML object each time, for
example only a url for a reference need be included in the results
from an analysis, not the entire paper.

--
Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz@neurobio.upr.clu.edu

From justin at ukans.edu  Tue Mar 30 03:19:20 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] Another XML proposal.
In-Reply-To: <199903300417.AAA03711@chimbo.neurobio.upr.clu.edu>
Message-ID: <Pine.OSF.4.03.9903300049290.10608-100000@busboy.sped.ukans.edu>

> I'd like to propose that Loci use many small XML DTDs instead of
> trying for a kitchen sink DTD.

I agree, and that is basically the way I had been thinking.
Specific descriptions of data should be as small and modular as possible
(sequence, structure, phylogeny, etc). LocusML should also be able to
describe relationships between those pieces of data, if necessary,
however. We might need specific DTDs for relationships (ie. a restriction
map, which contains a number of short sequence components), as a lot of
relationships will be very hard to express generically.

> we could use a simplified BSML for sequence information (and just
> sequences).

I don't like how BSML is structured, but I do like the detail it allows. I
prefer how the inner sections of BioML for its "flow". I had planned to
merge the two, looking more like BioML but with the versatility of BSML.
Also, BSML doesn't cover amino acid sequences (if I remember correctly),
while BioML does. The two different structures probably merit unique DTDs
anyway, though.

> a different XML dialect could be used for structure information
> (including structural annotations in sequences).

Yes. I'm not sure where to begin on structure. Someone here had ideas
on this, but I'm not sure who or what became of them.

Also, it sounds like we'll need a DTD for phylogeny, too. There are
probably others as well, but the concept remains the same. Describe just
the relevant data, and use a unique ID to find reference and data
relations elsewhere.

> separate XML DTDs could be defined for references, options to be passed
> to a program, and work paths.

A generic reference DTD is fairly simple. Describing relationships between
data will take a little more thought.
Loci specific information will probably be filled in as we go further into
development.

Although, just so no one is confused, the XML format is really only for
transfer (data) and storage (both data and Loci info).
Actual Loci info will be kept in the Paos object as attributes rather than
an XML stream that has to be parsed all the time. It will be written out
to XML for non-Paos storage. 
Generic data will always be handled by the Loci framework as XML (since
it's basically meaningless to it), and the data specific tools will handle
it internally in whatever way is appropriate (hash table, binary tree,
etc).

But a DTD is a good way to describe the data Loci uses.

> I retrieve a nucleotide sequence from Genbank, in genbank format, it
> is parsed into several XML objects: a nucleotide sequence object, several
> bibliographic reference objects, a protein sequence object for the
> "/translation=" feature found in the original genbank file.

Exactly. I believe the translation component is the gatekeeper, in Loci
terminology.

> Each xml object is displayed on the benchtop by the apropriate locus.
> Now I click on the button to perform a restriction map of my sequence.

I haven't thought about the UI much yet.

> The workspace contacts the restriction map locus, which returns an XML
> object describing the parameters and options this restriction map
> locus requires or supports. 

_That_ is an interesting idea. I had just been assuming a generic
interface for types of loci (for example, a restriction map locus has
three arguments and it doesn't vary), but rather than having a bunch of
hardcoded loci types, we can query the locus for it's interface (of course
we'll want to cache interfaces).

> An option handling locus can then prompt
> me for the enzymes I want to cut with, the output format I prefer,
> etc.

Going back to Jeff's idea about embedding python in XML, a locus could
return an interface description with UI code to handle the query
configuration (probably optional for exotic cases; most of the time it
would be generic fields with default UI handlers).

> The restriction map locus can now return the results as several xml
> objects:  a bibliographic reference object describing the algorithm
> used to perform the analysis; a result object containing the requested
> results; a locus object containing the gnome-python source code for
> a gui-locus that can display the results.

Before we go overboard with passing interface code around though, I'd like
to strongly encourage the presence of powerful, high-level widgets in the
workspace app. We don't want to be passing around a generic sequence
viewer all the time.

> The workspace can check if it already has a gui-locus that can display
> the results, and pases the results to it, or downloads the code and
> generates the gui-locus.

Like I said just above, I'd like to see a nice API (from the loci 
perspective) for the UI stuff. Ranging from low-level building block
widgets to higher-level generic viewers, as well as the ability to plug-in
additional generic viewers. That way if you're always using some
non-standard locus gui, you can just load the script locally (and even
replace it with faster compiled code).

> As loci are loaded into the workspace, they can register the ability
> to handle a particular DTD or set of DTDs.

Possibly even more than that -- for instance, a loci to handle a specific
relationship between sets of DTDs (I don't have a good example, though).

> We also need not pass around the entire XML object each time, for
> example only a url for a reference need be included in the results
> from an analysis, not the entire paper.

Yes. It was my intention for the workflow system to just give a locus what
it needs (probably by creating a second Paos object). It should present it
with the necessary data and control information, rather than sending the
whole object with potentially extraneous data and control info.
The locus updates the object with status information (recorded to the
master Paos object, which the gui can get info from). And then transmits
the generated data back via Paos. That's consolidated into the master
object and fed to the gui client.


Also, for the Paos representation of the Loci XML info, I was imagining a
DOM-like interface. The XML is represented in a tree.

So, this Loci info:
<query id="aaaa">
 <action>restriction map</action>
 <option name="distinguish enzyme cuts" value="yes">
 <data argument="template">#sequence_id</data>
 <data argument="restriction enzyme">EcoR1</data>
 <data argument="restriction enzyme">BamH1</data>
</query>

becomes this Paos object:

query.id = "aaaa"
query.action = "restriction map"
query.option{distinguish enzyme cuts} = "yes"
query.data{template} = "#sequence_id"
query.data{restriction enzyme} = "EcoR1"
query.data{restriction enzyme} = "BamH1"

where #sequence_id means the XML can be extracted from the Paos data
attribute under the key "sequence_id"

Or something along these lines. This example is missing a lot of things.
I'm not sure how python handles hashes either. This is actually perl/c-ish
here.

(note: Ok, now I'm going to ramble some...)

Although, perhaps we don't even need to bother trying to express the
internal Loci data stuff as XML. Will we ever need to write it out to XML?
Possibly only the actual biological data needs XML expression, just to
facilitate interaction between Loci derived data and non-Loci tools.
Theoretically, we don't need XML for anything, since structures in Paos
could hold all of the biological data, too. It just seems like a good way
to describe things for stuff that isn't entirely internal to Loci. But on
similar grounds, we will need to define the internal Loci info interface
adequately for tools to make use of it, and perhaps an XML representation
of that would make it more clear.

I'd really like to rig up a working demo.  Does anyone have a pretty
simple analysis tool we could use for an example? In particular, the view
of the resulting data should be simple (that's probably where the most
programming is). Actually a restriction map wouldn't be too bad...

Justin Bradford
justin@ukans.edu


From hortiz at neurobio.upr.clu.edu  Tue Mar 30 09:22:35 1999
From: hortiz at neurobio.upr.clu.edu (Humberto Ortiz Zuazaga)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] Another XML proposal. 
In-Reply-To: Your message of "Tue, 30 Mar 1999 02:19:20 CST."
             <Pine.OSF.4.03.9903300049290.10608-100000@busboy.sped.ukans.edu> 
Message-ID: <199903301422.KAA05393@chimbo.neurobio.upr.clu.edu>

> > I'd like to propose that Loci use many small XML DTDs instead of
> > trying for a kitchen sink DTD.

> LocusML should also be able to
> describe relationships between those pieces of data, if necessary,
> however. We might need specific DTDs for relationships (ie. a restriction
> map, which contains a number of short sequence components), as a lot of
> relationships will be very hard to express generically.

Yes, I propose a DTD for each kind of relationship, where for example a 
structural alignment could have a structural alignment DTD, and that DTD 
allowed for embedding a multi-sequence alignemnt entity that in turn contained 
several protein sequence entities, structure entities, each protein sequence 
could contain a set of reference entities.  Each entity could be in a 
different DTD.

We also need a DTD for page or canvas composition of multiple display loci, 
for embeding a figure in a figure, for example.

> I don't like how BSML is structured, but I do like the detail it allows.

I didn't mean BSML specifically, just that a sequence DTD should stick to 
describing only sequence information.

> > a different XML dialect could be used for structure information
> > (including structural annotations in sequences).
> 
> Yes. I'm not sure where to begin on structure. Someone here had ideas
> on this, but I'm not sure who or what became of them.

With my proposal, we can defer on defining a structure DTD until we actually 
have more clue.

> > The workspace contacts the restriction map locus, which returns an XML
> > object describing the parameters and options this restriction map
> > locus requires or supports. 
> 
> _That_ is an interesting idea. I had just been assuming a generic
> interface for types of loci (for example, a restriction map locus has
> three arguments and it doesn't vary), but rather than having a bunch of
> hardcoded loci types, we can query the locus for it's interface (of course
> we'll want to cache interfaces).

The gatekeeper can also handle finding appropriate loci:

workspace says I have a BICML nucleotide sequence v4.1 object, I want to 
perform a restriction map with these enzymes and see the sizes of the digested 
fragments.

A tacg locus on server.example.com can reply saying, I can do the analysis, 
please send me the sequence, and the enzymes you want off of this list, to 
view the output you need a locus that can display v3.5 digest files, here is a 
url for a gnome-python locus for a compatible viewer.

> > An option handling locus can then prompt
> > me for the enzymes I want to cut with, the output format I prefer,
> > etc.
> 
> Going back to Jeff's idea about embedding python in XML, a locus could
> return an interface description with UI code to handle the query
> configuration (probably optional for exotic cases; most of the time it
> would be generic fields with default UI handlers).

Again, we dont have to pass back the UI code, just a URL to it, the workspace 
may well already have a copy locally.  I think it's a bad idea to embed the 
python code in the xml.  It violates the principle that the DTDs should stick 
to the point, and it really gets ugly when you consider the security 
implications.  Locus will ship with loci for displaying many kinds of DTDs, 
and a site manager may well not allow the workspace to download untrusted 
code.  With my proposal, the worspace just has to locate any locus that can 
display the result DTD, you may well have several sequence viewers on your 
machine already.

> > The restriction map locus can now return the results as several xml
> > objects:  a bibliographic reference object describing the algorithm
> > used to perform the analysis; a result object containing the requested
> > results; a locus object containing the gnome-python source code for
> > a gui-locus that can display the results.
> 
> Before we go overboard with passing interface code around though, I'd like
> to strongly encourage the presence of powerful, high-level widgets in the
> workspace app. We don't want to be passing around a generic sequence
> viewer all the time.

That's what I mean. An analysis locus can just say my output is in BICML v3.2 
format, here is the url for a viewer if you don't have one.  The workspace 
then chooses whether or not to retreive the UI code.

> Although, perhaps we don't even need to bother trying to express the
> internal Loci data stuff as XML. Will we ever need to write it out to XML?
> Possibly only the actual biological data needs XML expression, just to
> facilitate interaction between Loci derived data and non-Loci tools.

I argue that all our data structures should be representable as XML.  This 
would let people write loci in any language, export individual components for 
other tools, and facilitate exchange of data.

Storing data in python specific or binary formats restricts your options.

Hopefully, we'll soon be able to embed Loci figures in our gnome word 
processor papers!

-- 
Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz@neurobio.upr.clu.edu


From hinsen at cnrs-orleans.fr  Tue Mar 30 11:52:03 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:06 2006
Subject: [Pipet Devel] Another XML proposal.
In-Reply-To: <199903300417.AAA03711@chimbo.neurobio.upr.clu.edu> (message from
	Humberto Ortiz Zuazaga on Tue, 30 Mar 1999 00:17:35 -0400)
References: <199903300417.AAA03711@chimbo.neurobio.upr.clu.edu>
Message-ID: <199903301652.SAA20878@dirac.cnrs-orleans.fr>

> I'd like to propose that Loci use many small XML DTDs instead of
> trying for a kitchen sink DTD.

I agree. The only reason for having one big DTD is that any combination
of information could easily be integrated into one file. This is useful
for hand-typed material, but hardly matters for computer-generated
data. Let's rather stay flexible.

> This is in keeping with the philosophy of having many small loci for
> analysis and display.  The requirement is that any of our DTDs must be
> able to contain objects in any of the others.

That should always be possible by using namespaces, if I understood
them correctly (which may not be the case!)

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hinsen at cnrs-orleans.fr  Tue Mar 30 11:56:36 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal.
In-Reply-To: <Pine.OSF.4.03.9903300049290.10608-100000@busboy.sped.ukans.edu>
	(message from Justin Bradford on Tue, 30 Mar 1999 02:19:20 -0600
	(CST))
References: <Pine.OSF.4.03.9903300049290.10608-100000@busboy.sped.ukans.edu>
Message-ID: <199903301656.SAA25516@dirac.cnrs-orleans.fr>

> > a different XML dialect could be used for structure information
> > (including structural annotations in sequences).
> 
> Yes. I'm not sure where to begin on structure. Someone here had ideas
> on this, but I'm not sure who or what became of them.

I am still in contact with people at EMBL who are working on
an DTD that is equivalent to mmCIF plus a converter in both
directions (written in Python). There have been some delays
(caused by real work ;-)  but they are optimistic to have it
ready soon. This seems the most promising structure DTD to me.

> Although, perhaps we don't even need to bother trying to express the
> internal Loci data stuff as XML. Will we ever need to write it out to XML?

I'd say everything that is saved to a file for archiving or non-immediate
reuse should be XML, if only to give users a chance to understand
what's inside without any special program!

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Wed Mar 31 03:24:26 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal - Part 1.
References: <Pine.OSF.4.03.9903300049290.10608-100000@busboy.sped.ukans.edu>
Message-ID: <3701DBBA.8CF11553@bc.edu>

Justin Bradford wrote:

> Although, just so no one is confused, the XML format is really only for
> transfer (data) and storage (both data and Loci info).
> Actual Loci info will be kept in the Paos object as attributes rather than
> an XML stream that has to be parsed all the time. It will be written out
> to XML for non-Paos storage.
> Generic data will always be handled by the Loci framework as XML (since
> it's basically meaningless to it), and the data specific tools will handle
> it internally in whatever way is appropriate (hash table, binary tree,
> etc).

We have gone back and forth on this point.  I just want to leave open the
possibility that "workflow data", which is the "Loci info" you are referring to,
may be _actively_ transferred as XML.

If the workflow data is only for archiving, I guess there is some point along
the pathway where the decision is made to start reading/writing workflow as
XML.  Is this something then that we want turned on and off along the path?

If it is kept "on", it will make for a more robust system, in case of a Loci
crash or OS crash.

Just a thought.

> > I retrieve a nucleotide sequence from Genbank, in genbank format, it
> > is parsed into several XML objects: a nucleotide sequence object, several
> > bibliographic reference objects, a protein sequence object for the
> > "/translation=" feature found in the original genbank file.
> 
> Exactly. I believe the translation component is the gatekeeper, in Loci
> terminology.

The "Gatekeeper" is the "Internet Application Broker" or "Locus IAB".  You're
talking about the "Document Translator" or "Locus DT".  (Don't you love these
names? ;-)  I'd actually like to take on the GenBank translation part of this,
since I made a GenBank parser once.

> 
> > Each xml object is displayed on the benchtop by the apropriate locus.
> > Now I click on the button to perform a restriction map of my sequence.
> 
> I haven't thought about the UI much yet.

Each seperable XML object (biodata) will be displayed on the benchtop...as a box
or button.  And yes, you can click on it (maybe even a right-mousebutton click)
to bring up a list of loci/tools that can perform work on that type of data
(this is where we need a local database of available loci and what they can do).

> 
> > The workspace contacts the restriction map locus, which returns an XML
> > object describing the parameters and options this restriction map
> > locus requires or supports.
> 
> _That_ is an interesting idea. I had just been assuming a generic
> interface for types of loci (for example, a restriction map locus has
> three arguments and it doesn't vary), but rather than having a bunch of
> hardcoded loci types, we can query the locus for it's interface (of course
> we'll want to cache interfaces).

Ohhh yes!  This is the database I was just talking about.  Maybe it's a part of
the benchtop, but it keeps track of all loci available to the user and what they
can do.  But instead of the locus being queried when it is about to be used, it
is queried when it becomes accessible to the workspace.

This is important for hot-plugging loci.  The user can add loci while Loci is
running.  Maybe at a certain time interval, the workspace (database part)
queries all accessible loci, and the loci return values informing the workspace
what they can do.

> 
> > An option handling locus can then prompt
> > me for the enzymes I want to cut with, the output format I prefer,
> > etc.
> 
> Going back to Jeff's idea about embedding python in XML, a locus could
> return an interface description with UI code to handle the query
> configuration (probably optional for exotic cases; most of the time it
> would be generic fields with default UI handlers).

Yep.

> 
> > The restriction map locus can now return the results as several xml
> > objects:  a bibliographic reference object describing the algorithm
> > used to perform the analysis; a result object containing the requested
> > results; a locus object containing the gnome-python source code for
> > a gui-locus that can display the results.
> 
> Before we go overboard with passing interface code around though, I'd like
> to strongly encourage the presence of powerful, high-level widgets in the
> workspace app. We don't want to be passing around a generic sequence
> viewer all the time.

Absolutely correct!  I did not want to be passing 100k Python modules to
recreate common code.  I think we should have high-level widgets that may really
be mostly C-GTK binaries wrapped in Python.  Python-GTK bindings (by James
Henstridge) can be used where the user may not have a particular C-GTK
megawidget in their Loci library.

> 
> > The workspace can check if it already has a gui-locus that can display
> > the results, and pases the results to it, or downloads the code and
> > generates the gui-locus.
> 
> Like I said just above, I'd like to see a nice API (from the loci
> perspective) for the UI stuff. Ranging from low-level building block
> widgets

...PyGTK...

> to higher-level generic viewers,

...C-GTK...

> as well as the ability to plug-in
> additional generic viewers. That way if you're always using some
> non-standard locus gui, you can just load the script locally (and even
> replace it with faster compiled code).

I'm not exactly sure what you mean here.


TO BE CONTINUED....


Cheers,
Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From bizzaro at bc.edu  Wed Mar 31 03:40:58 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal - Part deux.
References: <Pine.OSF.4.03.9903300049290.10608-100000@busboy.sped.ukans.edu>
Message-ID: <3701DF9A.F0D8000D@bc.edu>

Justin Bradford wrote:
> 
> > As loci are loaded into the workspace, they can register the ability
> > to handle a particular DTD or set of DTDs.
> 
> Possibly even more than that -- for instance, a loci to handle a specific
> relationship between sets of DTDs (I don't have a good example, though).

Okay.  But again, the Workspace continually updates a database with this
information.

> 
> > We also need not pass around the entire XML object each time, for
> > example only a url for a reference need be included in the results
> > from an analysis, not the entire paper.
> 
> Yes. It was my intention for the workflow system to just give a locus what
> it needs (probably by creating a second Paos object). It should present it
> with the necessary data and control information, rather than sending the
> whole object with potentially extraneous data and control info.
> The locus updates the object with status information (recorded to the
> master Paos object, which the gui can get info from). And then transmits
> the generated data back via Paos. That's consolidated into the master
> object and fed to the gui client.

This reminds me of the Notebook, which we have talked very little about.  It
will give the user a written log, in HTML, of the analyses, but where large
amounts of data have been developed, the HTML gives only a link to the file
(archived XML).  At a later point, the user can view the log using the Notebook,
click on a link, and the archived XML will be brought back to life and sent to
the appropriate view.

Damn this project is complex, but fun! ;-)

> Or something along these lines. This example is missing a lot of things.
> I'm not sure how python handles hashes either. This is actually perl/c-ish
> here.

Python doesn't use symbols to type variables.

> 
> (note: Ok, now I'm going to ramble some...)
> 
> Although, perhaps we don't even need to bother trying to express the
> internal Loci data stuff as XML. Will we ever need to write it out to XML?
> Possibly only the actual biological data needs XML expression, just to
> facilitate interaction between Loci derived data and non-Loci tools.
> Theoretically, we don't need XML for anything, since structures in Paos
> could hold all of the biological data, too. It just seems like a good way
> to describe things for stuff that isn't entirely internal to Loci. But on
> similar grounds, we will need to define the internal Loci info interface
> adequately for tools to make use of it, and perhaps an XML representation
> of that would make it more clear.

Ahh.  The old PAOS vs. XML argument.  It's a good argument.  I think we could go
100% PAOS or 100% XML, and both ways would work.  But I think the combination of
the two can give us some advantages.  The way I see it, and I guess you do too:

    PAOS -> for active communication about the internals of Loci
    XML  -> for archiving and translating bio data

Actually, I'll say it again:  I want PAOS to have built-in XML parsing to make
this more uniform.


Cheers,
Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From bizzaro at bc.edu  Wed Mar 31 04:15:30 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal - Part 3.
References: <199903301422.KAA05393@chimbo.neurobio.upr.clu.edu>
Message-ID: <3701E7B2.4BF2F65A@bc.edu>

Humberto Ortiz Zuazaga wrote:
> 
> Yes, I propose a DTD for each kind of relationship, where for example a
> structural alignment could have a structural alignment DTD, and that DTD
> allowed for embedding a multi-sequence alignemnt entity that in turn contained
> several protein sequence entities, structure entities, each protein sequence
> could contain a set of reference entities.  Each entity could be in a
> different DTD.

I am assuming that multiple DTD's doesn't mean multiple files, one per DTD.

> 
> We also need a DTD for page or canvas composition of multiple display loci,
> for embeding a figure in a figure, for example.

Okay.
> 
> The gatekeeper can also handle finding appropriate loci:

Okay, now we _are_ talking about Locus IAB ;-)

> 
> workspace says I have a BICML nucleotide sequence v4.1 object, I want to
> perform a restriction map with these enzymes and see the sizes of the digested
> fragments.
> 
> A tacg locus on server.example.com can reply saying, I can do the analysis,
> please send me the sequence, and the enzymes you want off of this list, to
> view the output you need a locus that can display v3.5 digest files, here is a
> url for a gnome-python locus for a compatible viewer.

That's an excellent point.  I have been assuming that the Workspace will only
use loci (including Locus IAB) that have the appropriate capabilities.  On the
remote end, as with Locus IAB, the programs may be more up to date than those on
the user's machine.

I guess Locus IAB can tell the Workspace (locus database) that is has native
v4.1 capabilities, but can provide a script to handle the older v3.5.  If v3.5
is the only option for the user's version of Loci, it will have to use it (but
it would use v4.1 if at all available).

> Again, we dont have to pass back the UI code, just a URL to it, the workspace
> may well already have a copy locally.

Hmmmm.  Either way, we're still talking about transferring a script.

> I think it's a bad idea to embed the
> python code in the xml.  It violates the principle that the DTDs should stick
> to the point, and it really gets ugly when you consider the security
> implications.

I wasn't talking about anything that would require a DTD, just a marker to say
<thescript> </thescript>.

> Locus will ship with loci for displaying many kinds of DTDs,
> and a site manager may well not allow the workspace to download untrusted
> code.

Security is an issue for executing _any_ code from a remote source, whether or
not it happens to be in the same file as the XML.

> With my proposal, the worspace just has to locate any locus that can
> display the result DTD, you may well have several sequence viewers on your
> machine already.

It's true that there may be more than one locus (viewer) to handle the same
job.  It will be an intresting challenge to get the Workspace to figure this out
without the user's help.

But I don't see how pointing to a URL for the GUI script will be any more
secure.

> That's what I mean. An analysis locus can just say my output is in BICML v3.2
> format, here is the url for a viewer if you don't have one.  The workspace
> then chooses whether or not to retreive the UI code.

Oh.  But how about the UI code is in the same file, as originally planned, but
Loci/Workspace will have a setting to not execute the UI code if it comes from
the Internet or from an unknown URL.

Won't this work too?  I just want to keep everything in the same data stream.

> I argue that all our data structures should be representable as XML.  This
> would let people write loci in any language, export individual components for
> other tools, and facilitate exchange of data.

Add that to my list:

    PAOS -> for active communication about the internals of Loci
    XML  -> for archiving and translating bio data AND LOCI INTERNALS

> 
> Storing data in python specific or binary formats restricts your options.

Yes.

> 
> Hopefully, we'll soon be able to embed Loci figures in our gnome word
> processor papers!

Hmmmm.  I suppose, if someone writes a translator :-)

BTW, I have been including the GNOME API to take advantage of GNOME features
like ORBit and even the GNUmeric spreadsheet, which can be accessed via
ORBit/CORBA.


Jeff
--
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From rahul at photino.sid.rice.edu  Wed Mar 31 15:57:45 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal - Part deux.
In-Reply-To: <3701DF9A.F0D8000D@bc.edu>
Message-ID: <Pine.LNX.4.10.9903311442260.1719-100000@photino.sid.rice.edu>

On Wed, 31 Mar 1999, J.W. Bizzaro wrote:

> > Or something along these lines. This example is missing a lot of things.
> > I'm not sure how python handles hashes either. This is actually perl/c-ish
> > here.
> 
> Python doesn't use symbols to type variables.

Glib implements hashes. We should be able to use it since we're using gtk
anyway. However, we might not want to require the remote loci to have glib
installed.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From hortiz at neurobio.upr.clu.edu  Wed Mar 31 13:23:55 1999
From: hortiz at neurobio.upr.clu.edu (Humberto Ortiz Zuazaga)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal - Part 3. 
In-Reply-To: Your message of "Wed, 31 Mar 1999 09:15:30 GMT."
             <3701E7B2.4BF2F65A@bc.edu> 
Message-ID: <199903311823.OAA24194@chimbo.neurobio.upr.clu.edu>

> > Yes, I propose a DTD for each kind of relationship
> 
> I am assuming that multiple DTD's doesn't mean multiple files, one per DTD.

I hope not, but don't know enough about XML yet.

> > Again, we dont have to pass back the UI code, just a URL to it, the workspace
> > may well already have a copy locally.
> 
> Hmmmm.  Either way, we're still talking about transferring a script.

Perhaps not.  Say I'm doing an analisis on a locus that returns multiple 
alingments.  I get my results back with the url for a multiple sequence 
viewer, but I already have one installed from a trusted source.  I won't need 
to get a new one.

> > I think it's a bad idea to embed the
> > python code in the xml.  It violates the principle that the DTDs should stick
> > to the point, and it really gets ugly when you consider the security
> > implications.
> 
> I wasn't talking about anything that would require a DTD, just a marker to say
> <thescript> </thescript>.

But that's what I mean.  What does <thescript> have to do with describing 
nucleotide sequences?

> > Locus will ship with loci for displaying many kinds of DTDs,
> > and a site manager may well not allow the workspace to download untrusted
> > code.
> 
> Security is an issue for executing _any_ code from a remote source, whether or
> not it happens to be in the same file as the XML.

Presumably, any loci shipped with the Locus distribution are safe.

> > With my proposal, the worspace just has to locate any locus that can
> > display the result DTD, you may well have several sequence viewers on your
> > machine already.
> 
> It's true that there may be more than one locus (viewer) to handle the same
> job.  It will be an intresting challenge to get the Workspace to figure this out
> without the user's help.
> 
> But I don't see how pointing to a URL for the GUI script will be any more
> secure.

No, say I want to view a multiple alignment.  If I have the any appropriate 
display locus already installed from a trusted source, then I won't need to 
execute _any_ remote source.

It's like with HTML.  If I write valid HTML 4.0, then it doesn't matter what 
browser a user views it with, or where he got it from, it's sufficient that 
the user has a compliant browser.

I say all loci should return results that say "This is valid bicml nucleotide 
sequence v4.2"

I don't want a locus to say "Best viewed with Internet Genome Explorer 10.3".

> Oh.  But how about the UI code is in the same file, as originally planned, but
> Loci/Workspace will have a setting to not execute the UI code if it comes from
> the Internet or from an unknown URL.
> 
> Won't this work too?  I just want to keep everything in the same data stream.

No, I specifically think putting the UI code into the data stream is bad.  
After the first time I get the UI, why should I keep downloading it?  Imagine 
having to download netscape everytime you wanted to view a html file.

As a matter of fact, an analysis locus doesn't even have to publish the url 
for a display locus in every result, we can use the AppBroker (can we call her 
the Matchmaker instead? she facilitates the exchange of loci) to keep track of 
where to get a display locus for the results.

So if I write a new analysis locus and want to make it available to the 
community I can publish the location of the analysis locus (source or service) 
and the output formats.  Display loci publish the input formats they support, 
and the AppBroker can make sure your workspace has the appropriate display 
locus, or help you locate one, or find a translation locus that can do the 
conversion for you.

-- 
Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz@neurobio.upr.clu.edu


From hortiz at neurobio.upr.clu.edu  Wed Mar 31 11:28:40 1999
From: hortiz at neurobio.upr.clu.edu (Humberto Ortiz Zuazaga)
Date: Fri Feb 10 19:18:07 2006
Subject: [Pipet Devel] Another XML proposal - Part 1. 
In-Reply-To: Your message of "Wed, 31 Mar 1999 08:24:26 GMT."
             <3701DBBA.8CF11553@bc.edu> 
Message-ID: <199903311628.MAA24038@chimbo.neurobio.upr.clu.edu>

> > but rather than having a bunch of
> > hardcoded loci types, we can query the locus for it's interface (of course
> > we'll want to cache interfaces).
> 
> Ohhh yes!  This is the database I was just talking about.  Maybe it's a part of
> the benchtop, but it keeps track of all loci available to the user and what they
> can do.  But instead of the locus being queried when it is about to be used, it
> is queried when it becomes accessible to the workspace.
> 
> This is important for hot-plugging loci.  The user can add loci while Loci is
> running.  Maybe at a certain time interval, the workspace (database part)
> queries all accessible loci, and the loci return values informing the workspace
> what they can do.

Yes, this is good.  Installing a display locus (from wherever, more on this 
later) should register the locus as able to display a certain set of DTDs, or 
an analysis locus can register the type of analysis performed and the input 
and output formats it handles.  Once registered, the workspace can find these 
locally and dispatch them immediately.

The app broker locus can also be queried at run time to find display loci or 
analysis loci meeting certain requirements.  Perhaps the workspace could ask 
the app broker to find source for a widget to display frobnicated sequences.  
This source could then be downloaded and registered locally.

I could set up my app broker to only accept source loci from "trusted" 
sources, or to only send my data to trusted analysis loci.  If I were really 
paranoid, I could turn off the app broker, and only run locally registered 
loci.

Is the locus database different from the app broker, or can they be merged?

The trick then is how the app broker finds out about loci at remote sites.
-- 
Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz@neurobio.upr.clu.edu


From bizzaro at bc.edu  Mon Mar 22 18:45:38 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:09 2006
Subject: [Pipet Devel] SGML book
Message-ID: <36F6D622.2331D42F@bc.edu>

Locians,

I want to mention a book I bought the other day.

Most references to XML deal with "Web applications" for the language.  Of
course, Loci at its core has nothing to do with the Web.  But I came across this
book that addresses the use of SGML (XML's parent) for non-Web software
development:

    McGrath, Sean
    PARSEME.1ST
    Prentice Hall PTR, 1998
    ISBN 0-13-488967-3

For those who are interested in the use of XML in this project, you may want to
check your library for it.  There are many useful examples (I think), and some
are even in Python.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar  1 03:58:45 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:19 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.GSU.4.05.9902281636000.5677-100001@moet.cs.colorado.edu>
Message-ID: <36DA56C5.C708C804@bc.edu>

Carlos,

Thank you for the e-mail and the GIF!

Carlos Maltzahn wrote:
> 
> I totally agree that Paos shouldn't shuffle around real data. I see the
> role of Paos as a coordination tool but not as a database management
> system.

Yeah.  I'm ironing out how I think Paos and XML fit into this project.  It does
appear that they are both neaded, being complementary.  XML I think is best as
manager for the biological data, while Paos is best as "guide" for the path
taken.

> I attached a GIF picture to this mail. This picture contains Gnome
> clients, Paos server, and Tool Manager (excuse me if I introduce yet
> another set of terms). Gnome clients and Tool Manager are Paos clients. A
> Gnome client consists of a GCL editor and progress monitor, among other
> things. A Tool Manager
> 
> - parses XML data and forwards it to the actual tool,

What should be the ratio of tool managers to tools.  I didn't see the actual
tool represented in the GIF, so I'm assuming it is 1:1.  If so, could each tool
manager be _embedded_ in the code of a tool?...at least using the "include"
command.

> - turn the result of a tool into XML data and send it to another tool
>   manager

Hmmm.  You see again here that the tool manager does what I'd expect the tool to
do.  Maybe we're thinking the same way about this.

> - sends status information to a Paos server (e.g. processing started or
>   completed, or processing ran out of memory),
> - receives notifications from a Paos server (e.g. "suspend", "abort",
>   or status query),
> - queries a Paos server about where to send results to,

Okay.

> The thin lines are communicating Python objects, the thick
> lines communicate XML structures. Note that the destination of Tool
> Manager can also be a Gnome client which is used to visualize results.

...the XML is sent back to the user at the end?

> Another question in the discussion was whether to use Python objects for
> communication or XML. XML is safer because it is an accepted and
> extensible standard. However, transfering serialized objects was the
> performance bottleneck in the Chautauqua workflow system (which uses
> Paos) and I introduced a bit of trickery to reduce this overhead.

What really worried me was the thought of a _single_ Paos process managing
_everything_, including every read and write for every single XML, just to get
workflow information.  Agreed it would be best to leave active workflow
information to RAM.

> So I
> would recommend sticking with Python objects for Paos communications but
> use XML for everything else.
> 

So Justin, do you agree then that our new XML should not include workflow
information?  The top of the XML, however, may still need an ID# to better track
it through the system.

...and good luck on your exam this morning!

I'm on spring break this week, which I'll use to get a bunch of things taken
care of, including work on this project.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From Thomas.Sicheritz at molbio.uu.se  Mon Mar  1 04:20:07 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] more nice interfaces
In-Reply-To: <199902261451.PAA15154@dirac.cnrs-orleans.fr>
References: <36D34556.673C4A2B@bc.edu>
	<14038.43467.520971.536971@beagle.bmc.uu.se>
	<199902261451.PAA15154@dirac.cnrs-orleans.fr>
Message-ID: <14042.19355.294867.863525@beagle.bmc.uu.se>

 > > Questions:
 > >  * how can I combine a python module with a python class definition
 > >    I want to add python code to the c-module ...
 > 
 > Sorry, I don't understand what you are trying to do. Something
 > with Python and C and modules... Could you give a more detailed
 > description?

I have a c-module with functions bb_sequence.reverse,
bb_sequence.complement etc. and I want to create a class in python called
bb_sequence ... the original question was about how to add the c-module
into the class ... but I think I am going to rename the c-functions and
wrap them in the python code.

 > >  * how can I implement this tcl code in python ?
 > >    foreach i  "reverse coplement antiparallel" {
 > >        puts [eval bb_sequence.$i $seq]
 > >    }
 > 
 > I'd have to know what the Tcl code means! I suppose it's a loop
 > over three strings, which in Python is
 > 
 > for i in ["reverse" "coplement" "antiparallel"]:
 >     ....
 > 
 > But I don't understand the stuff with "puts" etc.

I'd like to evaluate variables with function names, like:
for i in ["reverse" "coplement" "antiparallel"]:
    print i,": ",bb_sequence.i

where bb_sequence.bb_sequence.i should be substituted/bound to 
reverse coplement and antiparallel

puts = print
and [eval bb_sequence.$i $seq] tells the tcl interpreter to substitute all
variables before evaluating the expression
e.g [eval bb_sequence.$i $seq] -> bb_sequence.reverse "actgactagctagcatcgatcgat"


Do we have access to a mailing list archieve ? - I have been away to long
from this list to keep an <what data structure/markup languages to use>
overview ...

thx
-thomas


-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From hinsen at cnrs-orleans.fr  Mon Mar  1 04:43:09 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] Loci markup language and infrastructure things
In-Reply-To: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu>
	(message from Justin Bradford on Sat, 27 Feb 1999 17:24:27 -0600
	(CST))
References: <Pine.OSF.4.03.9902271715010.59-100000@busboy.sped.ukans.edu>
Message-ID: <199903010943.KAA15272@dirac.cnrs-orleans.fr>

> Also, for structure, there don't appear to be any MLs even attempting to
> do this, with the exception of CML. So, my idea is to take the PDB file
> format and XMLize it. If any of you know any glaring holes in PDB let
> me know, and we can work around those.

During a visit at EMBL last week, I talked to some people who are
working on an mmCIF to XML converter, using an XML version of the
mmCIF dictionary (no DTD yet, but it is planned). The goal is to save
everybody else the work of writing a parser for the STAR format that
CIF and mmCIF are based on. XML parsers are much more widely
available.

I can't judge how eager the crystallography community is to move to
mmCIF, but from what I heard at EMBL, it seems that mmCIF is getting
more and more attention - the PDB will finally accept submitted mmCIF
files which contain information that cannot easily be converted to PDB
format. So I think it's worth waiting for the mmCIF/XML stuff (which
is supposed to be ready soon) instead of doing our own format based on
the less flexible PDB format.

As for glaring holes in PDB, I think there are many, although
mostly related to the rather loose interpretation of the format
description that most programs have applied.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From hinsen at cnrs-orleans.fr  Mon Mar  1 04:55:37 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <36D92C03.1AF788EF@bc.edu> (bizzaro@bc.edu)
References: <Pine.OSF.4.03.9902280410120.11892-100000@busboy.sped.ukans.edu> <36D92C03.1AF788EF@bc.edu>
Message-ID: <199903010955.KAA27218@dirac.cnrs-orleans.fr>

> That brings up a big question I had, and where I've been getting confused...
> 
> Is there really any such thing as an "XML object"?  I mean, XML is a way to save

The confusion seems to be widespread. XML is of course a file format
(or rather metaformat), but it is particularly useful to store
plain-text file representations of objects. That's the philosophy
behind DOM (which defines a standard OO interface to XML documents),
and also XML-RPC. In the end it's just a difference of point of view;
files store data and objects store data!

> structured data as a _file_.  Python objects, on the other hand, are data
> structures in memory.  We would just be going back and forth between file and
> object using XML.

Right.

> So, where do we really need XML? Could the data just be a Python
> object? If we need to save the object, I think it can just be
> "pickled"?

Right, in principle. But there are advantages to using XML instead of
Python's pickling format, and these are the same advantages that XML
has compared to any other format: readability (plain ASCII) and
standard syntax. A Python pickle file looks like garbage in an editor,
and processing it without using Python requires significant effort.

There have been discussions of implementing a pickle-compatible
Python module that uses XML files. I don't know how far implementation
has progressed, but I definitely expect this to happen soon.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From Thomas.Sicheritz at molbio.uu.se  Mon Mar  1 05:03:38 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] more nice interfaces
In-Reply-To: <36DA6529.9A4EE77@bc.edu>
References: <36D34556.673C4A2B@bc.edu>
	<14038.43467.520971.536971@beagle.bmc.uu.se>
	<199902261451.PAA15154@dirac.cnrs-orleans.fr>
	<14042.19355.294867.863525@beagle.bmc.uu.se>
	<36DA6529.9A4EE77@bc.edu>
Message-ID: <14042.25294.567808.856677@beagle.bmc.uu.se>

Hej Jeff,

 > One major part of the Loci Project is to create a "library" of Python modules
 > (and C wrapped in Python) that handle common sequence and structure
 > manipulations.  The library for structure is something Konrad will hopefully
 > contribute, along the line of MMTK.
 > 
 > If you are writing or rewriting code to manipulate sequences (like reversing or
 > complementing), the code should become part of this library.
 > 
 > Tim, a guy we haven't heard from in a while, was going to write code to
 > convert codons into amino acids.  Tim, this also needs to be in the
 > library, in Python or Python/C.

I see ... that would make life easier. How are we going to do this
practically ? Are we using different namespaces in the library ?
How shold I rewrite the rewrite of my lib ?


 > If you want, I can forward you e-mails from any time span, since I save Loci
 > e-mail on my computer.

No thanks - I have already an overfull INBOX ... :-)
What are the latest news concerning markup languages ?

-thomas


-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From hinsen at cnrs-orleans.fr  Mon Mar  1 05:16:02 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:20 2006
Subject: [Fwd: [Pipet Devel] more nice interfaces]
In-Reply-To: <36DA6B68.F8F71BFF@bc.edu> (bizzaro@bc.edu)
References: <36DA6B68.F8F71BFF@bc.edu>
Message-ID: <199903011016.LAA23134@dirac.cnrs-orleans.fr>

> I have a c-module with functions bb_sequence.reverse,
> bb_sequence.complement etc. and I want to create a class in python called
> bb_sequence ... the original question was about how to add the c-module
> into the class ... but I think I am going to rename the c-functions and
> wrap them in the python code.

That's the best solution. You *could* write some C-level type and
inherit from it in a Python class by using the ExtensionClass
package, but there's no point unless you really want/need to
provide a generally useful C type.

> I'd like to evaluate variables with function names, like:
> for i in ["reverse" "coplement" "antiparallel"]:
>     print i,": ",bb_sequence.i

That just needs a slight rewrite:

for i in ["reverse" "coplement" "antiparallel"]:
    print i,": ", getattr(bb_sequence, i)

getattr() works for method names as well, it simply returns a method
object, which can be called like a function. From my current
undestanding of your original Tcl example, the Python equivalent would
be:

for i in ["reverse" "coplement" "antiparallel"]:
    print i,": ", getattr(bb_sequence, i)(seq)

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From bizzaro at bc.edu  Mon Mar  1 05:26:48 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Fwd: [Pipet Devel] more nice interfaces]
Message-ID: <36DA6B68.F8F71BFF@bc.edu>

>From Thomas...
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: Re: [Pipet Devel] more nice interfaces
Date: Mon,  1 Mar 1999 10:20:07 +0100 (MET)
Size: 3494
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990301/3ee52c0e/attachment.mht
From bizzaro at bc.edu  Mon Mar  1 05:28:07 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Fwd: [Pipet Devel] more nice interfaces]
Message-ID: <36DA6BB7.AD59D5EA@bc.edu>

My reply to Thomas.  For some reason we left the mailing list...
-------------- next part --------------
An embedded message was scrubbed...
From: "J.W. Bizzaro" <bizzaro@bc.edu>
Subject: Re: [Pipet Devel] more nice interfaces
Date: Mon, 01 Mar 1999 10:00:09 +0000
Size: 2242
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990301/7b12f1ba/attachment.mht
From bizzaro at bc.edu  Mon Mar  1 05:28:40 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Fwd: [Pipet Devel] more nice interfaces]
Message-ID: <36DA6BD8.A0733627@bc.edu>

Again from Thomas...
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: Re: [Pipet Devel] more nice interfaces
Date: Mon,  1 Mar 1999 11:03:38 +0100 (MET)
Size: 2944
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990301/6c6ab03a/attachment.mht
From hinsen at cnrs-orleans.fr  Mon Mar  1 05:48:16 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
In-Reply-To: <36DA7242.76312BC7@bc.edu> (bizzaro@bc.edu)
References: <36DA6BD8.A0733627@bc.edu> <36DA7242.76312BC7@bc.edu>
Message-ID: <199903011048.LAA12448@dirac.cnrs-orleans.fr>

> I think ExtensionClass is great in that the OO paradigm of Python is brought to
> the C module.  I would like to see this become standard Python in a future
> release.  But is it worth the effort to use a new Python package for the
> library?

If we need it, yes ;-)  It's too early to decide, in my opinion.

> Here is the URL?
> 
>     http://www.digicool.com/releases/ExtensionClass/
> 
> Konrad, do you know the license?  I can't find it.

Here's the file COPYRIGHT.txt from the source distribution:

   Copyright (C) 1996-1998, Digital Creations, Fredericksburg, VA, USA.  
   All rights reserved.

     Redistribution and use in source and binary forms, with or without
     modification, are permitted provided that the following conditions are
     met:

       o Redistributions of source code must retain the above copyright
	 notice, this list of conditions, and the disclaimer that follows.

       o Redistributions in binary form must reproduce the above copyright
	 notice, this list of conditions, and the following disclaimer in
	 the documentation and/or other materials provided with the
	 distribution.

       o Neither the name of Digital Creations nor the names of its
	 contributors may be used to endorse or promote products derived
	 from this software without specific prior written permission.


     THIS SOFTWARE IS PROVIDED BY DIGITAL CREATIONS AND CONTRIBUTORS *AS
     IS* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
     TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
     PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL DIGITAL
     CREATIONS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
     INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
     BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
     OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
     ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
     TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
     USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
     DAMAGE.

Looks OK to me.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From Thomas.Sicheritz at molbio.uu.se  Mon Mar  1 05:55:46 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
In-Reply-To: <36DA7242.76312BC7@bc.edu>
References: <36DA6BD8.A0733627@bc.edu>
	<36DA7242.76312BC7@bc.edu>
Message-ID: <14042.28588.223446.421175@beagle.bmc.uu.se>

Hej all,


 > Well, I havent really thought about naming conventions for the libraries.  Since
 > you are the first to do this, you get the honor of inventing the namespace.  You
 > can start everything with "locus_", if that is along the line of your question.

? Que ? locus_ ? ... have I missed something ? Why locus ?


 > I think we have to consider though just how much we will be using C to speed
 > things up in Loci.  Much of what we intend to do will be very
 > compute-intensive.  So maybe.  Think about it.
 > 
 > Here is the URL?
 > 
 >     http://www.digicool.com/releases/ExtensionClass/

Seems as a slight overkill to me ... I'll stick to simple wrapping for the
sequence stuff.
 > 
 > Konrad, do you know the license?  I can't find it.
http://www.digicool.com/releases/ExtensionClass/COPYRIGHT.html


And don't expect any great news from my part ... I have to put most of my
time on writting my thesis and keep our genome projects going. Right now
brewing more coffee feels like a luxury waste of time  ... ;-)


 > Have you been getting all of the e-mails from the mailing list?
 > 
 > I count ~20 messages from this weekend and today.

Yes ... I have just scanned them. I also have saved all other mails from
the list, but I am very short in time right now, so I save the fun part
until later.


c ya
-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From bizzaro at bc.edu  Mon Mar  1 05:56:02 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
References: <36DA6BD8.A0733627@bc.edu>
Message-ID: <36DA7242.76312BC7@bc.edu>

Thomas wrote:

> Hej Jeff,
> 
>  > One major part of the Loci Project is to create a "library" of Python modules
>  > (and C wrapped in Python) that handle common sequence and structure
>  > manipulations.  The library for structure is something Konrad will hopefully
>  > contribute, along the line of MMTK.
>  >
>  > If you are writing or rewriting code to manipulate sequences (like reversing or
>  > complementing), the code should become part of this library.
>  >
>  > Tim, a guy we haven't heard from in a while, was going to write code to
>  > convert codons into amino acids.  Tim, this also needs to be in the
>  > library, in Python or Python/C.
> 
> I see ... that would make life easier. How are we going to do this
> practically ? Are we using different namespaces in the library ?

Well, I havent really thought about naming conventions for the libraries.  Since
you are the first to do this, you get the honor of inventing the namespace.  You
can start everything with "locus_", if that is along the line of your question.

> How shold I rewrite the rewrite of my lib ?

Quoting Konrad:

    You *could* write some C-level type and
    inherit from it in a Python class by using the ExtensionClass
    package, but there's no point unless you really want/need to
    provide a generally useful C type.

I think ExtensionClass is great in that the OO paradigm of Python is brought to
the C module.  I would like to see this become standard Python in a future
release.  But is it worth the effort to use a new Python package for the
library?

I think we have to consider though just how much we will be using C to speed
things up in Loci.  Much of what we intend to do will be very
compute-intensive.  So maybe.  Think about it.

Here is the URL?

    http://www.digicool.com/releases/ExtensionClass/

Konrad, do you know the license?  I can't find it.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Mon Mar  1 15:07:17 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <36DA56C5.C708C804@bc.edu>
Message-ID: <Pine.GSU.4.05.9903011227120.7512-100000@moet.cs.colorado.edu>


    [Carlos Maltzahn]

    > I attached a GIF picture to this mail. This picture contains Gnome
    > clients, Paos server, and Tool Manager (excuse me if I introduce yet
    > another set of terms). Gnome clients and Tool Manager are Paos clients. A
    > Gnome client consists of a GCL editor and progress monitor, among other
    > things. A Tool Manager
    > 
    > - parses XML data and forwards it to the actual tool,
    
    [J.W. Bizzaro]
    What should be the ratio of tool managers to tools.  I didn't see
    the actual tool represented in the GIF, so I'm assuming it is 1:1.  
    If so, could each tool manager be _embedded_ in the code of a
    tool?...at least using the "include" command.

I used "Tool Managers" because I know very little about the nature of the
tools you are planning to use. In a 1:1 scenario, the tools allow you to
import a Python module (e.g., the tools are Python programs or run an
embedded Python interpreter). In this case the Tool Manager is a Python
module that uses the Paos Client module. But if the tools are supposed to
be more independent, the Tool Manager could be something like a remote
Unix shell which communicates with Paos and that can control a variety of
tools and has access to system information such as memory or CPU usage or
process status information. 

It might make sense to support both solutions.
    
    > - turn the result of a tool into XML data and send it to another tool
    >   manager
    
    Hmmm.  You see again here that the tool manager does what I'd
    expect the tool to do.  Maybe we're thinking the same way about
    this.
    
See above.

    > The thin lines are communicating Python objects, the thick
    > lines communicate XML structures. Note that the destination of Tool
    > Manager can also be a Gnome client which is used to visualize results.
    
    ...the XML is sent back to the user at the end?

Not necessarily. At some point the user wants to see the results, of
course -- but this could either happen "on-line" (i.e. while the
processing is going on), or "off-line" (i.e. after the results are
archived). For long-running processing it might be useful to see the
result as it emerges (e.g. histograms, scatter plots, etc). This might be
interesting not only at the end of a processing pipe but also at
intermediate steps. In any rate, I think XML is the way to go in all cases
where you want to communicate domain-specific data.
    
    > Another question in the discussion was whether to use Python objects for
    > communication or XML. XML is safer because it is an accepted and
    > extensible standard. However, transfering serialized objects was the
    > performance bottleneck in the Chautauqua workflow system (which uses
    > Paos) and I introduced a bit of trickery to reduce this overhead.
    
    What really worried me was the thought of a _single_ Paos process
    managing _everything_, including every read and write for every
    single XML, just to get workflow information.  Agreed it would be
    best to leave active workflow information to RAM.

What is RAM?

Carlos

From justin at ukans.edu  Mon Mar  1 15:51:30 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:20 2006
Subject: [Fwd: [Pipet Devel] more nice interfaces]
In-Reply-To: <36DA6BB7.AD59D5EA@bc.edu>
Message-ID: <Pine.OSF.4.03.9903011448260.30419-100000@busboy.sped.ukans.edu>

>> Do we have access to a mailing list archieve ? - I have been away to
>> long from this list to keep an <what data structure/markup languages
>> to use overview ...
>
> Justin is managing the majordomo account at UKansas.  I know there are
> some utilities to convert e-mails to HTML.

I've set up a mhonarc to archive our email to the web.
It's at http://toaster.sped.ukans.edu/tulip-list/

> If you want, I can forward you e-mails from any time span, since I save
> Loci e-mail on my computer.

I have all of the mail since I joined the project up on it, but if you can
send me the mail prior to Janurary 4th, I'll but that up, too.

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Mon Mar  1 17:29:24 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] mialing list - was more nice interfaces
References: <Pine.OSF.4.03.9903011448260.30419-100000@busboy.sped.ukans.edu>
Message-ID: <36DB14C4.18FBD7CD@bc.edu>

Justin Bradford wrote:
> 
> I've set up a mhonarc to archive our email to the web.
> It's at http://toaster.sped.ukans.edu/tulip-list/

That's great!  Thank you!

Question:  Will the headers always appear on a single page, or can they be split
up by month?

> 
> > If you want, I can forward you e-mails from any time span, since I save
> > Loci e-mail on my computer.
> 
> I have all of the mail since I joined the project up on it, but if you can
> send me the mail prior to Janurary 4th, I'll but that up, too.
> 

What format?  I have everything under Netscape Mail...so it's one text file.  I
can send it that way, unless you need it some other way.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar  1 17:41:01 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.GSU.4.05.9903011227120.7512-100000@moet.cs.colorado.edu>
Message-ID: <36DB177D.6F25852B@bc.edu>

Carlos Maltzahn wrote:

> I used "Tool Managers" because I know very little about the nature of the
> tools you are planning to use. In a 1:1 scenario, the tools allow you to
> import a Python module (e.g., the tools are Python programs or run an
> embedded Python interpreter). In this case the Tool Manager is a Python
> module that uses the Paos Client module.

Okay.  That's what I was thinking.

> But if the tools are supposed to
> be more independent, the Tool Manager could be something like a remote
> Unix shell which communicates with Paos and that can control a variety of
> tools and has access to system information such as memory or CPU usage or
> process status information.

Okey dokey.  I guess in either case, though, the "tool" would lock up during
communication...if communication were 2-way and it waited for a reply.

> It might make sense to support both solutions.

I think so.  We'll have more flexibility that way.

> Not necessarily. At some point the user wants to see the results, of
> course -- but this could either happen "on-line" (i.e. while the
> processing is going on), or "off-line" (i.e. after the results are
> archived). For long-running processing it might be useful to see the
> result as it emerges (e.g. histograms, scatter plots, etc). This might be
> interesting not only at the end of a processing pipe but also at
> intermediate steps. In any rate, I think XML is the way to go in all cases
> where you want to communicate domain-specific data.

..."domain-specific data" meaning the scientific data.

>     What really worried me was the thought of a _single_ Paos process
>     managing _everything_, including every read and write for every
>     single XML, just to get workflow information.  Agreed it would be
>     best to leave active workflow information to RAM.
> 
> What is RAM?

:-) Random Access Memory, as in 64 MB RAM, system memory, not on disk.

How's your thesis coming along?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Mon Mar  1 17:44:03 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] mailing list and some confusion
In-Reply-To: <36DB14C4.18FBD7CD@bc.edu>
Message-ID: <Pine.OSF.4.03.9903011635320.30419-100000@busboy.sped.ukans.edu>

> Question:  Will the headers always appear on a single page, or can they
> be split up by month?

I'm sure I can split them by month; it might just take a small script.
I'll get it working eventually.

> What format?  I have everything under Netscape Mail...so it's one text
> file.  I can send it that way, unless you need it some other way.

I'm not postive mhonarc will read netscape mail files. I can always try
it, and if it doesn't work we can go from there.

Also, would it be possible for you to explain your vision for how the
various components of Loci interact again. I'll somewhat fuzzy on how some
things interconnect at the network level.
For instance, network connections occur between what points?
What decides the path? What receives status updates? What happens when
problems arise? Does something always have to be running on the user's
side, and if so, what does it do?

I had a vision for the structure, which I don't think is what you had, and
all of the terminology we've been using has gotten muddled in my mind.

Justin Bradford
justin@ukans.edu

From carlosm at moet.cs.colorado.edu  Mon Mar  1 18:15:12 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <36DB177D.6F25852B@bc.edu>
Message-ID: <Pine.GSU.4.05.9903011534210.7512-100000@moet.cs.colorado.edu>

    > But if the tools are supposed to
    > be more independent, the Tool Manager could be something like a remote
    > Unix shell which communicates with Paos and that can control a variety of
    > tools and has access to system information such as memory or CPU usage or
    > process status information.
    
    Okey dokey.  I guess in either case, though, the "tool" would lock up during
    communication...if communication were 2-way and it waited for a reply.

Yes -- but 2-way communication can also be non-blocking (and should be) --
this is the beauty of notification requests: "tools" can register
notification requests that are designed in such a way that a Paos server
can effectively query tools. This requires that tools maintain some sort
of event loop that periodically checks for events either from Paos or from
the actual processing. The Paos Client module supports multiple ways of
implementing this: (1) the Client module forks a separate process that
listens to the Paos server; upon receiving a notification it interrupts
the main process and forwards the notification to the main process (the
necessary signal handlers and pipes are all installed by the Client
module), (2) Client uses a pre-defined pipe to receive notifications; this
is useful if the application does its own event management, (3) same as
(1) but it assumes that the application has installed its own signal
handler (this is useful if the actual event processing is done in a
language other than Python).

    
    > It might make sense to support both solutions.
    
    I think so.  We'll have more flexibility that way.

A shell approach has the additional advantage of being more universal.
    
    > Not necessarily. At some point the user wants to see the results, of
    > course -- but this could either happen "on-line" (i.e. while the
    > processing is going on), or "off-line" (i.e. after the results are
    > archived). For long-running processing it might be useful to see the
    > result as it emerges (e.g. histograms, scatter plots, etc). This might be
    > interesting not only at the end of a processing pipe but also at
    > intermediate steps. In any rate, I think XML is the way to go in all cases
    > where you want to communicate domain-specific data.
    
    ..."domain-specific data" meaning the scientific data.
    
Yes.

    >     What really worried me was the thought of a _single_ Paos process
    >     managing _everything_, including every read and write for every
    >     single XML, just to get workflow information.  Agreed it would be
    >     best to leave active workflow information to RAM.
    > 
    > What is RAM?
    
    :-) Random Access Memory, as in 64 MB RAM, system memory, not on disk.
    
Huh? What has physical memory to do with this?

    How's your thesis coming along?    
    
Thesis writing sucks! I wish I had more time to finish the Paos tutorial
(it's half done).

Carlos

From carlosm at moet.cs.colorado.edu  Mon Mar  1 18:20:33 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] mailing list and some confusion
In-Reply-To: <Pine.OSF.4.03.9903011635320.30419-100000@busboy.sped.ukans.edu>
Message-ID: <Pine.GSU.4.05.9903011616010.7512-100000@moet.cs.colorado.edu>


    [Justin Bradford]
    Also, would it be possible for you to explain your vision for how the
    various components of Loci interact again. I'll somewhat fuzzy on how some
    things interconnect at the network level.
    For instance, network connections occur between what points?
    What decides the path? What receives status updates? What happens when
    problems arise? Does something always have to be running on the user's
    side, and if so, what does it do?
    
    I had a vision for the structure, which I don't think is what you had, and
    all of the terminology we've been using has gotten muddled in my mind.

I agree. A new design diagram or a glossary would be very useful at this
point.

Carlos

From bizzaro at bc.edu  Mon Mar  1 18:55:37 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] some confusion
References: <Pine.OSF.4.03.9903011635320.30419-100000@busboy.sped.ukans.edu>
Message-ID: <36DB28F9.CD61F34C@bc.edu>

Justin Bradford wrote:

> Also, would it be possible for you to explain your vision for how the
> various components of Loci interact again. I'll somewhat fuzzy on how some
> things interconnect at the network level.
> For instance, network connections occur between what points?
> What decides the path? What receives status updates? What happens when
> problems arise? Does something always have to be running on the user's
> side, and if so, what does it do?
> 
> I had a vision for the structure, which I don't think is what you had, and
> all of the terminology we've been using has gotten muddled in my mind.
> 

I guess your looking for an updated "diagram or gloassary", as Carlos just
mentioned.  I'll get that out shortly.

As far as the details of network communication is concerned, Paos will be more
heavily involved in this than I originally planned and should handle all the
networking for us.  I expect networked communication between tools to be almost
identical to communication between tools on the same computer.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar  1 19:01:45 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.GSU.4.05.9903011534210.7512-100000@moet.cs.colorado.edu>
Message-ID: <36DB2A69.F097EEA@bc.edu>

Carlos Maltzahn wrote:
> 
>     >     What really worried me was the thought of a _single_ Paos process
>     >     managing _everything_, including every read and write for every
>     >     single XML, just to get workflow information.  Agreed it would be
>     >     best to leave active workflow information to RAM.
>     >
>     > What is RAM?
> 
>     :-) Random Access Memory, as in 64 MB RAM, system memory, not on disk.
> 
> Huh? What has physical memory to do with this?
> 

I mean workflow information is best left to being stored in a data structure,
which I'm assuming is kept in physical memory.  I guess I'm worng.  No big deal.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Mon Mar  1 19:11:19 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:20 2006
Subject: [Pipet Devel] some confusion
In-Reply-To: <36DB28F9.CD61F34C@bc.edu>
Message-ID: <Pine.GSU.4.05.9903011653450.7512-100000@moet.cs.colorado.edu>


    [J.W. Bizzaro] 
    As far as the details of network communication is concerned, Paos
    will be more heavily involved in this than I originally planned
    and should handle all the networking for us.  I expect networked
    communication between tools to be almost identical to
    communication between tools on the same computer.

Except the transfer of XML from tool to tool. 

It might make sense to make a distinction between streamed tool input and
input files. If a tool has to have access to the entire result of the
previous tool before it can do anything useful and the tool runs on a host
that shares the same NFS with the host of the previous tool, it doesn't
make sense to transfer any data: all the tool needs is a pointer to a file
that contains the result of the previous tool. 

On the other hand, if the tool is able to process a stream of data, the
entire process can be pipelined which saves a lot of time.

Carlos
    

From bizzaro at bc.edu  Mon Mar  1 20:58:56 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] some confusion
References: <Pine.GSU.4.05.9903011653450.7512-100000@moet.cs.colorado.edu>
Message-ID: <36DB45E0.157BD528@bc.edu>

Carlos Maltzahn wrote:
> 
> Except the transfer of XML from tool to tool.

Yep, we're going to get killed on the terminology in this project.

When you say "transfer", you mean reading the XML file from disk and then
"streaming" it to the next tool, without writing back to disk?

As you wrote below, we shouldn't have to do this if both tools are on the same
NFS.  But we will need some mechanism for XML transfers between NFS's.

> It might make sense to make a distinction between streamed tool input and
> input files.

Right.  So workflow info is streamed and biological data (XML) is read/written
from/to a file.

> If a tool has to have access to the entire result of the
> previous tool before it can do anything useful and the tool runs on a host
> that shares the same NFS with the host of the previous tool, it doesn't
> make sense to transfer any data: all the tool needs is a pointer to a file
> that contains the result of the previous tool.

Yes.

> On the other hand, if the tool is able to process a stream of data, the
> entire process can be pipelined which saves a lot of time.

Now by "pipelined", you mean one tool starts getting input before the other tool
is even finished with the data...for serialized analyses?

I agree we need a glossary.  I'll get started on one, but some of the
terminology I use may not be "correct" and will need to be changed.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar  1 21:55:00 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Fwd: [Pipet Devel] libraries - was more nice interfaces]
Message-ID: <36DB5304.665B61FC@bc.edu>

My reply to Thomas...
-------------- next part --------------
An embedded message was scrubbed...
From: "J.W. Bizzaro" <bizzaro@bc.edu>
Subject: Re: [Pipet Devel] libraries - was more nice interfaces
Date: Tue, 02 Mar 1999 02:06:15 +0000
Size: 1485
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990302/c464e2f9/attachment.mht
From bizzaro at bc.edu  Mon Mar  1 23:57:08 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] glossary
Message-ID: <36DB6FA4.51E260D9@bc.edu>

Attached is a glossary of the terms we've been using to describe Loci.

Please let me know if there is any confusion, changes, or additions.

You will find some names that you haven't seen before.  For example, I'd like to
call our XML, "BICML", as in "The BIC Group", instead of LociML.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
Glossary of Terms for The Loci Project
Version 0.1; 01 Mar 1999


analytical tool
    A small Python module, or program with a Python interface,
    with the ability to work on biological data of type
    sequence or structure.  This in non-graphical.  See
    also Library.

Benchtop
    Part of the Workspace.  A highly graphical client that
    is the primary user interface for managing data and the
    Work Flow System

BICML
    (formerly LociML or LocusML) The Biomolecular Informatics
    and Computation Markup Language.  The XML describing
    the biological data.

client
    Any tool that runs under its own process, usually
    a graphical tool or the Workspace.

collaboratory
    The system by which multiple users, on separate computers,
    can collaborate on a research project.

command/command-line
    The characters typed to start a program that would normally
    run in a console.

Gatekeeper
    The application broker that resides on a remote computer
    and translates Loci data streams and files into commands
    (input) for remote algorithms and queries for remote
    databases.  Output from the algorithms and databases
    is translated back into Loci data streams and files.

Graphical Command Language (GCL)
    (now deprecated) A representation of piped commands and
    files using pictures.  See Work Flow Diagram.

graphical/gui tool
    A type of client and type of tool.  What is actually
    seen and used by the user to work on the biological
    data.  This doesn't include the Workspace, which is not
    a "tool".

hub
    A remote computer that connects Loci to registered
    Gatekeepers on the Internet.

library
    Used in the common sense.  Loci can be considered a
    library of tools, both graphical and analytical.

Notebook
    Part of the Workspace.  Keeps a running log, written in
    HTML of all work perfomed using Loci.  The notebook can
    take inserted text from the user but no deletions.  This
    is an electronic version of a laboratory notebook.

object
    Used in the common sense.  Objects are data that can be
    streamed or stored.  BICML files are not considered
    objects.

tool
    A Python module or program wrapped in Python.  Can be
    either graphical or analytical, but is used for work
    on biological data.

local
    On the user's very own computer or NFS.

loci
    Plural for locus.

locus
    Any real part of Loci: modules, tools, clients, and
    libraries, including hubs and remote programs.  Usually
    doesn't include data.  "Locus" can in fact appear before
    every other name in this glossary beginning with a capital
    letter. The loci are represented on the Work Flow
    Diagram as boxes.

Locus AA1DV
    Amino Acid 1-Dimensional Viewer.  A graphical tool.

Locus AA1DE
    Amino Acid 1-Dimensional Editor.  A graphical tool.

Locus AA2DV
    Amino Acid 2-Dimensional Viewer.  A graphical tool.

Locus AA3DV
    Amino Acid 3-Dimensional Viewer.  A graphical tool.

Locus NA1DE
    Nucleic Acid 1-Dimensional Editor.  A graphical tool.

Locus NA1DV-L
    Nucleic Acid 1-Dimensional Viewer for linear strands.
    A graphical tool.

Locus NA1DV-Ci
    Nucleic Acid 1-Dimensional Viewer for cicular strands.
    A graphical tool.

Locus NA1DV-Ch
    Nucleic Acid 1-Dimensional Viewer for chromosomes.
    A graphical tool.

Locus NA3DV
    Nucleic Acid 3-Dimensional Viewer.  A graphical tool.

path/pathway
    The progress of work performed by the user, manually
    or automatically, that results in one locus calling on
    another and another, so that the progress can be traced.  
    The path is represented on the Work Flow Diagram as
    a line connecting boxes (loci).

Paos
    The "active object server" written by Carlos Maltzahn.
    Paos acts as the communication backbone for the Loci
    system and a guide through the work path.

porta
    Any connection between the local Loci system and another
    system, be it the Gatekeeper, CORBA system, or whatever.
    Porta can be local or remote.

Porta Internet
    The connection between the local Loci system and the
    Gatekeeper.

Porta CORBA
    The connection between the local Loci system and a
    CORBA system.

Python
    The de-facto programming/scripting language of Loci.

query
    A database query sent to the remote database via
    data stream.

remote
    Not on the user's very own computer or NFS.  Across a
    network or the Internet.

remote algorithm
    A command-line program for complex biological analyses,
    which resides on a remote computer.

remote database
    A biological database that resides on a remote computer.

remote program
    A remote algorithm or database that resides on a remote
    computer.

server
    Used in the common sense.  Any program serving or
    contolling a client.

stream
    Active passing of objects from one locus to another,
    without writing to disk.  Usually done via Paos.

Translator
    Client that converts common formats for biological
    data (such as PDB or GenBank) into BICML, and visa
    versa.

transfer
    Passing files across a porta.

workflow
    The flow of all work being performed on the Loci system.

Work Flow Diagram (WFD)
    The representation or choreography of work.  Part of
    the Workspace.  The WFD is a dynamic flow chart where
    loci are represented as boxes and paths are represented
    as lines between boxes.

Work Flow System (WFS)
    Paos control and monitoring of workflow.

Workspace or Locus Primus
    The client(s) that provide user control and monitoring
    of workflow.  This includes the Benchtop and Notebook.
    The Workspace is not considered a "tool".

From Thomas.Sicheritz at molbio.uu.se  Tue Mar  2 04:17:45 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
In-Reply-To: <36DB4796.BE47E7D4@bc.edu>
References: <36DA6BD8.A0733627@bc.edu>
	<36DA7242.76312BC7@bc.edu>
	<14042.28588.223446.421175@beagle.bmc.uu.se>
	<36DB4796.BE47E7D4@bc.edu>
Message-ID: <14043.39828.639775.924162@beagle.bmc.uu.se>

 > > Que ? locus_ ? ... have I missed something ? Why locus ?
 > 
 > You are talking about how modules and objects will be named, right?  To give
 > each a unique name so there is no confusion, I think names could start with
 > "locus", as in _one_ location, singular for loci.  PyGTK classes/objects are
 > named "gtk_whatever".

I c

 > 
 > What do you mean by "Que"?
They don't show "Fawlty Towers" oversea ? - ?Que? is the spanish quote from 
the a "continental cretin" ... (http://www.metronet.co.uk/cultv/fawlty.htm)


 > 
 > What do you think should be done regarding "namespace"?
This basic library - "loci" - should we build it as a shared library - load
on demand, or will it contain all vital structures/functions statically
linked to the core ... hmm ... in that case what is the core ?

A question concerning addon modules:
  If I am going to write new phylogenomic tools in e.g. pyhton - what do I
  need to think about to make the programs loci-compatible ?
  What should I tell other programmers ? Do we need a "locus style guide" ?


-thomas

-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From bizzaro at bc.edu  Tue Mar  2 23:18:48 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
References: <36DA6BD8.A0733627@bc.edu>
		<36DA7242.76312BC7@bc.edu>
		<14042.28588.223446.421175@beagle.bmc.uu.se>
		<36DB4796.BE47E7D4@bc.edu> <14043.39828.639775.924162@beagle.bmc.uu.se>
Message-ID: <36DCB828.6E8D962C@bc.edu>

Thomas.Sicheritz@molbio.uu.se wrote:

>  > What do you mean by "Que"?
> They don't show "Fawlty Towers" oversea ? - ?Que? is the spanish quote from
> the a "continental cretin" ... (http://www.metronet.co.uk/cultv/fawlty.htm)

I've seen it on Public Television here.  What is the character just before Que? 
I can't read it on Netscape Mail.

>  >
>  > What do you think should be done regarding "namespace"?
> This basic library - "loci" - should we build it as a shared library - load
> on demand, or will it contain all vital structures/functions statically
> linked to the core ... hmm ... in that case what is the core ?

There is no "core" other than Paos, which is called on demand and is not a
single process.

The library of analytical tools (see the glossary) will be shared, only loaded
if and when needed, using "include".  So really, they're Python modules.

To get a better idea, take a look at the modules for Konrad's "Molecular
Modeling Tollkit".  This may be pretty much what Konrad will contribute to Loci:

    http://starship.python.net/crew/hinsen/mmtk_manual/examples.html


> A question concerning addon modules:
>   If I am going to write new phylogenomic tools in e.g. pyhton - what do I
>   need to think about to make the programs loci-compatible ?

Good point.  With the development of the first graphical tools for Loci, such as
your sequence editor, we will be constructing a standard for Loci tools.  So,
there will be a certain amount of Python code in each tool that is the same for
all tools, even new ones.  And eventually we should be able to provide other
programmers with a nonfunctioning core to which they can add their code.

What should the standard/core of each graphical tool be able to do?

  (1) Read workflow data from Paos
  (2) Get bio data from Paos or directly from file
  (3) Convert bio data to layout for graphics
  (4) Have drawing capability (links to gnome-canvas)
  (5) Send work flow data to Paos
  (6) Send bio data to Paos or write to file
  (7) Find available tools to use or launch (from Paos?)

Anything else?

>   What should I tell other programmers ? Do we need a "locus style guide" ?

In a sense.  We should have clear instructions on how to add to the core of the
graphical tool, and what the programmer should know about how Loci functions.

This is all very important, because Loci will have to be very easy to expand for
it to get expanded by others at all.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Mar  2 23:20:27 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Fwd: [Pipet Devel] libraries - was more nice interfaces]
Message-ID: <36DCB88B.1A0DB35C@bc.edu>

>From Thomas...
-------------- next part --------------
An embedded message was scrubbed...
From: Thomas.Sicheritz@molbio.uu.se
Subject: Re: [Pipet Devel] libraries - was more nice interfaces
Date: Tue,  2 Mar 1999 10:17:45 +0100 (MET)
Size: 3005
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990303/dd8fa99d/attachment.mht
From bizzaro at bc.edu  Tue Mar  2 23:26:21 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
References: <36DA6BD8.A0733627@bc.edu>
			<36DA7242.76312BC7@bc.edu>
			<14042.28588.223446.421175@beagle.bmc.uu.se>
			<36DB4796.BE47E7D4@bc.edu> <14043.39828.639775.924162@beagle.bmc.uu.se> <36DCB828.6E8D962C@bc.edu>
Message-ID: <36DCB9EC.FDB3AA9F@bc.edu>

"J.W. Bizzaro" wrote:
> 
> all tools, even new ones.  And eventually we should be able to provide other
> programmers with a nonfunctioning core to which they can add their code.
> 
> What should the standard/core of each graphical tool be able to do?

To avoid confusion, we should call this a code "skeleton", not a "core".


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Mar  2 23:36:03 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] libraries - was more nice interfaces
References: <36DA6BD8.A0733627@bc.edu>
			<36DA7242.76312BC7@bc.edu>
			<14042.28588.223446.421175@beagle.bmc.uu.se>
			<36DB4796.BE47E7D4@bc.edu> <14043.39828.639775.924162@beagle.bmc.uu.se> <36DCB828.6E8D962C@bc.edu>
Message-ID: <36DCBC33.7609717F@bc.edu>

"J.W. Bizzaro" wrote:

> The library of analytical tools (see the glossary) will be shared, only loaded
> if and when needed, using "include".  So really, they're Python modules.

Of course I mean "import".

"include" is C, see? :-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From rahul at photino.sid.rice.edu  Wed Mar  3 00:20:14 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] and still more infrastructure things
In-Reply-To: <Pine.OSF.4.03.9902280410120.11892-100000@busboy.sped.ukans.edu>
Message-ID: <Pine.LNX.4.05.9903022317260.27929-100000@photino.sid.rice.edu>

On Sun, 28 Feb 1999, Justin Bradford wrote:

> I miss enclosed blocks, but otherwise I'm doing ok.
> {
>    whitespace   usage should 
>       be random  . you can just  parse  around
> 
>  it.
> }

Regarding this problem, I think we could hack up a preprocessor in perl to
convert stuff like this to standard python, sorta like cpp but customized
for python. There could also be a line we put that the top that defines
the format of this file. We could also make deprocessors to convert from
one format to the cannonical style of the other.

Just some of my many ramblings...

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GED/S/CS/MD/M/P/O/U/! d->-- s:-(--) a--->? C++(+++)$ UL++++$ P+++$>++++
L+++$>++++ !E--(----)? W++$>+++ N+(--) o>++++$ K? !w---()>? !O? M+ !V--?
!PS+? PE+() Y+(++) PGP>+ t !5-->? !X-- R>+ !tv-(+) b+>++ DI+(+++)>++++
D++@>$ G e(*)>++++>+++++>$ h-()>++ r? y?
------END GEEK CODE BLOCK------
See also: http://www.hewgill.com/ogr/  http://www.douglasadams.com
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Wed Mar  3 00:42:44 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] and still more infrastructure things
References: <Pine.LNX.4.05.9903022317260.27929-100000@photino.sid.rice.edu>
Message-ID: <36DCCBD4.9FF39A95@bc.edu>

Rahul Jain wrote:
> 
> On Sun, 28 Feb 1999, Justin Bradford wrote:
> 
> > I miss enclosed blocks, but otherwise I'm doing ok.
> > {
> >    whitespace   usage should
> >       be random  . you can just  parse  around
> >
> >  it.
> > }
> 
> Regarding this problem, I think we could hack up a preprocessor in perl to
> convert stuff like this to standard python, sorta like cpp but customized
> for python. There could also be a line we put that the top that defines
> the format of this file. We could also make deprocessors to convert from
> one format to the cannonical style of the other.
> 

To help ease someone's transition into Python, it might be an interesting
project, although Python programmers won't understand it being written in Perl
;-)

You would have to replace not only {} but ;


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From rahul at photino.sid.rice.edu  Wed Mar  3 01:26:15 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] sorta related school project
Message-ID: <Pine.LNX.4.05.9903030015420.27929-100000@photino.sid.rice.edu>

Hi guys,

For a programming class (Visualization in Science and Engineering), I need
to write a Mathematica module that will essentially be a guided
exploration to implement a specifc algorithim in Mathematica. I want to do
something bioinformatics-related, so I was wondering what algorithim lends
itself to a simple, but non-trivial, exploration. Mathematica can do some
pretty intense mathematical stuff (That's what it was made for, duh), but
it's also great at visualization.

This project is meant to be for students in a variety of science and
engineering fields, so shouldn't rely on more than high school biology. It
should be a derivation of the algorithim from first principles for a
special case and then a generalization. Check out
http://www.owlnet.rice.edu/~comp260/ for some examples of the kind of
stuff he wants us to write (You need Mathematica to view them).

Does any body have any ideas as to what algorithim I might use? I haven't
done much in bioinformatics, so I'll need an explanation of the algorithim
or a reference to a book or journal article.

Thanks for the help,

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GED/S/CS/MD/M/P/O/U/! d->-- s:-(--) a--->? C++(+++)$ UL++++$ P+++$>++++
L+++$>++++ !E--(----)? W++$>+++ N+(--) o>++++$ K? !w---()>? !O? M+ !V--?
!PS+? PE+() Y+(++) PGP>+ t !5-->? !X-- R>+ !tv-(+) b+>++ DI+(+++)>++++
D++@>$ G e(*)>++++>+++++>$ h-()>++ r? y?
------END GEEK CODE BLOCK------
See also: http://www.hewgill.com/ogr/  http://www.douglasadams.com
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Wed Mar  3 01:58:01 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] sorta related school project
References: <Pine.LNX.4.05.9903030015420.27929-100000@photino.sid.rice.edu>
Message-ID: <36DCDD79.78BF4AD9@bc.edu>

Rahul,

There is a list of biology computation projects, some with Mathematica code, in
this book:

Richard E. Crandall
Projects in Scientific Computation
Springer-Verlag, N.Y., 1994
Pages 79-92


Jeff


Rahul Jain wrote:
> 
> Hi guys,
> 
> For a programming class (Visualization in Science and Engineering), I need
> to write a Mathematica module that will essentially be a guided
> exploration to implement a specifc algorithim in Mathematica. I want to do
> something bioinformatics-related, so I was wondering what algorithim lends
> itself to a simple, but non-trivial, exploration. Mathematica can do some
> pretty intense mathematical stuff (That's what it was made for, duh), but
> it's also great at visualization.
> 
> This project is meant to be for students in a variety of science and
> engineering fields, so shouldn't rely on more than high school biology. It
> should be a derivation of the algorithim from first principles for a
> special case and then a generalization. Check out
> http://www.owlnet.rice.edu/~comp260/ for some examples of the kind of
> stuff he wants us to write (You need Mathematica to view them).
> 
> Does any body have any ideas as to what algorithim I might use? I haven't
> done much in bioinformatics, so I'll need an explanation of the algorithim
> or a reference to a book or journal article.
> 
> Thanks for the help,
> 
> --
> -> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
> -> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
> -> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
> -> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
> |--|--------|--------------|----|-------------|------|---------|-----|-|
> -----BEGIN GEEK CODE BLOCK-----
> Version: 3.1
> GED/S/CS/MD/M/P/O/U/! d->-- s:-(--) a--->? C++(+++)$ UL++++$ P+++$>++++
> L+++$>++++ !E--(----)? W++$>+++ N+(--) o>++++$ K? !w---()>? !O? M+ !V--?
> !PS+? PE+() Y+(++) PGP>+ t !5-->? !X-- R>+ !tv-(+) b+>++ DI+(+++)>++++
> D++@>$ G e(*)>++++>+++++>$ h-()>++ r? y?
> ------END GEEK CODE BLOCK------
> See also: http://www.hewgill.com/ogr/  http://www.douglasadams.com
>    Version 11.423.999.210000101.23.50110101.042
>    (c)1996-1999, All rights reserved. Disclaimer available upon request.

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From rahul at photino.sid.rice.edu  Wed Mar  3 03:13:57 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] inter-locus communication
In-Reply-To: <36DCB828.6E8D962C@bc.edu>
Message-ID: <Pine.LNX.4.05.9903030028230.27929-100000@photino.sid.rice.edu>

What exactly is the role of Paos in Loci? As I understand it, Paos is
simply a way of moving objects back and forth. I think that Justin's
original suggestion was good, with some details that need to be worked
out.

With an XML parser, it should be trivial to convert a LociML file to an
object in any language (Python, Perl, C++, etc.).

Each of the four sections should be accessible separately, and the part of
that section corresponding to a specific step should be equally easy to
access. The state is the section that will need to be passed around
frequently, the others should be static, except for the data. That can
have one part for the original data, and then subsequent parts for the
results of the analyses. The main difference from Justin's original model
that I'm suggesting regarding the structure is that _all_ sequence and
related data be stored in the <data> section and then referred to in the
queries. Parts would be appended to that with the output of a specific
step, containing chunks of data surrounded in an identifying XML tag, such
as <protein>, <dna>, <rna>, etc. with whatever identifiers would seem
appropriate for that step. Whatever acts as the master controller for this
analysis sequence will be in charge of putting all of these pieces
together and sending them back to the Workspace. The status section could
be sent over an open socket every time the client requests it (so that the
updating occurs as fast as the client and server can handle it, but not
faster). The data can be streamed over a separate socket for each step or
even for each part of each step.

So for a generalized example:
** indicates a "URL" that is accessed
" indicates data that is transmitted over the actual socket

** lociwfs://some.wfs.server/my-analysis?create

would be used to create a query with the name my-analysis + a unique
suffix. This name would be sent back to the client and would be the
'session ID'. The client would send all the relevant information at this
time, the <query> and the <data>. The server would return " <control
sessionID="ID_name"> "  <step id="q1" server="some.locus" /> "  [more
steps] "  <step id="q5" serverpending=true /> "  [more steps] " </control>
after figuring out what servers will do which step or indicate that it
doesn't yet know. The connection can now be closed or left open. This
could be specified in the request or not specified at all (if the client
closes the connection, then it's closed).

** lociwfs://some.wfs.server/sessionID?status

This would be the status socket, which could be closed and then
reconnected at any time. Upon connection, the server would send a control
section and then a status section

" <status>
"   <step id="q1" state="finished">
"     <output type="protein" id="protein" size="121331" />
"     <message>Analysis finished.</message>
"   </step>
"   <step id="q2" state="failed">
"     <message>Analysis failed. Error: ... </message>
"   </step>
"   <step id="q3" state="aborted">
"     <message>Aborted by user</message>
"   </step>
"   <step id="q4" state="processing" completion="2%">
"     <output type="dna" id="dna1" size="121331" />
"        <!-- reports size of data available so far -->
"     <output type="dna" id="dna2" size="0" />
"     <message>Processing... Reading Sequence...</message>
"   </step>
"   <step id="q5" state="waiting">
"     <message>Waiting for output from step q4</message>
"   </step>
"   <step id="q6" state="pending">
"     <message>Searching for available server...</message>
"   </step>
" </status>

The client would then send
" <querystatus />
to get another status section.
Every time the control section is updated (a new server is found), a new
control section would be sent. This could also be requested explicitly by
" <querycontrol />

if the client sends
" <cancel />
at any time, the analysis would be aborted,
" <cancel stepid="q1" />
should also be possible.

The socket would be closed by the server when all of the steps are either
aborted, failed, or finished. It could be closed by the client at any
time.

** lociwfs://some.wfs.server/sessionID?cancel[.stepid]

would do the same as the <cancel ... /> 'command'. The server would then
send a complete <control> and <status> section.

** lociwfs://some.wfs.server/sessionID?data[.stepid[.blockid#offset]]

would send all available data (or from a specific set or block of a set).
Offsets only make sense for the specific blocks of the step, as they
change as the output grows. The data would be streamed to the client as
more is amde available if the request was for a specific block. Actually,
now that I think about it, the data for a step or all the data shouldn't
be made available like this until all of the parts/steps are finished
processing. Partial data should only be available for a specific part.

** lociwfs://some.wfs.server/sessionID?reject

I think we should include this to prevent the server from being loaded up
with unnecessary sessions. However, that brings up an important point:

Should we implement a login/password system and the ability to control
readability permissions. I guess permissions should be rw for the creator
of the session and ro for others. If a wfs wants to keep certain data
unreadable to certain people, I don't think we need to implement that,
they should either allow general access or access only to those with
accounts. This information should also be available, possibly in a
separate <meta> section that would be sent in the final type of request:

** lociwfs://some.wfs.server/sessionID?info
** lociwfs://some.wfs.server/sessionID?report
** lociwfs://some.wfs.server/sessionID?fullreport

would send an entire report of the session, complete with all control
information, the final status information (to keep error messages
available), and the query. If ...?report is specified, then the server
would send output data as well. If ...?fullreport is specified, input data
would also be sent. This would only be accessible after the analysis is
complete and the session is closed.

The structure of the data section would be as follows:
<data>
  <input>
    [data block]
  </input>
  <output>
    <step id="q1">
      [data block]
    </step>
    [more steps]
  </output>
</data>

A data block would be structured as follows (I'm open to better ideas
here):
<protein id="ID_name" [other parameters specific to proteins]>
  [data]
</protein>

likewise, you can have <dna> and <rna> data blocks.

Of course we'll have to devise a protocol-level error reporting system to
report, for example, that a sessionID is non-existent or that data is not
available, or that the offset is larger that the current data.

I included a lot of detail here, but I'm open to discussion, I just put
the details in so we had a concrete example on the table to work
with/argue about. Nothing here is set in stone, especially since I haven't
a clue what I'm talking about :).

This is only for the communication between the wfs server and the
workspace. The communication between the wfs server and the loci can be
done in a different way. That can and maybe should involve Paos
specifically. We can worry about that later.

Also, I think that the definition for transfer in the glossary should also
include objects. Whatever, it's 2AM and I feel like nitpicking. There, I
feel much better. :)P

Wheew, my brain is _tired_. Now to get some sleep...

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GED/S/CS/MD/M/P/O/U/! d->-- s:-(--) a--->? C++(+++)$ UL++++$ P+++$>++++
L+++$>++++ !E--(----)? W++$>+++ N+(--) o>++++$ K? !w---()>? !O? M+ !V--?
!PS+? PE+() Y+(++) PGP>+ t !5-->? !X-- R>+ !tv-(+) b+>++ DI+(+++)>++++
D++@>$ G e(*)>++++>+++++>$ h-()>++ r? y?
------END GEEK CODE BLOCK------
See also: http://www.hewgill.com/ogr/  http://www.douglasadams.com
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Thu Mar  4 04:20:03 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:21 2006
Subject: [Pipet Devel] Paos article
Message-ID: <36DE5043.6B4D7C24@bc.edu>

Locians,

Attached is the Linux Magazin article on Paos, translated to English.  I did
this with the help of Babelfish, but it still required/s some cleanup.

I made this effort because there is much confusion about just what Paos will do
for Loci.  I hope it helps.

Enjoy!


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
Persistent Objects and Workflow-Management with Paos
by Carlos Maltzahn

----------------------------------------------------------------------------

Paos is a system for the remote-network and consistent administration of Python objects. Carlos leads us in today's issue of Python Tools in providing background to this interesting tool. As the larger application developed besides the Workflow Managment system, Chautauqua thereby is introduced and occupies the possibilities by Python in real applications. 

----------------------------------------------------------------------------

The Python module shelve supports a storing of Python objects into a file. After opening shelve files can be entered any objects with a name: 

>>> import shelve
>>> db = shelve.open('database')
>>> db['first object'] = [1, 2, 3]
>>> db['second object'] = ('hallo', [])
>>> db['first object']
[1, 2, 3]
>>> db.close()

If now this interpreter session is terminated, this way the entered objects are preserved. With the next call of Python these objects can be loaded by means of shelve.open('database ') again, i.e. these objects are persistent. The implementation of shelve can be configured at the Compile time of the Python of interpreter. Different data base c-libraries are available, e.g. dbm, gdbm and bsddb. These libraries implement data structures, which enable fast access to the stored data. In two important points shelve however offers no support: It does not implement a parallel access supervision, i.e. if several processes access persistent objects of the same shelve file, the file can become inconsistent. Additionally no inquiry language makes shelve available. 

Paos (Python Active Object server) structures on shelve and implements a Client/Server architecture with parallel access supervision and a simple inquiry language. 

    http://www.cs.colorado.edu/~carlosm/paos1_arch.gif

In addition Paos makes a notification service available, by which Python can use the server to be able to be informed about certain conditions. These conditions are defined by applications in the form of inquiries and registered with notifications the service in the server. Every time an application stores something with the server, the server applies the registered inquiries again to the stored objects. If a response on a request is not empty, the server transmits the response to the application, which registered the inquiry. 

An example and somewhat more (or too much?) Detail 

The following example illustrated how an application with the server constructs a connection, sets an inquiry and accesses to attributes of a loaded, persistent object. We assume a Paos server on the machine runs cheesy.cs.colorado.edu and waits for inquiries on the port 5000. The example produces some objects of the class person, stores it and executes an inquiry. 

import Client
import ExampleSchema

# builds connection with the Paos server on
conn = Client.Connection('cheesy.cs.colorado.edu', 5000, 'example')

# produces objects
john = ExampleSchema.Person()
john.name = 'John'
sue = ExampleSchema.Person()
sue.name = 'Sue'
john.loves = sue
bill = ExampleSchema.Person()
bill.name = 'Bill'
sue.loves = bill
bill.loves = sue

# registers objects with the server
conn.register_objs([john, sue, bill])

# stores objects off
conn.commit([john, sue, bill])

# gets all instances of ' person ', those who Sue falls in love with
answer = conn.get('r', 'Person', [('loves', '==', sue)])

# for each object in the response prints out the names of the loved.
for obj in answer:
  if obj.hasattr('sibling'):
    print obj.name, obj.sibling.name

First we import the module Client, in order to be able to structure a connection with the Paos server. Afterwards we import the module ExampleSchema, which the class person defined (see further below). Finally we structure a connection, by instantiating the class of the connections. We indicate the host names and the port, on which the Paos server runs. In the third argument any name for application can be entered. This name emerges then in the appropriate log entries Servers. 

We produce then three person - for instances and assign them attribute values. Before we can store these objects, they must be registered only with the server. The registration assigns a unique data base number to the new objects. This benefits us in the following inquiry, which follows storing: "give me all objects from the class person, those the object sue holds dear." If sue had not been registered, the server could not compare this object with the stored objects. 

The first argument 'r' in the inquiry means that the objects are only read in the response by application. If we liked to modify objects, then we must indicate either in the inquiry instead of 'r' the argument 'rw', or acquire the write rights for the objects which can be manipulated subsequently as the method conn.lock. We can acquire the write rights only if no different one possesses the write rights. If we possess the write rights, no different one can modify the corresponding objects. With everyone conn.commit and with program abort we lose all acquired write rights.

The inquiry supplies a list with two new objects, which are equivalent to John and bill to us. In the loop following on it we print the name of the respective loving out (both times ' Sue '). This harmless looking loop has it however in itself: The Client module guarantees that sue, john.loves and bill.loves to the same object point. This is enabled by the registration of sue and a resolution process, which is built into the attribute access of john.loves and bill.loves. This resolution process is permitted to be implemented over the inserted Python method __getattr__, those the redefinition of attribute accesses. Additionally these extended attribute access provides for dynamic loading of objects, which do not exist yet in Client application (the implementation of the attribute access is defined in Schema.py in the class DBobject. The method register_objs the class Connection in Client.py installs this attribute access for each Object in the argument list).

It is important to understand that this resolution process can provide only for the referential consistency of registered objects among themselves. In the above example the variables John and bill point to objects, which are not contained in the response. It is situated here in the responsibility of the programmer to detect when variables point to outdated objects. With a simple trick, variables can become "refurbished": John = conn.cache[john.db_id ]. In addition it is to be known necessarily that the Connection object administers a Cache for loaded objects and this Cache accesses over the data base numbers of the registered objects. Each registered object possesses the attribute "db_id with unique data base number. The Cache contains the version of all loaded objects, last-loaded in each case. 

Paos's most interesting characteristic is however the notification service. The following example shows how this service is used: 

import Client
import ExampleSchema
import Utilities
import os
import pickle

# defines a Pipe for notifications
(read_pipe_fd, write_pipe_fd) = os.pipe()

# builds connection with the server on
conn = Client.Connection('cheesy.cs.colorado.edu', 5000,
                         'example', (read_pipe_fd, write_pipe_fd))

# registers inquiry with notifications the service
request_id = conn.register('Person', [('name', '==', 'Sue')])

while 1:

  # control room on a notification and reads it
  data = Utilities.READ(read_pipe_fd, 10000)

  # packet notification out
  (req_id, obj_list, other_client) = pickle.loads(data)

  # packet identification from other client
  (other_host, other_pid, other_uid, other_name) = other_client

  # makes something with 

Compared with the first example three additional modules must be loaded: Utilities is a module with auxiliary procedures, which are used in all Paos modules. os and pickle are inserted modules of Python, the operating system functions and make available functions for the transformation of objects into a string (serialization).

First we define a Pipe, which will serve us later than recipients for notifications. We structure then a connection to the Paos server. The call of the Connection function has the Pipe as the fourth argument, so that the Pipe can be associated with the connecting object. Then we register an inquiry, which ensures that the server sends us all new person objects with the name 'Sue', as soon as these objects are again entered into the data base. Conn.register(...) Call returns the delivery a registration number.

We receive the notification over the Pipe. We use for it an auxiliary procedure, which guarantees that the full length of the notification of the Pipe is read. The notification is sent as string over the network and must be converted into a Python object again with the receiver side. This occurs with the assistance of the call pickle.loads(data). A notification consists of a Tripel, which those 

    * Registration number of the inquiry,
    * the response of the inquiry in form of a nonblank object list and
    * the identification of application, stored those the objects and with it the notification released 

includes. This identification again consists of 

    * the computer name
    * the number of the user process,
    * the number of the user user
    * and the third argument of the Client. Connection function call in application. 

Chautauqua: A larger application with Paos

Paos is a " spin off " product of a Workflow research project. One of the results of this project is the experimental Workflow system Chautauqua. Paos makes notifications available with the service the communication infrastructure for the different Chautauqua system components. Chautauqua users interact with web browser and with the system over a graph wordprocessor. The web browser displays dynamically generated "to-cDo" lists for each coworker, and is used for filling out forms. The graph wordprocessor displays the structure of an office process and the status of differently jobs. E.g. if an office coworker stores contents of a form in Paos, Paos notifies the Chautauqua Workflow manager, who delegates the form to the next office coworker. This process can track each user on their graph wordprocessor, since each wordprocessor receives and converts the appropriate notifications immediately into graphic representations.

The specialty of Chautauqua is that it enables to the users to change the structure of the office processes during running jobs. In the following snapshot we see the structure of an office process:

    http://www.cs.colorado.edu/~carlosm/paos2_ICN.gif

Coworkers are explained by asterisks, office roles by squares, and activities by sets and triangles. Small points on the right above activities represent "tokens", which represent the status of the work, and which move with the work progress by the graph. With the wordprocessor it is now possible to change any part of the graph. If activities are deleted, tokens can lose their location. Chautauqua offers mechanisms to gather and assign these lost tokens new locations in the changed graph.

Information 

Paos and Chautauqua are completely programmed in Python and freely available at:

    ftp://ftp.cs.colorado.edu/users/carlosm/paos-1.4.tar.gz 

    ftp://ftp.cs.colorado.edu/users/carlosm/chautauqua-1.4.tar.gz
    (Chautauqua contains Paos)

More detailed documentation for Paos and Chautauqua is at present in preparation and is announced in the new group comp.lang.python.

----------------------------------------------------------------------------

Carlos Maltzahn is at present a computer science student in the Ph.D. program of the University of Colorado in Boulder. His interests in research concentrate at the moment on Internet Caches and distribution indicating. In his spare time he roams either somewhere in the fantastically beautiful Rocky Mountains or spends his time building mobile robots from Fischer technique. To reach him use carlosm@cs.colorado.edu

----------------------------------------------------------------------------

Copyright ? Linux Magazin 
From bizzaro at bc.edu  Thu Mar  4 04:32:33 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] Paos README
Message-ID: <36DE5331.A33471D6@bc.edu>

For more information on Paos, attached in the README file, in English.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------
#
# Copyright 1995 Carlos Maltzahn
#
# Permission to use, copy, modify, distribute, and sell this software
# and its documentation for any purpose is hereby granted without fee,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Carlos Maltzahn or
# the University of Colorado not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.  Carlos Maltzahn makes no representations about the
# suitability of this software for any purpose.  It is provided "as is"
# without express or implied warranty.
#
# CARLOS MALTZAHN AND THE UNIVERSITY OF COLORADO DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF COLORADO
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
# OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
# Author:
#       Carlos Maltzahn
#       Dept. of Computer Science
#       Campus Box 430
#       Univ. of Colorado, Boulder
#       Boulder, CO 80309
#
#       carlosm@cs.colorado.edu
#

                               Paos
                               ====

DISTRIBUTION
------------
Paos (Python active object server) is an active multi-user object server with a
simple query language. All software is written in Python. The distribution
consists of the following files:

  Store.py      - implements storing and locking of objects, the query language
                  and registration of notifications.

  Server.py     - implements the network interface of Store.py. Server.py
                  imports Store.py and is started by "python Server.py <port>"

  Client.py     - implements the network interface of a client. It is used
                  by importing it into a Python program.

  Schema.py     - defines the class DBobject. All objects that are to be
                  stored in the object server need to be of this class or
                  a class that inherits this class directly or indirectly.

  Utilities.py  - contains a number of functions that are used in
                  above modules.

  example/
  --------
  Producer.py   - implements a producer that accepts input lines and stores them
                  to the object server. Started by
                  "python Producer.py <host> <port>"

  Consumer.py   - implements a consumer that prints out lines produced by
                  a producer and is started by
                  "python Consumer.py <host> <port>"

  Talk.py       - implements two way communication (accepts input lines and
                  prints out lines received from the server as notifications).
                  Uses select call and the new pipe feature.

  ExSchema.py   - contains the schema necessary for Talk.py, Producer.py and
                  Consumer.py

INSTALLATION
------------
Look at http://www.python.org/ for information on how to get and
install Python.

During installation make sure that you include at least one database
module of either dbhash, gdbm, dbm, or macdb. I would recommend dbhash
or dbm with the ndbm library because these do not limit length of records
(which gdbm and the default library of dbm do; I don't know anything about
macdb).

Second you need to include the home directory of Paos and all your
application directories into the environment variable PYTHONPATH. In
this case the applicaton directory would be <Paos home>/example.
Sometimes this environment variable is not accessible to the Python
application (e.g. in CGI programs for a WWW server). Then your
application programs need to import the module "sys" and set the
variable "sys.path" appropriately.

STARTING THE SERVER
-------------------
You start the server by "python Server.py <port number>
[<database file name>]. The database file name is optional. The
default database file name is "database". The server then looks for a
file <database file name>.db. If it it does not exist, the server
creates a new file of this name.

CONNECTING TO THE SERVER
------------------------
The client can be either a standalone or an embedded Python program.
It needs to import Client.py. This module defines a class called
"Connection" which is instantiated as follows:

import Client

conn = Client.Connection(<host name>,
                         <port number>,
                         <client name> [,
                         <callback function>])

If host and port are correctly specified this creates a TCP connection to the
server.

<client name> can be an arbitrary string which is only useful
for debugging purposes and possible future extensions.

<callback function> is optional. If specified, this function is called
if the client receives a notification from the object server (see below on
how to register notification requests).

NEW in v0.2: Instead of the callback function you can now pass a pipe
instead of a callback function (a tuple of a read and write file
descriptor returned by os.pipe()). You can use select.select(...)
on the read descriptor of the pipe. Use Utilities.READ(...) and
pickle.loads(...) to receive the notification (see below for the format
of a notification). You also need to apply conn.register_objs(...) on
the notification's object list. See the example application.

All interactions with the server are defined as methods of the
Connection instance. Note also, that you could have multiple connections
to same or different servers. However, currently each object server has
a seperate object ID name space. Also, each client registrates with a
client specific name, not a connection specific name. Therefore, the
client programmer has to take care of possible name collisions. A future
version will introduce client naming that is unique over all connections
and object ID naming that is unique over all Paos object servers.

Use

conn.close()

to close the connection.

QUERYING THE OBJECT SERVER
--------------------------
In order to query the object server you use

answer = conn.get(<access mode>, <scope>, <property list>)

answer is a list of objects.

<access mode> can be either 'r' for read-only access or 'rw'
  for write-locking all objects contained in the answer. If some of the
  objects contained in answer are already write-locked by another client
  then the answer is None. Note the difference to an empty list that
  merely indicates that there is no object in the object server that
  matches the query. Note that each failure to acquire write-locks results
  in the loss of all write-locks acquired so far!

<scope> can be either a list of persistent object references or a class name.
  A persistent object reference is a tuple as follows:
  ('__db', <db_id>).

<db_id> is an integer issued to each object that is stored in the object server.

<property list> is a list of properties. A property is a tuple as follows:
  (<attribute name>, <relation>, <value>).

<attribute name> is a string specifying the name of an attribute of objects
  specified by <scope>.

<relation> can have '==', '!=', 'in', 'not in', 'has', 'has not',
  'all in', 'not all in', 'some in', 'none in'.

  The meaning of '==', ..., 'not in' is the same as in Python.

  A list 'has' element iff element 'in' a list.

  A list 'has not' element iff not list 'has' element.

  List A 'all in' list B iff the elements of A are a subset of elements of B.

  List A 'not all in' list B iff not list A 'all in' list B

  List A 'some in' list B iff there exist a non-empty subset C of elements of A
    which is also a subset of elements of B.

  List A 'none in' list B iff not list A 'some in' list B

  Note that 'some in' is not the same as 'not all in'. In the first case
  the subset C has to be non-empty; in the second case C can be empty.

CREATING NEW OBJECTS
--------------------

Each new object that is created in a client and that is eventually
written to the object server needs to be registered with the server
PRIOR TO COMMIT TIME. Objects that are not registered at commit time can
cause bad inconsistencies! In general new objects should be registered
before your first access to one of their attributes with references
to other persistent objects. Each registered object receives a unique
persistent object ID under the attribute name "db_id". Use

db_id_list = conn.register_objs(<obj_list>)

db_id_list is a list of db_id integers in the order corresponding to <obj_list>.

<obj_list> is a list of objects. It can contain registered and unregistered
  objects. Registrating registered objects is useful in connection with
  notifications (see below). All unregistered objects in <obj_list>
  acquire write-locks.

STORING OBJECTS
---------------
Objects are stored by using

ret = conn.commit(<obj_list>)

ret is either 'ok' or None if an error at the server occured
  (the diagnostics printed out by the server will give more information
  about the error - I'm aware that this is not a good solution; future
  versions will hopefully offer a better error handling).

<obj_list> is a list of objects. <obj_list> contains all the objects
  that are supposed to be written to  the database. However, only objects
  that were previously locked will be written to the object server; readonly
  objects are simply ignored.

LOCKING OBJECTS
---------------
It is possible to write-lock objects once they are loaded. Use

answer = conn.lock(<obj_list>)

answer is a list of objects locked. The order of the list corresponds to
  <obj_list>. However, answer contains the versions of objects
  as they were found in the object server at locking time. If the lock
  failed answer is None and all previously acquired locks are released.

<obj_list> is list of persistent objects to be locked. Objects that are not
  explicitly mentioned in the list (i.e., are only directly or indirectly
  referenced by objects explicitly mentioned in the list) are ignored.

Note: 'lock' is faster than 'get' in the case of failed locking: 'get'
retrieves objects before checking their locks while 'lock' checks locks first.

Note also that there are three occasions where all previously acquired
locks are lost: (1) calling "commit", (2) calling "lock" which fails, and
(3) closing the connection or terminating the client

ATTRIBUTE ACCESS
----------------
Assuming you load object a and b, and a.attr = b, i.e. a.attr contains a
pointer to b. Now you issue a query that loads b and c. However, a.attr
and b refer now to different objects because a.attr points to an older
version of b. With many objects referring to each other it can become
quite difficult to keep track of all the different versions of objects.

In Paos each connection instance maintains an object cache that is
updated by all connection methods except get_raw_notification() (see
below). Attribute access of registered objects always access objects
in the cache. Thus, in the above example a.attr always refers to the
newest version of b. If a user wants to keep the older version of b
she needs to assign it to a variable v before the next query. However,
b's references to other persistent objects always point to the newest
versions.

Another advantage of this policy of attribute access is that the client
will load objects from the object server as needed. For example, if
you load object a and you assign v = a.attr then the client will
automatically load b unless it is already in the cache.

This convenience comes with a price: When you define persistent object
classes you need to enumerate those attribute names that can have
attribute values which contain references to other persistent objects.
This information is kept in a special attribute called '__refs'. For
example:

import Schema

class A(schema.DBobject):
  def __init__(self):
    schema.DBObject.__init__(self)
    self.__refs = ['attr']

This assumes that instances of class A have an attribute called 'attr' that
can refer to other persistent objects.

NOTIFICATIONS
-------------
With

request_id = conn.register(<scope>, <property list>)

you can register a notification request. <scope> and <property list> have
the same meaning as in "get". A notification request is a query that is
stored at the object server and evaluated in each subsequent "commit"
against the set of objects that is written to the object server. If the
result of such a query is not empty the client which registered the
notification request is notified. The format of the notification is

(<request_id>, <obj_list>, <committing client>)

<request_id> corresponds with the returned value of the corresponding
  "register" call, i.e. identifies the corresponding query.

<obj_list> is the list of objects that matches the query.

<committing client> identifies the client that triggered the notification.

Note that there no client can register a notification request for
another client; each notification request corresponds to exactly one
client. Also note that notification request do not survive a client's
lifetime: If a client terminates (or crashes) all notification requests
owned by that client are deleted.

There are multiple ways for a client to process notifications. If the
connection to the server was created with a pointer to a callback
function in the fourth argument then the client is interrupted at each
notification (with the signal SIGUSR1) and the callback function is
called. Otherwise the client needs to poll for notifications. In both
cases notifications are retrieved by

notification = conn.get_notification()

Note that a notification is generated for each registered notification
request. For example, if a client registered two requests and a
subsequent commit contains objects matching both requests then the
object server sends two notifications to the client. Also note that
multiple notifications triggered by one commit are sent in the order
they were registered.

Each "get_notification" updates the object cache (see paragraph about
attribute access). One can avoid this by using

notification = conn.get_raw_notification()

Note however, that attribute access in objects within the notification
is not resolved correctly since these objects are disconnected from the
attribute resolution mechanism discussed above. To connect these objects
to the resolution mechanism use "register_objs" (this updates the object
cache).

If there are no notifications "get_notification" returns None.
With

conn.unregister(<request_id>)

you can retract a notification request.
From bizzaro at bc.edu  Thu Mar  4 11:49:39 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] [Fwd: An EMBOSS release to play with]
Message-ID: <36DEB9A3.6A8318F@bc.edu>

>From the EMBOSS mailing list:
-------------- next part --------------
An embedded message was scrubbed...
From: Peter Rice <pmr@sanger.ac.uk>
Subject: An EMBOSS release to play with
Date: Thu, 4 Mar 1999 15:08:50 GMT
Size: 3654
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990304/83005b51/attachment.mht
From david.lapointe at umassmed.edu  Thu Mar  4 15:02:54 1999
From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] New Book Release
Message-ID: <93307F07DE63D211B2F30000F808E9E525D6EF@edunivexch02.umassmed.edu>

Developing Linux Applications using GTK+ and GDK (Feb 1999). It doesn't seem
to be as much of a reference as much as a collection of applications using
GTK+, ( a notepad editor, a molecule viewer (PDB), a graphical apache  log
analyzer, etc).

http://www.mcp.com/publishers/new_riders/catalog/new_riders_nr_bud.cfm

David Lapointe
Manager - Research Computing Services
UMass Medical School
Worcester, MA 01655
508/856-5141


From carlosm at moet.cs.colorado.edu  Thu Mar  4 15:25:37 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] Paos article
In-Reply-To: <36DE5043.6B4D7C24@bc.edu>
Message-ID: <Pine.GSU.4.05.9903041322400.26561-100000@moet.cs.colorado.edu>


Thanks a bunch Jeff! I put a proof-read version of your translation on the
web: www.cs.colorado.edu/~carlosm/paos-english.html

This version also fixes some bugs in the original Linux Magazin version.

Carlos 

On Thu, 4 Mar 1999, J.W. Bizzaro wrote:

    Locians,
    
    Attached is the Linux Magazin article on Paos, translated to English.  I did
    this with the help of Babelfish, but it still required/s some cleanup.
    
    I made this effort because there is much confusion about just what Paos will do
    for Loci.  I hope it helps.
    
    Enjoy!
    
    
    Jeff
    -- 
    J.W. Bizzaro                  Phone: 617-552-3905
    Boston College                mailto:bizzaro@bc.edu
    Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
    --

From bizzaro at bc.edu  Mon Mar 15 11:05:57 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
References: <Pine.LNX.4.05.9903030028230.27929-100000@photino.sid.rice.edu>
Message-ID: <36ED2FE5.85724F@bc.edu>

Sorry for the delay in replying to this.

Rahul Jain wrote:
> 
> What exactly is the role of Paos in Loci? As I understand it, Paos is
> simply a way of moving objects back and forth. I think that Justin's
> original suggestion was good, with some details that need to be worked
> out.

Justin's original suggestion, as I understood it, was to embed workflow
information into the XML.  I agreed that it is a novel idea, but I have two
problems with it:

  (1) It would greatly increase the amount of parsing and writing involved
  (2) It would greatly diminish the role of the object server...
            (thus I asked, "why use PAOS?")

We do need PAOS for object serving, but what the XML cannot do on its own, is
handle active links between multiple loci.  What would XML do in these cases?

  (1) Workflow information has to be reported back along the path to several
loci
  (2) Several loci need to update a single XML

The best solution for this is to have a server manage XML usage.  But you see,
this is where we need PAOS.  And if PAOS can handle the workflow information as
an XML, wouldn't it be more efficient to just keep this information as data
structure objects?

> 
> With an XML parser, it should be trivial to convert a LociML file to an
> object in any language (Python, Perl, C++, etc.).

Yes.  I do see the use of XML for archiving and transferring objects, even
workflow objects.

But I think the advantage to having an XML that is biodata-only, is that it
might be used outside of Loci.  Maybe it will be more accepted than BSML or
BioML.  But if it contains workflow structures that are inseparable from the
biological, it may never be used.

Perhaps we can make BICML so that it does not *need* workflow data to be
complete, but that it can *handle* it.

I think if we BICML strongly labels biodata with ID#'s, workflow data can be
appended to the XML, kept in another XML format, or just kept in PAOS as objects
but be easier to track.

So we have 4 options for the workflow data:

  (1) Put it in BICML, mixed with the biodata
  (2) Put it in BICML, separate from the biodata
  (3) Put it in a separate XML
  (4) Leave it as pure objects in PAOS

In all cases, I would like PAOS to handle the workflow data.

Carlos, I'm curious if an XML parser can be integrated with PAOS.  I think it
would make all of this simpler, even though a parser could be separate.

[cut to save space]
> 
> This is only for the communication between the wfs server and the
> workspace. The communication between the wfs server and the loci can be
> done in a different way. That can and maybe should involve Paos
> specifically. We can worry about that later.

Thank you for the prototype.  It brings us closer to a format definition for our
XML.  But I think communication should be handled via PAOS rather than inventing
a new system that requires each client to access Internet sockets.

Can we come up with a system for options 2 and 3 above?

  (2) Put it in BICML, separate from the biodata
  (3) Put it in a separate XML

> 
> Also, I think that the definition for transfer in the glossary should also
> include objects. Whatever, it's 2AM and I feel like nitpicking. There, I
> feel much better. :)P

I guess we just wanted a short term to describe parse/write.  Of course someone
can "transfer an object".


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar 15 11:05:57 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
References: <Pine.LNX.4.05.9903030028230.27929-100000@photino.sid.rice.edu>
Message-ID: <36ED2FE5.85724F@bc.edu>

Sorry for the delay in replying to this.

Rahul Jain wrote:
> 
> What exactly is the role of Paos in Loci? As I understand it, Paos is
> simply a way of moving objects back and forth. I think that Justin's
> original suggestion was good, with some details that need to be worked
> out.

Justin's original suggestion, as I understood it, was to embed workflow
information into the XML.  I agreed that it is a novel idea, but I have two
problems with it:

  (1) It would greatly increase the amount of parsing and writing involved
  (2) It would greatly diminish the role of the object server...
            (thus I asked, "why use PAOS?")

We do need PAOS for object serving, but what the XML cannot do on its own, is
handle active links between multiple loci.  What would XML do in these cases?

  (1) Workflow information has to be reported back along the path to several
loci
  (2) Several loci need to update a single XML

The best solution for this is to have a server manage XML usage.  But you see,
this is where we need PAOS.  And if PAOS can handle the workflow information as
an XML, wouldn't it be more efficient to just keep this information as data
structure objects?

> 
> With an XML parser, it should be trivial to convert a LociML file to an
> object in any language (Python, Perl, C++, etc.).

Yes.  I do see the use of XML for archiving and transferring objects, even
workflow objects.

But I think the advantage to having an XML that is biodata-only, is that it
might be used outside of Loci.  Maybe it will be more accepted than BSML or
BioML.  But if it contains workflow structures that are inseparable from the
biological, it may never be used.

Perhaps we can make BICML so that it does not *need* workflow data to be
complete, but that it can *handle* it.

I think if we BICML strongly labels biodata with ID#'s, workflow data can be
appended to the XML, kept in another XML format, or just kept in PAOS as objects
but be easier to track.

So we have 4 options for the workflow data:

  (1) Put it in BICML, mixed with the biodata
  (2) Put it in BICML, separate from the biodata
  (3) Put it in a separate XML
  (4) Leave it as pure objects in PAOS

In all cases, I would like PAOS to handle the workflow data.

Carlos, I'm curious if an XML parser can be integrated with PAOS.  I think it
would make all of this simpler, even though a parser could be separate.

[cut to save space]
> 
> This is only for the communication between the wfs server and the
> workspace. The communication between the wfs server and the loci can be
> done in a different way. That can and maybe should involve Paos
> specifically. We can worry about that later.

Thank you for the prototype.  It brings us closer to a format definition for our
XML.  But I think communication should be handled via PAOS rather than inventing
a new system that requires each client to access Internet sockets.

Can we come up with a system for options 2 and 3 above?

  (2) Put it in BICML, separate from the biodata
  (3) Put it in a separate XML

> 
> Also, I think that the definition for transfer in the glossary should also
> include objects. Whatever, it's 2AM and I feel like nitpicking. There, I
> feel much better. :)P

I guess we just wanted a short term to describe parse/write.  Of course someone
can "transfer an object".


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar 15 12:36:32 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
References: <Pine.LNX.4.05.9903030028230.27929-100000@photino.sid.rice.edu> <36ED2FE5.85724F@bc.edu>
Message-ID: <36ED4520.67992C2C@bc.edu>

"J.W. Bizzaro" wrote:
> 
> Perhaps we can make BICML so that it does not *need* workflow data to be
> complete, but that it can *handle* it.

[cut]

> So we have 4 options for the workflow data:
> 
>   (1) Put it in BICML, mixed with the biodata
>   (2) Put it in BICML, separate from the biodata
>   (3) Put it in a separate XML
>   (4) Leave it as pure objects in PAOS
> 

I want to stress that in all cases we should make an XML that can include
workflow data, but is complete without it, having only bio data.

So with option 1, where the XML has workflow and bio data mixed, can the
workflow data be left out by someone who wants to use it as a purely biological
ML?

If we can do this, and include an XML parser with PAOS, I'll be happy.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Mon Mar 15 12:36:32 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
References: <Pine.LNX.4.05.9903030028230.27929-100000@photino.sid.rice.edu> <36ED2FE5.85724F@bc.edu>
Message-ID: <36ED4520.67992C2C@bc.edu>

"J.W. Bizzaro" wrote:
> 
> Perhaps we can make BICML so that it does not *need* workflow data to be
> complete, but that it can *handle* it.

[cut]

> So we have 4 options for the workflow data:
> 
>   (1) Put it in BICML, mixed with the biodata
>   (2) Put it in BICML, separate from the biodata
>   (3) Put it in a separate XML
>   (4) Leave it as pure objects in PAOS
> 

I want to stress that in all cases we should make an XML that can include
workflow data, but is complete without it, having only bio data.

So with option 1, where the XML has workflow and bio data mixed, can the
workflow data be left out by someone who wants to use it as a purely biological
ML?

If we can do this, and include an XML parser with PAOS, I'll be happy.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From rahul at photino.sid.rice.edu  Mon Mar 15 17:55:59 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
In-Reply-To: <36ED4520.67992C2C@bc.edu>
Message-ID: <Pine.LNX.4.05.9903151627410.11161-100000@photino.sid.rice.edu>

In the plan I gave you guys, I only meant for the XML to be a way to
communicate between a wfs and the Workspace (GUI). It was designed so that
intermittently connected clients (or people who need to log out) can check
up on the status of their analysis from time to time, esp. on a really
long analysis. PAOS would most likely be used as the mode of communication
between the wfs and the loci.

Regarding the comment on making workflow information independent from the
biological stuff, I think the data section covers that separation quite
well. Keep the bio-related stuff in between the <data> and </data> tags
and the rest is Loci-specific workflow information. This format also makes
it easy to store the results of an analysis for archival purposes.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Tue Mar 16 17:11:33 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
References: <Pine.LNX.4.05.9903151627410.11161-100000@photino.sid.rice.edu>
Message-ID: <36EED715.9DD9C338@bc.edu>

Rahul Jain wrote:
> 
> In the plan I gave you guys, I only meant for the XML to be a way to
> communicate between a wfs and the Workspace (GUI).

I don't see WFS <---> Workspace (Work Flow Diagram and Notebook) communication
being much different from WFS <---> Tool communication.

> It was designed so that
> intermittently connected clients (or people who need to log out) can check
> up on the status of their analysis from time to time, esp. on a really
> long analysis.

I like this idea.  Was it Justin who first suggested it?  It's a good argument
for keeping a "hard copy" of the workflow data on disk via XML.  Imagine that
the system goes down for some reason, or even that the user wants to exit Loci
and log out.  Loci could just pick up later where it left off.

> Regarding the comment on making workflow information independent from the
> biological stuff, I think the data section covers that separation quite
> well. Keep the bio-related stuff in between the <data> and </data> tags
> and the rest is Loci-specific workflow information. This format also makes
> it easy to store the results of an analysis for archival purposes.

This is from your message:

<data>
  <input>
    [data block]
  </input>
  <output>
    <step id="q1">
      [data block]
    </step>
    [more steps]
  </output>
</data>

Let's see..

<data> is either workflow or bio
<input> and <output> are workflow
<step> is workflow
[data block] is bio

If that is correct, bio data is nested directly in workflow sections in 2
cases.  I suppose this is acceptable if the definition of BICML will allow for
bio data to go directly under <data>:

<data>
  <aa1d id=12378728937>
    (amino acid 1-dimensional/sequence)
  </aa1d>
</data>

Something like that :-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Mar 16 17:11:33 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] inter-locus communication
References: <Pine.LNX.4.05.9903151627410.11161-100000@photino.sid.rice.edu>
Message-ID: <36EED715.9DD9C338@bc.edu>

Rahul Jain wrote:
> 
> In the plan I gave you guys, I only meant for the XML to be a way to
> communicate between a wfs and the Workspace (GUI).

I don't see WFS <---> Workspace (Work Flow Diagram and Notebook) communication
being much different from WFS <---> Tool communication.

> It was designed so that
> intermittently connected clients (or people who need to log out) can check
> up on the status of their analysis from time to time, esp. on a really
> long analysis.

I like this idea.  Was it Justin who first suggested it?  It's a good argument
for keeping a "hard copy" of the workflow data on disk via XML.  Imagine that
the system goes down for some reason, or even that the user wants to exit Loci
and log out.  Loci could just pick up later where it left off.

> Regarding the comment on making workflow information independent from the
> biological stuff, I think the data section covers that separation quite
> well. Keep the bio-related stuff in between the <data> and </data> tags
> and the rest is Loci-specific workflow information. This format also makes
> it easy to store the results of an analysis for archival purposes.

This is from your message:

<data>
  <input>
    [data block]
  </input>
  <output>
    <step id="q1">
      [data block]
    </step>
    [more steps]
  </output>
</data>

Let's see..

<data> is either workflow or bio
<input> and <output> are workflow
<step> is workflow
[data block] is bio

If that is correct, bio data is nested directly in workflow sections in 2
cases.  I suppose this is acceptable if the definition of BICML will allow for
bio data to go directly under <data>:

<data>
  <aa1d id=12378728937>
    (amino acid 1-dimensional/sequence)
  </aa1d>
</data>

Something like that :-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Tue Mar 16 18:23:41 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server
Message-ID: <36EEE7FD.D2821B12@bc.edu>

Locians,

Tomorrow I will try to finish setting up our new server.  It's not much, but
it'll work for now:

Pentium I/100 MHz
16 MB RAM (maybe I should get more)
10 GB HDD (brand new!)
RedHat Linux 5.2

I will register the domain name bicgroup.org.  The new Loci Web site will likely
be at www.bicgroup.org/loci.  But we'll just have an IP address for a while.

The computer is partly owned by UMass Lowell, but we will work something out as
we (Ken Marx and I) do not want to associate the Loci Project with the
University.  (I'm trying to avoid intellectual property problems here.  I'm not
paid by the University or a student there any longer, and I don't want the
school to claim rights just because the server is there.)

As time passes and funds pass my way, I will set up servers in my home.

I want to give everyone an account.  This way, we can upload and download what
each of us has done.  I have considered issues like CVS and patches, but the
nature of Loci, being all smallish scripts with one author per script, allows us
to avoid these things rather nicely.  Isn't Python wonderful?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Wed Mar 17 12:54:45 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] Dave Beck joins!
Message-ID: <36EFEC65.50CACC93@bc.edu>

Here is Dave's latest e-mail:
-------------- next part --------------
An embedded message was scrubbed...
From: Dave Beck <dave@arginine.umdnj.edu>
Subject: Re: Loci / TULIP
Date: Wed, 17 Mar 1999 12:17:26 -0500
Size: 5212
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990317/911f0a46/attachment.mht
From bizzaro at bc.edu  Thu Mar 18 00:28:24 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server up
Message-ID: <36F08EF8.880A6043@bc.edu>

Okay.  We've got a dedicated server guys!

    129.63.144.25

This will be shortly named

    onsager.uml.edu

But use the IP for now.

Everyone gets an account.  I will send the pwords to you directly.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From carlosm at moet.cs.colorado.edu  Thu Mar 18 02:39:02 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <36F08EF8.880A6043@bc.edu>
Message-ID: <Pine.GSU.4.05.9903172330470.25792-100000@moet.cs.colorado.edu>


Thanks Jeff. 

Be aware that onsager is not behind a firewall and seems to run Red Hat.
I wouldn't use onsager for anything that cannot be restored very easily.

Jeff, are you planning to give us some tulip-related web space on onsager?

Carlos 

On Thu, 18 Mar 1999, J.W. Bizzaro wrote:

    Okay.  We've got a dedicated server guys!
    
        129.63.144.25
    
    This will be shortly named
    
        onsager.uml.edu
    
    But use the IP for now.
    
    Everyone gets an account.  I will send the pwords to you directly.
    
    
    Jeff
    -- 
    J.W. Bizzaro                  Phone: 617-552-3905
    Boston College                mailto:bizzaro@bc.edu
    Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
    --
    

From bizzaro at bc.edu  Thu Mar 18 03:07:25 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server up
References: <Pine.GSU.4.05.9903172330470.25792-100000@moet.cs.colorado.edu>
Message-ID: <36F0B43D.F5684219@bc.edu>

Carlos Maltzahn wrote:
> 
> Thanks Jeff.
> 
> Be aware that onsager is not behind a firewall and seems to run Red Hat.
> I wouldn't use onsager for anything that cannot be restored very easily.

I know there is no firewall.  But what's wrong with Red Hat?

> 
> Jeff, are you planning to give us some tulip-related web space on onsager?

Anything you want.  What did you have in mind?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From jabbo at mindless.com  Thu Mar 18 05:02:00 1999
From: jabbo at mindless.com (Tim)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server up
References: <Pine.GSU.4.05.9903172330470.25792-100000@moet.cs.colorado.edu> <36F0B43D.F5684219@bc.edu>
Message-ID: <99Mar18.100530est.131770@gateway.macroint.com>

>> I know there is no firewall.  But what's wrong with Red Hat?

Two words: script kiddies


Either use ipchains or a packet filtering router (eg. a POS with a PCI
bus ;-)).

-- 
    "Lisp has all the visual appeal of oatmeal
       with fingernail clippings mixed in." 

                                 --Larry Wall

From dave at arginine.umdnj.edu  Thu Mar 18 08:17:59 1999
From: dave at arginine.umdnj.edu (Dave Beck)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <36F0B43D.F5684219@bc.edu>; from J.W. Bizzaro on Thu, Mar 18, 1999 at 08:07:25AM +0000
References: <Pine.GSU.4.05.9903172330470.25792-100000@moet.cs.colorado.edu> <36F0B43D.F5684219@bc.edu>
Message-ID: <19990318081759.C18203@arginine.umdnj.edu>

If enough people have access to the clients, Jeff, or even if only a few
might, you could install ssh (http://www.cs.hut.fi/ssh/).  Will there be
a CVS repository on that box?

Quoting J.W. Bizzaro (bizzaro@bc.edu):
> Carlos Maltzahn wrote:
> > 
> > Thanks Jeff.
> > 
> > Be aware that onsager is not behind a firewall and seems to run Red Hat.
> > I wouldn't use onsager for anything that cannot be restored very easily.
> 
> I know there is no firewall.  But what's wrong with Red Hat?
> 
> > 
> > Jeff, are you planning to give us some tulip-related web space on onsager?
> 
> Anything you want.  What did you have in mind?
> 
> 
> Jeff
> -- 
> J.W. Bizzaro                  Phone: 617-552-3905
> Boston College                mailto:bizzaro@bc.edu
> Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
> --

-- 
Dave Beck 
dave@arginine.umdnj.edu                 Sites of interest (set 1):
Computer Science and Biology            http://locus.umdnj.edu/nigms/
Drexel University, Philadelphia PA      http://www.bio.net/

From jabbo at mindless.com  Thu Mar 18 08:18:07 1999
From: jabbo at mindless.com (Tim)
Date: Fri Feb 10 19:18:22 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb>
Message-ID: <99Mar18.132138est.131763@gateway.macroint.com>

That reminds me, you should consider putting up a packet filter and only
allowing connections on ports 80 and <whatever SSH uses; forgetting
right now>.

Plaintext logins are a Bad Thing... SSH is a good thing.  And CVS can
run inside of SSH (duh, but worth noting).

-- 

                "A goal is a dream with a deadline."

                          -- Harvey Mackay

From jabbo at mindless.com  Thu Mar 18 11:24:30 1999
From: jabbo at mindless.com (Tim)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu>
Message-ID: <99Mar18.162759est.131743@gateway.macroint.com>

That reminds me -- what sort of RAM does the machine take?  I can pick
some up at auction and beef the onsager.uml.edu server up to a
respectable amount if you tell me what type (EDO or FP, ECC or not, how
many pins, how many nanoseconds) it takes.  16MB won't cut it for
anything exciting.  (hell, my workstation has 128MB, but that's so I can
cache the OS into memory ;-))

Also, I apologize for being almost dead to the world.  I have been under
a lot of pressure to pull off a lesser miracle ... as of April 1st that
pressure is off.  I have been playing with PyGTK and trying to get back
into the swing of things, but the codon code I thought was finished
isn't around, and I'd like to stick an interface on it anyways.  I will
have a lot of leverage here after my deadline.

One thing that (thanks to work) I've been playing with a whole lot is
servlets; I know that a web interface isn't really what we're after, but
there are some stupendous projects out there that might allow us to run
JPython versions of some of the code on a webserver.  That, combined
with the ability to do cool stuff with corba, equals a lot of freedom
for showing prototypes to the people that would actually use this
package.  Anyways, I'll write more on this after my deadline.

Konrad -- I know French crypto laws are sort of fascist but is there any
way to use something similar to ssh?  Or alternatively could we set up a
mirroring type of thing?  Or... hell, this could be interesting.  We
gotta work around it.  I guess it would be way better to risk a
corrupted codebase than to have barriers to people like Konrad's
contributing easily.

Jon Stevens at clearink has a bunch of notes on setting up CVS and
managing stuff behind-the-scenes for the Java-Apache project:

http://www.working-dogs.com

Or alternatively I could help out after April 1st.  (there's a theme
here ;-))
Seriously though I can think of some other solutions now that I'm
writing; we have an interactive system here at Macro that runs under
SSL, maybe that would work, if so I can help you set it up (that'd be
port 443) and we could work on it from that angle.  Being fascist is
silly, but so is losing work!

-- 
        "When it is not necessary to make a decision,
          it is necessary not to make a decision."

                                    --Lord Falkland

From carlosm at mroe.cs.colorado.edu  Thu Mar 18 12:59:07 1999
From: carlosm at mroe.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <36F0B43D.F5684219@bc.edu>
Message-ID: <Pine.LNX.4.04.9903181042550.487-100000@schpamb>


> > Be aware that onsager is not behind a firewall and seems to run Red Hat.
> > I wouldn't use onsager for anything that cannot be restored very easily.
> 
> I know there is no firewall.  But what's wrong with Red Hat?

Our passwords are going through the Internet in plain text. It's extremely
easy to snoop them and then login. Red Hat's user friendly admin tools
have the tendency to permit users to acquire root access among other
things. RH's distributions are so unsecure that our department
doesn't allow us to connect RH computers to the network inside the
firewall. The Debian distribution tends to be more secure. 

I would recommend to put onsager behind a firewall and allow us to login
through the firewall using ssh or at least one-time passwords. 

> > Jeff, are you planning to give us some tulip-related web space on onsager?
> 
> Anything you want.  What did you have in mind?

I will start working at a company two months from now and eventually lose
my CU account. At that point I'd like to have a neutral place for Paos. I
was thinking about putting it on onsager -- but it needs to be more secure
than it is now. I hate to discover one day that the Paos distribution
contains a Trojan horse or something else ugly. 

More generally, I think onsager is not a save repository for Tulip
development right now.

Carlos

From dave at arginine.umdnj.edu  Thu Mar 18 13:55:17 1999
From: dave at arginine.umdnj.edu (Dave Beck)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <99Mar18.132138est.131763@gateway.macroint.com>; from Tim on Thu, Mar 18, 1999 at 08:18:07AM -0500
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb> <99Mar18.132138est.131763@gateway.macroint.com>
Message-ID: <19990318135517.A21261@arginine.umdnj.edu>

Tim has the idea...  I don't quite agree with Carlos's assesment of 
Red Hat's security flaws, but I don't think that matters if /etc/hosts.*
files were set up properly and only SSH, port 80, and perhaps anonymous
FTP were allowed from "unknown" hosts.  As far as Paos being on a server
that could be cracked, granted Carlos knows best of the potential dangers
of Paos, but it would seem to me that ANY machine is potentialy vulnerable
especially with man in the middle attacks possible.  If there is potential
for trojan horses being sent via Paos then Paos needs to deal with that
(by providing some kind of encryption / tamper proofing on its messages)
and not the server or operating system.  I don't think it is reasonable
to expect every locus server that might want to paticipate to ensure that
its local network and every network between source and destination be
secure and "tamper proof."  Its more realistic to put a seatbelt in every
car than it is to expect everyone to be a perfect driver.

Quoting Tim (jabbo@mindless.com):
> That reminds me, you should consider putting up a packet filter and only
> allowing connections on ports 80 and <whatever SSH uses; forgetting
> right now>.
> 
> Plaintext logins are a Bad Thing... SSH is a good thing.  And CVS can
> run inside of SSH (duh, but worth noting).
> 
> -- 
> 
>                 "A goal is a dream with a deadline."
> 
>                           -- Harvey Mackay

-- 
Dave Beck 
dave@arginine.umdnj.edu                 Sites of interest (set 1):
Computer Science and Biology            http://locus.umdnj.edu/nigms/
Drexel University, Philadelphia PA      http://www.bio.net/

From rahul at photino.sid.rice.edu  Thu Mar 18 14:21:50 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <Pine.LNX.4.04.9903181042550.487-100000@schpamb>
Message-ID: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu>

On Thu, 18 Mar 1999, Carlos Maltzahn wrote:

> Our passwords are going through the Internet in plain text. It's extremely
> easy to snoop them and then login. Red Hat's user friendly admin tools
> have the tendency to permit users to acquire root access among other
> things. RH's distributions are so unsecure that our department
> doesn't allow us to connect RH computers to the network inside the
> firewall. The Debian distribution tends to be more secure. 

I agree that Debian is generally more secure, but what's this about
getting root with the admin tools? They're not suid root.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From hinsen at cnrs-orleans.fr  Thu Mar 18 14:41:58 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <19990318135517.A21261@arginine.umdnj.edu> (message from Dave
	Beck on Thu, 18 Mar 1999 13:55:17 -0500)
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb> <99Mar18.132138est.131763@gateway.macroint.com> <19990318135517.A21261@arginine.umdnj.edu>
Message-ID: <199903181941.UAA26378@dirac.cnrs-orleans.fr>

> Tim has the idea...  I don't quite agree with Carlos's assesment of 
> Red Hat's security flaws, but I don't think that matters if /etc/hosts.*
> files were set up properly and only SSH, port 80, and perhaps anonymous
> FTP were allowed from "unknown" hosts.  As far as Paos being on a server

In principle I like the idea of SSH as much as others, but I have a
small problem: French cryptography law does not allow me to use SSH.
There are plans to change them, but as far as I know nothing has
happened yet.

On the other hand, I see no problem with restricting telnet and ftp
access to specific hosts; I can always go through my home machine
if necessary.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From carlosm at moet.cs.colorado.edu  Thu Mar 18 15:07:38 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu>
Message-ID: <Pine.GSU.4.05.9903181259020.27963-100000@moet.cs.colorado.edu>


I don't know the details of these security flaws. But if you look at the
RH errata you see a lot of updates regarding users being able to get root
access. It might all be fixed by now -- or it might not. I know of
multiple groups here who switched to Debian because they had
problems with people being able to hack into their RH systems.

I'm not a firewall expert either. All I know is that breakins at the CU CS
department were very frequent until we introduced a firewall, ssh, and
one-time passwords.

Carlos 

On Thu, 18 Mar 1999, Rahul Jain wrote:

    On Thu, 18 Mar 1999, Carlos Maltzahn wrote:
    
    > Our passwords are going through the Internet in plain text. It's extremely
    > easy to snoop them and then login. Red Hat's user friendly admin tools
    > have the tendency to permit users to acquire root access among other
    > things. RH's distributions are so unsecure that our department
    > doesn't allow us to connect RH computers to the network inside the
    > firewall. The Debian distribution tends to be more secure. 
    
    I agree that Debian is generally more secure, but what's this about
    getting root with the admin tools? They're not suid root.
    
    -- 
    -> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
    -> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
    -> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
    -> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
    |--|--------|--------------|----|-------------|------|---------|-----|-|
       Version 11.423.999.210000101.23.50110101.042
       (c)1996-1999, All rights reserved. Disclaimer available upon request.
    
    
From rahul at photino.sid.rice.edu  Thu Mar 18 18:56:42 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <Pine.GSU.4.05.9903181259020.27963-100000@moet.cs.colorado.edu>
Message-ID: <Pine.LNX.4.10.9903181754360.12428-100000@photino.sid.rice.edu>

On Thu, 18 Mar 1999, Carlos Maltzahn wrote:

> 
> I don't know the details of these security flaws. But if you look at the
> RH errata you see a lot of updates regarding users being able to get root
> access. It might all be fixed by now -- or it might not. I know of
> multiple groups here who switched to Debian because they had
> problems with people being able to hack into their RH systems.

AFAIK, these were bugs in the original packages, and were present in all
distros. RH is probably just more vocal about the bugfixes because they
have more corporate customers to worry about and they may not watch lists
such as BUGTRAQ.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Thu Mar 18 19:58:43 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.GSU.4.05.9903172330470.25792-100000@moet.cs.colorado.edu> <36F0B43D.F5684219@bc.edu> <19990318081759.C18203@arginine.umdnj.edu>
Message-ID: <36F1A143.52EC0104@bc.edu>

Dave Beck wrote:
> 
> If enough people have access to the clients, Jeff, or even if only a few
> might, you could install ssh (http://www.cs.hut.fi/ssh/).  Will there be
> a CVS repository on that box?
> 

I have the ssh2 package already, but I have never used it.

CVS is another area I will need some help with.

I am more an ambitious junior scientist than an OSS hacker ;-)


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Mar 18 20:10:08 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu> <99Mar18.162759est.131743@gateway.macroint.com>
Message-ID: <36F1A3F0.88F06833@bc.edu>

Tim wrote:
> 
> That reminds me -- what sort of RAM does the machine take?  I can pick
> some up at auction and beef the onsager.uml.edu server up to a
> respectable amount if you tell me what type (EDO or FP, ECC or not, how
> many pins, how many nanoseconds) it takes.  16MB won't cut it for
> anything exciting.  (hell, my workstation has 128MB, but that's so I can
> cache the OS into memory ;-))

I friend of mine just gave me a fist full of SIMMS, 72-pin EDO.  We don't even
know the MB yet.  I'll just have to plug them in and try.  But the computer will
get much more than 16 MB if I can help it.  Thanks for the offer!  But I'll see
how this works out first.

You didn't mention the fact that it is a Pentium 100.  I know that's pathetic,
but it's the best I can do for now.

> 
> Also, I apologize for being almost dead to the world.  I have been under
> a lot of pressure to pull off a lesser miracle ... as of April 1st that
> pressure is off.  I have been playing with PyGTK and trying to get back
> into the swing of things, but the codon code I thought was finished
> isn't around, and I'd like to stick an interface on it anyways.  I will
> have a lot of leverage here after my deadline.

Are you working on a thesis?

> 
> One thing that (thanks to work) I've been playing with a whole lot is
> servlets; I know that a web interface isn't really what we're after, but
> there are some stupendous projects out there that might allow us to run
> JPython versions of some of the code on a webserver.

Ahhh.  You may want to get together with Rahul about this.  Since Sun made Java
somewhat open source, I can accept a limited implementation of it for the Web
front end.  That project is, afterall, not part of the Loci core, so it can be
licensed anyway we want.  Since Loci is LGPL rather than GPL, the guts of the
Web interface are irrelevant to the rest of Loci.

> That, combined
> with the ability to do cool stuff with corba, equals a lot of freedom
> for showing prototypes to the people that would actually use this
> package.  Anyways, I'll write more on this after my deadline.

Hmmm.  Looking forward to it.

> Jon Stevens at clearink has a bunch of notes on setting up CVS and
> managing stuff behind-the-scenes for the Java-Apache project:
> 
> http://www.working-dogs.com

I'll check it out.

> 
> Or alternatively I could help out after April 1st.  (there's a theme
> here ;-))
> Seriously though I can think of some other solutions now that I'm
> writing; we have an interactive system here at Macro that runs under
> SSL, maybe that would work, if so I can help you set it up (that'd be
> port 443) and we could work on it from that angle.  Being fascist is
> silly, but so is losing work!
> 

Do you mean SSL or SSH?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Mar 18 20:22:43 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb>
Message-ID: <36F1A6E3.7DEA0ABE@bc.edu>

Carlos Maltzahn wrote:
> 
> Our passwords are going through the Internet in plain text. It's extremely
> easy to snoop them and then login. Red Hat's user friendly admin tools
> have the tendency to permit users to acquire root access among other
> things. RH's distributions are so unsecure that our department
> doesn't allow us to connect RH computers to the network inside the
> firewall.

Even _inside_ of a firewall?

I know of one case where password snooping led to a security breach on a Solaris
system.  They used one-time passwords after that...pain.

> I would recommend to put onsager behind a firewall and allow us to login
> through the firewall using ssh or at least one-time passwords.

UMass Lowell just doesn't seem so concerned about firewalls.

Actually, I just set up a Web server at Boston College using Red Hat.  But BC
has this firewall set up for every system on the network that prevents every
attempt to make a connection from the outside, which naturally blocks the Web
server.  I asked to have the firewall removed, and as nutty as they are about
security, BC said all I have to do is disable finger and update sendmail.

And the system administrator is a real Linux guru.  He seemed to have little
concern about using Red Hat.

> > > Jeff, are you planning to give us some tulip-related web space on onsager?
> >
> > Anything you want.  What did you have in mind?
> 
> I will start working at a company two months from now and eventually lose
> my CU account. At that point I'd like to have a neutral place for Paos. I
> was thinking about putting it on onsager -- but it needs to be more secure
> than it is now. I hate to discover one day that the Paos distribution
> contains a Trojan horse or something else ugly.

I would be honored to host PAOS.  We'll get this security problem settled.

> More generally, I think onsager is not a save repository for Tulip
> development right now.

Where do you think the biggest threat comes from, other developers or the
occasional cracker?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Mar 18 20:33:42 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb> <99Mar18.132138est.131763@gateway.macroint.com> <19990318135517.A21261@arginine.umdnj.edu>
Message-ID: <36F1A976.D014FF34@bc.edu>

Dave Beck wrote:
> 
> Tim has the idea...  I don't quite agree with Carlos's assesment of
> Red Hat's security flaws, but I don't think that matters if /etc/hosts.*
> files were set up properly and only SSH, port 80, and perhaps anonymous
> FTP were allowed from "unknown" hosts.

Okay.  We need someone to volunteer to be our anti-cracker.

Tim?  Carlos?  Dave?  Rahul?

> As far as Paos being on a server
> that could be cracked, granted Carlos knows best of the potential dangers
> of Paos, but it would seem to me that ANY machine is potentialy vulnerable
> especially with man in the middle attacks possible.  If there is potential
> for trojan horses being sent via Paos then Paos needs to deal with that
> (by providing some kind of encryption / tamper proofing on its messages)
> and not the server or operating system.  I don't think it is reasonable
> to expect every locus server that might want to paticipate to ensure that
> its local network and every network between source and destination be
> secure and "tamper proof."  Its more realistic to put a seatbelt in every
> car than it is to expect everyone to be a perfect driver.

I'm sure Carlos was referring to the PAOS source code tree or whatever being
compromised on an insecure server.

But the reality of the Loci communication process being "secure" has not escaped
me.  We cannot guarantee that every Loci client (locus) on the Internet is
legitimate, but we can take measures to keep loci communication in sort of a
"sandbox", to use a Java term.

Another concern is that companies using Loci will want to keep communication
private, so that no one steals their million-dollar discovery.  Maybe someone
into encryption would like to take on that project.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Mar 18 20:35:52 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu>
Message-ID: <36F1A9F8.39EC5CC3@bc.edu>

Rahul Jain wrote:
> 
> On Thu, 18 Mar 1999, Carlos Maltzahn wrote:
> 
> > Our passwords are going through the Internet in plain text. It's extremely
> > easy to snoop them and then login. Red Hat's user friendly admin tools
> > have the tendency to permit users to acquire root access among other
> > things. RH's distributions are so unsecure that our department
> > doesn't allow us to connect RH computers to the network inside the
> > firewall. The Debian distribution tends to be more secure.
> 
> I agree that Debian is generally more secure, but what's this about
> getting root with the admin tools? They're not suid root.
> 

Red Hat doesn't allow a direct login to root from a remote host.  But you can
log into a user account and use "su", if that is what you're referring to.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Mar 18 20:37:02 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb> <99Mar18.132138est.131763@gateway.macroint.com> <19990318135517.A21261@arginine.umdnj.edu> <199903181941.UAA26378@dirac.cnrs-orleans.fr>
Message-ID: <36F1AA3E.DFAFE99A@bc.edu>

Konrad Hinsen wrote:

> On the other hand, I see no problem with restricting telnet and ftp
> access to specific hosts; I can always go through my home machine
> if necessary.

Or to specific domains.  That's a good idea.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Thu Mar 18 22:35:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] and a new idea!
Message-ID: <36F1C5F0.8D4765D8@bc.edu>

Boy, you guys now have a mailbox full of my messages ;-)

I did originally want to make the focus of Loci the production of
publication-quality figures.  This is where some of the comparisons to The GIMP
came in.  Every graphical locus is really supposed to be preparing an
illustration/picture/image.

Well, how about this:  We have a _central_ "canvas", where the user can grab
figures from other loci and drop them into the canvas.

The way I see it, someone can take a nucleotide sequence from one locus, drop it
onto the canvas, and then take the 3D structure of the DNA or RNA, and drop it
right below the sequence for comparison.

So, the user is really building a figure for publication.

The workflow system comes in to play now because I would like to see each figure
on the canvas dynamically updated by the originating locus.  E.g., the user
wants to go back and edit the sequence.  When this is done, the user won't have
to drag it back over to the canvas; Loci does it automatically.

I think it does sort of tie together the graphical loci, as they weren't so much
before.  You know, the user really does have a single task in mind.  They want
to publish their results.

Any comments?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Fri Mar 19 01:37:48 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <36F1A143.52EC0104@bc.edu>
Message-ID: <Pine.OSF.4.03.9903190032580.23123-100000@busboy.sped.ukans.edu>

On Fri, 19 Mar 1999, J.W. Bizzaro wrote:

> I have the ssh2 package already, but I have never used it.
> CVS is another area I will need some help with.

I can give you a hand with both.
If you want, this weekend I'll put both on (but it'll require temporary
root access).

Justin


From Thomas.Sicheritz at molbio.uu.se  Fri Mar 19 03:13:50 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <36F1A976.D014FF34@bc.edu>
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb>
	<99Mar18.132138est.131763@gateway.macroint.com>
	<19990318135517.A21261@arginine.umdnj.edu>
	<36F1A976.D014FF34@bc.edu>
Message-ID: <14066.631.492397.523413@beagle.bmc.uu.se>


 > > Tim has the idea...  I don't quite agree with Carlos's assesment of
 > > Red Hat's security flaws, but I don't think that matters if /etc/hosts.*
 > > files were set up properly and only SSH, port 80, and perhaps anonymous
 > > FTP were allowed from "unknown" hosts.
 > 
 > Okay.  We need someone to volunteer to be our anti-cracker.
 > 
 > Tim?  Carlos?  Dave?  Rahul?
 > 

I agree in RedHat being the least secure of all distributions - I switched
from Debian & RH to Suse on all of the departments and my personal
machines.
One of our fresh installes RH machines was on the net in 7 minutes before
the first successfull crack-in ... :-(

My policy here is
* restricted secure shell 
* if ssh is not an alternative: tcp_wrapper protected telnet/ftp
  and I do NOT close all ports - instead I wrapp/twist/fake them with tcp_wrapper
  so that we get a chance to notice any cracking attempts; read script kiddies
  (try to finger me at beagle.bmc.uu.se - I assure you we dont have users
  named fritz or bertram)
* of course ... no rsh.rcp, rhost etc.

My suggestion is to (at least) wrap all open ports directly in inetd.


I fear that I have to stop looking at python and the sequence editor for a
while ... to many meetings and to many unwritten thesises (=1) 


-thomas

-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From hinsen at cnrs-orleans.fr  Fri Mar 19 04:09:38 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <99Mar18.162759est.131743@gateway.macroint.com> (message from Tim
	on Thu, 18 Mar 1999 11:24:30 -0500)
References: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu> <99Mar18.162759est.131743@gateway.macroint.com>
Message-ID: <199903190909.KAA14982@dirac.cnrs-orleans.fr>

> Konrad -- I know French crypto laws are sort of fascist but is there any
> way to use something similar to ssh?  Or alternatively could we set up a

I don't know for sure, but in principle anything using cryptography
is not allowed. I have seen the opinion that using ssh for password
protection is OK as long as the following session is not encrypted;
I think this was deduced by analogy to e-mail encryption, which is
allowed for signatures but not for encrypting content.

To make things worse, I work for a French government institution,
so our system administrators won't tolerate anything which looks
just the slightest bit illegal.

On the other hand, if I am the only one in France (and it seems so at
the moment), then don't worry. I can always go through my account on
Starship Python, and use ssh from there.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From dave at arginine.umdnj.edu  Fri Mar 19 09:37:10 1999
From: dave at arginine.umdnj.edu (Dave Beck)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] sequence editor (WAS: new server up)
In-Reply-To: <14066.631.492397.523413@beagle.bmc.uu.se>; from Thomas.Sicheritz@molbio.uu.se on Fri, Mar 19, 1999 at 09:13:50AM +0100
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb> <99Mar18.132138est.131763@gateway.macroint.com> <19990318135517.A21261@arginine.umdnj.edu> <36F1A976.D014FF34@bc.edu> <14066.631.492397.523413@beagle.bmc.uu.se>
Message-ID: <19990319093710.A26189@arginine.umdnj.edu>

Quoting Thomas.Sicheritz@molbio.uu.se (Thomas.Sicheritz@molbio.uu.se):
> I fear that I have to stop looking at python and the sequence editor for a
> while ... to many meetings and to many unwritten thesises (=1) 

Thomas, would you mind if I started futzing with it?  I'd like to start
porting my QT/C++ based sequence editor to Python/C/GTK and what you have
is a terrific start...

-- 
Dave Beck 
dave@arginine.umdnj.edu                 Sites of interest (set 2):
Computer Science and Biology            http://www.cyc.com/cyc-2-1/toc.html
Drexel University, Philadelphia PA      http://arginine.umdnj.edu/

From jabbo at mindless.com  Fri Mar 19 09:56:40 1999
From: jabbo at mindless.com (Tim)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.LNX.4.10.9903181316570.2788-100000@photino.sid.rice.edu> <99Mar18.162759est.131743@gateway.macroint.com> <36F1A3F0.88F06833@bc.edu>
Message-ID: <99Mar19.150002est.131730@gateway.macroint.com>

>> Or alternatively I could help out after April 1st.  (there's a theme
>> here ;-))
>> Seriously though I can think of some other solutions now that I'm
>> writing; we have an interactive system here at Macro that runs under
>> SSL, maybe that would work, if so I can help you set it up (that'd be
>> port 443) and we could work on it from that angle.  Being fascist is
>> silly, but so is losing work!

>Do you mean SSL or SSH?

SSL is port 443 (usually, you can change this but I think it's a bad
idea).
I can't remember what port SSH connects to by default; I looked around a
bit but it's been several months since I used ssh through a firewall.

We need to set it up here (security at my company is pathetic) so pretty
soon it will come back to me.

-- 
     "An organization is like a tree full of monkeys, all on different
      levels, some climbing up.  The monkeys on the top look down and
see 
      a tree full of smiling faces.  The monkeys on the bottom look up
and
      see nothing but assholes."                           --Tom
Schuneman

From Thomas.Sicheritz at molbio.uu.se  Fri Mar 19 10:41:41 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] sequence editor
In-Reply-To: <19990319093710.A26189@arginine.umdnj.edu>
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb>
	<99Mar18.132138est.131763@gateway.macroint.com>
	<19990318135517.A21261@arginine.umdnj.edu>
	<36F1A976.D014FF34@bc.edu>
	<14066.631.492397.523413@beagle.bmc.uu.se>
	<19990319093710.A26189@arginine.umdnj.edu>
Message-ID: <14066.27762.236938.449442@beagle.bmc.uu.se>

Dave Beck writes:
 > Quoting Thomas.Sicheritz@molbio.uu.se (Thomas.Sicheritz@molbio.uu.se):
 > > I fear that I have to stop looking at python and the sequence editor for a
 > > while ... to many meetings and to many unwritten thesises (=1) 
 > 
 > Thomas, would you mind if I started futzing with it?  I'd like to start
 > porting my QT/C++ based sequence editor to Python/C/GTK and what you have
 > is a terrific start...

Sure - I feel that I have no time at all to start with the graphical stuff
(I haven't succeeded yet compiling gnomelibs on my Sun). But I'd like to
keep on a little on the python based - behind the scenes/nongraphic - sequence
classes. Could we corporate with this ? Beside my thesis I have another
genome which has to be analysed, parsed and annotated. I feel that I have
the basic python sequence class ready to build my usual tools on it
(read: I almost got used to python and don't really want to stop messing
around with it)

Suggestions ?
-thomas

-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From dave at arginine.umdnj.edu  Fri Mar 19 11:08:12 1999
From: dave at arginine.umdnj.edu (Dave Beck)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] sequence editor
In-Reply-To: <14066.27762.236938.449442@beagle.bmc.uu.se>; from Thomas.Sicheritz@molbio.uu.se on Fri, Mar 19, 1999 at 04:41:41PM +0100
References: <Pine.LNX.4.04.9903181042550.487-100000@schpamb> <99Mar18.132138est.131763@gateway.macroint.com> <19990318135517.A21261@arginine.umdnj.edu> <36F1A976.D014FF34@bc.edu> <14066.631.492397.523413@beagle.bmc.uu.se> <19990319093710.A26189@arginine.umdnj.edu> <14066.27762.236938.449442@beagle.bmc.uu.se>
Message-ID: <19990319110812.A27565@arginine.umdnj.edu>

We need a CVS repository and strong documentation skills. ;)  I don't have
any problem working on shared sources....  I have found that it is pretty easy
when everyone uses the changelogs and people set up watches on sources they
are actively developing..

Quoting Thomas.Sicheritz@molbio.uu.se (Thomas.Sicheritz@molbio.uu.se):
> Dave Beck writes:
>  > Quoting Thomas.Sicheritz@molbio.uu.se (Thomas.Sicheritz@molbio.uu.se):
>  > > I fear that I have to stop looking at python and the sequence editor for a
>  > > while ... to many meetings and to many unwritten thesises (=1) 
>  > 
>  > Thomas, would you mind if I started futzing with it?  I'd like to start
>  > porting my QT/C++ based sequence editor to Python/C/GTK and what you have
>  > is a terrific start...
> 
> Sure - I feel that I have no time at all to start with the graphical stuff
> (I haven't succeeded yet compiling gnomelibs on my Sun). But I'd like to
> keep on a little on the python based - behind the scenes/nongraphic - sequence
> classes. Could we corporate with this ? Beside my thesis I have another
> genome which has to be analysed, parsed and annotated. I feel that I have
> the basic python sequence class ready to build my usual tools on it
> (read: I almost got used to python and don't really want to stop messing
> around with it)
> 
> Suggestions ?
> -thomas
> 
> -- 
> Sicheritz Ponten Thomas E.  Department of Molecular Biology
> blippblopp@linux.nu         BMC, Uppsala University
> BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
> Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
> Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
> Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux
> 
> 	De Chelonian Mobile ... The Turtle Moves ...

-- 
Dave Beck 
dave@arginine.umdnj.edu                 Sites of interest (set 2):
Computer Science and Biology            http://www.cyc.com/cyc-2-1/toc.html
Drexel University, Philadelphia PA      http://arginine.umdnj.edu/

From bizzaro at bc.edu  Fri Mar 19 12:12:42 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] new server up
References: <Pine.OSF.4.03.9903190032580.23123-100000@busboy.sped.ukans.edu>
Message-ID: <36F2858A.937CFC10@bc.edu>

Justin,

The ssh2 is on my machine at home, not on "biohacker/onsager".  Do you want me
to get the packages (in RPM) for you first?

I appreciate the help.  I'll send you an e-mail with the pword in the body. 
That is, if you really are Justin :-)

Everyone, Justin will set these up.  I guess we'll start with SSH, and then we
can consider the neat tricks mentioned by Tim and Thomas.


Jeff


Justin Bradford wrote:
> 
> On Fri, 19 Mar 1999, J.W. Bizzaro wrote:
> 
> > I have the ssh2 package already, but I have never used it.
> > CVS is another area I will need some help with.
> 
> I can give you a hand with both.
> If you want, this weekend I'll put both on (but it'll require temporary
> root access).
> 
> Justin

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Mar 19 12:26:22 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:23 2006
Subject: [Pipet Devel] Greg Waltz joins!
Message-ID: <36F288BE.D42A7C6B@bc.edu>

Locians,

I recruited Greg Waltz, who is an OpenGL guru developing his own modeler with
GTK.  He will be taking charge of the rendering engines for the 3D loci.

The messages to follow will be from our initial conversations.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Fri Mar 19 12:28:29 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] Greg/Jeff
Message-ID: <36F2893D.22DBB01D@bc.edu>

Forwarded message I sent to Greg...
-------------- next part --------------
An embedded message was scrubbed...
From: "J.W. Bizzaro" <bizzaro@bc.edu>
Subject: Re: 3D modeller for structural biology
Date: Thu, 18 Mar 1999 05:13:51 +0000
Size: 4836
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990319/33adb5e7/attachment.mht
From bizzaro at bc.edu  Fri Mar 19 12:29:35 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] Greg/Jeff
Message-ID: <36F2897F.ABDAEB1F@bc.edu>

Forwarded message Greg sent back to me...
-------------- next part --------------
An embedded message was scrubbed...
From: greg waltz <finklesk@Op.Net>
Subject: Re: 3D modeller for structural biology
Date: Thu, 18 Mar 1999 11:55:01 -0500 (EST)
Size: 4497
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990319/5aaea797/attachment.mht
From dave at arginine.umdnj.edu  Fri Mar 19 16:04:23 1999
From: dave at arginine.umdnj.edu (Dave Beck)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] www pages
Message-ID: <19990319160423.A30492@arginine.umdnj.edu>

I got tired of trying to find all the references to the tools mentioned
in the list archives, so I have created a WWW page (in the J. W./ Loci
style) which has the relevant homepages and download pages:
	http://cimr.umdnj.edu/~dave/loci	# goto What You Need
If I have left anything off, let me know...

-- 
Dave Beck 
dave@arginine.umdnj.edu                 Sites of interest (set 2):
Computer Science and Biology            http://www.cyc.com/cyc-2-1/toc.html
Drexel University, Philadelphia PA      http://arginine.umdnj.edu/

From rahul at photino.sid.rice.edu  Fri Mar 19 18:31:47 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] new server up
In-Reply-To: <99Mar19.150002est.131730@gateway.macroint.com>
Message-ID: <Pine.LNX.4.10.9903191731190.22556-100000@photino.sid.rice.edu>

On Fri, 19 Mar 1999, Tim wrote:

> I can't remember what port SSH connects to by default; I looked around a
> bit but it's been several months since I used ssh through a firewall.

It's port 22 by default.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Fri Mar 19 22:40:15 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] www pages
References: <19990319160423.A30492@arginine.umdnj.edu>
Message-ID: <36F3189F.7F2BFA1@bc.edu>

Thanks Dave!

I am making a new site of course on biohacker/onsager.  The way I am organizing
the pages, the information on your pages would go under "Developers".

BTW, you did see my "PyG Tools" Web site, didn't you?  I have many many links
there to Python and GTK sites...but not all directly to the download sites.  I
guess I should have that.

What do you mean by "tools we are tentatively going to use"?  Do you have
something else in mind? :-)


Jeff


Dave Beck wrote:
> 
> I got tired of trying to find all the references to the tools mentioned
> in the list archives, so I have created a WWW page (in the J. W./ Loci
> style) which has the relevant homepages and download pages:
>         http://cimr.umdnj.edu/~dave/loci        # goto What You Need
> If I have left anything off, let me know...
> 
> --
> Dave Beck
> dave@arginine.umdnj.edu                 Sites of interest (set 2):
> Computer Science and Biology            http://www.cyc.com/cyc-2-1/toc.html
> Drexel University, Philadelphia PA      http://arginine.umdnj.edu/

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From jabbo at mindless.com  Sat Mar 20 08:43:19 1999
From: jabbo at mindless.com (Tim)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
Message-ID: <99Mar20.134637est.131718@gateway.macroint.com>

http://www.inxight.com/Inxight_Corporate_Web_Site/Edu_Org_Program/Intro_to_Program.html

Take a look at this tool...  looks like it could be useful for browsing
phylogenetic trees.

I'm working a bit on a molecule viewer and the frontend for codon.  If
I'm lucky it could be done tomorrow night.  If not, well, it will wait
another week.

-- 

  "We don't like their sound, and guitar music is on the way out."

               --Decca Recording Co. rejecting the Beatles, 1962

From dave at arginine.umdnj.edu  Sat Mar 20 09:50:46 1999
From: dave at arginine.umdnj.edu (Dave Beck)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] www pages
In-Reply-To: <36F3189F.7F2BFA1@bc.edu>; from J.W. Bizzaro on Sat, Mar 20, 1999 at 03:40:15AM +0000
References: <19990319160423.A30492@arginine.umdnj.edu> <36F3189F.7F2BFA1@bc.edu>
Message-ID: <19990320095046.C32426@arginine.umdnj.edu>

Quoting J.W. Bizzaro (bizzaro@bc.edu):
> I am making new site of course on biohacker/onsager.  The way I am organizing
> the pages, the information on your pages would go under "Developers".
OK...

> BTW, you did see my "PyG Tools" Web site, didn't you?  I have many many links
> there to Python and GTK sites...but not all directly to the download sites.  I
> guess I should have that.
I'm a very to the point kind of man. ;)  I was trying to prep 6 different 
boxes for this and I didn't want to navigate back to the download pages
every time.  BTW: Python 1.5.1, PyGTK 0.5.9, GTK+ 1.2, GLib 1.2, PAOS, and
egcs 1.1.2 (C only), compiles effortlessly on Linux (duh), Solaris (not
terribly surprising), IRIX (wow), AIX (double wow), and LinuxPPC (duh).

> What do you mean by "tools we are tentatively going to use"?  Do you have
> something else in mind? :-)
No, I just never like to commit until someone has put a gun to my head.  That
way you can at least LIE and say, "No, it was just a tentative plan to kill
the Godfather."

> Jeff
> Dave Beck wrote:
> > 
> > I got tired of trying to find all the references to the tools mentioned
> > in the list archives, so I have created a WWW page (in the J. W./ Loci
> > style) which has the relevant homepages and download pages:
> >         http://cimr.umdnj.edu/~dave/loci        # goto What You Need
> > If I have left anything off, let me know...
> > 
> > --
> > Dave Beck
> > dave@arginine.umdnj.edu                 Sites of interest (set 2):
> > Computer Science and Biology            http://www.cyc.com/cyc-2-1/toc.html
> > Drexel University, Philadelphia PA      http://arginine.umdnj.edu/
> 
> -- 
> J.W. Bizzaro                  Phone: 617-552-3905
> Boston College                mailto:bizzaro@bc.edu
> Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
> --

-- 
Dave Beck 
dave@arginine.umdnj.edu                 Sites of interest (set 3):
Computer Science and Biology            http://selene.biochem.uga.edu/tutorial/
Drexel University, Philadelphia PA      http://www.cold.org/

From bizzaro at bc.edu  Sat Mar 20 13:26:49 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] www pages
References: <19990319160423.A30492@arginine.umdnj.edu> <36F3189F.7F2BFA1@bc.edu> <19990320095046.C32426@arginine.umdnj.edu>
Message-ID: <36F3E868.837CF4A8@bc.edu>

Dave Beck wrote:

> I'm a very to the point kind of man. ;)  I was trying to prep 6 different
> boxes for this and I didn't want to navigate back to the download pages
> every time.  BTW: Python 1.5.1, PyGTK 0.5.9, GTK+ 1.2, GLib 1.2, PAOS, and
> egcs 1.1.2 (C only), compiles effortlessly on Linux (duh), Solaris (not
> terribly surprising), IRIX (wow), AIX (double wow), and LinuxPPC (duh).

Your effort deserves a "wow!"  Thanks.

> 
> > What do you mean by "tools we are tentatively going to use"?  Do you have
> > something else in mind? :-)
> No, I just never like to commit until someone has put a gun to my head.  That
> way you can at least LIE and say, "No, it was just a tentative plan to kill
> the Godfather."

Okey dokey.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From bizzaro at bc.edu  Sat Mar 20 13:47:50 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] 3D modelers (was Greg/Jeff)
References: <36F2897F.ABDAEB1F@bc.edu>
Message-ID: <36F3ED56.BF57BB6D@bc.edu>

Hi Greg.

> gui progrqmming isn't too bad, but i know nothing about python. we'll see
> how this goes, what jobs come up and when, and how much time i have when
> they do. i certainly would not be adverse to learning a new language.

For more information on Python bindings to GTK, you can check out my info page:

    http://www.uml.edu/Dept/Chem/BICGroup/PyGTools/

> right now it isn't very big. but it may be in the future. i've been
> thinking about that, and i have come to the conclusion that mg^2 will have
> the functions required to be useful to you in a few releases from now. in
> fact, even now it has the features i think you would want (multiple views,
> solid and wireframe rendering, translate, rotate, scale, zoom, etc.), but
> those features need some work and some of the data structures need
> redesigning (i'm working on that this afternoon). so, my idea is that we
> can take a version of mg^2 that has the functionality you need while it's
> still small and then buld in the specifics to your application. but this
> all depends on what you want. i don't mind starting from scratch.

It sounds like a plan to me :-)  I'll get more detail to you ASAP.

> 
> to solve your light weight constraint, why not make a main app that runs
> the other functions as plug-ins? however, from what i saw on your site it
> seems that you want alot of it to be command line driven so maybe plug-ins
> wouldn't help so much.

The command-line programs are implemented as a special feature of Loci.  The
other tools communicate via Python object server.

If we had a small "engine" that could be modified by plug-in to make several
special-purpose tools, that would be along the line I was thinking.

> 
> a friend might be interested in the 3d area of this project also. i'll ask
> him.

Great.  Anyone you know who'd like to help is more than welcome!


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From rahul at photino.sid.rice.edu  Sun Mar 21 00:35:45 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <99Mar20.134637est.131718@gateway.macroint.com>
Message-ID: <Pine.LNX.4.10.9903202333330.12253-100000@photino.sid.rice.edu>

On Sat, 20 Mar 1999, Tim wrote:

> http://www.inxight.com/Inxight_Corporate_Web_Site/Edu_Org_Program/Intro_to_Program.html
> 
> Take a look at this tool...  looks like it could be useful for browsing
> phylogenetic trees.

What I thought it was from the article on slashdot. Unfortunately their
site was slashdotted so I couldn't get to it before.

It's just the same as a Metainformation format that Apple developed about
a year ago. Forget what it was called but it was much cooler and free. Not
open source, tho.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Sun Mar 21 00:41:11 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
References: <99Mar20.134637est.131718@gateway.macroint.com>
Message-ID: <36F48677.D7936BB0@bc.edu>

Tim wrote:
> 
> http://www.inxight.com/Inxight_Corporate_Web_Site/Edu_Org_Program/Intro_to_Program.html
> 
> Take a look at this tool...  looks like it could be useful for browsing
> phylogenetic trees.

Hmmm.  I can't get the Java to run, as usual.

> 
> I'm working a bit on a molecule viewer and the frontend for codon.  If
> I'm lucky it could be done tomorrow night.  If not, well, it will wait
> another week.
> 

Could you tell me more about these?  How are these written?  Are they meant for
Loci?


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From Thomas.Sicheritz at molbio.uu.se  Mon Mar 22 03:43:51 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <99Mar20.134637est.131718@gateway.macroint.com>
References: <99Mar20.134637est.131718@gateway.macroint.com>
Message-ID: <14069.64306.977925.47568@beagle.bmc.uu.se>

Tim writes:
 > http://www.inxight.com/Inxight_Corporate_Web_Site/Edu_Org_Program/Intro_to_Program.html
 > 
 > Take a look at this tool...  looks like it could be useful for browsing
 > phylogenetic trees.

Hmm ... :-) ... it looks quite fun ... (look at the spider phylogeny)
We could adapt the rotating/zooming idea to the python phylogentic tool
(pyphy or phypy .. or physpampy ?) 
Actually I really like the idea ... phylogenetic reconstructions tend to get 
large amounts of taxa - which is not easy to see in one window.
I have to write a treeviewer module for my (hopefully) last bigger project
(phylogenomics) in my thesis. I thought I would hack it in Tcl/Tk but with
some help from other loci'ers I could try it in python.

There is no good treeviewing program for all platforms (read: nothing for
Linux and Solaris which doesn't need 8bpp color mode)
I always had some problems to code treeparsing scripts and beeing able to
represent them in a "good" way on the screen (trifurcation, distances etc.)
I could need some help here ... 

I started on some smaller versions where the branches or taxa labels (in my
case SWISSPROT ID's) are linked to yank (sequence retrieval), SWISSPROT
database, blast and clustalw - which should be connected/linked from the
whole genome map/sequence ... that seems to fit perfectly into the LOCI
way of thinking.

My time schedule:
* mar,apr,may: finish my current paper 
* apr: bioinformatics meeting in Lyon(France)  (RECOMB99)
* apr: bioinformatics meeting in Lund(Sweden)  (bioinformatics'99)
  (am I going to meet some of you in Lyon or Lund ?)
* ???: start with the phylogenomic project

I am very tempted to leave the whole sequence editor part to Dave and
only keep on with the basic_nucleotide_sequence and phylogenetic tools.

Suggestions ?

-thomas

-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From pmr at sanger.ac.uk  Mon Mar 22 04:19:46 1999
From: pmr at sanger.ac.uk (Peter Rice)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <14069.64306.977925.47568@beagle.bmc.uu.se>
	(Thomas.Sicheritz@molbio.uu.se)
References: <99Mar20.134637est.131718@gateway.macroint.com> <14069.64306.977925.47568@beagle.bmc.uu.se>
Message-ID: <199903220919.JAA05972@unst.sanger.ac.uk>

Thomas.Sicheritz@molbio.uu.se writes:

>There is no good treeviewing program for all platforms (read: nothing for
>Linux and Solaris which doesn't need 8bpp color mode)

You could look at the European Bioinformatics Institute's hyperbolic
viewer for taxonomy. It can generalize to all kinds of tree-based data.

http://industry.ebi.ac.uk/~alan/BioWidget/

-- 
----------------------------------------------------------------------
Peter Rice                | Informatics Division, The Sanger Centre,
E-mail: pmr@sanger.ac.uk  | Wellcome Trust Genome Campus,
Tel: (44) 1223 494967     | Hinxton, Cambridge, CB10 1SA, England
Fax: (44) 1223 494919     | URL: http://www.sanger.ac.uk/Users/pmr/

From Thomas.Sicheritz at molbio.uu.se  Mon Mar 22 05:05:02 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <199903220919.JAA05972@unst.sanger.ac.uk>
References: <99Mar20.134637est.131718@gateway.macroint.com>
	<14069.64306.977925.47568@beagle.bmc.uu.se>
	<199903220919.JAA05972@unst.sanger.ac.uk>
Message-ID: <14070.3927.402222.785804@beagle.bmc.uu.se>

Peter Rice writes:

 > Thomas.Sicheritz@molbio.uu.se writes:
 > >There is no good treeviewing program for all platforms (read: nothing for
 > >Linux and Solaris which doesn't need 8bpp color mode)
 > 
 > You could look at the European Bioinformatics Institute's hyperbolic
 > viewer for taxonomy. It can generalize to all kinds of tree-based data.
 > 
 > http://industry.ebi.ac.uk/~alan/BioWidget/

Thx - looks nice. But what I had in mind was more a viewer and editor. 
- what about the performance - is this hyperbolic viewer really usable on a 
normal workstation ?

(I really like the fish-eye views ...)

-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From pmr at sanger.ac.uk  Mon Mar 22 05:47:03 1999
From: pmr at sanger.ac.uk (Peter Rice)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <14070.3927.402222.785804@beagle.bmc.uu.se>
	(Thomas.Sicheritz@molbio.uu.se)
References: <99Mar20.134637est.131718@gateway.macroint.com>
	<14069.64306.977925.47568@beagle.bmc.uu.se>
	<199903220919.JAA05972@unst.sanger.ac.uk> <14070.3927.402222.785804@beagle.bmc.uu.se>
Message-ID: <199903221047.KAA06102@unst.sanger.ac.uk>

Thomas,

>Thx - looks nice. But what I had in mind was more a viewer and editor. 
>- what about the performance - is this hyperbolic viewer really usable on a 
>normal workstation ?
>
>(I really like the fish-eye views ...)

It should be adaptable to become an editor. Alan Robinson at the EBI
is the best contact for it.

					Peter
-- 
----------------------------------------------------------------------
Peter Rice                | Informatics Division, The Sanger Centre,
E-mail: pmr@sanger.ac.uk  | Wellcome Trust Genome Campus,
Tel: (44) 1223 494967     | Hinxton, Cambridge, CB10 1SA, England
Fax: (44) 1223 494919     | URL: http://www.sanger.ac.uk/Users/pmr/

From bizzaro at bc.edu  Mon Mar 22 16:10:52 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
References: <99Mar20.134637est.131718@gateway.macroint.com> <14069.64306.977925.47568@beagle.bmc.uu.se>
Message-ID: <36F6B1DC.487EB75E@bc.edu>

Thomas.Sicheritz@molbio.uu.se wrote:

> We could adapt the rotating/zooming idea to the python phylogentic tool
> (pyphy or phypy .. or physpampy ?)
>

Or how about locus_phy  :-)


> Actually I really like the idea ... phylogenetic reconstructions tend to get
> large amounts of taxa - which is not easy to see in one window.
> I have to write a treeviewer module for my (hopefully) last bigger project
> (phylogenomics) in my thesis. I thought I would hack it in Tcl/Tk but with
> some help from other loci'ers I could try it in python.
>

I'll  help all that I can!

>
> There is no good treeviewing program for all platforms (read: nothing for
> Linux and Solaris which doesn't need 8bpp color mode)
> I always had some problems to code treeparsing scripts and beeing able to
> represent them in a "good" way on the screen (trifurcation, distances etc.)
> I could need some help here ...
>

I do like the representation Peter showed us:

   http://industry.ebi.ac.uk/~alan/BioWidget/

There are numerous ways people have chosen to represent this type of data.  I think a good
look at some of the literature will help us.  I'll see if I can find anything else.

>
> I started on some smaller versions where the branches or taxa labels (in my
> case SWISSPROT ID's) are linked to yank (sequence retrieval), SWISSPROT
> database, blast and clustalw - which should be connected/linked from the
> whole genome map/sequence ... that seems to fit perfectly into the LOCI
> way of thinking.
>

Phylogeny is something we be very concerned with in developing LocusML.  Your input to Justin
and Rahul would be helpful.


>
> My time schedule:
> * mar,apr,may: finish my current paper
> * apr: bioinformatics meeting in Lyon(France)  (RECOMB99)
> * apr: bioinformatics meeting in Lund(Sweden)  (bioinformatics'99)
>   (am I going to meet some of you in Lyon or Lund ?)
>

I wish.  I haven't heard of Bioinformatics'99.  Do you have a URL?

Konrad, will you be at RECOMB99?

>
> I am very tempted to leave the whole sequence editor part to Dave and
> only keep on with the basic_nucleotide_sequence and phylogenetic tools.
>
>

If you wish.  Be sure to work with Tim on the basic sequence tools, as he has been developing
some (Codon).


Jeff
bizzaro@bc.edu

From bizzaro at bc.edu  Mon Mar 22 22:24:17 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] molecule viewer
Message-ID: <36F70960.8B6B10BB@bc.edu>

Locians,

I found a simple C-GTK molecule viewer, but perhaps the only GTK+ molecule
viewer.  It comes from Eric Harlow's new book on GTK development.

I have a link to the source code at the new Web site:

    http://129.63.144.25/

It compiled on my system, but I can't seem to display any molecules.  If anyone
gets it to work right, let me know.

Greg, this might be something you want to take a close look at, since it deals
with molecules and a GTK GUI.

I can't find the license for it, but I think it is GNU GPL.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Tue Mar 23 02:24:39 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] phylogeny and and overview question
In-Reply-To: <36F6B1DC.487EB75E@bc.edu>
Message-ID: <Pine.OSF.4.03.9903230118120.17316-100000@busboy.sped.ukans.edu>

> Phylogeny is something we be very concerned with in developing LocusML.
> Your input to Justin and Rahul would be helpful.

Phylogeny, too?!? Ok. Well, I'm going to need input here.
I'd like to make a draft of the LocusML, but I need input on structure and
phylogeny. Sequence I'm going to take from bioml and bsml.

As for the structure, I want to clear up something I'm a little confused
about. Does the Loci system work like this:

Desktop <-> wfs <--|----> analysis locus #1
                   |
                   |----> analysis locus #2
                   |
                   |----> database
                   |
                   |----> etc...


And things from the third column only talk to the wfs, and not directly to
each other. Right?

Justin


From justin at ukans.edu  Tue Mar 23 02:26:28 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] phylogeny and and overview question
In-Reply-To: <Pine.OSF.4.03.9903230118120.17316-100000@busboy.sped.ukans.edu>
Message-ID: <Pine.OSF.4.03.9903230125121.17316-100000@busboy.sped.ukans.edu>

> As for the structure, I want to clear up something I'm a little confused
> about. Does the Loci system work like this:

I just realized this might be a bit unclear.
When I said structure in the sentence, I might the structure of the Loci
framework of tools.

Justin


From hinsen at cnrs-orleans.fr  Tue Mar 23 04:47:08 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <36F6B1DC.487EB75E@bc.edu> (bizzaro@bc.edu)
References: <99Mar20.134637est.131718@gateway.macroint.com> <14069.64306.977925.47568@beagle.bmc.uu.se> <36F6B1DC.487EB75E@bc.edu>
Message-ID: <199903230947.KAA22818@dirac.cnrs-orleans.fr>

> > My time schedule:
> > * mar,apr,may: finish my current paper
> > * apr: bioinformatics meeting in Lyon(France)  (RECOMB99)
> > * apr: bioinformatics meeting in Lund(Sweden)  (bioinformatics'99)
> >   (am I going to meet some of you in Lyon or Lund ?)
> 
> I wish.  I haven't heard of Bioinformatics'99.  Do you have a URL?
> 
> Konrad, will you be at RECOMB99?

I didn't even know about it until now! Which perhaps proves that
I am not in bioinformatics...

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From Thomas.Sicheritz at molbio.uu.se  Tue Mar 23 04:53:13 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <199903230947.KAA22818@dirac.cnrs-orleans.fr>
References: <99Mar20.134637est.131718@gateway.macroint.com>
	<14069.64306.977925.47568@beagle.bmc.uu.se>
	<36F6B1DC.487EB75E@bc.edu>
	<199903230947.KAA22818@dirac.cnrs-orleans.fr>
Message-ID: <14071.25652.327532.298103@beagle.bmc.uu.se>

Konrad Hinsen writes:
 > > > My time schedule:
 > > > * mar,apr,may: finish my current paper
 > > > * apr: bioinformatics meeting in Lyon(France)  (RECOMB99)
 > > > * apr: bioinformatics meeting in Lund(Sweden)  (bioinformatics'99)
 > > >   (am I going to meet some of you in Lyon or Lund ?)
 > > 
 > > I wish.  I haven't heard of Bioinformatics'99.  Do you have a URL?
 > > Konrad, will you be at RECOMB99?
 > 
 > I didn't even know about it until now! Which perhaps proves that
 > I am not in bioinformatics...

http://www.loria.fr/~kucherov/RECOMB99/
http://www.biokemi.su.se/bioinformatics99/

-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From rahul at photino.sid.rice.edu  Tue Mar 23 05:47:52 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:24 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <14070.3927.402222.785804@beagle.bmc.uu.se>
Message-ID: <Pine.LNX.4.10.9903230442470.2165-100000@photino.sid.rice.edu>

On Mon, 22 Mar 1999 Thomas.Sicheritz@molbio.uu.se wrote:

> Thx - looks nice. But what I had in mind was more a viewer and editor. 

The Apple project (XCF?) had an integrated viewer/editor/generator(from an
HTML nested list or by following links). The interface allowed the user to
position the node in space and then use the mouse to "fly through" the
tree. It was really quite impressive.

> - what about the performance - is this hyperbolic viewer really usable on a 
> normal workstation ?

The Java applet gives reasonable performance on my P150 under Linux 2.2 
and Netscape 4.08. The Apple version was a bit less than twice as fast
as a Windows binary on the same system.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From rahul at photino.sid.rice.edu  Tue Mar 23 06:06:15 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <Pine.LNX.4.10.9903230442470.2165-100000@photino.sid.rice.edu>
Message-ID: <Pine.LNX.4.10.9903230457471.2165-100000@photino.sid.rice.edu>

On Tue, 23 Mar 1999, Rahul Jain wrote:

> On Mon, 22 Mar 1999 Thomas.Sicheritz@molbio.uu.se wrote:
> 
> > Thx - looks nice. But what I had in mind was more a viewer and editor. 
> 
> > - what about the performance - is this hyperbolic viewer really usable on a 
> > normal workstation ?

Oops, I didn't realize that you were talking about another tree-viewer....
This one seems a bit slow and the labels tend to overlap a bit. The slow
responsiveness makes it even harder to get to a place where you can read
them well. The zoombar should also have a slightly different scale...

Personally, I loved the Apple viewer and that model would be really nice
to follow:

It's sort of in 3D, with the top node in front and each lower level
farther back. Clicking on a part of the figure flys in that direction,
pressing Ctrl speeds it up and pressing Shift reverses direction.
Double-clicking on a node centers it and brings it to a reasonable
distance. You really have to use it to see how cool the view and interface
was. Unfortunately, I think Apple canceled the project and scrapped all
the code. I can't find a single reference to it now...

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From rahul at photino.sid.rice.edu  Tue Mar 23 07:29:57 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <Pine.LNX.4.10.9903230457471.2165-100000@photino.sid.rice.edu>
Message-ID: <Pine.LNX.4.10.9903230627090.2339-100000@photino.sid.rice.edu>

Ya know, we could support all of these if we code this thing correctly.
Just make an abstract(virtual) class that's a "TreeViewer".
Then we can have concrete subclasses such as FlyThruTreeViewer and
SphereOverlaidTreeViewer or whatever. We can implement whatever's easiest
at first and then add more as time goes on and as we get more developers.

I better get some sleep...

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From Thomas.Sicheritz at molbio.uu.se  Tue Mar 23 07:38:01 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] interesting...
In-Reply-To: <Pine.LNX.4.10.9903230627090.2339-100000@photino.sid.rice.edu>
References: <Pine.LNX.4.10.9903230457471.2165-100000@photino.sid.rice.edu>
	<Pine.LNX.4.10.9903230627090.2339-100000@photino.sid.rice.edu>
Message-ID: <14071.35325.208490.652352@beagle.bmc.uu.se>

Rahul Jain writes:
 > Ya know, we could support all of these if we code this thing correctly.
 > Just make an abstract(virtual) class that's a "TreeViewer".
 > Then we can have concrete subclasses such as FlyThruTreeViewer and
 > SphereOverlaidTreeViewer or whatever. We can implement whatever's easiest
 > at first and then add more as time goes on and as we get more developers.

Its a  pitty that I cannot check the apple viewer - it sounds really
interesting ...
But you are right - we can implement (step by step) everything we want ...
I'll think about what's needed.

 > I better get some sleep...
Ok - I better get some lunch ...

c ya
-thomas


-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From Thomas.Sicheritz at molbio.uu.se  Tue Mar 23 10:10:25 1999
From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] Tree Data Structure
Message-ID: <14071.44136.217697.452404@beagle.bmc.uu.se>

Hej,

What is the best way to implement a tree data structure in pyhton ?
The example tree (in newick format) looks like:
(chlamydia:100.000000,((PARDE:100.000000,RECAM:100.000000,RICKY:100.000000,(PORPU:100.000000,CHOCR:100.000000):100.000000):100.000000,(ECOLI:100.000000,COXBU:100.000000):100.000000):100.000000,MYCTUB:100.000000);

drawn as an unrooted tree:
/--------------------------------------------------- chlamydia(1)
|						   
|                           /----------------------- PARDE(2)
|                           |			   
|                           +----------------------- RECAM(3)
|                           |			   
|           /------100------+----------------------- RICKY(6)
|           |               |			   
|           |               |                /------ PORPU(7)
|           |               \------100-------+	   
+--100------+                                \------ CHOCR(9)
|           |					   
|           |                                /------ ECOLI(4)
|           \--------------100---------------+	   
|                                            \------ COXBU(5)
|						   
\--------------------------------------------------- MYCTUB(8)

Should we choose nested lists, NL parser or available tree modules ?
I am not fluent in python yet - so I could need help with the basic
structure and parsing classes.

Suggestions ?

thx
-thomas
-- 
Sicheritz Ponten Thomas E.  Department of Molecular Biology
blippblopp@linux.nu         BMC, Uppsala University
BMC:  +46 18 4714214        BOX 590 S-751 24 UPPSALA Sweden
Fax   +46 18  557723        http://evolution.bmc.uu.se/~thomas
Molecular Tcl:   http://evolution.bmc.uu.se/~thomas/tcl
Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux

	De Chelonian Mobile ... The Turtle Moves ...

From bizzaro at bc.edu  Tue Mar 23 10:40:13 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] phylogeny and and overview question
References: <Pine.OSF.4.03.9903230118120.17316-100000@busboy.sped.ukans.edu>
Message-ID: <36F7B5DD.D4AC2173@bc.edu>

Justin Bradford wrote:

> As for the structure, I want to clear up something I'm a little confused
> about. Does the Loci system work like this:
> 
> Desktop <-> wfs <--|----> analysis locus #1
>                    |
>                    |----> analysis locus #2
>                    |
>                    |----> database
>                    |
>                    |----> etc...
> 
> And things from the third column only talk to the wfs, and not directly to
> each other. Right?
> 

Maybe more like this:

Workspace <-> wfs <--|----> analysis locus #1
                     |
                    wfs
                     |
                     |----> analysis locus #2
                     |
                    wfs
                     |
                     |----> gui locus #1
                     |
                    wfs
                     |
                     |----> gui locus #2
                     |
                    wfs
                     |
                     |----> database
                     |
                    wfs
                     |
                     |----> etc...

where "Workspace" is (1) the workflow diagram/monitor, (2) the notebook/logger,
and (3) the central canvas.  Communication to these isn't really any different. 
It's just that these are for user monitoring and control.


Jeff
-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From hinsen at cnrs-orleans.fr  Tue Mar 23 14:17:56 1999
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] Tree Data Structure
In-Reply-To: <14071.44136.217697.452404@beagle.bmc.uu.se>
	(Thomas.Sicheritz@molbio.uu.se)
References: <14071.44136.217697.452404@beagle.bmc.uu.se>
Message-ID: <199903231917.UAA17142@dirac.cnrs-orleans.fr>

Thomas.Sicheritz@molbio.uu.se writes:

> What is the best way to implement a tree data structure in pyhton ?

That depends on the operations that are to be performed on the
data!

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

From carlosm at moet.cs.colorado.edu  Tue Mar 23 15:41:18 1999
From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] Tree Data Structure
In-Reply-To: <199903231917.UAA17142@dirac.cnrs-orleans.fr>
Message-ID: <Pine.GSU.4.05.9903231335130.13637-100000@moet.cs.colorado.edu>


I agree with Konrad. If you are using the tree data structure for fast
access to a large amount of data, use the B-tree portion of bsddb
(www.sleepycat.com). Python has bindings to the (old) 1.85 version
(somebody might have swigged the newer versions, too). It's faster than
any Python program and you get persistency for free.

Carlos 

On Tue, 23 Mar 1999, Konrad Hinsen wrote:

    Thomas.Sicheritz@molbio.uu.se writes:
    
    > What is the best way to implement a tree data structure in pyhton ?
    
    That depends on the operations that are to be performed on the
    data!
    
    Konrad.
    -- 
    -------------------------------------------------------------------------------
    Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
    Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
    Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
    45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
    France                                   | Nederlands/Francais
    -------------------------------------------------------------------------------
    

From bizzaro at bc.edu  Tue Mar 23 17:43:47 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] Tree Data Structure
References: <Pine.GSU.4.05.9903231335130.13637-100000@moet.cs.colorado.edu>
Message-ID: <36F81923.CB21F0EF@bc.edu>

Thomas,

The B+Tree module for Python (bplustree.py), written by Aaron Watters, is
attached.

I'm not sure if this is Berkeley DB version that Carlos was referring to, but it
is not a binding.  It's all Python.

Carlos, have you seen this module before?  Is it any good?

There appears to be no license for this module other than this:

    This code is provided for arbitrary use, but without warrantee of
    any kind.  At present it seems to work, but I'll call it an beta
    until it's better tested.


Jeff
bizzaro@bc.edu


Carlos Maltzahn wrote:
> 
> I agree with Konrad. If you are using the tree data structure for fast
> access to a large amount of data, use the B-tree portion of bsddb
> (www.sleepycat.com). Python has bindings to the (old) 1.85 version
> (somebody might have swigged the newer versions, too). It's faster than
> any Python program and you get persistency for free.
> 
> Carlos
> 
> On Tue, 23 Mar 1999, Konrad Hinsen wrote:
> 
>     Thomas.Sicheritz@molbio.uu.se writes:
> 
>     > What is the best way to implement a tree data structure in pyhton ?
> 
>     That depends on the operations that are to be performed on the
>     data!
> 
>     Konrad.
>     --
>     -------------------------------------------------------------------------------
>     Konrad Hinsen                            | E-Mail: hinsen@cnrs-orleans.fr
>     Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
>     Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
>     45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
>     France                                   | Nederlands/Francais
>     -------------------------------------------------------------------------------
> 

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--
-------------- next part --------------


"""
B+tree implementation.
======================
B+ trees are an efficient index structure for mapping
a dictionary type object into a disk file.  All keys for
these dictionary structures are strings with a fixed
maximum length.  The values can be strings or 
integers (often representing seek positions in a secondary
file) depending on the implementation.

B+ trees can be useful for storing large mappings on disk
in such a way that a small number of keys/values can be
retrieved very quickly (with very few disk accesses).
B+ trees can also be useful for sorting a very large number
(millions) of records by unique string key values.

In this implementation all keys must 
not exceed the maximum length for a
given tree.  For string values there is no limitation on
size of content.  Note that in my tests updates are
2-3 times slower than retrieves, except for walking
which is much faster than normal retrieves.

As an add-on this module also provides a dbm compatible
interface that permits arbitrary length keys and values.
See below.

Provided here are several implementations:

BplusTree():
  defines a mapping from strings to integers.

caching_BPT():
  subclass of BplusTree that caches key,value
  pairs already seen.  This one cannot be updated.
  Construct a compatible index file using BplusTree
  and for read only access that touches a manageable
  number of keys, reopen the file using caching_BPT.

SBplusTree():
  defines a mapping from strings to strings.
  Updatable, but overwrites or deletions will
  leave "unreachable garbage" in the "value space"
  of the index file.  Use recopy_sbplus() to
  recopy the file, eliminating the garbage.

caching_SBPT():
  analogous to caching_BPT, but mapping to strings.

File creation:
==============
To create an index file do the following:

  file = open(filename, "w+b")
  B = SBplusTree(file, seek_position, nodesize, keymax)
  B.startup()

where seek_position is the seek_position where to "start"
the tree (usually the start of file, 0), nodesize is the
number of keys to keep at each node of the tree (pick an
even number between 2 and 255), and keymax is the maximum
size for the string keys in the mapping.

When choosing nodesize remember that larger nodesizes
make Python do more work and the file system do less work.
I think 212 is probably a pretty good number.  Of course
choose keymax to be as large as you will need.  A too large
key size, however, may waste considerable space in the file.

Now that you have a tree you can populate it with values
just like a dictionary.

   B["this"] = "that"
   B["willy"] = "wonka"
   x = B["this"]
   del B["this"]
   print len(B)
   ...
   f.close()

The supported dictionary operations are indexed retrieval
B[k], indexed assignment B[k] = v, key deletion del B[k] and
length len(B).  Retrieval and deletion will raise KeyError
on absent key.  Assignment will raise ValueError if the key
is too large.

B.keys(), B.values(), B.items() are not directly
supported, but see "Walking" below.

Note that the "basic" B-plus tree implementations only accept and
return integers as values.  The SB-plus implementation will
accept anything as values, but will use the str(x) function
to convert them to a string before storing the value in the
file.  The value returned will always be the string value
stored.  IE

   B["okeydoke"] = 23
   print `B["okeydoke"]`

prints "'23'", with the quotes.  The controlling 
application must control the
serialization/deserialization of values if it needs to store
something other than strings.

Read only file access:
======================
Once an index file exists it can be re-opened in "read only"
mode.

   f = open(filename, "rb")
   B = caching_SBPT(f)
   B.open()
   print B["willy"]

Note that the configuration parameters for the tree are
determined from a "file header".  Note however that a file
written to store integers using BplusTree should not be opened
for strings using SBplusTree or undefined and undesirable
behaviour will result.  Opening an SBplusTree as a BplusTree
is not advisable either.

If the seek position for the start of the tree is anything
other than 0, it must be specified:

   B = caching_SBPT(f, position)

or undefined behaviour will result.

In this mode, retrieval and walking are permitted, but attempts
to modify the structure will cause an exception.  In this mode the
programmer may prefer to use the "caching" versions if they expect
to retrieve the same keys many times and if the number of keys to
touch is not huge (say, in the millions).

Re-open for modification:
=========================
An existing index file can also be reopened for modification.

   f = open(filename, "r+b")
   B = SBplusTree(f)
   B.open()
   B["this"] = "is fun!"
   ...
   f.close()

Again, modifications are disallowed for cached trees.

Walking:
========
One of the neat features of B-plus trees is that they keep
their keys in sorted order.  Hence it is easy and efficient
to retrieve the keys/values sorted by the keys, and also to
do range queries.

To support this feature the tree implementations provide
a "walker" interface.

   walker = tree.walker(lowerkey, includelower, 
                        upperkey, includeupper)
   while walker.valid:
      print (walker.current_key(), walker.current_value())
      walker.next()
   walker.first()

Or to traverse all pairs in key-sorted order

   walker = tree.walker()
   while walker.valid:
      print (walker.current_key(), walker.current_value())
      walker.next()
   walker.first()

The lowerkey/upperkey parameters indicate where to start/end
walking (interpreted as the beginning/end if they are
omitted or set to None) and includelower indicates whether
to include the lower value if it is present in the tree,
if not the next greater key will be the start position.

For example to walk from key "m" (or just past it if absent)
to the end:

    w = tree.walker("m", 1)

or to walk between "mzzz" and "nzzz" not inclusive:

    w = tree.walker("mzzz", 0, "nzzz", 0)

or walk from the beginning to "m", not inclusive

    w = tree.walker(None, None, "m", 0)

Here w.current_key() and w.current_value() retrieve the current
key and value respectively, w.next() moves to the next pair, if there is one
and w.valid indicates whether there is a current pair, and 
w.first() resets the walker to the first pair, if there is one.
At initialization the walker is already at the first pair, if
it exists.

Multiaccess optimizations:
==========================

To make updates and retrievals run faster you can enable/disable
a tree-global least-recently-used fifo mechanism which reduces
reads and writes, but be *sure* to disable it before closing any
BTree file that has been modified, or the tree may well become
corrupt

    try:
       B.enable_fifo()
       do_updates(B)
    finally:
       B.disable_fifo()

The fifo may also improve performance for read only access,
but it is not important to disable the mechanism later.
The optimizations help most when key accesses are localized.
(ie, a bunch of inserts with keys starting "abc..."
or 10000 inserts in [almost] key-sorted order).
For only one access, it's no help at all!  The fifo mechanism
will not help for walking, so don't do it if you will only walk
a portion of the tree once.  You might want to try putting
various values as the optional argument to enable_fifo, eg, 
B.enable_fifo(1000) (but that's probably past the diminishing returns
point...).  Large fifos will consume lots of "core" memory.

Trash compacting
================

The functions recopy_bplus(f1, f2) and recopy_sbplus(f1, f2)
recopy open "rb" file f1 to (open "w+b")
file f2 for BplusTrees and SBplusTrees respectively.  The
copy f2 will have no "garbage" and almost all leaf nodes will be
full.  This can result in reducing file size by about 1/3.
Both files must have headers at seek 0 and hold nothing but
the tree nodes and tree data.  Also look at recopy_tree(t1, t2).

DBM compatibility
=================

As an application of SBplusTree this module also provides
a plug-compatible implementation of the standard python dbm
style functionality, except that the "mode" parameter is not
supported on initialization.  See the Python Lib manual entry
on dbm.  Both keys and values may be of *arbitrary* length in
this case, but keys are not kept in key-sorted order and
overwrites and key collisions will result in unused garbage
in the file (keys and values occur as SBplustree "values"
using a PORTABLE bucket hashing scheme).

   d = dbm(filename, flag)
   
creates a dictionary like structure with d[key]=value, x=d[key],
d.has_key(key), del d[key], len(d), and d.keys().  Also
after any modification be sure that d gets explicitly
closed d.close() or the file *may* become corrupt.
Also, d.copy(otherfilename, "c") will create a more
compact copy of d in another file with garbage discarded.
The dbm implementation uses a very large fifo, so many accesses
may consume a lot of "core" memory.

DBM comparison
==============
An alternative to this module is gdbm or dbm for file
indexing -- both supported by available Python extension
modules.

Expect dbm to be generally faster than this module, but
remember:
  - dbm doesn't do key-sorted walking.
  - dbm often isn't portable across machines.
  - dbm isn't written in Python (ie, requires an extension
    module).
  - dbm sometimes doesn't allow arbitrary value lengths
    (but gdbm allows arbitrary length keys and values...)
whereas this module does/is.  I don't know precisely how
much faster dbm is, but for some types of use it may turn
out to actually be slower, for all I know.  Please let
me know!  Probably the most compelling advantage is that
the index files generated by this module are portable across
platforms.

Fun
===
For fun or debugging try tree.dump().
There is also a test suite for the module at the
bottom (test() and retest()) which create a test index
called "test" in the current directory.  Also testdbm().

Caveats:
========
NOTE: only the standard string ordering is supported for
  walking at present.  This could be fixed...

WARNING: Never modify a tree while it is being walked.  Always
  recreate all walkers after a tree modification.
  NEVER open the same tree for modification twice!
  ALWAYS make sure a modified tree has disabled the fifo and
  the file has been closed before reopening the tree.

WARNING: This implementation has no support for concurrent
  modification.  It is designed for "write once by one process",
  "read many by (possibly) several processes, but not with
  concurrent modification."

WARNING: If during modification any exception other than a KeyError/ValueError
  is not caught, the indexed file structure *may* become corrupt (because
  some operations completed and others didn't).  Walking all values
  of an index or B.dump() may detect some corrupt states (***Note I should write
  a sanity-check routine***)

WARNING: As noted above an overwrite or delete for a SBTree (mapping
  to strings) will leave unreachable junk in the "value space" of
  the index.  See above.

This code is provided for arbitrary use, but without warrantee of
any kind.  At present it seems to work, but I'll call it an beta
until it's better tested.

Aaron Watters, arw@pythonpros.com
http://starship.skyport.net/crew/aaron_watters
http://www.pythonpros.com
"""

import string

nilseek = -1

from marshal import dumps
sequence_overhead = len(dumps(""))
intsize = len(dumps(1))

# bisect algorithm with bounds (in 1.5 this is in /Lib)
# Insert item x in list a, and keep it sorted assuming a is sorted

def insort(a, x, lo=0, hi=None):
     if hi is None:
		hi = len(a)
     while lo < hi:
		mid = (lo+hi)/2
		if x < a[mid]: hi = mid
		else: lo = mid+1
     a.insert(lo, x)


# Find the index where to insert item x in list a, assuming a is sorted

def bisect(a, x, lo=0, hi=None):
     if hi is None:
		hi = len(a)
     while lo < hi:
		mid = (lo+hi)/2
		if x < a[mid]: hi = mid
		else: lo = mid+1
     return lo


NOROOMERROR = "NOROOMERROR"
    
Rootflag = 1
Interiorflag = 2
Freeflag = 3
Leafflag = 4
LeafandRootflag = 5
Leafflags = (Leafflag, LeafandRootflag)
Interiorflags = (Interiorflag, Rootflag)

class Node_Fifo:
   """fifo of nodes for locality access optimization"""
   def __init__(self, size=30):
       self.fifo = [] # fifo of active nodes, if used.
       self.fifosize = size
       self.fifo_dict = {}

   def flush_fifo(self):
       for node in self.fifo:
           if node.dirty:
              node.store(1)
       self.fifo = []
       self.fifo_dict = {}

class Node:
   """B+ tree node.
      follows Silberchatz & Korth database intro book closely.
      Each node has a number self.validkeys> 1 of valid keys (except for
      a tree with only 0 or 1 entries.  For leaves each
         self.key[i] that is valid is associated with int value
         self.indices[i]
      For nonleaves nextnode integer reference is at
         self.indices[i+1] and
         self.indices[0]
      is for entries with keys<self.keys[0]
      for leaves self.indices[self.size] is "pointer" to
      next sequential leaf.
   """

   # for update optimization
   dirty = 0
   fifo = None
   
   def __init__(self, flag, size, keylen, position, infile, cloner=None):
       self.flag = flag # one of Rootflag...
       self.size = size # num of pointers from this Node
       #if size>255: raise ValueError, "size too large: "+`size`
       if size<0: #or size%2==1: 
          raise ValueError, "size must be positive <= 255"
       self.position = position # seek position in file
       self.infile = infile # open file for storage
       self.keylen = keylen # maximum key length (no nulls!)
       # seek pointers for descendents (root/interior)
       # all but last is a value for a leaf, last is successor seek
       self.indices = [-1] * (size+1)
       # key storage
       # for leaves value for key[i] is at indices[i]
       # for others keys[i] is at indices[i+1],
       #   indices[0] points to keys preceding keys[0].
       # for freelist nodes, nodes are stored on
       #   linked list with indices[0] forward
       self.keys = [""] * size
       # linearized storage length in file
       #self.intstorage = intsize * (size+1)
       #self.keystorage = keylen * size
       # in debug mode the seek position is prepended
       #if debug:
       #   self.intstorage = self.intstorage + intsize
       #self.storage = (2 +           # flag, valid
       #                self.intstorage + self.keystorage)
       if cloner is None:
          self.storage = (sequence_overhead + # list overhead
                       2*intsize +         # flag, valid
                       (size+1)*intsize +  # indices
                       size*(sequence_overhead + keylen) # keys
                       )
       else:
          self.storage = cloner.storage
          self.fifo = cloner.fifo
       # note, for interior nodes
       #    validkey of 0 means one valid pointer, -1 means none
       # for leaves validkeys should be positive
       if flag in [Interiorflag, Rootflag]:
          self.validkeys = -1 # number of valid entries
       else:
          self.validkeys = 0
          
   def clear(self):
       # reinitialize keys, indices for self.
       size = self.size
       self.keys = [""] * size
       self.validkeys = 0
       if self.flag in Interiorflags:
          # reinit all indices
          self.indices = [-1] * (size+1)
          self.validkeys = -1
       else:
          # don't clobber forward pointer
          self.indices[:size] = [-1] * size
       
   # interior node operation.
   def putnode(self, key, node):
       """place a node for key into self.  Raise NOROOMERROR if no room."""
       from types import StringType
       if type(key)!=StringType:
          raise TypeError, "bad key "+`key`
       position = node.position
       self.putposition(key, position)
       
   def putfirstindex(self, index):
       #print "putfirstindex", index
       if self.validkeys>=0:
          raise ValueError, "putfirstindex on full node"
       self.indices[0] = index
       self.validkeys = 0
       
   def putposition(self, key, position):
       #print "putposition", (key, position), self.indices, self.keys
       if self.flag not in Interiorflags:
          raise ValueError, "cannot insert into leaf node"
       validkeys = self.validkeys
       last = validkeys + 1
       if self.validkeys>=self.size: raise NOROOMERROR, "no room"
       # store the key
       if validkeys<0: # no nodes currently
          #print "no keys"
          self.validkeys = 0
          self.indices[0] = position
       else:
          # yes nodes
          keys = self.keys
          # is the key there already?
          if key in keys:
             if keys.index(key)<validkeys:
                raise ValueError, "reinsert of node for existing key"
          place = bisect(keys, key, 0, validkeys)
          keys.insert(place, key)
          del keys[last]
          # store the index
          indices = self.indices
          #print "inserting", position, "before", indices
          indices.insert( place+1, position)
          del indices[last+1]
          #print "after", indices
          self.validkeys = last
       
   def delnode(self, key):
       """delete node for key, (key==None means "start" node)
          key must match exactly."""
       if self.flag not in Interiorflags:
          raise ValueError, "cannot delete node from leaf node"
       if self.validkeys<0: raise KeyError, "no such key (empty)"
       validkeys = self.validkeys
       indices = self.indices
       keys = self.keys
       #print "delnode before", key, keys, indices, validkeys
       if key is None:
          # delete first node (shouldn't happen?
          place = 0
          indexplace = 0
       else:
          # delete non-first node
          place = keys.index(key)
          indexplace = place+1
       del indices[indexplace]
       indices.append(nilseek)
       del keys[place]
       keys.append("")
       #keys[validkeys-1] = ""
       #print "delnode after", keys, indices
       self.validkeys = validkeys-1
       
   def get_keys(self):
       """return a list of valid keys for self."""
       validkeys = self.validkeys
       if validkeys<=0: return []
       else: return self.keys[0:validkeys]
       
   def keys_indices(self, leftmost):
       """return [(leftmost, firstindex), (nodekey, nodeindex), ...]"""
       keys = self.get_keys()
       if self.flag in Interiorflags:
          # nonleaf, must add leftmost to complete keys
          keys = [leftmost] + keys
       indices = self.indices[:len(keys)]
       # return pairing
       return map(None, keys, indices)
              
   def getnode(self, key):
       """get node that exactly matches key (None for first)"""
       if self.flag not in Interiorflags:
          raise ValueError, "cannot getnode from leaf node"
       if key is None: index = 0
       else: index = self.keys.index(key) + 1
       place = self.indices[index]
       if place<0: raise IndexError, "invalid position! "+`(place, key)`
       # short-circuit optimization: check fifo
       fifo = self.fifo
       if fifo:
          ff = fifo.fifo
          fd = fifo.fifo_dict
          if fd.has_key(place):
             node = fd[place]
             ff.remove(node)
             ff.insert(0, node)
             return node
       node = self.clone(place)
       node = node.materialize()
       return node
       
   # leaf mode operations
   def next(self):
       """get next node from self in linear leaf sequence, or return None."""
       if self.flag not in Leafflags:
          raise ValueError, "cannot get next for non-leaf."
       place = self.indices[self.size]
       if place == nilseek: return None
       else:
          node = self.clone(place)
          node = node.materialize()
          return node
          
   def putvalue(self, key, value):
       """put key->value mapping into leaf node.
       """
       from types import StringType, IntType
       if type(key)!=StringType and type(value)!=IntType:
          raise ValueError, "bad key, value"+ `(key,value)`
       if self.flag not in Leafflags:
          raise ValueError, "cannot get next for non-leaf."
       validkeys = self.validkeys
       indices = self.indices
       keys = self.keys
       if validkeys<=0:  # empty
          # "first entry", (key, value)
          indices[0] = value
          keys[0] = key
          self.validkeys = 1
       else:
          place=None
          if key in keys:
             place = keys.index(key)
             if place>=validkeys: place=None
          if place is not None:
             keys[place] = key
             indices[place] = value
          else:  
             if validkeys>=self.size: 
                #print "node out of room"
                #for x in self.__dict__.items(): print x
                raise NOROOMERROR, "no room"
             place = bisect(keys, key, 0, validkeys)
             #print "next entry at", place
             #next = place+1
             last = validkeys+1
             del keys[validkeys]
             del indices[validkeys]
             keys.insert(place, key)
             indices.insert(place, value)
             self.validkeys = last
             

   def put_all_values(self, keys_indices):
       """optimization for node restructuring."""
       self.clear()
       indices = self.indices
       keys = self.keys
       length = self.validkeys = len(keys_indices)
       if length>self.size:
          raise IndexError, "bad length "+`length`
       #if length<self.size/2-1: # not valid for delete (?)
       #   raise IndexError, "not enough keys"+`length`
       for i in xrange(length):
           (keys[i], indices[i]) = keys_indices[i]
           
   def put_all_positions(self, first_position, keys_positions):
       """optimization for restructuring."""
       self.clear()
       indices = self.indices
       keys = self.keys
       length = self.validkeys = len(keys_positions)
       if length>self.size:
          raise IndexError, "bad length "+`length`
       #if length<self.size/2: # not valid for delete (?)
       #   raise IndexError, "not enough keys"+`length`
       indices[0] = first_position
       for i in xrange(length):
           (keys[i], indices[i+1]) = keys_positions[i]

   def delvalue(self,key):
       keys = self.keys
       indices = self.indices
       if key not in keys:
          raise KeyError, "missing key, can't delete"
       place = keys.index(key)
       validkeys = self.validkeys
       #next = place + 1
       prev = validkeys -1
       #keys[place:prev] = keys[next:validkeys]
       #indices[place:prev] = indices[next:validkeys]
       del keys[place]
       del indices[place]
       keys.insert(prev, "")
       indices.insert(prev, nilseek)
       self.validkeys = validkeys-1
       #keys[prev] = ""
       #indices[prev] = nilseek
          
   def getvalue(self, key):
       """get value exactly matching key."""
       try:
           place = self.keys.index(key)
       except ValueError:
           raise KeyError, "key not found: " + `key`
       else:
           return self.indices[place]
          
   def newneighbor(self, position):
       """make a new leaf adjacent to self"""
       if self.flag not in Leafflags:
          raise ValueError, "cannot make leaf neighbor for non-leaf."
       neighbor = self.clone(position)
       size = self.size
       indices = self.indices
       neighbor.indices[size] = indices[size]
       indices[size] = position
       return neighbor

   def nextneighbor(self):
       """return next leaf in tree, or None."""
       if self.flag not in Leafflags:
          raise ValueError, "cannot get leaf neighbor for non-leaf."
       size = self.size
       position = self.indices[size]
       if position==nilseek:
          return None
       else:
          neighbor = self.clone(position)
          neighbor = neighbor.materialize()
          return neighbor
       
   def delnext(self, next, free):
       #print "delnext"
       #print self.indices, self.position
       #print next.indices, next.position
       size = self.size
       if self.indices[size]!=next.position:
          raise ValueError, "invalid next pointer"
       self.indices[size] = next.indices[size]
       return next.free(free)
       
   # free list mode operations
   def free(self, freenodeposition):
       """add self to free list, return position as new
          free position."""
       self.flag = Freeflag
       self.indices[0] = freenodeposition
       self.store()
       return self.position
       
   def unfree(self, flag):
       """Assuming self is head of free list,
          pop self off freelist, return next free elt position
          DOES NOT STORE.
          """
       next = self.indices[0]
       self.flag = flag
       self.validkeys = 0
       self.indices[0] = nilseek
       self.clear()
       return next
          
   def clone(self, position):
       """return a Node of same shape as self."""
       if self.fifo:
          dict = self.fifo.fifo_dict
          if dict.has_key(position):
             return dict[position]
       return Node(self.flag, self.size, self.keylen,
                   position, self.infile, self)
                   
   def getfreenode(self, freeposition, freenode_callback=None):
       """get free node of same shape as self from self.file
          make one if none exists.  Assume freeposition is
          seek position of next free node.
          returns (node, newfreeposition)
          if freenode_callback is specified, it is a function to call
          with a new free list head, if needed freenode_callback(int)
          """
       file = self.infile
       if freeposition==nilseek:
          # add at last position in file
          #save = file.tell()
          file.seek(0, 2)  # goto eof
          position = file.tell()
          thenode = self.clone(position)
          thenode.store() # write new record
          #file.seek(save)
          return (thenode, nilseek)
       else:
          # get node at position
          position = freeposition
          thenode = self.clone(position)
          thenode = thenode.materialize() # get old node
          next = thenode.indices[0]
          if freenode_callback is not None:
             freenode_callback(next)
          thenode.__init__(self.flag, self.size, 
             self.keylen, position, self.infile)
          thenode.store() # save reinitialized node
          return (thenode, next)
       
   def materialize(self):
       """read self from file."""
       #print "materialize", self.position
       position = self.position
       if self.fifo:
          fifo = self.fifo
          # look in fifo
          dict = fifo.fifo_dict
          ff = fifo.fifo
          if dict.has_key(position):
             #print "using fifo", position
             node = dict[position]
             if node is not ff[0]:
                ff.remove(node)
                ff.insert(0, node)
             #if len(ff)!=len(dict): raise "whoops"
             return node
       f = self.infile
       #f.flush() # ? needed ?
       #save = f.tell()
       f.seek(position)
       data = f.read(self.storage)
       self.delinearize(data)
       #f.seek(save) # go back
       if self.fifo:
          self.add_to_fifo()
       return self
       
   def add_to_fifo(self):
          fifo = self.fifo
          ff = fifo.fifo
          dict = fifo.fifo_dict
          #if len(dict)!=len(ff): raise "whoops before"
          position = self.position
          if dict.has_key(position):
             old = dict[position]
             del dict[position]
             ff.remove(old)
          dict[self.position] = self
          #if self in ff: ff.remove(self)
          ff.insert(0, self)
          if len(ff)>self.fifo.fifosize:
             last = ff[-1]
             del ff[-1]
             del dict[last.position]
             #print "storing", last.position
             if last.dirty:
                last.store(1)
          #if len(dict)!=len(fifo): raise "whoops"
             
   def enable_fifo(self, size = 33):
       "you better disable it later!"
       if size<5 or size>1000000:
          raise ValueError, "size not valid: "+`size`
       self.fifo = Node_Fifo(size)
       
   def disable_fifo(self):
       #print "disabling fifo", self.fifo_dict.keys()
       #global fifo_on
       if self.fifo:
          self.fifo.flush_fifo()
          self.fifo = None
       
   def store(self, force=0):
       """write self to file at self.position
          return end of record seek position."""
       #print "store", self.position
       position = self.position
       fifo = self.fifo
       if not force and fifo:
          fd = fifo.fifo_dict
          if fd.has_key(self.position) and fd[position] is self:
             self.dirty = 1
             return # defer
       f = self.infile
       #save = f.tell()
       f.seek(position)
       data = self.linearize()
       f.write(data)
       last = f.tell()
       #f.seek(save)
       self.dirty = 0
       if not force and self.fifo:
          self.add_to_fifo()
       return last
       
   def linearize(self):
       """create record format for self."""
       from marshal import dumps
       all = [self.flag, self.validkeys] + self.indices + self.keys
       s = dumps(all)
       ls = len(s)
       storage = self.storage
       if (ls > storage):
          raise ValueError, "bad storage: " + `s`
       s = s + "X" * (storage-ls)
       return s
       
       #indices = self.indices
       # in debug prepend seek position
       #if debug: indices = [self.position] + indices
       #ints = encodeints(indices)
       #keys = encodestrs(self.keys, self.keylen)
       #validkeys = self.validkeys
       #if validkeys<0: v = "*" # dummy purposes only (prewrites)
       #else: v = chr(self.validkeys ^ CMASK) # try to make v readable
       #return "%s%s%s%s%s" % (self.flag, v, ints, keys, SEPARATOR)
       
   __print__ = linearize
       
   def delinearize(self, str):
       """parse, store from record format from self."""
       from marshal import loads
       all = loads(str)
       [self.flag, self.validkeys] = all[:2]
       #self.flag = chr(ordflag)
       s = self.size
       next = 2+s+1
       indices = self.indices = all[2:next]
       keys = self.keys = all[next:]
       if len(keys) != s:
          raise ValueError, "bad keys: " + `keys` + `len(keys)`
         
   def dump(self, indent=""):
       flag = self.flag
       if flag==Freeflag:
          print 'free->', self.position,
          nextp = self.indices[0]
          if nextp!=nilseek:
             next = self.clone(nextp)
             next = next.materialize()
             next.dump()
          else:
             print "!last"
          return
       nextindent = indent + "   "
       print indent,
       if flag == Rootflag: print "root",
       elif flag == Interiorflag: print "interior",
       elif flag == Leafflag: print "leaf", 
       elif flag == LeafandRootflag: print "root and leaf",
       else: print "invalid flag???", flag,
       print self.position, "valid=", self.validkeys
       print indent, "keys", self.keys
       print indent, "seeks", self.indices
       if flag in [Rootflag, Interiorflag]:
          # interior
          for i in self.indices:
              if i != nilseek:
                 n = self.clone(i)
                 n = n.materialize()
                 n.dump(nextindent)
       else:
          # leaf
          pass
       print indent, "*****"
       
class BplusTree:
   """Basic B+tree maps fixed length strings to integers
      (could be seek positions)"""

   length = None # fill in later
   
   dirty = 0 # default
      
   # length keylen, nodesize, root_seek, free
   header_format = "%10d %10d %10d %10d %10d\n" 

   def __init__(self, infile, position=None, nodesize=None, keylen=None):
       """infile should be open file in "rb" or "w+b" mode.
          if optional args are not given they are determined
          from first line in file.
       """
       #print "BPlusTree(%s, %s, %s)" % (position, nodesize, keylen)
       if keylen is not None and keylen<=2:
          raise ValueError, "keylen must be greater than 2"
       self.root_seek = nilseek # dummy
       self.free = nilseek
       self.root = None
       self.file = infile
       self.nodesize = nodesize
       self.keylen = keylen
       if position is None:
          position = 0
       self.position = position
       #if nodesize is None:
       #   self.get_parameters()

   def walker(self, 
                      keylower=None, includelower=None,
                      keyupper=None, includeupper=None):
       return BplusWalker(self, keylower, includelower,
                                keyupper, includeupper)

   def init_params(self):
       return (self.file, self.position, self.nodesize, self.keylen)

   def getfile(self):
       return self.file

   def getroot(self):
       return self.root
          
   def update_freelist(self, position):
       if self.free!= position:
          self.free = position
          self.reset_header()

   def startup(self):
       """startup the file, write header, set root"""
       if self.nodesize is None or self.keylen is None:
          raise ValueError, \
           "cannot initialize without nodesize, keylen specified"
       self.length = 0
       self.reset_header()
       file = self.file
       file.seek(0,2) # goto eof
       self.root_seek = file.tell()
       self.reset_header()
       root = self.root = Node(LeafandRootflag, self.nodesize, self.keylen,
                        self.root_seek, file)
       root.store()

   def open(self):
       """get info on existing file."""
       file = self.file
       self.get_parameters()
       self.root = Node(LeafandRootflag, self.nodesize, self.keylen,
                        self.root_seek, file)
       self.root = self.root.materialize()
       

   fifo_enabled = 0
   
   def enable_fifo(self,size=33):
       #print "fifo enabled"
       self.fifo_enabled = 1
       self.root.enable_fifo(size)
       
   def disable_fifo(self):
       #print "fifo disabled"
       self.fifo_enabled = 0
       if self.dirty: 
          self.reset_header()
          self.dirty = 0
       self.root.disable_fifo()
 
   def reset_header(self):
       """reset the header of the file"""
       if self.fifo_enabled: 
          self.dirty = 1
          return # defer
       file = self.file
       file.seek(self.position)
       #file.write( self.header_format % 
       # (self.length, self.keylen, self.nodesize, self.root_seek, self.free) )
       from marshal import dump
       dump( (self.length, self.keylen, self.nodesize, self.root_seek, self.free),
             file)
          
   def get_parameters(self):
       file = self.file
       #save = file.tell()
       file.seek(self.position)
       from marshal import load
       temp = load(file)
       #print temp, self.position
       (self.length, self.keylen, self.nodesize, self.root_seek, self.free)=\
          temp
       #file.seek(save)

   def __len__(self):
       if self.length is None:
          self.get_parameters()
       return self.length
       
   def __getitem__(self, key):
       """self[key] -- get item associated with key"""
       if self.root is None: raise ValueError, "not open!"
       return self.find(key, self.root)

   def has_key(self, key):
       try:
           test = self[key]
       except KeyError:
           return 0
       else:
           return 1
       
   def __setitem__(self, key, value):
       """self[key]=value -- set map for key to value"""
       from types import StringType, IntType
       if type(key)!=StringType: raise ValueError, "key must be string"
       if type(value)!=IntType: raise ValueError, "value must be int"
       if len(key)>self.keylen: raise ValueError, "key too long"
       if value<0: raise ValueError, "value must be positive"
       current_length = self.length
       #if FORBIDDEN in key: 
       #   raise ValueError, "key cannot contain "+`FORBIDDEN`
       root = self.root
       if root is None: raise ValueError, "not open!"
       #global test1 #debug
       test1 = self.set(key, value, self.root)
       # do we need to split root?
       if test1 is not None:
          #print "splitting root", `test1`
          (leftmost, node) = test1
          #print "leftmost", leftmost, node
          # make a non-leaf root
          (newroot, self.free) = root.getfreenode(self.free)
          newroot.flag = Rootflag
          if root.flag is LeafandRootflag:
             root.flag = Leafflag
          else:
             root.flag = Interiorflag
          newroot.clear()
          newroot.putfirstindex(root.position)
          newroot.putnode(leftmost, node)
          self.root = newroot
          self.root_seek = newroot.position
          newroot.store()
          root.store()
          self.reset_header()
       else:
          if self.length!=current_length:
             self.reset_header()
       
   def __delitem__(self, key):
       """del self[key] -- remove map for key to value"""
       root = self.root
       currentlength = self.length
       self.remove(key, root)
       if root.flag==Rootflag:
          validkeys = root.validkeys
          if validkeys<1:
             if validkeys<0:
                raise ValueError, "invalid empty non-leaf root"
             newroot = self.root = root.getnode(None)
             self.root_seek = newroot.position
             self.free = root.free(self.free)
             self.reset_header()
             if newroot.flag==Leafflag:
                newroot.flag = LeafandRootflag
             else:
                newroot.flag = Rootflag
             newroot.store()
          elif self.length!=currentlength:
             self.reset_header()
       elif root.flag!=LeafandRootflag:
          raise ValueError, "invalid flag for root"
       elif self.length!=currentlength:
          self.reset_header()
       
   def set(self, key, value, node):
       """insert key-->value starting at node.
          return None if no split, else return
             (leftmostkey, newnode)
       """
       keys = node.keys
       validkeys = node.validkeys
       if node.flag in Interiorflags:
          # non leaf
          # find the descendent to insert in
          place = bisect(keys, key, 0, validkeys)
          #print place, key, validkeys, keys
          if place>=validkeys or keys[place]>=key:
             # insert at previous node
             index = place
          else:
             # index at node
             index = place+1
          if index==0: nodekey=None
          else: nodekey=keys[place-1]
          #print "nodekey", nodekey, node.indices
          nextnode = node.getnode(nodekey)
          test = self.set(key, value, nextnode)
          # split?
          if test is not None:
             (leftmost, insertnode) = test
             try:
                 # insert if room
                 node.putnode(leftmost, insertnode)
             except NOROOMERROR:
                 # no room, split
                 insertindex = insertnode.position
                 (newnode, self.free) = node.getfreenode(
                   self.free, self.update_freelist)
                 newnode.flag = Interiorflag
                 ki = node.keys_indices("dummy")
                 (dummy, firstindex) = ki[0]
                 # remove dummy
                 ki = ki[1:]
                 # insert new pair
                 insort(ki, (leftmost, insertindex))
                 newleftmost = self.divide_entries(firstindex, node, newnode, ki)
                 node.store()
                 newnode.store()
                 return (newleftmost, newnode)
             else:
                 node.store()
                 return None # no split
       else:
          # leaf
          if key not in keys or keys.index(key)>=validkeys:
              newlength = self.length+1
          else:
              newlength = self.length
          try:
              # insert if room
              node.putvalue(key, value)
          except NOROOMERROR:
              # no room: split
              # get entries (dummy is ignored for leaves)
              ki = node.keys_indices("dummy")
              insort(ki, (key, value))
              (newnode, self.free) = node.getfreenode(
                self.free, self.update_freelist)
              newnode = node.newneighbor(newnode.position)
              newnode.flag = Leafflag
              # 0 is dummy firstindex, ignored for leaves
              newleftmost = self.divide_entries(0, node, newnode, ki)
              node.store()
              newnode.store()
              self.length = newlength
              return (newleftmost, newnode)
          else:
              node.store()
              self.length = newlength
              return None
              
   def remove(self, key, node):
       """remove key from tree at node.
          raise KeyError if absent.
          return (leftmost, size) if leftmost changes.
          otherwise return (None, size).
          Caller is responsible for restructuring node, if needed.
       """
       newnodekey = None
       if node.flag in Interiorflags:
          # nonleaf
          keys = node.keys
          validkeys = node.validkeys
          place = bisect(keys, key, 0, validkeys)
          if place>=validkeys or keys[place]>=key:
             # delete at tree before place
             index = place
          else:
             # delete at tree for place
             index = place+1
          if index==0: nodekey=None
          else: nodekey=keys[place-1]
          nextnode = node.getnode(nodekey)
          # recursively remove from nextnode
          (lm, size) = self.remove(key, nextnode)
          # is nextnode now too small?
          nodesize = self.nodesize
          half = nodesize/2
          if (size<half):
             # restructure, ugly!
             # find another node for redistribution
             if nodekey is None and validkeys==0:
                raise IndexError, "invalid node, only one child!"
             if place>=validkeys:
                # final node, get previous
                rightnode = nextnode
                rightkey = nodekey
                if validkeys<=1: leftkey = None
                else: leftkey = keys[place-2]
                leftnode = node.getnode(leftkey)
             else:
                # non-final, get next
                leftnode = nextnode
                leftkey = nodekey
                if index==0: rightkey=keys[0]
                else: rightkey = keys[place]
                rightnode = node.getnode(rightkey)
             # get all keys, indices
             rightki = rightnode.keys_indices(rightkey)
             leftki = leftnode.keys_indices(leftkey)
             ki = leftki + rightki
             # redistribute or merge?
             #print "ki, nodesize", ki, nodesize
             lki = len(ki)
             if lki>nodesize or (leftnode.flag!=Leafflag and lki>=nodesize):
                # redistribute
                (newleftkey, firstindex) = ki[0]
                if leftkey==None:
                   newleftkey = lm
                if leftnode.flag!=Leafflag:
                   # nuke first ki
                   ki = ki[1:]
                newrightkey = self.divide_entries(
                     firstindex, leftnode, rightnode, ki)
                # delete, reinsert right
                node.delnode(rightkey)
                node.putnode(newrightkey, rightnode)
                # ditto for left if first changed
                if (leftkey!=None and leftkey!=newleftkey):
                   node.delnode(leftkey)
                   node.putnode(newleftkey, leftnode)
                node.store()
                leftnode.store()
                rightnode.store()
             else:
                # merge into left, free right
                (newleftkey, firstindex) = ki[0]
                #leftnode.clear()
                if leftnode.flag!=Leafflag:
                   #leftnode.putfirstindex(firstindex)
                   #del ki[0]
                   #for (k,i) in ki:
                   #    leftnode.putposition(k,i)
                   leftnode.put_all_positions(firstindex, ki[1:])
                else:
                   #for (k,i) in ki:
                   #    leftnode.putvalue(k,i)
                   leftnode.put_all_values(ki)
                if rightnode.flag==Leafflag:
                   self.free = leftnode.delnext(rightnode, self.free)
                else:
                   self.free = rightnode.free(self.free)
                if leftkey is not None and newleftkey!=leftkey:
                   node.delnode(leftkey)
                   node.putnode(newleftkey, leftnode)
                node.delnode(rightkey)
                node.store()
                leftnode.store()
                self.reset_header()
             if leftkey is None: newnodekey = lm
          else:
             # no restructure
             # update leftmost, if needed
             if nodekey is None: newnodekey = lm
             elif lm is not None:
                node.delnode(nodekey)
                node.putnode(lm, nextnode)
          # end of restructure if
       else:
          # leaf, base case: just delete it
          if node.validkeys<1:
             # should only happen for empty root
             raise KeyError, "no such key"
          first = node.keys[0]
          node.delvalue(key)
          rest = node.keys[0]
          if first!=rest:
             newnodekey = rest
          node.store()
          self.length = self.length - 1
       return (newnodekey, node.validkeys)

   def divide_entries(self, firstindex, node1, node2, entries):
       """divide presorted entries evenly among node1, node2
          return leftmost of node2.
          firstindex is ignored for leaves
       """
       middle = len(entries)/2 + 1
       #node1.clear()
       #node2.clear()
       if node1.flag in Interiorflags:
          #middle = middle+1
          left = entries[:middle]
          right = entries[middle:]
          #print "left, right", left, right
          # nonleaf
          #node1.putfirstindex(firstindex)
          #for (k,i) in left:
          #    node1.putposition(k,i)
          (leftmost, midindex) = right[0]
          #node2.putfirstindex(midindex)
          #for (k,i) in right[1:]:
          #    node2.putposition(k, i)
          node1.put_all_positions(firstindex, left)
          node2.put_all_positions(midindex, right[1:])
          return leftmost
       else:
          # leaf
          left = entries[:middle]
          right = entries[middle:]
          #for (k,i) in left:
          #    node1.putvalue(k,i)
          #for (k,i) in right:
          #    node2.putvalue(k,i)
          node1.put_all_values(left)
          node2.put_all_values(right)
          return right[0][0]
       
   def find(self, key, node):
       """find key starting at node."""
       while node.flag in Interiorflags:
          # non-leaf
          thesekeys = node.keys
          validkeys = node.validkeys
          # find place at or just beyond key
          place = bisect(thesekeys, key, 0, validkeys)
          if place>=validkeys or thesekeys[place]>key:
             if place==0: nodekey=None
             else: nodekey=thesekeys[place-1]
          else:
             nodekey = key
          node = node.getnode(nodekey)
       return node.getvalue(key)
          
   def dump(self):
       self.root.dump()
       if self.free!=nilseek:
          free = self.root.clone(self.free)
          free = free.materialize()
          free.dump()
          
   def __del__(self):
       if self.fifo_enabled:
          self.disable_fifo()

class BplusWalker:
   """iterative walker for bplustree leaf nodes."""

   def __init__(self, tree, 
                      keylower=None, includelower=None,
                      keyupper=None, includeupper=None):
       """initialize a walker for tree with key values bounded
          by upper/lower, if given, included or excluded as specified.
          Tree should never be updated while walker is active,
          otherwise behaviour of walker is undefined."""
       self.tree = tree
       self.keylower = keylower
       self.includelower = includelower
       self.keyupper = keyupper
       self.includeupper = includeupper
       if self.tree.getroot() == None:
          self.tree.open()
       # get the first pertinent leaf in tree
       node = self.tree.getroot()
       while node.flag in Interiorflags:
          # interior node, seek a leaf
          if keylower is None:
             nkey = None
          else:
             keys = node.get_keys()
             place = bisect(keys, keylower)
             if place==0: nkey = None
             elif place>len(keys): nkey = keys[-1]
             else: nkey = keys[place-1]
          node = node.getnode(nkey)
       self.node = self.startnode = node
       # preinit
       self.node_index = None
       self.valid = 0 # pessimism
       self.first()

   def first(self):
       """reset walker to first position, or raise IndexError
          if keyrange is empty."""
       node = self.node = self.startnode
       # is the key in the node?
       keys = node.keys
       #print "first at", keys
       keylower = self.keylower
       keyupper = self.keyupper
       validkeys = node.validkeys
       self.valid = 0
       if keylower==None:
          self.node_index = 0
          self.valid = 1
       elif keylower in keys and self.includelower:
          index = self.node_index = keys.index(keylower)
          if index<validkeys:
             self.valid = 1 # hurrah!
       if not self.valid:
          # look for next
          place = bisect(keys, keylower, 0, validkeys)
          if place<validkeys:
             index = self.node_index = place
             testk = keys[index]
             if (testk>keylower or 
                 (self.includelower and testk==keylower)):
                self.valid = 1
             else:
                self.valid = 0
          else:
             # advance to the next node
             next = node.nextneighbor()
             if next is not None:
                self.startnode = next
                self.first()
                return
             else:
                self.valid = 0
       # test keyupper
       if self.valid and keyupper is not None:
          key = self.current_key()
          if key<keyupper or (self.includeupper and key==keyupper):
             self.valid = 1
          else:
             self.valid = 0

   def current_key(self):
       """key the walker currently "points at"."""
       if self.valid: return self.node.keys[self.node_index]
       else: raise IndexError, "not at valid index"

   def current_value(self):
       """value the walker currently "points at"."""
       #print "current at", self.node_index, self.node.indices
       if self.valid: return self.node.indices[self.node_index]
       else: raise IndexError, "not at valid index"

   def next(self):
       """advance to next position, or set to invalid."""
       nextp = self.node_index+1
       node = self.node
       if nextp>=node.validkeys:
          # goto next node
          next = node.nextneighbor()
          if next is None:
             self.valid = 0
             return
          node = self.node = next
          nextp = 0
       #print "next at", node.keys, node.indices, nextp, node.validkeys
       if node.validkeys<=nextp:
          self.valid = 0
       else:
          testkey = node.keys[nextp]
          keyupper = self.keyupper
          if (keyupper is None or
              testkey<keyupper or 
              (self.includeupper and testkey==keyupper)):
             self.node_index = nextp
             self.valid = 1
          else:
             self.valid = 0

class caching_BPT(BplusTree):

   """simple caching.  No updates allowed."""

   def __init__(self, infile, position=None, nodesize=None, keylen=None):
       BplusTree.__init__(self, infile, position, nodesize, keylen)
       self.cache = {}

   def __getitem__(self, key):
       try:
           return self.cache[key]
       except KeyError:
           r = self.cache[key] = BplusTree.__getitem__(self, key)
           return r

   def reset_cache(self):
       self.cache = {}

   def nope(self, *args):
       raise ValueError, "op not permitted for caching_BPT"

   __setitem__ = __delitem__ = nope

class SBplusTree:
   """Wrapper for BPlusTree, maps strings-->strings.
      Key strings are fixed length as in BPlusTree.
      Value strings are arbitrary length but space for
      overwritten or deleted values will be wasted in
      the file (the aren't GC'd, unlike tree nodes which are.
   """

   # can be overridden.
   treeclass = BplusTree
   
   def __init__(self, infile, position=None, nodesize=None, keylen=None):
       self.infile = infile
       self.tree = self.treeclass(infile, position, nodesize, keylen)

   def walker(self, 
                      keylower=None, includelower=None,
                      keyupper=None, includeupper=None):
       return SBplusWalker(self, keylower, includelower,
                                 keyupper, includeupper)

   def __len__(self):
       return len(self.tree)

   def init_params(self):
       return self.tree.init_params()

   def getroot(self):
       return self.tree.getroot()

   def getfile(self):
       return self.infile
       
   def enable_fifo(self, size=33):
       self.tree.enable_fifo(size)
       
   def disable_fifo(self):
       self.tree.disable_fifo()

   def dump(self):
       """ignore real values here, should fix.""" 
       self.tree.dump()

   def startup(self):
       self.tree.startup()

   def open(self):
       self.tree.open()

   def __getitem__(self, key):
       seek = self.tree[key]
       return getstring(self.infile, seek)

   def __setitem__(self, key, value):
       """Warning: overwrite "loses" old value space."""
       #try:
       #   test = self[key]
       #except KeyError:
       #   go = 1
       #else:
       #   go = (test != key)
       #if go:
       # assume overwrite (optimize)
       seek = putstring(self.infile, value)
       self.tree[key] = seek

   def __delitem__(self, key):
       """Warning: loses old value storage."""
       del self.tree[key]

   def has_key(self):
       return self.tree.has_key(self)

class caching_SBPT(SBplusTree):
   """string-->string caching b-plus tree."""
   treeclass = caching_BPT

class SBplusWalker:
   """iterator for string-->string Bplus tree."""

   # can be overridden
   walkerclass = BplusWalker

   def __init__(self, tree,
                      keylower=None, includelower=None,
                      keyupper=None, includeupper=None):
       self.walker = self.walkerclass(tree, keylower, includelower,
                keyupper, includeupper)
       self.file = tree.getfile()
       self.valid = self.walker.valid

   def first(self):
       self.walker.first()
       self.valid = self.walker.valid

   def current_key(self):
       return self.walker.current_key()

   def current_value(self):
       seek = self.walker.current_value()
       return getstring(self.file, seek)

   def next(self):
       self.walker.next()
       self.valid = self.walker.valid

def putstring(infile, s):
       """Add a new string record to eof. return start seek."""
       #save = infile.tell()
       # seek to eof
       infile.seek(0,2)
       last = infile.tell()
       from marshal import dump
       dump(s, infile)
       #infile.seek(save)
       return last

def getstring(infile, i):
       """get an old string record at i"""
       #save = infile.tell()
       infile.seek(i)
       from marshal import load
       s = load(infile)
       #infile.seek(save)
       return s

def recopy_bplus(fromfile, tofile, 
                 treeclass=BplusTree):
    """copy BplusTree from fromfile to tofile.
       from file should be open "rb", tofile "w+b"."""
    fromtree = treeclass(fromfile)
    fromtree.open()
    (f, p, n, k) = fromtree.init_params()
    totree = treeclass(tofile, p, n, k)
    totree.startup()
    return recopy_tree(fromtree, totree)
    
def recopy_tree(fromtree, totree):
    """copy fromtree contents to totree.
       trees must be compatible.
       copy attempts to "compactize" totree."""
    (f,p,n,k) = totree.init_params()
    try:
        totree.enable_fifo()
        walker = fromtree.walker()
        # fill up first node in totree
        part1 = n/2 +1
        part2 = part1-2
        defer = []
        while walker.valid:
           # pseudooptimization: defer n/2-1 tail elements
           # for n even this makes all leaves full (in tests)
           for i in xrange(part1):
               if not walker.valid: break
               totree[ walker.current_key() ] = walker.current_value()
               walker.next()
           for (k,v) in defer:
               totree[k]=v
           defer = []
           for i in xrange(part2):
               if not walker.valid: break
               defer.append( (walker.current_key(), walker.current_value()) )
               walker.next()
        for (k,v) in defer:
            totree[k] = v
        return (fromtree, totree)
    finally:
        #print "disabling fifo"
        totree.disable_fifo()

def recopy_sbplus(fromfile, tofile,
                 treeclass=SBplusTree):
    """copy SBplusTree from fromfile to tofile.
       from file should be open "rb", tofile "w+b".
       this will create a new file without "lost garbage"."""
    return recopy_bplus(fromfile, tofile, treeclass)
    
##### simple dbm compatibility
bignum = 0x7efe77 # 8 million buckets

def myhash(s):
    """portable string hash function.
       (because builtin hash isn't portable)."""
    o = ord
    B = bignum
    result = 775 + len(s)*1001
    for c in s:
        #print result
        result = (result*253 + o(c)*113) % B
    return result

class dbm:
   """dbm compatible index file with unlimited key/value size.
      overwrites, dels and hash collisions leave "junk" in index.
      Alternate implementations left to reader, or to future.
      
      Hash indexed into buckets in an SBplusTree.
      buckets with marshalled dict of {key: value}
      for elements in this bucket.
   """
      
   flagmap = {"r": "rb", "w": "r+b", "c": "w+b"}
   openmodes = ("r", "w")
   treeclass = SBplusTree
   nodesize = 202
      
   def __init__(self, filename, flag="r", mode=None):
       #print "init", filename, flag, mode
       if mode is not None:
          raise ValueError, "sorry mode not supported (portability)"
       self.fileflag = flag
       rf = self.realflag = self.flagmap[flag]
       self.filename = filename
       f = self.file = open(filename, rf)
       # length record at start of file
       if flag in self.openmodes:
          from marshal import load
          from string import atoi
          self.length = load(f)
          # parameters determined from header
          #print "reopening", self.length, f.tell()
          t = self.tree = self.treeclass(f, f.tell())
          t.open()
       else:
          # put length record
          from marshal import dump
          dump(0, f)
          self.length = 0
          #print "creating", self.length, f.tell()
          t = self.tree = self.treeclass(f, f.tell(), self.nodesize, intsize-1)
          t.startup()
       self.tree.enable_fifo(self.nodesize+3)
          
   closed = 0
          
   def close(self):
       if self.closed: return
       self.tree.disable_fifo()
       # put length record
       if self.length<0: 
          raise ValueError, "negative len?"+`(self.length, self.filename)`
       f = self.file
       if self.fileflag in ("c", "w"):
          f.seek(0)
          from marshal import dump
          dump(self.length, f)
       f.close()
       self.closed = 1
       
   def __del__(self):
       self.close()
       
   def __len__(self):
       return self.length
          
   def hash(self, key):
       from marshal import dumps
       h = myhash(key)
       hs = dumps(h)[1:] # nuke indicator
       return hs
          
   def pairs(self, hash):
       try:
           spairs = self.tree[hash]
       except KeyError:
           return {}
       from marshal import loads
       return loads(spairs)
       
   def setpairs(self, hash, pairs):
       from marshal import dumps
       spairs = dumps(pairs)
       self.tree[hash] = spairs
          
   def __getitem__(self, item):
       h = self.hash(item)
       pairs = self.pairs(h)
       return pairs[item]
   
   def __setitem__(self, item, value):
       h = self.hash(item)
       pairs = self.pairs(h)
       if not pairs.has_key(item):
          self.length = self.length+1
       pairs[item] = value
       self.setpairs(h, pairs)
       #print self.length
   
   def __delitem__(self, item):
       h = self.hash(item)
       pairs = self.pairs(h)
       del pairs[item]
       if pairs:
          self.setpairs(h, pairs)
       else:
          del self.tree[h]
       self.length = self.length-1
       #print self.length
       
   def has_key(self, item):
       try:
           test = self[item]
       except KeyError:
           return 0
       else:
           return 1
   
   def keys(self):
       """not terribly efficient! (should optimize?)"""
       result = []
       w = self.tree.walker()
       from marshal import loads
       while w.valid:
          spairs = w.current_value()
          pairs = loads(spairs)
          for k in pairs.keys():
              result.append(k)
          w.next()
       if len(result)!=self.length:
          raise IndexError, "bad tree length:"+`(len(result), self.length)`
       return result
       
   
   def copy(self, tofilename, flag, mode=None):
       if flag=="r":
          raise ValueError, "nonsense! can't copy into read only index"
       #print "copy", tofilename, flag
       other = dbm(tofilename, flag, mode)
       if flag=="c":
          # create: make optimal
          recopy_tree(self.tree, other.tree)
          other.length = self.length
          other.tree.enable_fifo(other.nodesize+3)
       elif flag=="w":
          # insert-into: simple traversal (collisions waste space)
          w = self.tree.walker()
          from marshal import loads
          while w.valid:
             spairs = w.current_value()
             pairs = loads(spairs)
             for (k,v) in pairs.items():
                 other[k] = v
             w.next()
       return other
       
def testdbm():
    print "creating files test1, 2, 3 for dbm test"
    d1 = dbm("test1", "c")
    for x in range(10):
        key = "hello"*x
        d1[key] = "01234567890"[:-x]
        print key, d1[key]
    print d1.keys()
    for x in range(300):
        d1[oct(x)] = hex(x)
    del d1['']
    print len(d1), d1.keys()
    print "should be 0:", d1.has_key(""), d1.has_key("abd")
    print "copying"
    d2 = d1.copy("test2", "c")
    beforedel = len(d1)
    del d2["hello"]
    print len(d2), d2.keys()
    d2.close()
    d2 = dbm("test2", "r")
    print "should be equal", beforedel-1, len(d2)
    print "keys", d2.keys()
    print "testing copy-into"
    d3 = dbm("test3", "c")
    d3["willy"] = "wally"
    d3.close()
    d3 = d2.copy("test3", "w")
    print "should be equal", beforedel, len(d3)
    print "keys", d3.keys()
    
### test
def test():
    """test program: creates a bplustree file "test".
       try messing with the node size.
    """
    print "creating file 'test' in current directory for test data."
    f = open("test", "w+b")
    B = SBplusTree(f, 0, 202, 10)
    B.startup()
    B.enable_fifo()
    #return B
    B["this"] = 0xdad
    from string import letters, digits
    for x in letters+digits: B[x] = ord(x)
    for x in "13579finalmopq": del B[x]
    print "final pass"
    from time import time
    s = time()
    for x in range(1000): B[hex(x)] = x; #print x
    print "one thousand assigns", time()-s
    #B.dump()
    B.disable_fifo()
    return (B, f)

def retest():
    from time import time
    f = open("test", "rb")
    B = caching_SBPT(f)
    B.open()
    B.enable_fifo()
    print "retesting"
    for x in "abcdefghi012345":
        try:
             print x, "-->", B[x]
        except KeyError:
             print x, "absent"
    print "entering torture chamber"
    s = time()
    for x in range(1000): l = B[hex(x)]
    print "1 thousand retrieves: ", time()-s
    return B
    print "keys, values between 4 and C (including C)"
    W = SBplusWalker(B, "4", 0, "C", 1)
    while W.valid:
       print (W.current_key(), W.current_value()),
       W.next() 
    print
    print "keys, values between 4 (including 4) and C (excluding C)"
    W = SBplusWalker(B, "4", 1, "C", 0)
    while W.valid:
       print (W.current_key(), W.current_value()),
       W.next()
    print
    print "all keys"
    W = SBplusWalker(B)
    while W.valid:
       print W.current_key(),
       W.next()
    print
    print "A to A inclusive (1 elt)"
    W = SBplusWalker(B, "A", 1, "A", 1)
    while W.valid:
       print W.current_key(),
       W.next()
    print
    print "A to A exclusive (0 elt)"
    W = SBplusWalker(B, "A", 1, "A", 0)
    while W.valid:
       print W.current_key(),
       W.next()
    print
    print "AA to AA inclusive (0 elt)"
    W = SBplusWalker(B, "AA", 1, "AA", 0)
    while W.valid:
       print W.current_key(),
       W.next()
    print
    print
    B.disable_fifo()
    return (W, B, f)

if __name__=="__main__": 
   (B,f) = test()
   B=None
   f.close()
   retest()
    
From bizzaro at bc.edu  Tue Mar 23 17:56:51 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] phylogeny and and overview question
References: <Pine.OSF.4.03.9903230118120.17316-100000@busboy.sped.ukans.edu> <36F7B5DD.D4AC2173@bc.edu>
Message-ID: <36F81C33.39FFCF1D@bc.edu>

But if you're saying,

    Desktop = Workspace + all other GUI loci

then you are correct.  But the Workspace and GUI loci use the wfs to communicate
with each other as well.  In short, EVERYTHING uses the wfs :-)


Jeff
bizzaro@bc.edu


"J.W. Bizzaro" wrote:
> 
> Justin Bradford wrote:
> 
> > As for the structure, I want to clear up something I'm a little confused
> > about. Does the Loci system work like this:
> >
> > Desktop <-> wfs <--|----> analysis locus #1
> >                    |
> >                    |----> analysis locus #2
> >                    |
> >                    |----> database
> >                    |
> >                    |----> etc...
> >
> > And things from the third column only talk to the wfs, and not directly to
> > each other. Right?
> >
> 
> Maybe more like this:
> 
> Workspace <-> wfs <--|----> analysis locus #1
>                      |
>                     wfs
>                      |
>                      |----> analysis locus #2
>                      |
>                     wfs
>                      |
>                      |----> gui locus #1
>                      |
>                     wfs
>                      |
>                      |----> gui locus #2
>                      |
>                     wfs
>                      |
>                      |----> database
>                      |
>                     wfs
>                      |
>                      |----> etc...
> 
> where "Workspace" is (1) the workflow diagram/monitor, (2) the notebook/logger,
> and (3) the central canvas.  Communication to these isn't really any different.
> It's just that these are for user monitoring and control.
> 

-- 
J.W. Bizzaro                  Phone: 617-552-3905
Boston College                mailto:bizzaro@bc.edu
Department of Chemistry       http://www.uml.edu/Dept/Chem/Bizzaro/
--

From justin at ukans.edu  Tue Mar 23 18:28:49 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] phylogeny and and overview question
In-Reply-To: <36F81C33.39FFCF1D@bc.edu>
Message-ID: <Pine.OSF.4.03.9903231726520.25836-100000@busboy.sped.ukans.edu>

> But if you're saying,
> 
>     Desktop = Workspace + all other GUI loci
> 
> then you are correct.  But the Workspace and GUI loci use the wfs to
> communicate with each other as well.  In short, EVERYTHING uses the wfs
> :-)

Yes, I am. I believe I understand it now.

Justin


From bizzaro at bc.edu  Thu Mar 25 02:05:37 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] official GNU project status
Message-ID: <36F9E041.900EBCB3@bc.edu>

Locians,

I've been mulling over the prospect of making The Loci Project part of the GNU
Project.  I had a number of questions, which I sent to the FSF, and below is
their reply.  What are your thoughts regarding this?  I can't see how we can
lose.  Maybe some of you are just RMS haters...?


Jeff
bizzaro@bc.edu

---------
 jwb> Quoting Georg from Issue #1:
 jwb> "The TrustCenter in Hamburg (Germany) [5] released its PKCS#11
implementation
 jwb> under the GPL and made it an official GNU Project [6]."

 jwb> What makes a project "an official GNU Project"?  What you wrote gives the
 jwb> impression that it was The TrustCenter's decision in this case.  But it
must of
 jwb> course be the FSF's decision.

This is a question that might be worth answering in the issue #3...

So you don't have to wait: here is your private answer. .-)

"Official GNU Projects" are projects that are "officially accredited"
by the FSF / GNU Project. The official GNU Projects are considered to
be part of the GNU System and they are distributed on the GNU CD-ROMs.

All GNU Projects follow the GNU coding guidelines (long commandline
options, a help available via "--help" and such) as can be viewed on
the GNU Webpage.

 jwb> Are all "GNU Projects" for the creation of the GNU OS and OS-related?  The
GIMP
 jwb> gained this status but is not a program critical for the existence of an
OS.

All projects under the GPL or Lesser GPL may become "official GNU
Projects" as long as they are of interest for a group of people (just
one or two isn't enough..). As you already said: Not all GNU Projects
are neccessarily system-related.

 jwb> Also, are there restrictions to using "GNU" in a program name?  If someone
calls
 jwb> their program "GNU CD Player or GCP", do they or must they have permission
to
 jwb> use "GNU"?

Well. There are not really restrictions on the usage of "GNU" in a
name although using GNU suggests a GNU affiliation and it would
probably be a good idea to use it only if you have an official GNU
Project.

How to make a project "an official GNU Project" is easy. Contact the
FSF / GNU Project and tell us what you are planning to do (or have
done already) and (optional) why you think it'd be interesting to have 
this as a part of the GNU System. In most cases we'll give you an
account on the GNU machines, offer you webspace for your project on
www.gnu.org/software/your-cool-gnu-project and welcome you in the GNU
community. :-)

 jwb> If a developer's project does become an accredited GNU project, what is
the
 jwb> developer expected to give to the FSF?  Particularly, does the FSF gain
any
 jwb> copyright or legal claim to the software?  

That is up to you. If you ask to make your project a GNU Project
you'll be asked whether you want to transfer your copyright to the
FSF. This is not neccessary, though. If you take my GNU Project, the
Xlogmaster, for instance: I still hold the copyright although it is an 
official GNU Project.

The only thing that is really "expected" from you is to comply to the
GNU coding standards which ensure that all applications have a similar 
"feel" to them (like everything should support the "--version" and
"--help" commandline options). Those are never strictly enforced -
it's more like something that "we would like you to do". Since those
standards are basically common sense I never had a problem to follow
them. 

We also encourage people to write clean code and write as much
documentation as possible. Again this is never enforced.

You will decide what happens to your project. Even after transferring
the copyright you'd become the maintainer of the package and would
determine it's course.

No matter how you decide yourself: You'll always stay the original
author and the maintainer as long as you want. The program will still
be "your baby". .-)

 jwb> And what is the developer expected to
 jwb> get from the FSF in return.

If you mean any monetary compensation: The FSF does not pay any money
for making things official GNU Projects - sometimes you'll find
announcements for special projects that might be funded by FSF money
but in general we won't pay an author to make something a GNU Project.

The other advantages are more than worth it in my eyes, though. You'll 
get an account on the GNU machines (together with a nifty 
"yourchoice@gnu.org" email address). You will be able to access GNU
internal mailinglists and your project can have it's own homepage on
www.gnu.org. If you want you can get a GNU Mailinglist and/or
Newsgroup (gnu.your.project) for your program.

Your project will be on the GNU CD-ROMs and ftp.gnu.org plus all it's
mirrors.

What I consider very pleasing myself is the feeling to have "given
back" something to the community that allowed me to use all this cool
software before. 

 jwb> Thank you for answering my questions!

No problem. Hope my answers helped you a bit.

Regards,
                Georg


-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

Studies show that 93% of all people are below average.
--

From bizzaro at bc.edu  Fri Mar 26 08:15:53 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] Re: Future of TULIP
References: <99032607283800.22282@mrnlinux.hg.med.umich.edu>
Message-ID: <36FB8889.73FD4F00@bc.edu>

Hi Matthew!

TULIP/Loci is very much alive and growing.  The lack of any signs of life is due
to my failure to keep updating the "old" site.  Maybe my excuse is that I have
been planning on a "new", dedicated server for the project, which BTW is being
set up now.  Another excuse may be that the project is being defined and
redefined so often that I can't put together a clear presentation.

What I don't mention on the old site is that we now have a mailing list.  To
subscribe, send an e-mail to:

    majordomo@busboy.sped.ukans.edu

with this text in the body of the message:

    subscribe tulip-list

And we have an archive for the mailing list:

    http://toaster.sped.ukans.edu/tulip-list/

Take a look at some of the recent discussions.

Your help would be very much appreciated!

BTW, could you give me some background on your interests in research and
experience in programming?


Jeff
bizzaro@bc.edu


"Matthew R. Nelson" wrote:
> 
> With your most recent news occurring within a one month period of time
> almost four months ago, it looks like TULIP is dying or dead.  Is this
> true?  If not, it is important to have your website at least give the
> appearance of a living beast, providing any updates in decisions, class
> definitions, prototypes, etc.  Such activity can help reassure people like
> me that it might be worth it to contribute to the effort.
> 
> I'll continue to revisit your pages for signs of life.
> 
> Regards,
> 
> Matt
> ----------------------------------------------------------------------------
> Matthew R. Nelson
> Dept. of Human Genetics             http://www-personal.umich.edu/~ticul/
> University of Michigan              email: ticul@umich.edu
> 4711 Medical Science II             phone: (734) 763-8090 or 647-3151
> Ann Arbor, MI  48109-0618           fax:   (734) 763-5477
> ----------------------------------------------------------------------------

-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

Studies show that 93% of all people are below average.
--

From bizzaro at bc.edu  Fri Mar 26 16:42:37 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] [Fwd: Future of TULIP]
Message-ID: <36FBFF4D.CE320E37@bc.edu>

Reply from Matthew...
-------------- next part --------------
An embedded message was scrubbed...
From: "Matthew R. Nelson" <mrn@supermta.hg.med.umich.edu>
Subject: Re: Future of TULIP
Date: Fri, 26 Mar 1999 08:15:27 -0500
Size: 3195
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990326/0d0f80c7/attachment.mht
From bizzaro at bc.edu  Fri Mar 26 17:53:54 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] an interesting thought
Message-ID: <36FC1002.63362451@bc.edu>

Locians,

It's strange that I just came across this article:

  http://www.news.com/News/Item/0%2C4%2C34314%2C00.html?dd.ne.txt.0326.04

Strange, because a couple days ago I was thinking about the future of GUI
development and the role of XML and the Internet.  I thought that the Web
browser of today may some day become so customizable that it will be a portable
GUI toolkit, with an Internet backbone.

But this is also the direction Loci is heading.

The article talks about using XUL (an XML) to provide the browser with GUI
information (buttons, etc.).  The problems are, (1) XUL would require an
enormous and complex DTD, and (2) the browser would need all of the widgets
built-in, ready to be called upon at runtime.

I realized these would be insurmountable problems for Loci, if it were to go
this route, simply because Loci is not the scale of Mozilla.

But what if the GUI information for Loci were included in the XML, as with XUL,
but

                      >>>as a Python-GTK script<<<

Yes, a functional program/module embedded in the XML.  Has anyone heard of this
being done before?  Try that with compiled binaries!

So each locus would be the same: just a shell that can process our XML (workflow
+ bio + GUI) and make an application on the fly.

Any thoughts?  What would be the advantages and disadvantages?


Jeff
bizzaro@bc.edu

From justin at ukans.edu  Sat Mar 27 04:37:13 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] an interesting thought
In-Reply-To: <36FC1002.63362451@bc.edu>
Message-ID: <Pine.OSF.4.03.9903270334020.4848-100000@busboy.sped.ukans.edu>

> So each locus would be the same: just a shell that can process our XML
> (workflow + bio + GUI) and make an application on the fly.
> 
> Any thoughts?  What would be the advantages and disadvantages?

The only problem I see is one of speed. However, if we had widgets along
the lines of render_3d_molecule, etc, it could work.

You don't mean the analysis tool generates the script, though, right?

Justin


From David.Lapointe at umassmed.edu  Sat Mar 27 09:34:54 1999
From: David.Lapointe at umassmed.edu (Lapointe, David)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] an interesting thought
Message-ID: <93307F07DE63D211B2F30000F808E9E525D739@edunivexch02.umassmed.edu>

Jeff,

This has been puzzling me for a while. What with the emphasis
toward using Python as a base language for Loci, what about Grail as a
browser/GUI ? I haven't tried this but apparently python scripts can be
downloaded and client side executed. <diatribe issue="What about
security?"/> 

Also, at the BAMBCT meeting Thursday, I brought up the issue of using Linux
to do Computational Biology. Lance thought a June meeting Show 'n Tell would
be interesting. There are *many* scientists who are using Linux scattered
around the Boston area. 

David

-----Original Message-----
From: J.W. Bizzaro
To: tulip-list
Sent: 3/26/99 5:53 PM
Subject: [Pipet Devel] an interesting thought

Locians,

It's strange that I just came across this article:

 
http://www.news.com/News/Item/0%2C4%2C34314%2C00.html?dd.ne.txt.0326.04

Strange, because a couple days ago I was thinking about the future of
GUI
development and the role of XML and the Internet.  I thought that the
Web
browser of today may some day become so customizable that it will be a
portable
GUI toolkit, with an Internet backbone.

But this is also the direction Loci is heading.

The article talks about using XUL (an XML) to provide the browser with
GUI
information (buttons, etc.).  The problems are, (1) XUL would require an
enormous and complex DTD, and (2) the browser would need all of the
widgets
built-in, ready to be called upon at runtime.

I realized these would be insurmountable problems for Loci, if it were
to go
this route, simply because Loci is not the scale of Mozilla.

But what if the GUI information for Loci were included in the XML, as
with XUL,
but

                      >>>as a Python-GTK script<<<

Yes, a functional program/module embedded in the XML.  Has anyone heard
of this
being done before?  Try that with compiled binaries!

So each locus would be the same: just a shell that can process our XML
(workflow
+ bio + GUI) and make an application on the fly.

Any thoughts?  What would be the advantages and disadvantages?


Jeff
bizzaro@bc.edu

From bizzaro at bc.edu  Sat Mar 27 21:03:36 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] an interesting thought
References: <Pine.OSF.4.03.9903270334020.4848-100000@busboy.sped.ukans.edu>
Message-ID: <36FD8DF8.C6453A88@bc.edu>

Justin Bradford wrote:
> 
> > So each locus would be the same: just a shell that can process our XML
> > (workflow + bio + GUI) and make an application on the fly.
> >
> > Any thoughts?  What would be the advantages and disadvantages?
> 
> The only problem I see is one of speed. However, if we had widgets along
> the lines of render_3d_molecule, etc, it could work.

Yes, I was thinking about making high-level widgets (bindings to C-GTK
concoctions).  In fact, I have always considered Loci to be a "library" of
high-level biowidgets.

But I would make the basic widgets available too, probably through the
Python-GTK bindings.

I don't think it would be all that slow.  The tools/loci will be in Python
anyway.  As Python works, modules are "imported" from other files.  The script
would be in the LocusML like this:

<guiscript>
#!/usr/bin/env python

import sys
from Gtkinter import *
import GtkExtra

class Application:

From bizzaro at bc.edu  Sat Mar 27 21:08:44 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] an interesting thought
References: <Pine.OSF.4.03.9903270334020.4848-100000@busboy.sped.ukans.edu> <36FD8DF8.C6453A88@bc.edu>
Message-ID: <36FD8F2C.CEC4671A@bc.edu>

Hehehe.  The marker I put in at the end of my last message: "/guiscript" cut off
the message.

Here is the rest...

and perhaps the script is read in by the locus and then written to a temporary
file.  Once in that file, the locus can "import" the module as if it were always
available.

> 
> You don't mean the analysis tool generates the script, though, right?
> 
Not from scratch.  But the analysis tool can pick out a script from a
library/repository that is appropriate for the biodata in the XML.

IOW, we are looking at "browsing" data that contains its own instructions for
display and manipulation.  It may be somewhat akin to Javascript in an HTML
doc.  I think that's the best way to look at it.

But what does the user gain?  I think that, just as the original plan for Loci
was to allow the user to access analysis tools without having them all on the
user's machine, this would allow the user to access graphical tools the same
way.

What I think may go along with this very well (in fact may be required), is a
GUI builder (or Locus builder), sort of like a specialized Delphi.  This can
help developers create custom loci without having to get into too much of the
Python and LocusML.

Maybe this is very complicated, but I think it is the direction software
development is heading.  If we don't go this way now, we may see AOL Navigator
do the same stuff in 10 years.


Ciao,
Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

Studies show that 93% of all people are below average.
--

From bizzaro at bc.edu  Sun Mar 28 04:08:51 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:25 2006
Subject: [Pipet Devel] an interesting thought
References: <93307F07DE63D211B2F30000F808E9E525D739@edunivexch02.umassmed.edu>
Message-ID: <36FDF1A3.53E95E0E@bc.edu>

"Lapointe, David" wrote:
> 
> This has been puzzling me for a while. What with the emphasis
> toward using Python as a base language for Loci, what about Grail as a
> browser/GUI ? I haven't tried this but apparently python scripts can be
> downloaded and client side executed.

An excellent point!  I have seen Grail before, and looking at it again, it does
do what I was talking about.  It would be a good excercise for everyone to
download it:

    http://grail.cnri.reston.va.us/grail/source/

Grail is able to run Python-Tk scripts in the browser window.  Check out these
demos:

    http://grail.cnri.reston.va.us/grail/demo/

Yes, Grail is 100% Python (with Tk) and a good model for Loci, with a couple
exceptions...

    (1) It uses Tkinter rather than GTK.
    (2) It is not GNU LGPL or GPL (but it is free and open source).

> <diatribe issue="What about security?"/>

Yes, this brings up the old Trojan Horse issue.  Grail seems to use the standard
"sandbox", and can be a model for Loci in security too.  But they're not
altogether certain about security either:

    http://grail.cnri.reston.va.us/grail/info/security.html

> 
> Also, at the BAMBCT meeting Thursday, I brought up the issue of using Linux
> to do Computational Biology. Lance thought a June meeting Show 'n Tell would
> be interesting. There are *many* scientists who are using Linux scattered
> around the Boston area.
> 
So, maybe I can show Loci if anything works by then ;-)

But of course Loci will be made to run on all flavors of UNIX, if we can help
it.


Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

Studies show that 93% of all people are below average.
--

From bizzaro at bc.edu  Mon Mar 29 08:05:15 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] Open Labs
Message-ID: <36FF7A8B.B5BC430E@bc.edu>

Locians,

I uploaded the start of the new Web site for "Open Labs" and Loci:

    http://129.63.144.25/

Also, we will probably go with the name "Open Labs".  I have been communicating
with the person using openlab.org.  He is creating on an organization for
open-source development, and I think there may be some confusion if we used
"Open Lab".  The plural form works well for us, since I was planning on having
several "labs", each with a different project.  The new server will likely be
named:

    openlabs.uml.edu


Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

Studies show that 93% of all people are below average.
--

From David.Lapointe at umassmed.edu  Mon Mar 29 10:23:08 1999
From: David.Lapointe at umassmed.edu (Lapointe, David)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] an interesting thought
Message-ID: <93307F07DE63D211B2F30000F808E9E525D73A@edunivexch02.umassmed.edu>

What about Grail? That would allow embedded python scripts. Has anyone tried
that?

David

> -----Original Message-----
> From: J.W. Bizzaro [mailto:bizzaro@bc.edu]
> Sent: Friday, March 26, 1999 5:54 PM
> To: tulip-list
> Subject: [Pipet Devel] an interesting thought
> 
> 
> Locians,
> 
> It's strange that I just came across this article:
> 
>   
> http://www.news.com/News/Item/0%2C4%2C34314%2C00.html?dd.ne.tx
> t.0326.04
> 
> Strange, because a couple days ago I was thinking about the 
> future of GUI
> development and the role of XML and the Internet.  I thought 
> that the Web
> browser of today may some day become so customizable that it 
> will be a portable
> GUI toolkit, with an Internet backbone.
> 
> But this is also the direction Loci is heading.
> 
> The article talks about using XUL (an XML) to provide the 
> browser with GUI
> information (buttons, etc.).  The problems are, (1) XUL would 
> require an
> enormous and complex DTD, and (2) the browser would need all 
> of the widgets
> built-in, ready to be called upon at runtime.
> 
> I realized these would be insurmountable problems for Loci, 
> if it were to go
> this route, simply because Loci is not the scale of Mozilla.
> 
> But what if the GUI information for Loci were included in the 
> XML, as with XUL,
> but
> 
>                       >>>as a Python-GTK script<<<
> 
> Yes, a functional program/module embedded in the XML.  Has 
> anyone heard of this
> being done before?  Try that with compiled binaries!
> 
> So each locus would be the same: just a shell that can 
> process our XML (workflow
> + bio + GUI) and make an application on the fly.
> 
> Any thoughts?  What would be the advantages and disadvantages?
> 
> 
> Jeff
> bizzaro@bc.edu
> 

From David.Lapointe at umassmed.edu  Mon Mar 29 10:34:48 1999
From: David.Lapointe at umassmed.edu (Lapointe, David)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] an interesting thought
Message-ID: <93307F07DE63D211B2F30000F808E9E525D73B@edunivexch02.umassmed.edu>

Has anyone benchmarked GTK+ ? At least for our purposes? Eric Harlow has
some cautionary things to say about real time performance ( in the games
chapter ).


> -----Original Message-----
> From: J.W. Bizzaro [mailto:bizzaro@bc.edu]
> Sent: Saturday, March 27, 1999 9:04 PM
> To: tulip-list@busboy.sped.ukans.edu
> Subject: Re: [Pipet Devel] an interesting thought
> 
> 
> Justin Bradford wrote:
> > 
> > > So each locus would be the same: just a shell that can 
> process our XML
> > > (workflow + bio + GUI) and make an application on the fly.
> > >
> > > Any thoughts?  What would be the advantages and disadvantages?
> > 
> > The only problem I see is one of speed. However, if we had 
> widgets along
> > the lines of render_3d_molecule, etc, it could work.
> 
> Yes, I was thinking about making high-level widgets (bindings to C-GTK
> concoctions).  In fact, I have always considered Loci to be a 
> "library" of
> high-level biowidgets.
> 
> But I would make the basic widgets available too, probably through the
> Python-GTK bindings.
> 
> I don't think it would be all that slow.  The tools/loci will 
> be in Python
> anyway.  As Python works, modules are "imported" from other 
> files.  The script
> would be in the LocusML like this:
> 
> <guiscript>
> #!/usr/bin/env python
> 
> import sys
> from Gtkinter import *
> import GtkExtra
> 
> class Application:
> 

From hortiz at neurobio.upr.clu.edu  Mon Mar 29 12:27:24 1999
From: hortiz at neurobio.upr.clu.edu (Humberto Ortiz Zuazaga)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] I'd like to contribute.
Message-ID: <199903291727.NAA01917@chimbo.neurobio.upr.clu.edu>

Hello, I'm Humberto Ortiz-Zuazaga, I work with the University of Puerto Rico's 
Institute of Neurobiology:

http://www.neurobio.upr.clu.edu/

I've been given the go ahead to develop sequence analysis tools for a 
molecular biology core facility being set up here.

I'd like to contribute to the Loci project, as it looks like you've designed 
the very system I'd like to create (by the way, where is the design document?  
I read it friday night, but it's not on the new web site yet).

I've got a couple of years experience developing sequence analysis and genetic 
mapping tools, some of which are available on the web:

http://www-bio.cnnet.clu.edu/analysis/ (when it's up)
http://www.neurobio.upr.clu.edu/~hortiz/cmb/tkmap/
http://www.neurobio.upr.clu.edu/~hortiz/cmb/bpe/

I've dabbled a little in python, and done no gnome programming, but I agree 
with both these choices for a analysis GUI.  I've gotten the python-gnome 
package set up on my machine, and looked over the mailing list archives (and 
subscribed).

I look forward to contributing.

-- 
Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz@neurobio.upr.clu.edu


From rahul at photino.sid.rice.edu  Mon Mar 29 15:35:51 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] an interesting thought
In-Reply-To: <93307F07DE63D211B2F30000F808E9E525D73B@edunivexch02.umassmed.edu>
Message-ID: <Pine.LNX.4.10.9903291434520.7830-100000@photino.sid.rice.edu>

On Mon, 29 Mar 1999, Lapointe, David wrote:

> Has anyone benchmarked GTK+ ? At least for our purposes? Eric Harlow has
> some cautionary things to say about real time performance ( in the games
> chapter ).
> 

I use GNOME all the time. It's reasonably fast on a P150/32MB RAM as long
as you're using a new version.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.

From bizzaro at bc.edu  Mon Mar 29 21:22:05 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] I'd like to contribute.
References: <199903291727.NAA01917@chimbo.neurobio.upr.clu.edu>
Message-ID: <3700354D.3D6B7A70@bc.edu>

Hello Humberto!

Humberto Ortiz Zuazaga wrote:

> I've been given the go ahead to develop sequence analysis tools for a
> molecular biology core facility being set up here.
> 
> I'd like to contribute to the Loci project, as it looks like you've designed
> the very system I'd like to create

We'd love to have you help.  We want to develop both a system for networking
biotools and a set of basic biotools (loci).

> (by the way, where is the design document?
> I read it friday night, but it's not on the new web site yet).

I don't have anything called a "design document".  It could have been a few
things.  Did you see it on the mailing list archive or the old Web site?

The new site is just starting to come together, so it is missing quite a bit. 
But as far as an actual design is concerned, it has been changing lately, and
the best way to tell what we're going to do (until the Web site is ready) is to
read the mailing list archive:

    http://toaster.sped.ukans.edu/tulip-list/

> 
> I've got a couple of years experience developing sequence analysis and genetic
> mapping tools, some of which are available on the web:
> 
> http://www-bio.cnnet.clu.edu/analysis/ (when it's up)
> http://www.neurobio.upr.clu.edu/~hortiz/cmb/tkmap/
> http://www.neurobio.upr.clu.edu/~hortiz/cmb/bpe/

I took a look at the sites.  You seem very capable.  I like the "TkMap", since
we will want a genome map viewer, and the sequence viewers will show a map as
well.

> 
> I've dabbled a little in python, and done no gnome programming, but I agree
> with both these choices for a analysis GUI.  I've gotten the python-gnome
> package set up on my machine, and looked over the mailing list archives (and
> subscribed).

You may want to check out my PyG Tools Web site:

    http://www.uml.edu/Dept/Chem/BICGroup/PyGTools/

> 
> I look forward to contributing.
> 

Great!

I will make an account for you on our Linux box:

    129.63.144.25

And I will soon send out a list of loci that need developers.


Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From hortiz at neurobio.upr.clu.edu  Mon Mar 29 22:14:11 1999
From: hortiz at neurobio.upr.clu.edu (Humberto Ortiz Zuazaga)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] Old design document.
In-Reply-To: Your message of "Tue, 30 Mar 1999 02:22:05 GMT."
             <3700354D.3D6B7A70@bc.edu> 
Message-ID: <199903300314.XAA03604@chimbo.neurobio.upr.clu.edu>

> > (by the way, where is the design document?
> > I read it friday night, but it's not on the new web site yet).
> 
> I don't have anything called a "design document".  It could have been a few
> things.  Did you see it on the mailing list archive or the old Web site?

Old web site.  A page describing how loci would use XML to pass
results back and forth.  Ive been reading the list archives, so I see
most of the design has changed (mostly for the better).

> But as far as an actual design is concerned, it has been changing lately

I've noticed, I'll work from the glossary then.  You should put that
up on the web site.

From bizzaro at bc.edu  Mon Mar 29 22:45:00 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] Old design document.
References: <199903300314.XAA03604@chimbo.neurobio.upr.clu.edu>
Message-ID: <370048BC.E07E1259@bc.edu>

Humberto Ortiz Zuazaga wrote:
> 
> > > (by the way, where is the design document?
> > > I read it friday night, but it's not on the new web site yet).
> >
> > I don't have anything called a "design document".  It could have been a few
> > things.  Did you see it on the mailing list archive or the old Web site?
> 
> Old web site.  A page describing how loci would use XML to pass
> results back and forth.  Ive been reading the list archives, so I see
> most of the design has changed (mostly for the better).

Okay.  That page is gone and pretty out of date.  I have a copy of it, if you'd
like to see it again.

> 
> > But as far as an actual design is concerned, it has been changing lately
> 
> I've noticed, I'll work from the glossary then.  You should put that
> up on the web site.

Yep.  That's going up soon.


Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From bizzaro at bc.edu  Tue Mar 30 00:43:12 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] new stuff up
Message-ID: <37006470.39DFE0B2@bc.edu>

Locians,

I just posted an updated glossary.  Note that the biggest changes from the last
version are (1) tools == loci == clients (there is no distinction now) and (2) I
added the "FigureBuilder".

    http://129.63.144.25/loci/docs/gloss.html

Also, I posted some developer bios.  Let me know if you have anything to
change.  I don't have any bio for Greg...Hey Greg!

    http://129.63.144.25/loci/devel.html


Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From bizzaro at bc.edu  Tue Mar 30 01:26:26 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] Open Labs
References: <36FF7A8B.B5BC430E@bc.edu>
Message-ID: <37006E92.7B61EA4@bc.edu>

Nah!  The heck with it.  I like "The Open Lab".  So the server will be...

    bioinformatics.org

And I will register...

    theopenlab.org
    theopenlab.net


Jeff


"J.W. Bizzaro" wrote:
> 
> Locians,
> 
> I uploaded the start of the new Web site for "Open Labs" and Loci:
> 
>     http://129.63.144.25/
> 
> Also, we will probably go with the name "Open Labs".  I have been communicating
> with the person using openlab.org.  He is creating on an organization for
> open-source development, and I think there may be some confusion if we used
> "Open Lab".  The plural form works well for us, since I was planning on having
> several "labs", each with a different project.  The new server will likely be
> named:
> 
>     openlabs.uml.edu
> 
> Jeff
> --
> J.W. Bizzaro                  mailto:bizzaro@bc.edu
> Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/
> 
> Studies show that 93% of all people are below average.
> --

-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From justin at ukans.edu  Tue Mar 30 19:41:52 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] SooHaeng joins
In-Reply-To: <37017261.ED400ECB@bc.edu>
Message-ID: <Pine.OSF.4.03.9903301841010.15365-100000@busboy.sped.ukans.edu>

> Although he is using mesa right now, while you said you are using
> OpenGL.  Are there any compatibilty problems?

Mesa is an OpenGL compatible library, so theoretically, there shouldn't
be any compatibility problems ;)

Justin Bradford
justin@ukans.edu


From bizzaro at bc.edu  Tue Mar 30 19:54:57 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] SooHaeng joins
Message-ID: <37017261.ED400ECB@bc.edu>

I recruited SooHaeng (and I guess his "partner"), who I found were in the middle
of developing an MD modeler with GTK:

    http://dnd98.freeservers.com/

Greg, I think that SooHaeng and his program will compliment your development of
a rendering engine.  Perhaps you can help add different representations (cartoon
views, etc.).  Although he is using mesa right now, while you said you are using
OpenGL.  Are there any compatibilty problems?

SooHaeng's e-mail is attached.


Jeff
-------------- next part --------------
An embedded message was scrubbed...
From: yoo@theoalpha.korea.ac.kr
Subject: Thank you.
Date: Mon, 29 Mar 1999 13:26:19 +0900
Size: 1400
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990331/930a36ce/attachment.mht
From bizzaro at bc.edu  Wed Mar 31 04:33:54 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] an interesting thought
References: <93307F07DE63D211B2F30000F808E9E525D73B@edunivexch02.umassmed.edu>
Message-ID: <3701EC02.8C762EF7@bc.edu>

"Lapointe, David" wrote:
> 
> Has anyone benchmarked GTK+ ? At least for our purposes? Eric Harlow has
> some cautionary things to say about real time performance ( in the games
> chapter ).
> 

Just qualitatively, it's one of the fastest GUI's for Linux I've seen.  And
PyGTK beats Tkinter hands down, AFAIC.


Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From bizzaro at bc.edu  Wed Mar 31 06:56:07 1999
From: bizzaro at bc.edu (J.W. Bizzaro)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] new glossary
Message-ID: <37020D57.4181F4FB@bc.edu>

A new glossary is up:

    http://129.63.144.25/loci/docs/gloss.html

I added a number of things we've been talking about lately:

    Locus PhyE     -     Phylogenic Editor

    Locus PhyV     -     Phylogenic Viewer

    Techie         -     (as in "lab technician") the name for the daemon
                         that builds a database of what loci are available
                         and what they can do

    Locus IAB & CAB  -   the application brokers (IAB == Gatekeeper)

Also, thinking about the relationship between viewers and editors, I think we
need to make it clear that editors are attached to or dependent on viewers.  For
example, if the user wants to edit a nucleotide sequence, the Benchtop doesn't
call up a sequence editor, it calls up a sequence viewer, and the viewer calls
the editor.

Why is this important?  The viewer manages the display of all the biological
data, so the user should be able to see changes affected by the editor.  This
allows the editor to be simpler.  Also, the editor can do without the code to
manage workflow, since it will only communicate with the viewer.  Make sense?


Cheers,
Jeff
-- 
J.W. Bizzaro                  mailto:bizzaro@bc.edu
Boston College Chemistry      http://www.uml.edu/Dept/Chem/Bizzaro/

I have always appreciated your ability to ________, whenever
there has been a blank to fill.
--

From rahul at photino.sid.rice.edu  Wed Mar 31 14:23:57 1999
From: rahul at photino.sid.rice.edu (Rahul Jain)
Date: Fri Feb 10 19:18:26 2006
Subject: [Pipet Devel] an interesting thought
In-Reply-To: <3701EC02.8C762EF7@bc.edu>
Message-ID: <Pine.LNX.4.10.9903311320300.1418-100000@photino.sid.rice.edu>

On Wed, 31 Mar 1999, J.W. Bizzaro wrote:

> Just qualitatively, it's one of the fastest GUI's for Linux I've seen.  And
> PyGTK beats Tkinter hands down, AFAIC.

Absolutely. The real problem w/ Tk is that it is layered on top of so many
other toolkits. GTK talks to Glib which talks directly to the X server.

Although it's not that fast with the pixmap theme... :)

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.