From justin at ukans.edu  Wed Dec  1 00:30:15 1999
From: justin at ukans.edu (Justin Bradford)
Date: Fri Feb 10 19:18:59 2006
Subject: [Pipet Devel] python-bonobo (was Re: desktop as...)
In-Reply-To: <Pine.OSF.4.03.9911302101180.14297-100000@busboy.sped.ukans.edu>
Message-ID: <Pine.OSF.4.03.9911302237480.14297-100000@busboy.sped.ukans.edu>

> It is my understanding that while bonobo uses ORBit, it is just a library.
> If I am correct, non-elegant bindings should be relatively easy. By
> "non-elegant", I mean it the Python interface would feel more C-ish than
> Python-ish.

Ok, upon reviewing the documentation, I was way off before. However, one
could still contain all of the ORBit code within the C stubs, and have
only the Embeddable, Container, View, ViewFrame, ClientSite, etc object
interfaces exposed in Python. So, ORBit bindings would not be necessary,
and it would make components/containers pretty easy to write in Python.

Anyone know if James Henstridge has started/implemented any of the bonobo
stuff?

Justin


From bizzaro at geoserve.net  Wed Dec  1 02:26:02 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] Databases/Languages (was New guy speaks up)
References: <l03130301b469420087ec@[172.16.0.2]> <l03130307b46a371421a1@[172.16.0.2]>
Message-ID: <3844CD8A.51BB0253@geoserve.net>

Brad Chapman wrote:
> 
> Gary Van Domselaar wrote:
> > Loci as a Graphical Shell/ Graphical
> >Scripting language with a database 'locus', but the actual database, and
> >data model used to store the sequence data (and annotations) would be an
> >'option' depending on what the developers have provided for loci.  so
> >there may be a relational database, an object database etc.,
>
>         Based on this thinking and the 'plug-in' idea mentioned so often,
> why not implement a database as a loci? (maybe this could be some kind of
> derivative of the container?). This way, the user can use or not use a
> database depending on their work with Loci.

I think you said just what Gary did: a container is a locus that is a type of
a database.

>         For instance, if I am using Loci to pull single files from the PDB
> and pipe them into RasMol for viewing, it would be really stupid to have an
> intermediate step where I stick a single object into a database. By
> contrast, if I am parsing the current UniGene text file (100MB), it  is
> crazy to not have some structured way to store this. I mean, I would be
> none too happy a user if I watched my computer spend an hour parsing a huge
> document and another several BLASTing the results and then had the computer
> crash losing all of my data. A 'plug-in' database loci could serve as the
> storage for huge data files--allowing data backup and easy access to the
> important parts of the data (sequences in this case). Is this the kind of
> plan everyone was thinking of?

Yep, you got it!  The plan is, you can have any sort of intermediate between
data and processor, depending on the needs of the processor (and your needs,
if you're developing new extensions for Loci).

> I agree completely. A database is still a data intermediate, but I think it
> at least has the advantages of: 1) being readily storable 2) allowing
> specific parts of the data to be individually queried. 3) being flexible
> enough to allow a wide variety of data types to be stored without data loss.

Those are some good points we can use for our documentation.

> Gary Van Domselaar wrote:
> >Loci's own database requirements may not be so
> >much for sequence storage as much as it is for things like the container
> >locus, which is a queriable locus that contains other loci.
> 
> One thing I'm not clear about--how does all this relate to the idea of the
> container and storing loci? I guess I'm not clear about exactly what it
> means to store a converter loci, for instance? Even more confusing to me,
> how do you query a container locus? How does this relate with storing
> actual data? Confusion, confusion over here!

You simply have to define the word 'database' rather broadly.  Literally
ANYTHING that can store information in a queriable fashion is a 'database',
for our purposes.  But we'll just use the word 'container' to keep the
language lawyers at bay.

Does a filesystem 'store information in a queriable fashion'?  Yes.  Now this
is where things get interesting: In Loci, you can open a container that
represents a filesystem directory.  Since loci (data, programs, etc.) are
individual files in a directory, they can be thought of as being stored in a
container.  Subdirectories are then container loci within container loci.  BUT
KEEP IN MIND: This is ONE type of container.  Not all container loci represent
filesystem directories.

> I will agree that I would rather not mess around with two languages/two
> interpreters and all of that jazz. No fun! However there are a number of
> things that are implemented in perl currently that are not available in
> python. For instance, the bioperl modules. Although the biopython project
> is dealing with building the same functionalities in python they currently
> have no code (and the list has been relatively silent!). In addition, a
> number of excellent programmers are coding in perl and so there are a lot
> of good scripts/code available. Should all perl scripts either: a) be run
> through CORBA to be used with Loci? or b) have to be reimplemented in
> python to be used with Loci?

This is something we can't expect for ANY program ported to Loci: to be
modified, whether it is by making it use CORBA or by translating it to
Python.  The only things that should require compliance with a Loci
specification for interoperability are (1) GUI widgets, (2) programs that need
direct access or control of Loci internals, (3) wrappers.  THESE will use
CORBA (especially if not written in Python) and/or Python.

> For example, how are we planning on connecting
> to AceDB servers--rewriting AcePerl or running it through CORBA? I think
> our disadvantage if we try and rewrite everything is that we can't take
> full advantage of work done in other languages and have to work really hard
> to keep the python implementation "up to date" with the perl. In addition,
> there is an unhealthy competition between perl and python for programmer
> time, regardless of which is a "better" language.

As I mention above, all this will be done via a large variety of wrapppers. 
But for the most part, if it runs on the command line, the Workspace will
allow the user to make his/her own wrappers.  See an earlier post to the list
about 'constructing the command line' or something like that.  It should be
part of the documentation.  Let me know if you can't find it in the archives.

> I mention above the kind of things I had in mind. Specifically, where is
> the point where it takes more effort to connect things with CORBA then it
> does to rewrite them in python? What should be rewritten and what should be
> connected? From what I've read, gnome development (which you all seem to be
> wisely following closely!) seems to do a good job of making CORBA tie a lot
> together, but I'm not completely positive how this all can translate to
> Loci. Once again, confusion is overwhealming me!

Just remember: CORBA and/or Python for

    (1) Widgets
    (2) Low-level customization of Loci
    (3) Wrappers

Gary described this as 'middleware', which is a good way to think of it.

    Front-endware: Loci's Workspace/GUI
    Middleware: CORBA and/or Python for points mentioned above
    Back-endware: Bioinformatics apps, unmodified


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Wed Dec  1 02:58:31 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] OMG approves new, revised CORBA specs
Message-ID: <3844D527.26ED8A74@geoserve.net>

FYI:

  http://news.cnet.com/news/0-1003-200-1472911.html


Jeff


From bizzaro at geoserve.net  Wed Dec  1 03:40:08 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] python praise
Message-ID: <3844DEE8.9BB875F6@geoserve.net>

Bruce Eckel, who has written articles and books on Java and C++, has some
surprisingly nice things to say about Python in his article at Borland.com:

--------------------------
Where does Python fit? Everywhere else. It's both a
programming language and a scripting language, but it's very
nicely object-oriented from the ground up, easy to learn and
use. In fact, I think it could be the ideal beginner's language.
You can write command-line programs and GUI programs.
You can write programs to test your design, then re-code the
programs in C++ or Java once you've gotten the kinks out. But
to me the key is productivity. I seem to be able to develop
programs 10 times faster than in C++ or Java, and for that
reason I'm willing to write programs in Python that I wouldn't
trouble myself with in other languages, simply because using
those languages would take too long. Although many programs
for Linux will be written in Java or C++, there will be lots of
smaller solutions as well because of Python. Perl, Tcl/TK, and
Rebol will also be used, but I don't think those languages
scale as well as Python. Nor is the code they produce as
maintainable, which means they won't be as heavily used in
the end. 

http://community.borland.com/devnews/article/1,1714,20173,00.html
--------------------------

Bruce's publications:

--------------------------
Since 1986, Bruce Eckel (www.BruceEckel.com) has published over 150
computer articles and 6 books, four of which were on C++, and
given hundreds of lectures and seminars throughout the world. He is
the author of Thinking in Java (Prentice-Hall 1998, freely available
at www.BruceEckel.com; 2nd edition in progress on the Web site),
the Hands-On Java Seminar CD ROM (available at
www.BruceEckel.com), Thinking in C++ (Prentice-Hall, 1995; 2nd
edition in progress on the Web site), C++ Inside & Out
(Osborne/McGraw-Hill 1993; the 2nd edition of Using C++,
Osborne/McGraw-Hill 1989) and was the editor of the anthology
Black Belt C++ (M&T/Holt 1994). He was a founding member of the
ANSI/ISO C++ committee. He speaks regularly at conferences and
is the track chair for both C++ and Java at the Software
Development conference.
--------------------------


Jeff


From bizzaro at geoserve.net  Wed Dec  1 03:53:53 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line]
Message-ID: <3844E221.E114B259@geoserve.net>

Just for kicks, I'm reposting my June message about 'constructing the
command-line' (well, and because I mentioned it to Brad).  Note that I refer
to 'our own' XML for bioinformatics + Loci internals: LocusML.  The plans for
a LocusML have changed a bit since then.   Jeff
-------------- next part --------------
An embedded message was scrubbed...
From: "J.W. Bizzaro" <bizzaro@bc.edu>
Subject: [Pipet Devel] constructing the command-line
Date: Sun, 06 Jun 1999 12:21:00 +0000
Size: 6694
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19991201/bf9ec5da/attachment.mht
From dlapointe at mediaone.net  Wed Dec  1 06:53:25 1999
From: dlapointe at mediaone.net (David Lapointe)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line]
In-Reply-To: <3844E221.E114B259@geoserve.net>
References: <3844E221.E114B259@geoserve.net>
Message-ID: <99120107414600.00536@gnomen>

On Wed, 01 Dec 1999, J.W. Bizzaro wrote:
> 
> Just for kicks, I'm reposting my June message about 'constructing the
> command-line' (well, and because I mentioned it to Brad).  Note that I refer
> to 'our own' XML for bioinformatics + Loci internals: LocusML.  The plans for
> a LocusML have changed a bit since then.   Jeff

I am glad Jeff reposted this.  I have been creating perl CGI interfaces to EMBOSS programs. 
I was writing to Jeff about this and how it would be great to parse the *.acd files for each program ( these define the 
input and output data types, which are required, the data ranges, etc) into a GUI interface. This might be similar to 
GDE but Glade seems very promising. Alternatively, for a loci interface, parsing the *.acd files might generate 
a series of linked loci.  One hassle with doing this is the acd interface will change, incrementally ( see below).

As an aside on the internal data representation,  you could either have one or not, similar to what Brad just 
mentioned about  using databases. Personally I think  format conversions are too lossy wrt  annotations. Also, short 
of rewriting (almost) every application outside of loci, you would need to deal with format conversions at some point.  

The EMBOSS list has interesting thread going about protein sequences with very high ATCG content, so they must 
be forced to protein type otherwise the program thinks they are nucleic acids. The issue is adding a new 
flag for this forcing, what will be the flags name.  The diversity of opinion on this issue is heartening.  BLAST for example
does this up front. You have to tell the program what type you have. Other programs tag sequences at the top with their type,
but that would involve changing the databases, to create a new data format, like FBF.

-- 
 .david
 David Lapointe
"The meek will inherit the earth," noted tycoon 
J. Paul Getty. "But not the mineral rights."


From bizzaro at geoserve.net  Wed Dec  1 13:09:40 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line]
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen>
Message-ID: <38456464.56B2E0C0@geoserve.net>

David Lapointe wrote:
> 
> I am glad Jeff reposted this.

And I'm glad that you're glad.

> I have been creating perl CGI interfaces to EMBOSS programs.
> I was writing to Jeff about this and how it would be great
> to parse the *.acd files for each program ( these define the
> input and output data types, which are required, the data
> ranges, etc) into a GUI interface. This might be similar to
> GDE but Glade seems very promising. Alternatively, for a loci
> interface, parsing the *.acd files might generate
> a series of linked loci.

...which can be combined into one composite locus.

> One hassle with doing this is the
> acd interface will change, incrementally ( see below).

Will it change because the entire interface is still under development, or
because individual programs will require changes to their *.acd files?

> As an aside on the internal data representation,  you could
> either have one or not, similar to what Brad just
> mentioned about  using databases. Personally I think  format
> conversions are too lossy wrt  annotations. Also, short
> of rewriting (almost) every application outside of loci, you
> would need to deal with format conversions at some point.

Again, we can promote something as our 'preferred format' and use it as an
intermediate in format conversions.  Just because we don't hard-code a data
format into Loci, it doesn't mean we can't push for some new standard.  I've
heard some interesting ideas for a universal bioinformatics XML.  Peter
Murray-Rust even started a mailing list to promote the development of an
_open_ standard for such a beast.  But the list now seems dead.  If some Lab
Rats want to start an effort here, I'm all for it.

> The EMBOSS list has interesting thread going about protein
> sequences with very high ATCG content, so they must
> be forced to protein type otherwise the program thinks they
> are nucleic acids. The issue is adding a new flag for this
> forcing, what will be the flags name.  The diversity of
> opinion on this issue is heartening.  BLAST for example
> does this up front. You have to tell the program what type you
> have. Other programs tag sequences at the top with their type,
> but that would involve changing the databases, to create a new
> data format, like FBF.

Yeah, I've been following the EMBOSS list.  It's funny that some programs
'assume' you are using a certain type of data.  And the same goes for data
formats.  How hard is it to have one word to say what it is you're dealing
with?

    <dna>
      GCATAAGCATGCAGATC
    </dna>

    <protein>
      ACGATCATCAGCATCAG
    </protein>

I had a problem like this with GenBank once.  You might think GenBank has all
the descriptors needed to annotate a nucleotide sequence.  But...hmmm...where
did that DNA come from anyway?  The nucleus?  The mitochondria?  The
chloroplasts?  There's no descriptor for that!!!


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From dlapointe at mediaone.net  Wed Dec  1 20:44:26 1999
From: dlapointe at mediaone.net (David Lapointe)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line]
In-Reply-To: <38456464.56B2E0C0@geoserve.net>
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net>
Message-ID: <99120122172000.00534@gnomen>

> > Alternatively, for a loci
> > interface, parsing the *.acd files might generate
> > a series of linked loci.
> 
> ...which can be combined into one composite locus.
> 

Yes, that would be the idea. The composite locus would encapsulate the I/O and 
parameters.   

> > One hassle with doing this is the
> > acd interface will change, incrementally ( see below).
> 
> Will it change because the entire interface is still under development, or
> because individual programs will require changes to their *.acd files?

Well, it seems like anything  that has a version 0.0.4 will change 8-).  But I would imagine
that before all is said and done it will be different.  I am not doing justice to the acd scheme, perhaps 
because I am trying to use it in a different way.  

> Again, we can promote something as our 'preferred format' and use it as an
> intermediate in format conversions.  Just because we don't hard-code a data
> format into Loci, it doesn't mean we can't push for some new standard.  I've
> heard some interesting ideas for a universal bioinformatics XML.  Peter
> Murray-Rust even started a mailing list to promote the development of an
> _open_ standard for such a beast.  But the list now seems dead.  If some Lab
> Rats want to start an effort here, I'm all for it.

I think there are two things here to consider. First, if you are going from genbank to 
fasta, why have an intermediate format?  Second, if you were going to write de novo 
some analysis program to work with loci, what format would you use? If you could settle on that,
that would be the internal format, which might not be a format at all but rather a sequence object.

> Yeah, I've been following the EMBOSS list.  It's funny that some programs
> 'assume' you are using a certain type of data.  And the same goes for data
> formats.  How hard is it to have one word to say what it is you're dealing
> with?

Some programs work with both Nucs and Prots,  FASTA, BLAST, CLUSTAL to name a few.  I think historically
someone thought it was a good idea to consider sequences with >80% AT(U)CG as nucleic acids, of course
that has problems right away, just like 99 0r 88.
 
>     <dna>
>       GCATAAGCATGCAGATC
>     </dna>
> 
>     <protein>
>       ACGATCATCAGCATCAG
>     </protein>
> 

Heh heh or ATCGRTSNRYTACG.

> I had a problem like this with GenBank once.  You might think GenBank has all
> the descriptors needed to annotate a nucleotide sequence.  But...hmmm...where
> did that DNA come from anyway?  The nucleus?  The mitochondria?  The
> chloroplasts?  There's no descriptor for that!!!

There is but someone has to annotate that section.  Check out Sequin on the NCBI site. There is a section for location
of the sequence ( genomic, mitochondrial, ...).  Or check out seq.asn  in the NCBI toolkit.      


> 
> Cheers.
> Jeff

-- 
 .david
 David Lapointe
"Hokey religions and ancient weapons are no 
match for a good blaster at your side, kid,"


From stein at fmppr.fmnh.org  Thu Dec  2 11:03:37 1999
From: stein at fmppr.fmnh.org (J. Steinbachs)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] databases
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen>
Message-ID: <38469859.2F2F8185@fmnh.org>

 
Hey all...

I went to an interesting seminar yesterday at U of Chicago.  Susan
Davidson (co-director, Center for Bioinformatics, UPenn) gave a talk on
"Refreshing the Tower of Babel." 

Caveat: I know very little about databases.

The application: EpoDB, a database created at UPenn Center for
Bioinformatics, designed to study gene regulation during differentiation
and development of vertebrate red blood cells.

The problems:  extracting data from a sorts of databases with different
underlying structures; cleansing the data (error removal); integration;
annotation; updating (particularly, updating without losing the
information added/removed during data cleansing).

I guess Susan is a strong proponent in the DB field for complex value
databases (blah blah blah ginger... don't ask me what those are). 
However, for this problem, she and her colleagues have chosen to use
XML, modifying it a bit into something they call WHAX.

The data can be represented as a "WHAX tree", with the tag representing
the branches and the tag value representing the node.  Additions to the
a subset of the data can be integrated into the larger database by
simple manipulations of WHAX trees.

I originally went because of the application to genetic data.  But then
I got sidetracked...  Here at the Museum, we have specimen data (21+
million specimens in total) in which species names change, higher
taxonomic information changes, and so on, all of which should be tracked
within the database.  In some cases, we are integrating the traditional
genetic data into our specimen databases; i.e., in newer portions of our
collection of specimens, we have a one-to-one correspondence between the
dead dried pressed plant (or the stuffed animal and corresponding
skeleton), the DNA extracted from said plant (or animal), and a record
in our developing databases (birds are separate from plants are separate
from fishes...).  The computer scientists were intrigued by this type of
data :)  This WHAX "thing" would be perfect for tracking all that
information.

Perhaps "bioinformatics" is currently too narrowly defined (organisms
have more characteristics about them than just their DNA).  If we, the
community of manipulators of biological data, do come up with an open
standard for representing said data, that standard should be flexible
enough to encompass all the characteristics about the organisms.  And,
in light of all the stupid patenting going on, perhaps an open standard
is needed before big bad multinational corporation patents it first.

Just a few thoughts...
-jennifer

--------------------------
J. Steinbachs, PhD
Computational Biologist
Dept of Botany
The Field Museum
Chicago, IL 60605-2496

office: 312-665-7810
fax: 312-665-7158
--------------------------


From mangalam at home.com  Thu Dec  2 13:44:11 1999
From: mangalam at home.com (Harry Mangalam)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] databases
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org>
Message-ID: <3846BDFB.8AA9F4B9@home.com>

That's interesting.  One of the things I'm working on (just about the only
thing I'm working on, it seems) is a gene expression database that will
support multiple species as well as multiple technologies (glasss
microarrays, Affy chips, AFLP, SAGE, etc).  As you might imagine, it;s one
thing to store multiple species information; it's quite another ot try to
make it interpretable and queriable and expect to get anything sensible
back.

We're currently using the NCBI taxonomy tree and looking forward to seeing
more effort thru the Gene Ontology project, but it sounds like this might
be a  more flexible solution if you can do dynamic modifications to the
tree by manipulating these WHAX trees.

I'm off to check out her site, but it's unresponsive right now:
http://cbil.humgen.upenn.edu/epodb/epodb.html

How are you currently representing this problem at the Field Museum? 
Especially the dynamic nature of the problem?

Cheers
Harry


"J. Steinbachs" wrote:
> 
> 
> Hey all...
> 
> I went to an interesting seminar yesterday at U of Chicago.  Susan
> Davidson (co-director, Center for Bioinformatics, UPenn) gave a talk on
> "Refreshing the Tower of Babel."
> 
> Caveat: I know very little about databases.
> 
> The application: EpoDB, a database created at UPenn Center for
> Bioinformatics, designed to study gene regulation during differentiation
> and development of vertebrate red blood cells.
> 
> The problems:  extracting data from a sorts of databases with different
> underlying structures; cleansing the data (error removal); integration;
> annotation; updating (particularly, updating without losing the
> information added/removed during data cleansing).
> 
> I guess Susan is a strong proponent in the DB field for complex value
> databases (blah blah blah ginger... don't ask me what those are).
> However, for this problem, she and her colleagues have chosen to use
> XML, modifying it a bit into something they call WHAX.
> 
> The data can be represented as a "WHAX tree", with the tag representing
> the branches and the tag value representing the node.  Additions to the
> a subset of the data can be integrated into the larger database by
> simple manipulations of WHAX trees.
> 
> I originally went because of the application to genetic data.  But then
> I got sidetracked...  Here at the Museum, we have specimen data (21+
> million specimens in total) in which species names change, higher
> taxonomic information changes, and so on, all of which should be tracked
> within the database.  In some cases, we are integrating the traditional
> genetic data into our specimen databases; i.e., in newer portions of our
> collection of specimens, we have a one-to-one correspondence between the
> dead dried pressed plant (or the stuffed animal and corresponding
> skeleton), the DNA extracted from said plant (or animal), and a record
> in our developing databases (birds are separate from plants are separate
> from fishes...).  The computer scientists were intrigued by this type of
> data :)  This WHAX "thing" would be perfect for tracking all that
> information.
> 
> Perhaps "bioinformatics" is currently too narrowly defined (organisms
> have more characteristics about them than just their DNA).  If we, the
> community of manipulators of biological data, do come up with an open
> standard for representing said data, that standard should be flexible
> enough to encompass all the characteristics about the organisms.  And,
> in light of all the stupid patenting going on, perhaps an open standard
> is needed before big bad multinational corporation patents it first.
> 
> Just a few thoughts...
> -jennifer
> 
> --------------------------
> J. Steinbachs, PhD
> Computational Biologist
> Dept of Botany
> The Field Museum
> Chicago, IL 60605-2496
> 
> office: 312-665-7810
> fax: 312-665-7158
> --------------------------
> 
> _______________________________________________
> pipet-devel maillist  -  pipet-devel@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/pipet-devel

-- 
Cheers,
Harry

Harry J Mangalam -- (949) 856 2847 -- mangalam@home.com


From stein at fmppr.fmnh.org  Thu Dec  2 14:45:44 1999
From: stein at fmppr.fmnh.org (J. Steinbachs)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] databases
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846BDFB.8AA9F4B9@home.com>
Message-ID: <3846CC68.A8EBA111@fmnh.org>

Harry Mangalam wrote:

<snip>

> 
> How are you currently representing this problem at the Field Museum?
> Especially the dynamic nature of the problem?
> 
Keep in mind that I'm not here working on databases... that I've only
become peripherally interested and involved because these are problems
that are not being addressed well by computing services.  Being at a
Museum has huge drawbacks - we don't have easy access to experts in the
fields who could help put together theory and applications solve some of
these informational issues.

That said, our databases are separate - a big problem in and of itself.

Our databases are relational databases which clearly do not easily
address the problem of changing identifiers (or any other
characteristic), especially when temp workers are hired to enter data
and make on-the-fly corrections without consulting the curator.  e.g., a
pot that was recorded in field notes as being collected in Rhodesia is
entered into the database as being collected from the modern political
equivalent; bad move as the listing of location as Rhodesia is an
important time stamp as to when the pot was collected.  Time stamps on
databases are clearly of utmost importance.  I don't know how these
relational databases are currently keeping track of species name (or
political country) changes; I would guess that some kind of "memo" field
might be in use.

Currently, anybody conducting a historical biodiversity survey of our
collections ("What organisms are in your collection from the Pacific
Northwest?") has to consult over half a dozen different databases, all
relational, but using different products.  Most have limited
web-accessibility.

On the molecular end, we've got individuals working on particular genes
for different groups of species.  They do their alignment (by eye only -
*cringe*), then plunk the data into NEXUS format for use in Paup*.  So
they have bunches of different text files floating around their hard
drives.  A really useful thing would be a database of aligned genes for
the different groups (e.g., the ribosomal database project)... but how
would one keep the alignment up-to-date?  What would be the best
underlying structure for such data?

Lots of problems, no clear solutions...

-jennifer  

--------------------------
J. Steinbachs, PhD
Computational Biologist
Dept of Botany
The Field Museum
Chicago, IL 60605-2496

office: 312-665-7810
fax: 312-665-7158
--------------------------


From bizzaro at geoserve.net  Thu Dec  2 16:10:39 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] databases
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org>
Message-ID: <3846E04F.4F59F82E@geoserve.net>

"J. Steinbachs" wrote:
> 
> The data can be represented as a "WHAX tree", with the tag representing
> the branches and the tag value representing the node.  Additions to the
> a subset of the data can be integrated into the larger database by
> simple manipulations of WHAX trees.

As a complete aside from the database issue, Loci needs to represent the
Workflow Diagram / Graphical Script in XML.  This is of course a tree
structure.  Perhaps we should look at how WHAX trees work for this purpose.

Is this the URL?

    http://cbil.humgen.upenn.edu/epodb/epodb.html

As Harry said, it's unresponsive.

> Perhaps "bioinformatics" is currently too narrowly defined (organisms
> have more characteristics about them than just their DNA).  If we, the
> community of manipulators of biological data, do come up with an open
> standard for representing said data, that standard should be flexible
> enough to encompass all the characteristics about the organisms.  And,
> in light of all the stupid patenting going on, perhaps an open standard
> is needed before big bad multinational corporation patents it first.

I couldn't have said it better.  And how does the media define bioinformatics?

    "The use of computer databases to organize the huge
     amount of biological information obtained by sequencing
     the human genome [DNA]..."

  http://www.newsalert.com/bin/story?StoryId=Coenz0bKbyta1nJu


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From stein at fmppr.fmnh.org  Thu Dec  2 16:12:39 1999
From: stein at fmppr.fmnh.org (J. Steinbachs)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] UPenn EpoDB URL
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846E04F.4F59F82E@geoserve.net>
Message-ID: <3846E0C7.E4685332@fmnh.org>

"J.W. Bizzaro" wrote:

<Snip>

> Is this the URL?
> 
>     http://cbil.humgen.upenn.edu/epodb/epodb.html
> 

Try http://www.cbil.upenn.edu/EpoDB/index.html instead :)

-j


--------------------------
J. Steinbachs, PhD
Computational Biologist
Dept of Botany
The Field Museum
Chicago, IL 60605-2496

office: 312-665-7810
fax: 312-665-7158
--------------------------


From bizzaro at geoserve.net  Thu Dec  2 16:40:44 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] UPenn EpoDB URL
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846E04F.4F59F82E@geoserve.net> <3846E0C7.E4685332@fmnh.org>
Message-ID: <3846E75C.949B5CA6@geoserve.net>

"J. Steinbachs" wrote:
> 
> Try http://www.cbil.upenn.edu/EpoDB/index.html instead :)

Jennifer, do you have any reference for WHAX? <:-)  I couldn't find anything
about it in the EpoDB literature.


Thank you.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From stein at fmppr.fmnh.org  Thu Dec  2 16:50:01 1999
From: stein at fmppr.fmnh.org (J. Steinbachs)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] UPenn EpoDB URL
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846E04F.4F59F82E@geoserve.net> <3846E0C7.E4685332@fmnh.org> <3846E75C.949B5CA6@geoserve.net>
Message-ID: <3846E989.2E4348D7@fmnh.org>

"J.W. Bizzaro" wrote:
> 
> "J. Steinbachs" wrote:
> >
> > Try http://www.cbil.upenn.edu/EpoDB/index.html instead :)
> 
> Jennifer, do you have any reference for WHAX? <:-)  I couldn't find anything
> about it in the EpoDB literature.
> 

Sadly, I do not.  I do recall that Susan mentioned that the WHAX stuff
had only been done within the past four months, so a publication is not
likely to be forthcoming soon.  It might be worthwhile contacting her
directly (see the CBIL web page for contact details) for more
information (especially people actually doing the work).

-jennifer
 
--------------------------
J. Steinbachs, PhD
Computational Biologist
Dept of Botany
The Field Museum
Chicago, IL 60605-2496

office: 312-665-7810
fax: 312-665-7158
--------------------------


From chapmanb at arches.uga.edu  Thu Dec  2 18:07:49 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] databases
In-Reply-To: <3846E04F.4F59F82E@geoserve.net>
References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen>
 <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen>
 <38469859.2F2F8185@fmnh.org>
Message-ID: <l03130304b46ca46f268d@[172.16.0.2]>

Fresh off of my schooling on what Loci is/is not, here are my thoughts on a
tree representation of a workflow diagram:


>> The data can be represented as a "WHAX tree", with the tag representing
>> the branches and the tag value representing the node.  Additions to the
>> a subset of the data can be integrated into the larger database by
>> simple manipulations of WHAX trees.
>
>As a complete aside from the database issue, Loci needs to represent the
>Workflow Diagram / Graphical Script in XML.  This is of course a tree
>structure.  Perhaps we should look at how WHAX trees work for this purpose.
>

	It seems to me that a balanced tree data structure (B-Tree, from my
Intro to Algorithms text) would be an excellent way to represent a workflow
diagram! I haven't looked through enough Python libs yet, but I'm positive
there must be some nice tree classes already implemented that could be
extended so we wouldn't have to do all the work (just implement a suitable
holder class and the additional functions to deal with it). The tree seems
to flow kind of naturally from the structure of how I picture a loci
diagram working. In addition, since it would be *just* a data stucture
(although a big one!), this could help in passing it around (I think I saw
a mention somewhere about the idea of interchanging loci implementations
between collaborators).
	The only problem I picture is when mutiple branches feed into a
single node:

			big container loci
			|		|
			|		|
		  document		document
		(genbank sequence) (genbank sequence 2)
		|			|
		|			|
		converter		converter
			|		|
			|		|
			----------------
				|
 				processor
		(ie. seqalign to align the 2 sequences)

Does this corrupt a tree? I don't think this is explicitly disallowed in
the rules on a branched tree, but I can't really every recall seeing a tree
like this. I would think it would then become a graph, but this seems too
general to represent the type of data, since it still has a lot of "tree"
characteristics.
	Anyways, that is just my naive "I have read the intro to algorithms
text" thoughts on the WHAX tree idea. I would be very interested in seeing
the WHAX algorithms, and also in hearing other people's input on this (all
the computer science people out there can straighten me out!). Many thanks
to Jennifer for posting the original info.

Brad


From bizzaro at geoserve.net  Fri Dec  3 01:10:33 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] Entity
Message-ID: <38475ED9.982B3287@geoserve.net>

Another source for ideas on the use of XML:

    http://entity.netidea.com/


Cheers.
Jeff


From bizzaro at geoserve.net  Fri Dec  3 01:36:01 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:00 2006
Subject: [Pipet Devel] BLADE
Message-ID: <384764D1.A7893148@geoserve.net>

Perhaps what we could use for a Web interface to Loci:

    http://www.thestuff.net/bob/projects/blade/


Cheers.
Jeff


From bizzaro at geoserve.net  Fri Dec  3 20:39:40 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] [Fwd: WHAX]
Message-ID: <384870DB.D915F35B@geoserve.net>

This message is from Susan Davidson, who resently spoke with Jennifer at
UChicago.  Susan mentioned in her talk there the WHAX XML model for tree
structures.   Jeff
-------------- next part --------------
An embedded message was scrubbed...
From: Susan Davidson <susan@central.cis.upenn.edu>
Subject: WHAX
Date: Fri, 03 Dec 1999 17:29:28 EST
Size: 1723
Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19991204/90ca56ff/attachment.mht
From chapmanb at arches.uga.edu  Sun Dec  5 13:53:54 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] WHAX and Loci storage ideas
In-Reply-To: <384870DB.D915F35B@geoserve.net>
Message-ID: <l03130302b47064d3c3c4@[172.16.0.2]>

Oh Great Locians;
	Hello! I have been doing some more thinking about data storage for
loci and along these lines have read up on the WHAX stuff. Below I kind of
give a quick overview of WHAX (for those of you who didn't like the looks
of the 60 page technical document about it!) based on what I was able to
get out of it (since I'm not a database expert). Then I follow up with a
plan for data storage for Loci based on WHAX ideas, info from the archives,
and my own random ideas. Sorry it's so long, but I would really be
interested in hearing everyone's comments if they can make it all of the
way though!

WHAX (Warehouse Architechture for XML)
--------------------------------------
	Basically, this is a technical document detailing the
implementation of WHAX. Basically, what WHAX is designed to do is to take
selected
information from a data source, which can be either a database or an XML
document, and represent it as an "XML Warehouse." This XML Warehouse
contains specific information from a database which has been selected by
the user. For instance, if you had a database full of books you've read,
you could create an XML warehouse of all of the books you've read that
were written by Stephen King. Some key characteristics of an XML Warehouse
is that it is in XML format and is represented by a  tree structure. So
based on my limited XML knowledge, this seems analagous to a Document
Object Model (DOM).
	What WHAX does is define a method for upkeeping this XML Warehouse.
The upkeep is unique from upkeep of databases because XML is in a
semi-structured format--the paper describes it as "self-describing,
irregular data." That paper details methods for changing the XML warehouse
when new data is added or removed, and for keeping the warehouse consistent
with changes in the underlying database where the XML warehouse got its
information from.


Data Storage in Loci
--------------------
	Reading through this document got me thinking about how this could
be applied to Loci and I came up with the following model of data storage
in Loci.

To make things simpler in my head, I split the data storage needs of Loci
(according to my, hopefully correct!, model of Loci) into three categories:

1. The data that comes in as a document (for instance, a set of
	sequences in FASTA format). These are the input files provided by
the user.

2. The actual setup of a workflow diagram--the underlying structure of the
diagram (how all of the loci are connected together). This is supplied by
the user in the workflow diagram by connecting all of the dots together and
constructing the command-lines (in the words of Jeff!).

3. The internal XML warehouse (to use my new WHAX-learned term!). This
would be a subset of the supplied data (1.) that is passed from loci to
loci according to the work flow diagram. Jeff describes this very well
(Data Storage Interfaces--June 11) as an XML document that travels from
loci to loci and changes XML formats (ie. changes to different document
structures according to the specific DTD (document type definition) needed
at that loci).

Each of these points has a specific storage needs, so I have come up with a
separate plan for each of them:

1. Input Data: Since the user supplied this data, it is their choice to
determine how they want to deal with it. If they want to store it as a
backup in a database of some sort, then they can do this through the work
flow diagram. So the data can be stored in a 'plug-in' database (what Gary
and Jeff mentioned to be). This type of interface/data storage component
isn't "essential" to the functioning of Loci, so I will go on to the
essential data storage needs.

2. Workflow Data: Loci will need a method to store the user defined
workflow diagram. This diagram includes: 1. the setup of the workflow
diagram (how everything is connected together) 2. The constructed command
line for each program 3. more???. This is the kind of storage need I was
thinking about when I wrote my incoherent message a couple of days ago
about trees and graphs. Basically, my thinking is that we can stick all of
the information from a workflow diagram into a data stucture, and then move
through this structure in the specified order to execute the contents of
the workflow diagram. My new data structure of choice is a flow network
(still from Intro Algorithms). Basically I think each element of network
would have a setup kind of like the following pseudo-code:

data-structure loci:
	array[pointers] TheNextLoci #pointers to the loci which come next in
				    #the flow diagram
	string Type # The loci type
	string IOName #the program or document represented by the loci
	tuple CommandLine #all of the command line arguments
	pointer XMLDocument #the info being processed
	pointer DTD #the document definition for the particular loci
	pointer ActionInstructions #a document with what to do at that loci

Of course, this would require each loci to setup a DTD type file that has
the specifications to create a document for the particular program (I talk
more about how I think this would work in point 3. below) and also an
ActionInstruction to determine what to do at that loci (ie. display a pdb
file in RasMol, align sequences from the XML document etc.).
	My mental image is that the XML document would move into a
particular locus, be converted to the DTD required for that particular
locus, and then processed according to the specifications of the program at
that locus. I imagine the setup of the DTD and action instructions would be
part of the plug-in process for each program that needs to read a document
into or get info from the workflow diagram.

3. Internal XML warehouse: My thoughts on this on pretty directly based off
the WHAX paper. Here is kind of what I imagine happening with a document
that comes into Loci. First the document will be converted into XML format
based on the DTD of the locus (ie. the type of data in the document). This
XML document will then be put into an XML database (Note: This is kind of
what I was thinking before--have a database to store info instead of a
specific internal format.) Then, as you progress through the work-flow
diagram, each loci will create an XML warehouse from the XML database based
on the DTD requirements of the particular loci. So what I am thinking is
that we can use the WHAX system to maintain an XML document that has all of
the info needed for a particular locus. For instance, if we come to a
processor that requires sequences in the database in FASTA format, we can
pull out the sequences and other required info from the database and update
the XML warehouse to have this info. So we would maintain a view of the
data available in the database and update it for the needs of a locus.
Okay, I should stop talking about this point before I get any more
confusing!

More ranting
----------------------

Basically, I am proposing a plan whereby we eliminate a specific internal
storage format and essentially put everything into a database. Of course,
this type of plan "requires" a database, and here I was thinking that we
could use dbXML (http://www.dbXML.org), mentioned by Jeff in the archives.
The database is under a BSD-style license (which I think is compatible with
the LGPL) and although it still doesn't "do" anything yet, it is under
current development (most recent tarball = November 27th) and we could try
and coordinate development with Tom Bradford, the developer there. He is
developing it in C++ with a CORBA interface (he is using ORBacus as his
ORB), so ultimately the database could also be pluggable (you could use any
XML storage database), which fits in well with the Loci schema.
The reason that I think this kind of plan is better than an internal format
is that it gives us a lot of flexibility to input any kind of information,
as Jennifer was talking about. For instance, say we had a program to plug
in that uses specific animal descriptors to build an evolutionary tree. So
you might have data for an anteater in the input file like:

<Claws> Sharp and Pointy </Claws>
<Nose> Long </Nose>
<Tounge> Really Long </Tounge>

(Okay, so I don't know anything about anteaters! Sorry!). With an internal
data format, we could have to define a new DTD to include these three
elements but with a database format, I don't think this would be necessary.


Okay, well basically this is what has been on my mind for the past couple
of days and hopefully I've managed to scrape it together in a
semi-organized fashion. I would be really interested to hear everyone's
comments about the ideas to see if they are along the lines of other
peoples' thinking or just really crazy. Also, thank you very much if you
read this through all of the way to the end!

Brad


.


From gvd at redpoll.pharmacy.ualberta.ca  Sun Dec  5 23:30:16 1999
From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] WHAX and Loci storage ideas
References: <l03130302b47064d3c3c4@[172.16.0.2]>
Message-ID: <384B3BD8.6B557542@redpoll.pharmacy.ualberta.ca>

Brad,

It good to know that someone is thinking about data storage issues for
Loci.  This is an important and (in my personal opinion) underdiscussed
topic.  Let's disscuss some of these ideas now.  For clarity, lets keep
in mind that Loci is constructed in a 'three-tier' architecture:
1. The GUI 'Front-end' with 'bindings' to the 'Middleware'. 
2. The 'Middleware', which is the CORBA, or command line interface, or
http protocol, or whatever is needed to access the 'Back-end'.  These
will be the services that allow the backend to interoperate, as dictated
by the WFD. A 'data translator locus' is a good example of loci
'middleware'.  The database used to store the individual loci contained
within a 'container locus' would be another example.  
3. The Back-end, which are the information repositories (filesystems,
databases, and so on), and the analysis programs that manipulate the
data.  The back-end  likely is diverse, both  architecturally and
geographically.  Note that nowhere in this description is there any
mention of data-type:  Loci can work for physicists as well as it can
for bioinformaticists, but we are all bioinformaticists here, so we
always provide our scenarios (and will use Loci) as a bioinformatics
application. A multiple-alignment program is a good example of a
'back-end' locus.

The back-end 'resources' are the 'loci'.  They are represented by the
icons / nodes in the Front-End, and made interoperable by the
middleware.   The front-end and the back-end dont even know about each
other.  

Although I'm not the absolute authority on Loci's architecture, and the
architecture likely will cotinue to evolve, I'm relatively certain that
this is the current 'Loci architectural paradigm'.  

I'm pretty certain that you already understand this paradigm, but I
thought I should make it explicit for the sake of discussing your ideas
on data storage for Loci.


Brad Chapman wrote:

> WHAX (Warehouse Architechture for XML)
> --------------------------------------
>         Basically, this is a technical document detailing the
> implementation of WHAX. Basically, what WHAX is designed to do is to take
> selected
> information from a data source, which can be either a database or an XML
> document, and represent it as an "XML Warehouse." This XML Warehouse
> contains specific information from a database which has been selected by
> the user. For instance, if you had a database full of books you've read,
> you could create an XML warehouse of all of the books you've read that
> were written by Stephen King. Some key characteristics of an XML Warehouse
> is that it is in XML format and is represented by a  tree structure. So
> based on my limited XML knowledge, this seems analagous to a Document
> Object Model (DOM).
>         What WHAX does is define a method for upkeeping this XML Warehouse.
> The upkeep is unique from upkeep of databases because XML is in a
> semi-structured format--the paper describes it as "self-describing,
> irregular data." That paper details methods for changing the XML warehouse
> when new data is added or removed, and for keeping the warehouse consistent
> with changes in the underlying database where the XML warehouse got its
> information from.

The URL for this document is 
http://db.cis.upenn.edu/cgi-bin/Person.perl?susan

The document title is:  Efficient View Maintenance in XML Data
Warehouses 


> Data Storage in Loci
> --------------------
>         Reading through this document got me thinking about how this could
> be applied to Loci and I came up with the following model of data storage
> in Loci.
> 
> To make things simpler in my head, I split the data storage needs of Loci
> (according to my, hopefully correct!, model of Loci) into three categories:
> 
> 1. The data that comes in as a document (for instance, a set of
>         sequences in FASTA format). These are the input files provided by
> the user.

Or retrieved from a database query, or output by an analysis program.

> 
> 2. The actual setup of a workflow diagram--the underlying structure of the
> diagram (how all of the loci are connected together). This is supplied by
> the user in the workflow diagram by connecting all of the dots together and
> constructing the command-lines (in the words of Jeff!).

This is my understanding as well, although the WFD will be constructed
via a graphical shell, which has a 'thin interface' to the middleware. 
When you say 'constructing the command-lines', do you mean 'generating
the interface to the middleware'?  

> 
> 3. The internal XML warehouse (to use my new WHAX-learned term!). This
> would be a subset of the supplied data (1.) that is passed from loci to
> loci according to the work flow diagram. Jeff describes this very well
> (Data Storage Interfaces--June 11) as an XML document that travels from
> loci to loci and changes XML formats (ie. changes to different document
> structures according to the specific DTD (document type definition) needed
> at that loci).


> 
> Each of these points has a specific storage needs, so I have come up with a
> separate plan for each of them:
> 
> 1. Input Data: Since the user supplied this data, it is their choice to
> determine how they want to deal with it. If they want to store it as a
> backup in a database of some sort, then they can do this through the work
> flow diagram. So the data can be stored in a 'plug-in' database (what Gary
> and Jeff mentioned to be). This type of interface/data storage component
> isn't "essential" to the functioning of Loci, so I will go on to the
> essential data storage needs.

Exactly.  Using Jeff's analogy, what if we were to retrieve an entire 2
Terabyte sequence file, in GenBank format, from the NCBI database, and
wanted to search the entire file against the cDNA for alpha-hemoglobin.
Lets suppose further that we had access to a remote analysis program
running on a fancy supercomputer that did BLAST searches for us and
required GenBank formatted files to perform the search. Suppose further
that the NCBI database and the Supercomputer were on the same machine. 
We could construct a WFD where we retrieve the 2 Terabyte file from NCBI
and 'pipe' it directly to the analysis program, along with our
a-hemoglobin cDNA, and BLAST away. In theory, Loci would send the data
from the database and through the analysis program, possibly without the
data ever touching a network-interface card, and without ever being
reformatted If however, Loci required the data to be reformatted and
stored in an intermediate database, say on my 66Mhz 486 with 400 MB Hard
drive and 4Mb ram, I'd be running for the fire-extinguisher as my cpu
exploded in a core-dumping ball of fire.

On the other hand, what if we planned to do our entire thesis project
based upon the information kept in that 2 Terabyte file? Would we want
to retrieve it from the NCBI database everytime we wanted to do an
analysis on it, especially if we wanted only to search a small segment
of it? No way! we would wan to have that file stored in a fashion
wherein we could easily extract only the parts that we are interested in
performing an analysis on. This is where Loci's ability to store
sequence data in a database becomes important.


> 
> 2. Workflow Data: Loci will need a method to store the user defined
> workflow diagram. This diagram includes: 1. the setup of the workflow
> diagram (how everything is connected together) 2. The constructed command
> line for each program 3. more???. This is the kind of storage need I was
> thinking about when I wrote my incoherent message a couple of days ago
> about trees and graphs. Basically, my thinking is that we can stick all of
> the information from a workflow diagram into a data stucture, and then move
> through this structure in the specified order to execute the contents of
> the workflow diagram. My new data structure of choice is a flow network
> (still from Intro Algorithms). Basically I think each element of network
> would have a setup kind of like the following pseudo-code:
> 
> data-structure loci:
>         array[pointers] TheNextLoci #pointers to the loci which come next in
>                                     #the flow diagram
>         string Type # The loci type
>         string IOName #the program or document represented by the loci
>         tuple CommandLine #all of the command line arguments
>         pointer XMLDocument #the info being processed
>         pointer DTD #the document definition for the particular loci
>         pointer ActionInstructions #a document with what to do at that loci

We still need to formalize the interface to the the command-line-run
backend apps.  but this sounds about right to me.

The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence
Analysis working group has a nearly complete RFP
(http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_RFP.html)
for sequences and their alignment and annotation.  Loci plans to adopt
their CORBA IDL for passing biomolecular sequence objects to
CORBA-compliant backend apps.  This RFP has 'XML extensions' for future
compatability, btw.

> 
> Of course, this would require each loci to setup a DTD type file that has
> the specifications to create a document for the particular program (I talk
> more about how I think this would work in point 3. below) and also an
> ActionInstruction to determine what to do at that loci (ie. display a pdb
> file in RasMol, align sequences from the XML document etc.).
>         My mental image is that the XML document would move into a
> particular locus, be converted to the DTD required for that particular
> locus, and then processed according to the specifications of the program at
> that locus. I imagine the setup of the DTD and action instructions would be
> part of the plug-in process for each program that needs to read a document
> into or get info from the workflow diagram.

My understanding is that Loci will come with 'data translators'
(middleware) that will be placed between a document / database to
accomodate the formatting requirements of the analysis program that will
operate on the document.


> 
> 3. Internal XML warehouse: My thoughts on this on pretty directly based off
> the WHAX paper. Here is kind of what I imagine happening with a document
> that comes into Loci. First the document will be converted into XML format
> based on the DTD of the locus (ie. the type of data in the document). This
> XML document will then be put into an XML database (Note: This is kind of
> what I was thinking before--have a database to store info instead of a
> specific internal format.) 

I think this is appropriate only for Loci's own internal data
requirements, but violates Loci's 'laissez-faire' paradigm for operating
on 'exogenous' data. Jeff explained to me best when he said that Loci
should be like the Bash shell: the bash shell has redirection operators
and pipes, which you can combine to do some fairly sophisticated data
processing, for example:

bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt

Here bash will pipe the contents of /var/adm/messages to grep, which
will extract all the lines containing the word 'root' and place them in
the /tmp/root.txt file.  Bash itself cares not about the contents of
/var/adm/messages, doesnt reformat it, doesnt store it in an
intermediate database, then re-extract it from the database, reformat it
once again, and finally pump out the /tmp/root.txt file according to
some xml dtd.  Neither should Loci, in its most abstracted form.
Instead,the data conversions and XML operations should be the modular
extensions to Loci that we provide as valuable options for the end-user,
so that Loci becomes not just a graphical 'bash', but a sophisticated
distributed data processing system.  Not that a graphical bash wouldn't
be nice:  the gnome dudes have talked about using Loci's graphical shell
to do just that!  Bottom line:  maximum abstraction + maximum
modularization = maximum flexibility = maximum power!

gary


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gary Van Domselaar		gvd@redpoll.pharmacy.ualberta.ca
Faculty of Pharmacy 		Phone: (780) 492-4493
University of Alberta		FAX:   (780) 492-5305
Edmonton, Alberta, Canada       http://redpoll.pharmacy.ualberta.ca/~gvd


From David.Lapointe at umassmed.edu  Mon Dec  6 13:21:40 1999
From: David.Lapointe at umassmed.edu (Lapointe, David)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Linux Clusters vs SMP
Message-ID: <93307F07DE63D211B2F30000F808E9E501644F33@edunivexch02.umassmed.edu>

There is an interesting discussion about SMP vs Beowulf going on on
bionet.software.

Here's an attachment: I missed the first article by David Mathog.

 <<smpbeowulf.txt>> 

David Lapointe, Ph.D.
Research Computing Manager
6-5141
"What we obtain too cheap, we esteem too lightly." - T. Paine

-------------- next part --------------
From: wrp@alpha0.bioch.virginia.edu (William R. Pearson)
Subject: Re: SMP vs. Beowulf?
Newsgroups: bionet.software
Date: 02 Dec 1999 15:18:22 -0500
Organization: University of Virginia


We have not looked into SMP vs Beowulf exhaustively, but we have quite
a bit of experience.

(1) SMP is far easier to configure and run than PVM (or MPI or
    others). You just run the program; if its threaded SMP, it runs
    faster.  SMP programs are also much easier to develop and debug.

(2) Our current PVM implementation is not as CPU efficient as spawning
    a bunch of threaded fasta33_t runs when the algorithm is fast.
    For Smith-Waterman, which is compute bound, they are equally
    efficient.  In line with point (1), I think it is easier to
    improve the performance of an SMP program.  I don't think this is
    an inherent shortcoming of PVM, but reflect the fact that our PVM
    implementaion (and very primitive scheduling system) was build
    when machines and interconnections were much slower.

(3) However, we have not yet found a version of Linux Pthreads that
    works 100% of the time.  With the kernal and C-libraries that we
    use, we see failures which are almost certainly caused by Linux
    Pthreads.  (We never see them in any other environment, and we
    don't see them unthreaded.)  Linux PVM is very reliable.

So we use both.  We use PVM for genome-vs-genome Smith-Waterman
searches, and we use SMP threaded versions for our WWW
server. Starting up PVM (or any other system that spawns large numbers
of jobs on other machines) has a high overhead, which isn't worth the
cost when the search will be done in a few minutes - we don't see
nearly as much overhead with SMP machines.  But large SMP machines are
considerably more expensive.  A cost-effective solution is a WWW
server that sends its searches to a bank of 1-CPU or 2-CPU machines.

Bill Pearson

############
From: Tim Cutts <timc@chiark.greenend.org.uk>
Subject: Re: SMP vs. Beowulf?
Newsgroups: bionet.software
Date: 03 Dec 1999 11:30:28 +0000 (GMT)
Organization: Linux Unlimited

William R. Pearson <wrp@alpha0.bioch.virginia.edu> wrote:
>
>We have not looked into SMP vs Beowulf exhaustively, but we have quite
>a bit of experience.
>
>(1) SMP is far easier to configure and run than PVM (or MPI or
>    others). You just run the program; if its threaded SMP, it runs
>    faster.  SMP programs are also much easier to develop and debug.

There are a couple of points to make here.  1)  MPI is far more
efficient than PVM.  No-one should be using PVM these days.  2) MPI is
more flexible than threads in that an MPI version of a program can still
be run on an SMP machine, as well as on a distributed network.

Programs like BLAST and FASTA have a problem in that their I/O
requirements are large, and this can be a real performance problem on a
distributed network.

For example, you could think of implementing your parallel program by
giving each MPI process part of the database to work on.  The problem
there is that you have a large overhead in getting the database to the
processor.  Ethernet is too slow, and will destroy any performance gain
from the parallel code.

A better solution, easier to implement, and probably more useful for
most purposes, is a workstation farm with each node having a local copy
of all the target databases, and run normal single threaded blast on
each.  For large scale work, you typically want to blast lots of
sequences against several databases, so such coarse grained
parallelisation is fine.  You just need some way of distributing the
blast jobs to your farm.  You can either do this with some fairly
trivial perl scripting, or you can use some more flexible commercial
offering.  I can highly recommend platform computing's LSF package.
It's expensive, but it extremely good at managing workstation farms, in
particular with cycle stealing from machines when they're idle.

Using LSF at the University of Cambridge, I got 100 %CPU utilisation on
a 20 workstation farm.  These were interactive workstations too; people
doing NMR spectrum assignment at the workstations weren't even aware
their machines were also performing highly CPU intensive analysis jobs
in the background.  Efficient use of the workstations like this
ultimately saved money, since they realised that they no longer needed
to buy further machines.

Tim.

##########
From: Piotr Kozbial <piotrk@ibb.waw.pl>
Subject: Re: SMP vs. Beowulf?
Newsgroups: bionet.software
Date: Sat, 04 Dec 1999 15:11:40 +0100
Organization: http://news.icm.edu.pl/
Reply-To: piotrk-NO@SPAMM-ibb.waw.pl

There are other kinds of Linux clusters. You can read discussion about
"Choosing the Right Cluster System"
http://slashdot.org/article.pl?sid=99/11/12/0354238

For example (posted by SEWilco):

Beowulf is one of a family of parallel programming API tools. Programs
must use the API to accomplish parallel programming. 
http://cesdis.gsfc.nasa.gov/linux/beowulf/beowulf.html
           
SCI is fast hardware with support for distributed shared memory,
messaging, and data transfers. Again, if you don't use the API then no
gain. 
http://nicewww.cern.ch/~hmuller/sci.htm
            
DIPC is distributed System V IPC. Programs which use the IPC API can be
converted to DIPC easily, such as just by adding the DIPC flag to the
IPC call. 
http://wallybox.cei.net/dipc/dipc.html
            
MOSIX is the most general-purpose. Processes are scattered across a
cluster automatically without having to modify the programs. No API
needed other than usual Unix-level process use. Allows parallel
execution of any program, although full use requires a parallel program
design.
http://www.cnds.jhu.edu/mirrors/mosix/
From bizzaro at geoserve.net  Mon Dec  6 14:10:19 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Conglomerate
Message-ID: <384C0A1B.9F24D3A5@geoserve.net>

This may be interesting to us in more than one way:

    http://www.conglomerate.org/


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From chapmanb at arches.uga.edu  Mon Dec  6 20:18:08 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] WHAX and Loci storage ideas
In-Reply-To: <384B3BD8.6B557542@redpoll.pharmacy.ualberta.ca>
References: <l03130302b47064d3c3c4@[172.16.0.2]>
Message-ID: <l03130307b471fffd6330@[172.16.0.2]>

Gary et al.;
	Thanks for getting back with me about my data storage thinking! I
think I may have the idea now--so I kind of work through everything in the
rest of this e-mail, and then humbly propose a short-term development
plan(!) just for the sake of argument.

Gary Van Domselaar wrote:
>in mind that Loci is constructed in a 'three-tier' architecture:
>1. The GUI 'Front-end' with 'bindings' to the 'Middleware'.
>2. The 'Middleware', which is the CORBA, or command line interface, or
>3. The Back-end, which are the information repositories (filesystems,
>I'm pretty certain that you already understand this paradigm, but I
>thought I should make it explicit for the sake of discussing your ideas
>on data storage for Loci.

Yeah, I have a firm grasp on the theory but in practice, I know that I have
a lot of difficulty separating Front-End (ie. Loci proper) and Middleware
(ie. plug-ins to Loci). I apologize about that--I know that some of my
thoughts probably reflect my inability to separate these components. I'm
working at it!

Gary Van Domselaar wrote:
>> WHAX (Warehouse Architechture for XML)
>> --------------------------------------
>The URL for this document is
>http://db.cis.upenn.edu/cgi-bin/Person.perl?susan
>
>The document title is:  Efficient View Maintenance in XML Data Warehouses

Thanks, I meant to include that info!

Gary Van Domselaar wrote:
>> 1. The data that comes in as a document (for instance, a set of
>>         sequences in FASTA format). These are the input files provided by
>> the user.
>
>Or retrieved from a database query, or output by an analysis program.

Right-o!

Gary Van Domselaar wrote:
>>
>> 2. The actual setup of a workflow diagram--the underlying structure of the
>> diagram (how all of the loci are connected together). This is supplied by
>> the user in the workflow diagram by connecting all of the dots together and
>> constructing the command-lines (in the words of Jeff!).
>
>This is my understanding as well, although the WFD will be constructed
>via a graphical shell, which has a 'thin interface' to the middleware.
>When you say 'constructing the command-lines', do you mean 'generating
>the interface to the middleware'?

What I think this refers to is generating a command-line for a program by
using a GUI to input all of the switches. For instance, if I were using
program foo that used a -l switch to specify a log file, I would use the
Loci interface to generate the equivalent of 'foo -l /var/mylogfile.' My
thinking was that 'the interface to the middleware' would be worked out
during the programming of the plug-in to work with Loci. For instance, to
get Loci to use my sequence viewer program, I would have to tell it by
writing the plug-in:

1. What kind of file the program needs (ie. PDB, FASTA, etc)
2. How to work the program (ie. the command line stuff: the switches it
takes, etc)

Loci would then take this info and have a GUI for 'constructing the command
line' (getting the switches set up) and do error checking do make sure the
user supplies the right file for the program.
At least, this is my current understanding of how stuff would work

Gary Van Domselaar wrote:

>We still need to formalize the interface to the the command-line-run
>backend apps.  but this sounds about right to me.
>
>The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence
>Analysis working group has a nearly complete RFP
>(http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_R
>FP.html)
>for sequences and their alignment and annotation.  Loci plans to adopt
>their CORBA IDL for passing biomolecular sequence objects to
>CORBA-compliant backend apps.  This RFP has 'XML extensions' for future
>compatability, btw.

Thanks--I'll take a look at it (whenever I am feeling up to looking at a
huge document with half the lines crossed out!). I just came up with that
"interface" specification off the top of my head--just wanted to make sure
I was on the right track.


Gary Van Domselaar wrote:
>I think this is appropriate only for Loci's own internal data
>requirements, but violates Loci's 'laissez-faire' paradigm for operating
>on 'exogenous' data. Jeff explained to me best when he said that Loci
>should be like the Bash shell: the bash shell has redirection operators
>and pipes, which you can combine to do some fairly sophisticated data
>processing, for example:
>
>bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt
>
>Here bash will pipe the contents of /var/adm/messages to grep, which
>will extract all the lines containing the word 'root' and place them in
>the /tmp/root.txt file.  Bash itself cares not about the contents of
>/var/adm/messages, doesnt reformat it, doesnt store it in an
>intermediate database, then re-extract it from the database, reformat it
>once again, and finally pump out the /tmp/root.txt file according to
>some xml dtd.  Neither should Loci, in its most abstracted form.

I really like the idea of piping! You (and Jeff) are right, there is no
reason to stick stuff in a database if you could just pipe it around.
However, I have a couple of practical questions for using a piping approach
like this:

1. If you have data from a number of sources in a bunch of different
formats, how would you get them together to pipe them into a program that
would require them all in one text document in, say, FASTA format? Would
you have to run each of them through a converter to get them in a common
format, then pipe them all into a processor that would stick them into a
single file?

2. Conversely, what if you had a huge document and wanted to break it up
into smaller documents? For example, what if you had a swiss-prot file and
wanted to get just the protein sequences for all Zea mays (corn)
accessions--how would this be done?

3. How could individual parts of the data be queried or reordered? For
instance, if I wanted to separate all sequences with a particular motif out
of a file and then reorder them by organism.

4. What about doing things like generating GUIs on the fly, as Jeff talked
about  in the 'constructing the command line' mail? He mentioned getting a
pyGTK GUI directly from a Glade output XML document in this case, but
similary, what if we wanted to put the output into a web browser? Would we
convert the file to XML, then process it into HTML/GladeXML and then output
it?

These are just a few concerns I thought up for discussion regarding the
piping system you described. I really like the idea, and think it would be
a more straightforward to do, but my only concern is how well it would
scale as operations got more complicated. I guess I have been thinking of
Loci more as a graphical scripting language, which I imagine having a lot
more options then just a redirection shell.

Gary Van Domselaar wrote:
>Instead,the data conversions and XML operations should be the modular
>extensions to Loci that we provide as valuable options for the end-user,
>so that Loci becomes not just a graphical 'bash', but a sophisticated
>distributed data processing system.  Not that a graphical bash wouldn't
>be nice:  the gnome dudes have talked about using Loci's graphical shell
>to do just that!  Bottom line:  maximum abstraction + maximum
>modularization = maximum flexibility = maximum power!

	You are absolutely right! The best way to combine the piping
backbone with the scripting extensions would be to use a pluggable database
type option (the container) within the pipeline as I was mentioning before.
There I was thinking more in the context of a relational database for long
term storage but now I am thinking more in terms of an XML type database
for stort term storage for Loci's internal data requirements. Alright, yet
another separation between Front-end and Middleware! Sorry that I did not
grasp this sooner!
	So, how does this new paradigm for storage sound?:

1. Front-end: No storage capabilities of its own. Used to organize the
connections to the middleware and pass data around.

2. Middleware--2 storage options:
a. Provide option for XML storage of an "internal XML format." If a user
has a need for more complicated data-handling (as I described in my
questions above), they can utilize this option to place things in an
internal XML database and then use the XML warehouse kind of stuff I
described in point 3 in my last e-mail.
b. Provide an option for permanent storage with relational databases (ie.
MySQL, PostgreSQL, Sybase ...), so that the data can be available after
Loci has quit.

The middleware would handle the connections between the Loci front-end,
which asks for a database or internal format, and the back-end, which
provides it.

3. Back-end: All of the databases themselves.

If this sounds like a plan, then I would like to humbly propose an
immediate development focus: Get the piping stuff working with the Loci
front-end so that we can do something like the following: 1. Input a
sequence in FASTA format 2. Convert it to a new format 3. View it in a
sequence viewer. This type of activity would not require any storage
options, so this would simplify things. In addition, Jeff has the GUI
set-up to make the connections, so we are currently able to construct this
kind of workflow diagram. I think reaching this kind of short term goal
would be extremely exciting as Loci would actually "do" something and would
provide us with a base for further development. How does this sound? Anyone
for this? Hip-hip-hooray? Booooo? Whatta you think?
	Well, if you are to the end again, thank you very much! I would
love to hear comments, etc. Also, I hope I don't step on any toes by making
a development direction suggestion. I just want to get an idea of the short
and long term goals of Loci and kind of find my place somewhere in there so
I can have Loci working for my thesis project needs. Thanks again for
listening!

Brad


From bizzaro at geoserve.net  Tue Dec  7 09:26:00 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] WHAX and Loci storage ideas
References: <l03130302b47064d3c3c4@[172.16.0.2]>
Message-ID: <384D18F8.76F10B4B@geoserve.net>

Hey Brad!

Having read through most of your message at this point, I want to first rehash
a couple issues about the use of an 'internal format' or 'database':

(1) We have to distinguish between 'data to be processed' and 'workflow
data'.  My objection to a _required_ internal format or database, is for data
to be processed, NOT workflow data.  Of course we need our own system of
handling workflow data, and as Brad suggested, they can be kept in a database.

(2) As for data to be processed (biological/bioinformatics data), we can come
up with our own system, using XML or a database, or whatever.  I just don't
want to _require_ that every bioinformatics datum be converted to that format,
without the user's knowledge.  As Brad says, the user is responsible for
knowing what to do with the data.

(3) Processable data can be encapsulated in the workflow data, providing the
format of the processable data is maintained.  So, if locus represents a FASTA
document, our workflow data should just insert the whole document, unchanged,
between some tags: <document></document>.  Or if a database is used for Loci's
infrastructure and workflow management, the whole document is kept there.  BUT
THE DATA WITHIN THE DOCUMENT IS NOT CHANGED BY LOCI: Only the user can make
the change, and it is done via 'converter' loci.

Brad Chapman wrote:
> 
> To make things simpler in my head, I split the data storage needs of Loci
> (according to my, hopefully correct!, model of Loci) into three categories:
> 
> 1. The data that comes in as a document (for instance, a set of
>         sequences in FASTA format). These are the input files provided by
> the user.

Okay, in this case we're talking about data to be processed.

> 2. The actual setup of a workflow diagram--the underlying structure of the
> diagram (how all of the loci are connected together). This is supplied by
> the user in the workflow diagram by connecting all of the dots together and
> constructing the command-lines (in the words of Jeff!).

This is workflow data.

We're saying the WFD is a graphical script, which is a _script_ nonetheless,
that has to be represented as text (underneath it all) and parsed by an
interpreter (of our own invention) during execution.  It may be obvious to
some here that this is what we're aiming for (a scripting language), but some
may be scared off thinking this is an enormous task.  I like to think that it
is exciting and challenging.

> 3. The internal XML warehouse (to use my new WHAX-learned term!). This
> would be a subset of the supplied data (1.) that is passed from loci to
> loci according to the work flow diagram. Jeff describes this very well
> (Data Storage Interfaces--June 11) as an XML document that travels from
> loci to loci

I think you're talking about workflow data here too.

I learned that 'travel' is not the best word to use here, because it implies
that everything has to be parsed and rewritten (literally moved) between every
locus, even if all loci are on the local system.  Humberto and Justin have
correctly remarked that we want to minimize 'travel' where we can.  In the
case of all local loci, Loci (the program) should 'know' there is no need to
move anything: Everything stays on the local filesystem.

And in most cases, the data accompanying any communication between remote loci
should _point_ (via URI) to where loci (documents, programs, etc.) lie and not
assume the user wants or needs them: The user may already have the locus on
his/her local computer.  Also, since the remote system may be only the first
in a chain/workpath of connected systems, it would be most efficient to have a
pointer to any loci, rather than moving the whole thing across some umpteen
nodes.  IOW, I want the DNA doc on the 13th system I'm connected to.  I can
either make a direct connection to the 13th server via IP, or I can have the
13th send the doc to the 12th, which sends the doc to the 11th, which sends
the doc to the 10th... (Get the picture?)

> and changes XML formats (ie. changes to different document
> structures according to the specific DTD (document type definition) needed
> at that loci).

I'm not sure if you're talking about workflow or processable data here.

> Each of these points has a specific storage needs, so I have come up with a
> separate plan for each of them:
> 
> 1. Input Data: Since the user supplied this data, it is their choice to
> determine how they want to deal with it.

Amen brother!

> If they want to store it as a
> backup in a database of some sort, then they can do this through the work
> flow diagram. So the data can be stored in a 'plug-in' database (what Gary
> and Jeff mentioned to be). This type of interface/data storage component
> isn't "essential" to the functioning of Loci, so I will go on to the
> essential data storage needs.

Something that needs serious thought, however, on the extensions end of this
project.

> 2. Workflow Data: Loci will need a method to store the user defined
> workflow diagram. This diagram includes: 1. the setup of the workflow
> diagram (how everything is connected together) 2. The constructed command
> line for each program 3. more???. This is the kind of storage need I was
> thinking about when I wrote my incoherent message a couple of days ago
> about trees and graphs. Basically, my thinking is that we can stick all of
> the information from a workflow diagram into a data stucture, and then move
> through this structure in the specified order to execute the contents of
> the workflow diagram. My new data structure of choice is a flow network
> (still from Intro Algorithms). Basically I think each element of network
> would have a setup kind of like the following pseudo-code:
> 
> data-structure loci:
>         array[pointers] TheNextLoci #pointers to the loci which come next in
>                                     #the flow diagram
>         string Type # The loci type
>         string IOName #the program or document represented by the loci
>         tuple CommandLine #all of the command line arguments
>         pointer XMLDocument #the info being processed
>         pointer DTD #the document definition for the particular loci
>         pointer ActionInstructions #a document with what to do at that loci

There is some talk about the format of 'workflow data' in the mail archives. 
There were even thoughts that workflow and processable data could be
mixed...which gets back to a required internal data format.

> Of course, this would require each loci to setup a DTD type file that has
> the specifications to create a document for the particular program (I talk
> more about how I think this would work in point 3. below) and also an
> ActionInstruction to determine what to do at that loci (ie. display a pdb
> file in RasMol, align sequences from the XML document etc.).

Hmmm.

>         My mental image is that the XML document would move into a
> particular locus, be converted to the DTD required for that particular
> locus, and then processed according to the specifications of the program at
> that locus. I imagine the setup of the DTD and action instructions would be
> part of the plug-in process for each program that needs to read a document
> into or get info from the workflow diagram.

Oh okay, you're talking about wrapping programs not designed for Loci, to be
used in Loci: workflow data.  As I think you're suggesting, the same wrapping
system should be used for all loci, whether they be data or programs.  To a
large extent, _something_ has to accompany each locus.

> 3. Internal XML warehouse: My thoughts on this on pretty directly based off
> the WHAX paper. Here is kind of what I imagine happening with a document
> that comes into Loci. First the document will be converted into XML format
> based on the DTD of the locus (ie. the type of data in the document). This
> XML document will then be put into an XML database (Note: This is kind of
> what I was thinking before--have a database to store info instead of a
> specific internal format.)

I'm not sure what you mean by 'document'.  I usually use that word for
processable data, but I think you're referring to workflow data.

> Then, as you progress through the work-flow
> diagram, each loci will create an XML warehouse from the XML database based
> on the DTD requirements of the particular loci. So what I am thinking is
> that we can use the WHAX system to maintain an XML document that has all of
> the info needed for a particular locus. For instance, if we come to a
> processor that requires sequences in the database in FASTA format, we can
> pull out the sequences and other required info from the database and update
> the XML warehouse to have this info. So we would maintain a view of the
> data available in the database and update it for the needs of a locus.
> Okay, I should stop talking about this point before I get any more
> confusing!

I think I may need some hand-holding on this.

> More ranting
> ----------------------
> 
> Basically, I am proposing a plan whereby we eliminate a specific internal
> storage format and essentially put everything into a database. Of course,
> this type of plan "requires" a database, and here I was thinking that we
> could use dbXML (http://www.dbXML.org), mentioned by Jeff in the archives.

I'm still not sure if you're suggesting that all processable (bioinformatics)
data be broken up and converted into XML tags.

> The database is under a BSD-style license (which I think is compatible with
> the LGPL)

It is.  BSD allows proprietary/closed-source derivatives of your program,
which I don't like.  But it's not our program anyway.  Providing we can ship
it with Loci, that's all that matters to us.

> and although it still doesn't "do" anything yet, it is under
> current development (most recent tarball = November 27th) and we could try
> and coordinate development with Tom Bradford, the developer there.

Justin had some of his own ideas for an XML database, which he mentions on
this list.  He didn't give any details, so it's not worth searching for.  But
I thought you should know.  Of course, our own would be LGPL'd.

> He is
> developing it in C++ with a CORBA interface (he is using ORBacus as his
> ORB), so ultimately the database could also be pluggable (you could use any
> XML storage database), which fits in well with the Loci schema.

We could use it until something better (uses ORBit, Python, LGPL) comes along.

> The reason that I think this kind of plan is better than an internal format
> is that it gives us a lot of flexibility to input any kind of information,
> as Jennifer was talking about. For instance, say we had a program to plug
> in that uses specific animal descriptors to build an evolutionary tree. So
> you might have data for an anteater in the input file like:
> 
> <Claws> Sharp and Pointy </Claws>
> <Nose> Long </Nose>
> <Tounge> Really Long </Tounge>
> 
> (Okay, so I don't know anything about anteaters! Sorry!). With an internal
> data format, we could have to define a new DTD to include these three
> elements but with a database format, I don't think this would be necessary.

I would consider this for a plug-in database and not mix processable data with
workflow data.

So, are we looking at parallel (two, interconnected) databases?  If someone
wants to use Loci for, say physics, would this be a problem?


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Tue Dec  7 09:51:02 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] WHAX and Loci storage ideas
References: <l03130302b47064d3c3c4@[172.16.0.2]> <384B3BD8.6B557542@redpoll.pharmacy.ualberta.ca>
Message-ID: <384D1ED6.1C18564F@geoserve.net>

Gary Van Domselaar wrote:
> 
> It good to know that someone is thinking about data storage issues for
> Loci.

...cuz we can't rely on Jeff ;-)

> 'middleware'.  The database used to store the individual loci contained
> within a 'container locus' would be another example.

Interesting point.  A database for a locus's workflow data is middleware, but
a database for a locus's processable data is back-endware.

> mention of data-type:  Loci can work for physicists as well as it can
> for bioinformaticists, but we are all bioinformaticists here, so we

And Brad, this is why you can give an example of ant-eater physiology.  If any
one of us designed Loci for ourselves, the audience would be very small.  Even
within the scope of bioinformatics, it would be limited.

> Although I'm not the absolute authority on Loci's architecture,

No one is ;-)

> On the other hand, what if we planned to do our entire thesis project
> based upon the information kept in that 2 Terabyte file? Would we want
> to retrieve it from the NCBI database everytime we wanted to do an
> analysis on it, especially if we wanted only to search a small segment
> of it? No way! we would wan to have that file stored in a fashion
> wherein we could easily extract only the parts that we are interested in
> performing an analysis on. This is where Loci's ability to store
> sequence data in a database becomes important.

Everytime Loci 'points' to a locus (see my last message), the user should have
the option to download the whole thing.  If remote_locus_1 is a processor and
remote_locus_2 is the data, and they both reside on the same remote computer,
NOTHING should be passed back to the user but the results of the process. 
This is why we use pointers (URI's - not C pointers): low bandwith usage,
convenience.  But if the user really wants remote_locus_2 on his/her computer,
he/she should be able to 'get it'.  I haven't thought about how the user
interface for this would work.

> The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence
> Analysis working group has a nearly complete RFP
> (http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_RFP.html)
> for sequences and their alignment and annotation.  Loci plans to adopt
> their CORBA IDL for passing biomolecular sequence objects to
> CORBA-compliant backend apps.  This RFP has 'XML extensions' for future
> compatability, btw.

Right, and AppLab and some others have adopted the RFP.

> My understanding is that Loci will come with 'data translators'
> (middleware) that will be placed between a document / database to
> accomodate the formatting requirements of the analysis program that will
> operate on the document.

Again, it depends on whether Brad was talking about workflow data or
processable data.

> I think this is appropriate only for Loci's own internal data
> requirements, but violates Loci's 'laissez-faire' paradigm for operating
> on 'exogenous' data. Jeff explained to me best when he said that Loci
> should be like the Bash shell: the bash shell has redirection operators
> and pipes, which you can combine to do some fairly sophisticated data
> processing, for example:
> 
> bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt
> 
> Here bash will pipe the contents of /var/adm/messages to grep, which
> will extract all the lines containing the word 'root' and place them in
> the /tmp/root.txt file.  Bash itself cares not about the contents of
> /var/adm/messages, doesnt reformat it, doesnt store it in an
> intermediate database, then re-extract it from the database, reformat it
> once again, and finally pump out the /tmp/root.txt file according to
> some xml dtd.  Neither should Loci, in its most abstracted form.
> Instead,the data conversions and XML operations should be the modular
> extensions to Loci that we provide as valuable options for the end-user,
> so that Loci becomes not just a graphical 'bash', but a sophisticated
> distributed data processing system.

You said it so well!

> Not that a graphical bash wouldn't
> be nice:  the gnome dudes have talked about using Loci's graphical shell
> to do just that!  Bottom line:  maximum abstraction + maximum
> modularization = maximum flexibility = maximum power!

So, Loci is more like a graphical bash + some nifty programs to go with it.

Regarding processable data format conversions, a bash command might work like
this:

  echo data.fasta | fasta2xml | bioxmlview.py

Does bash need to know ANYTHING about biological data???


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Tue Dec  7 10:44:55 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] WHAX and Loci storage ideas
References: <l03130302b47064d3c3c4@[172.16.0.2]> <l03130307b471fffd6330@[172.16.0.2]>
Message-ID: <384D2B77.36B555C4@geoserve.net>

Brad Chapman wrote:
> 
> >This is my understanding as well, although the WFD will be constructed
> >via a graphical shell, which has a 'thin interface' to the middleware.
> >When you say 'constructing the command-lines', do you mean 'generating
> >the interface to the middleware'?
> 
> What I think this refers to is generating a command-line for a program by
> using a GUI to input all of the switches. For instance, if I were using
> program foo that used a -l switch to specify a log file, I would use the
> Loci interface to generate the equivalent of 'foo -l /var/mylogfile.'

That's exactly right, and applies pretty much to generating commands for
command-line applications.  I would like as much as possible for other
interface constructions to work in a similar fashion.  The idea is this:

    LOCI IS IT'S OWN SOFTWARE DEVELOPMENT KIT (SDK).

If you think about it, most programming and GUI building paradigms use tree
and workflow models.  If we can carry these over to Loci, you have a capable
and flexible development environment TOO!

> My
> thinking was that 'the interface to the middleware' would be worked out
> during the programming of the plug-in to work with Loci. For instance, to
> get Loci to use my sequence viewer program, I would have to tell it by
> writing the plug-in:
> 
> 1. What kind of file the program needs (ie. PDB, FASTA, etc)
> 2. How to work the program (ie. the command line stuff: the switches it
> takes, etc)
> 
> Loci would then take this info and have a GUI for 'constructing the command
> line' (getting the switches set up) and do error checking do make sure the
> user supplies the right file for the program.
> At least, this is my current understanding of how stuff would work

It sounds about right to me.  Later, we'll need some people thinking about how
to add these features to the Loci 'SDK'.

> I really like the idea of piping! You (and Jeff) are right, there is no
> reason to stick stuff in a database if you could just pipe it around.
> However, I have a couple of practical questions for using a piping approach
> like this:
> 
> 1. If you have data from a number of sources in a bunch of different
> formats, how would you get them together to pipe them into a program that
> would require them all in one text document in, say, FASTA format? Would
> you have to run each of them through a converter to get them in a common
> format, then pipe them all into a processor that would stick them into a
> single file?

I think you hit the nail on the head.

> 2. Conversely, what if you had a huge document and wanted to break it up
> into smaller documents? For example, what if you had a swiss-prot file and
> wanted to get just the protein sequences for all Zea mays (corn)
> accessions--how would this be done?

You'd need a processor (or database query) to do this.  It'd be better to have
a more general-purpose processor (can handle extracting all sorts of data)
than a special purpose one.  And (if we make our own) the processor should
work from one 'good' data format, leaving translation from swiss-prot to a
converter locus.  Let's say the 'good' data format is 'HumbertoXML' ;-)


    swiss-prot         swiss-prot          breaker-   ---->  Zea mays
      document  ---->  to HXML      ---->    upper    ---->  sequences
                       converter                      ---->  in HXML


> 3. How could individual parts of the data be queried or reordered? For
> instance, if I wanted to separate all sequences with a particular motif out
> of a file and then reorder them by organism.

If this stuff was databased first, you could use a more sophisticated query
system than above.  So, you may want to pipe your data into a database to
start.

> 4. What about doing things like generating GUIs on the fly, as Jeff talked
> about  in the 'constructing the command line' mail? He mentioned getting a
> pyGTK GUI directly from a Glade output XML document in this case, but
> similary, what if we wanted to put the output into a web browser? Would we
> convert the file to XML, then process it into HTML/GladeXML and then output
> it?

Web output of Loci interfaces is a tricky problem, and the whole Web interface
project is the biggest sub-project to Loci.  I can think of some ways to make
simple and limited Web interfaces, but just like you cannot get MS Word to run
via HTML browser, many Loci interfaces cannot not be run this way.  This is
why people made Java applets, etc.

What I am hoping to be able to do is convert diagrams or illustrations (for
example, protein motifs) made by Loci into JPG's for Web display.  I'm trying
to be realistic about this part of Loci.

> These are just a few concerns I thought up for discussion regarding the
> piping system you described. I really like the idea, and think it would be
> a more straightforward to do, but my only concern is how well it would
> scale as operations got more complicated. I guess I have been thinking of
> Loci more as a graphical scripting language, which I imagine having a lot
> more options then just a redirection shell.

Alright, as far as scripting languages are concerned, Loci is very limited. 
But I'd like to think of it as being 'high-level' or a '4GL' (fourth
generation language).  And I think that keeping Loci agnostic of data type
does not deminish its capabilities.  How can one turn a redirection shell into
a scripting language? As long as we're looking at bash as an analogy, we can
consider SHELL SCRIPTING, which is really just a more structured command-line.

> 2. Middleware--2 storage options:
> a. Provide option for XML storage of an "internal XML format." If a user
> has a need for more complicated data-handling (as I described in my
> questions above), they can utilize this option to place things in an
> internal XML database and then use the XML warehouse kind of stuff I
> described in point 3 in my last e-mail.
> b. Provide an option for permanent storage with relational databases (ie.
> MySQL, PostgreSQL, Sybase ...), so that the data can be available after
> Loci has quit.
> 
> The middleware would handle the connections between the Loci front-end,
> which asks for a database or internal format, and the back-end, which
> provides it.

I think you're suggesting a generic XML database as an 'internal database',
which can handle any processable data that are marked up.  I like the idea,
providing it is very generic.

But included in this list of middleware should be the mechanism (database?)
for knowing what locus is connected to what...basically handling all of the
workflow data...and a parser/interpreter  You mentioned this before.

> 3. Back-end: All of the databases themselves.

All the programs, data, converters...everything called a 'locus'.

> If this sounds like a plan, then I would like to humbly propose an
> immediate development focus: Get the piping stuff working with the Loci
> front-end so that we can do something like the following: 1. Input a
> sequence in FASTA format 2. Convert it to a new format 3. View it in a
> sequence viewer. This type of activity would not require any storage
> options, so this would simplify things. In addition, Jeff has the GUI
> set-up to make the connections, so we are currently able to construct this
> kind of workflow diagram. I think reaching this kind of short term goal
> would be extremely exciting as Loci would actually "do" something and would
> provide us with a base for further development. How does this sound? Anyone
> for this? Hip-hip-hooray? Booooo? Whatta you think?

As a focus or goal, this sounds good.  It doesn't say how we'll get there. 
But I never mentioned what the simplest senario for running Loci should be.

> Also, I hope I don't step on any toes by making
> a development direction suggestion. I just want to get an idea of the short
> and long term goals of Loci and kind of find my place somewhere in there so
> I can have Loci working for my thesis project needs.

No problem.  I still owe you a TODO list.  I worked on one, and I will pass it
to Gary for some comments before making it official.  If anyone else wants to
see the unofficial version so that they can comment on it, mail me directly.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From gvd at redpoll.pharmacy.ualberta.ca  Wed Dec  8 10:48:47 1999
From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Conglomerate
References: <384C0A1B.9F24D3A5@geoserve.net>
Message-ID: <384E7DDF.F5CD8064@redpoll.pharmacy.ualberta.ca>

"J.W. Bizzaro" wrote:
> 
> This may be interesting to us in more than one way:
> 
>     http://www.conglomerate.org/

"Conglomerate" is a structured document authoring application with an
intuitive interface.

I haven't downloaded a copy, but this does look like a viable
alternative to docbook. I'll check it out soon, and would like to have
the honourable Dr. Lapointe take a look at it as well.

The issues for me, in terms of using conglomerate for writing
documentation is that their software is new and probably not widely
adopted.  Their own documentation is sparse, which is ironic considering
that they are writing a structured document development interface. 
Their general news web archive is only month old, and their development
web archive link is broken.  On the other hnad, it does use XML, can
produce multiple output (HTML, TeX) from a single source, uses XML, and
provides a very nice interface for authoring structured docs.


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gary Van Domselaar		gvd@redpoll.pharmacy.ualberta.ca
Faculty of Pharmacy 		Phone: (780) 492-4493
University of Alberta		FAX:   (780) 492-5305
Edmonton, Alberta, Canada       http://redpoll.pharmacy.ualberta.ca/~gvd


From toneman at phil.uu.nl  Wed Dec  8 11:01:48 1999
From: toneman at phil.uu.nl (Michiel Toneman)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Conglomerate
In-Reply-To: <384E7DDF.F5CD8064@redpoll.pharmacy.ualberta.ca>
Message-ID: <Pine.LNX.4.04.9912081650320.15909-100000@apenstaartje.dyndns.org>

On Wed, 8 Dec 1999, Gary Van Domselaar wrote:

> "J.W. Bizzaro" wrote:
> > 
> > This may be interesting to us in more than one way:
> > 
> >     http://www.conglomerate.org/
> 
> "Conglomerate" is a structured document authoring application with an
> intuitive interface.
> 
> I haven't downloaded a copy, but this does look like a viable
> alternative to docbook. I'll check it out soon, and would like to have
> the honourable Dr. Lapointe take a look at it as well.
> 
> The issues for me, in terms of using conglomerate for writing
> documentation is that their software is new and probably not widely
> adopted.  Their own documentation is sparse, which is ironic considering
> that they are writing a structured document development interface. 
> Their general news web archive is only month old, and their development
> web archive link is broken.  On the other hnad, it does use XML, can
> produce multiple output (HTML, TeX) from a single source, uses XML, and
> provides a very nice interface for authoring structured docs.
> 
> 

I think you will see much progress on Conglomerate, because 
when it was announced on the Gnome Notices (Gnotices, see
http://www.gnome.org/) it got a warm welcome. I think there
is much motivation to make this a "killer app" for gnome.

Greetings,

Michiel Toneman

-- 
I wish there was a knob on the TV to turn up the intelligence. 
There's a knob called "brightness", but it doesn't work.


From dlapointe at mediaone.net  Wed Dec  8 22:04:42 1999
From: dlapointe at mediaone.net (David Lapointe)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Conglomerate
In-Reply-To: <384C0A1B.9F24D3A5@geoserve.net>
References: <384C0A1B.9F24D3A5@geoserve.net>
Message-ID: <99120822122500.00658@gnomen>

On Mon, 06 Dec 1999, J.W. Bizzaro wrote:
> This may be interesting to us in more than one way:
> 
>     http://www.conglomerate.org/

That's a very interesting application.  It would ( or anything for that matter ) be great if it could read
DTD's and generate the proper tags ( withing the hierarchy ) automagically.  

However,  I believe we have to wait for the availablity of conglomerate.  
-- 
 .david
 David Lapointe
"There are two things that are infinite; Human stupidity and the
universe. And I'm not sure about the universe." - Albert Einstein


From bizzaro at geoserve.net  Thu Dec  9 16:07:21 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Directory-XML
Message-ID: <38501A09.6D91978C@geoserve.net>

DSML may be useful for Loci's directory service ('hub'):

    http://www.internetwk.com/story/INW19991207S0007

BTW, I sent Gary the TODO list for review.

Also, fixes to the Workspace bugs, found by Brad, have been added to the CVS.


Cheers.
Jeff


From bizzaro at geoserve.net  Thu Dec 16 19:40:47 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Gnome stuff
Message-ID: <3859868F.E4E3FCA2@geoserve.net>

Locians,

If you're tracking Gnome development (as I am), you might find this interview
with the developers very interesting:

    http://news.gnome.org/gnome-news/945331082/index_html


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From gvd at redpoll.pharmacy.ualberta.ca  Thu Dec 16 20:10:20 1999
From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Gnome stuff
In-Reply-To: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am
Message-ID: <199912170110.SAA25755@redpoll.pharmacy.ualberta.ca>

> 
> Locians,
> 
> If you're tracking Gnome development (as I am), you might find this interview
> with the developers very interesting:
> 
>     http://news.gnome.org/gnome-news/945331082/index_html

Jeff,

Thanks for pointing this interview out.  There is a lot of discussion
relevant to the Loci project, considering our heavy reliance on the GNOME
application development environment.  Most relevant to me are the
discussions related to Conglomerate, the structured text that we reviewed
recently.  It looks like the GNOME team wants very much to use
Conglomerate as the front end to write DocBook documents.  If we can use
Conglomerate to write our DocBook documents, then we _definitely_ should. 
I'm gonna give it a try right away.  

I strongly suggest all Locians review this interview.


From dlapointe at mediaone.net  Thu Dec 16 20:28:54 1999
From: dlapointe at mediaone.net (David Lapointe)
Date: Fri Feb 10 19:19:01 2006
Subject: [Pipet Devel] Re: New Python Book
Message-ID: <99121620304000.00623@gnomen>

The Beasley book is very impressive.  It's nicely organized as a reference book, with examples.

-- 
 .david
 David Lapointe
It is good to have an end to journey toward; but it is the
journey that matters, in the end.--Ursula K. Le Guin


From bizzaro at geoserve.net  Thu Dec 16 21:32:49 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] on a personal note
Message-ID: <3859A0D1.7E102232@geoserve.net>

Greetings fellow Lab Rats!

I would like to announce that I have earned my "Master of Science in
Chemistry/Biochemistry" degree from Boston College.  This comes with my
completion of the degree requirements this fall semester.

My future plans are centered entirely around this organization (The Open Lab)
and The Loci Project.  I will be entering the Doctoral Biochemistry Program at
the University of Massachusetts Lowell, where operations have been based since
the organization's inception.  Yes, this is where I earned my undergraduate
degree in chemistry, but it is most important to me that I am able to continue
my work with The Open Lab, and I believe I would not have that luxury anywhere
else.

Bigger and better: I am working with our advisors (Ken Marx, Rob Harrison and
David Lapointe) and administrators (Gary Van Domselaar, Pete St. Onge and Mark
Luo) to expand and improve services at The Open Lab.  Each expansion or
improvement will be announced as it is made, but I can give you a hint as to
our plans: (1) donation and purchase of more computing hardware, including
some Big Iron to serve Loci applications; (2) change of name to include
"BIOINFORMATICS.ORG", which we will soon be able to use as a domain; (3)
continued development of the bioinformatics portal, which started with the
"Bioinformatics GNU's" news list; and (4) grant awards and corporate
sponsorship.

I would like to thank the advisors, administrators, project coordinators
(Justin Bradford, SooHaeng Yoo, Thomas Sicheritz, Rick Ree and Carlos
Maltzahn), and everyone else, ALL volunteers of their time and effort to make
this organization successful.  I think we have a bright future ahead!


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Thu Dec 16 23:04:20 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] TODO!
Message-ID: <3859B644.7BBE8A46@geoserve.net>

Locians,

We finally have a TODO list (attached).  There are many projects within Loci
and sub-projects within those.  So, I listed them all out in outline format
with a very brief description under each project.  Each project (and most
sub-projects) needs a 'project leader', and some leaders are named BY MY
ASSUMPTION (please confirm).  Where no leaders are identified, you will see a
'???'.  This is where we need YOUR help!  Any suggestions/additions for this
list are of course welcome.

Jeff
-------------- next part --------------
THE LOCI PROJECT; TODO 19991216

LOCI PROJECT LEADER: J.W. Bizzaro <jeff@bioinformatics.org>
ASSISTANT: Gary Van Domselaar <gary@bioinformatics.org>


I.    CORE WORKSPACE

      PROJECT LEADER: J.W. Bizzaro <jeff@bioinformatics.org>

      Part of loci-core.  This project covers the entire GUI for Loci and
      the implementation of GUI extentions.


      A. GUI construction via XML


      B. Dynamic menu generation


      C. Themes


      D. CORBA integration


      E. Bonobo integration


II.   CORE SCRIPTING LANGUAGE (XML-based)

      PROJECT LEADER: ???

      Part of loci-core.  Once a graphical script is generated by the user,
      via the Workspace, it can be executed.  The graphical script will
      therefore need to be represented in text (XML) and executed by an
      interpreter.


      A. Language definition


      B. Interpreter


III.  CORE DATABASE CONNECTIVITY

      PROJECT LEADER: Brad Chapman? <chapmanb@arches.uga.edu>

      Part of loci-core.  The line is blurred between what is a 'real'
      database being used by Loci and just about everything else.


      A. Representation of filesystem as containers


      B. Representation of databases as containers


IV.   CORE DIRECTORY SERVICES (formerly called 'hub')

      PROJECT LEADER: ???

      Part of loci-core.  Akin to domain name serving, a world-wide registry
      needs to be made containing what loci are available where.  Each copy
      of Loci will in fact have the ability to contact others to find out
      what is _pulicly_ available there.  All copies of Loci should register
      their available loci with a central registry too.


V.    CORE UTILITIES

      PROJECT LEADER: ???

      Part of loci-core.  This includes helper applications that are
      external to Loci.  What would be interesting is finding a way to
      run these as loci.


      1. Installation Manager

         SUB-PROJECT LEADER: ???


      2. User Preferences Configuration

         SUB-PROJECT LEADER: ???


VI.   PYTHON BINDINGS

      PROJECT LEADER: Justin Bradford <justin@ukans.edu>

      Since most/all of Loci's core is written in Python and uses Gnome
      libraries, several bindings are needed.


      A. GTK/GNOME

         These already exist, thanks to James Henstridge.


      B. ORBit


      C. Bonobo


VII.  WEB INTERFACE (loci-web)

      PROJECT LEADER: David Lapointe? <david.lapointe@umassmed.edu>

      This would replace the Workspace and allow a limited number of
      loci to run via Web browser.


VIII. CORE WRAPPERS AND EXTENSIONS

      PROJECT LEADER: J.W. Bizzaro <jeff@bioinformatics.org>

      Part of loci-core.  These are basic loci that come with each copy
      of Loci.


      A. Locus output to command-line

         SUB-PROJECT LEADER: J.W. Bizzaro <jeff@bioinformatics.org>


      B. Locus input from command-line/stdout

         SUB-PROJECT LEADER: Thomas Junier? <thomas.junier@isrec.unil.ch>


      C. Generic XML database

         SUB-PROJECT LEADER: Brad Chapman? <chapmanb@arches.uga.edu>


IX.   BIOINFORMATICS WRAPPERS AND EXTENSIONS (loci-bio)

      PROJECT LEADER: ???

      These are loci for basic bioinformatics research.


      A. Bioinformatics XML and Converters
         ('internal format')

         SUB-PROJECT LEADER: Humberto Otiz Zuazaga? <hortiz@neurobio.upr.clu.edu>


      B. Misc. Converters

         SUB-PROJECT LEADER: ???

         1. GenBank to Raw Sequence


X.    EMBOSS WRAPPERS (loci-emboss)

      PROJECT LEADER: David Lapointe <david.lapointe@umassmed.edu>

      These are loci for running EMBOSS under Loci.


XI.   DOCUMENTATION

      PROJECT LEADER: Gary Van Domselaar <gary@bioinformatics.org>
      ASSISTANT: David Lapointe? <david.lapointe@umassmed.edu>


From chapmanb at arches.uga.edu  Sat Dec 18 01:29:09 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
In-Reply-To: <199912170110.SAA25755@redpoll.pharmacy.ualberta.ca>
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec
 17, 99 00:40:47 am
Message-ID: <l03130301b480d0faab55@[172.16.0.2]>

Oh great locians;
	Jeff and Gary--thanks much for mentioning this interview. As I read
throught it with the brand new ToDo list in hand, I had a couple of
questions about what is relevant/not relevant to Loci:

1. Bonobo: What are everyone's thoughts on the reliance of Loci on Bonobo?
It sounds like, if I read the description correctly, Bonobo implements a
wrapper around CORBA which allows linking of multiple objects or, to quote
from the interview: 'Think "multi-directional pipes".' Is the plan for
implementing Loci to make it a wrapper around Bonobo so that we have:

(((Loci ((Bonobo (CORBA/ORBit) Bonobo)) Loci)))

or rather, a wrapper around a wrapper around ORBit? I guess this falls into
ID. and IE. in the ToDo outline: CORBA and Bonobo integration (well, and
also VIC. python bindings for bonobo!)

2. GConf: This is described in the interview as "an API for storing
configuration data...for now just XML text files." Is this something that
can be utilized for storing the core scripting language described in II. of
the ToDo?

3. The as-yet-unamed replacement for the GMC file manager: According to the
interview this new manager "..is designed to be able to plug in Bonobo
components so that you can install viewers for different types of files or
different file systems altogether." Is this something that we should
investigate for representing the Loci file system (ie. IIIA.) or am I
totally off in thinking it does a simiar thing to what we need for managing
files?

	Is there more stuff in there that could be useful to us? How about
other gnome stuff that I haven't mentioned here? I guess I am not
completely clear on how much Loci will be integrated into the GNOME project
so if anyone could "throw me a friggin' bone" on this, I would be quite
appreciative!
	Along these lines, if we are going to be using a lot of gnome
libraries/programs, do you think it would be worthwhile to keep a listing
of "useful gnome stuff" or something along those lines, to make it easier
to dig into the gnome api's? Also, maybe this way we could divide up the
process of understanding different parts of gnome and thus making the
learning curve for diving into it a little less steep... (at least for me!)

Brad


From chapmanb at arches.uga.edu  Sat Dec 18 01:58:34 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Random thoughts on application wrapping
Message-ID: <l03130302b480da45da97@[172.16.0.2]>

Hello all!
	I was messing around with CORBA and trying to make myself a little
piped program that called an already existing program when I got hard into
thinking about how to wrap applications so that they can be run within the
python framework of Loci. So I was just wondering, what is the proposed
mechanism for taking an existing program (say the dnacomp program of
phylip, written in c) and allowing it to be called from a loci script. I
could come up with two possible ways to do this:

1. The Applab way: Applab, a java application wrapper for CORBA (which has
been mentioned on the list several times) does the following to incorporate
a program:
  a. has an IDL interface for controlling and running outside apps (ie.
  our dnacomp program)
  b. requires the construction of a meta-data file describing the interface.
  c. parses (using a perl script) the meta-data file into java code which fits
  into the server side implementation and wraps the program.

2. The other way I could think of: This way would be to generate a wrapper
for each individual program based on its language and the methods that are
avaiable to do that. For instance:
  a. we could wrap C and C++ programs using SWIG (http://www.swig.org)
  b. we could deal with Java programs by using JPython to input their classes
  and then do scripting between them.
  c. we could deal with Perl/Tcl programs by using Minotaur
  (http://mini.net/pub/ts2/minotaur.html) to imbed perl and tcl scripts into
  python classes and then run them from there.

Either way has pluses and minuses. I think the first way is nice because it
allows a consistent method to "port" a program to Loci. However, unless we
decided to use the applab language and/or parser, we would have to describe
our own input language and then design a parser to deal with it. The second
way uses already exciting programs, but it a lot uglier because it is
different for every app ported and thus makes the porting process very
difficult for a non-programming-user of Loci.
	Is any of these two ways I mentioned about something anyone else
was thinking for wrapping programs so they can run in Loci? Or have I
completely forgotten an obvious way. Thanks in advance for any help anyone
can provide on this dillemna of mine!

Brad


From bizzaro at geoserve.net  Sat Dec 18 20:01:40 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec
	 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]>
Message-ID: <385C2E74.CF6F6056@geoserve.net>

Brad Chapman wrote:
> 
> 1. Bonobo: What are everyone's thoughts on the reliance of Loci on Bonobo?
> It sounds like, if I read the description correctly, Bonobo implements a
> wrapper around CORBA which allows linking of multiple objects or, to quote
> from the interview: 'Think "multi-directional pipes".' Is the plan for
> implementing Loci to make it a wrapper around Bonobo so that we have:
> 
> (((Loci ((Bonobo (CORBA/ORBit) Bonobo)) Loci)))
> 
> or rather, a wrapper around a wrapper around ORBit? I guess this falls into
> ID. and IE. in the ToDo outline: CORBA and Bonobo integration (well, and
> also VIC. python bindings for bonobo!)

Well, if you consider the relationship between a program and its library to be
the same as a wrapper and its 'wrappee', then yes...sort of.  Bonobo is just
one of the libraries we're using, and we're using it to...

    (1) Include non-python GUIs in the Workspace
    (2) Include Loci's GUI in other apps, if there is a need to do so

> 2. GConf: This is described in the interview as "an API for storing
> configuration data...for now just XML text files." Is this something that
> can be utilized for storing the core scripting language described in II. of
> the ToDo?

Hmmmm.  I understood GConf to be akin to the Windows Registry.  I can't
imagine trying to put all of our XML into it, especially if the script XML
includes the GUI XML, etc.

> 3. The as-yet-unamed replacement for the GMC file manager:

GFM: Gnome File Manager, right?  That's what I had seen.

> According to the
> interview this new manager "..is designed to be able to plug in Bonobo
> components so that you can install viewers for different types of files or
> different file systems altogether." Is this something that we should
> investigate for representing the Loci file system (ie. IIIA.) or am I
> totally off in thinking it does a simiar thing to what we need for managing
> files?

GFM is, disappointingly, a clone of Windows Explorer running with the Active
Desktop.  So, the viewers in GFM are 'just' giving you a preview/thumbnail of
the file in one corner of GFM's window.

I never really thought that Loci's 'file system', or the way files and
directories are shown, would provide previews or thumbnails.  It's an
interesting idea that we can pursue later.  But for now, I'd like to see each
directory on the file system be represented as a 'container locus'.  If you
double-click on such a container, you get a windowlet just like any other
locus.  But a container's windowlet is a 'list' widget that lists the contents
of the directory:

        +------+
        | cont |                 <----- icon
        | ainer|
        +------+
    +---------------+
    |    file       |
    |    file       |
    |    file       |            <------ windowlet
    |    container  |
    |    file       |
    +---------------+

Since the contents are either files or directories, and these are
automatically simple loci, a container is (by definition) a locus that
contains other loci.  And some of these loci inside of said container are
directories, which are again containers, so we have the directory heirarchy
represented as loci inside of loci ad infinitum (or containers inside of
containers...).

For now the 'list' widget just gives the names and icons of the files and
directories (loci) held by the container.  And you can drag-and-drop loci
to-and-from the container's list and Workspace!

We need to start off with a sufficiently high level container/directory, say
the system's root directory (?)

>         Is there more stuff in there that could be useful to us? How about
> other gnome stuff that I haven't mentioned here? I guess I am not
> completely clear on how much Loci will be integrated into the GNOME project
> so if anyone could "throw me a friggin' bone" on this, I would be quite
> appreciative!

I wouldn't say that Loci is being integrated into Gnome, although the thought
of having Loci serve as the desktop for Gnome has surfaced recently (I think
it's unlikely to happen).  Gnome rather serves as some development tools for
Loci.  And communication with other Gnome applications can be facilitated via
CORBA and Bonobo.

>         Along these lines, if we are going to be using a lot of gnome
> libraries/programs, do you think it would be worthwhile to keep a listing
> of "useful gnome stuff" or something along those lines, to make it easier
> to dig into the gnome api's? Also, maybe this way we could divide up the
> process of understanding different parts of gnome and thus making the
> learning curve for diving into it a little less steep... (at least for me!)

For now, I'm sure we're using...

    gnome-libs           (what gnome-python wraps)
    bonobo
    orbit

These are 3 distinct packages/parts to Gnome.  There is no need to look beyond
these, so our use of Gnome is less confusing than you may be thinking.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Sat Dec 18 20:24:47 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Random thoughts on application wrapping
References: <l03130302b480da45da97@[172.16.0.2]>
Message-ID: <385C33DF.FF079473@geoserve.net>

Brad Chapman wrote:
> 
>         I was messing around with CORBA and trying to make myself a little
> piped program that called an already existing program when I got hard into
> thinking about how to wrap applications so that they can be run within the
> python framework of Loci. So I was just wondering, what is the proposed
> mechanism for taking an existing program (say the dnacomp program of
> phylip, written in c) and allowing it to be called from a loci script. I
> could come up with two possible ways to do this:

If it runs from the command-line and is non-interactive, the default (although
somewhat sloppy perhaps) method is the one I outlined in the message
'constructing the command-line'.

Otherwise, we should have a SET of tools available for wrapping proggies FROM
THE WORKSPACE.

> 1. The Applab way: Applab, a java application wrapper for CORBA (which has
> been mentioned on the list several times) does the following to incorporate
> a program:
>   a. has an IDL interface for controlling and running outside apps (ie.
>   our dnacomp program)
>   b. requires the construction of a meta-data file describing the interface.
>   c. parses (using a perl script) the meta-data file into java code which fits
>   into the server side implementation and wraps the program.

For more sophisticated wrappings, we want to use the AppLab approach.  Since
nothing has really been finalized about just how we will use CORBA, we will
simply copy AppLab.

SEView, by Thomas Junier who is on this list, will also give us some ideas
about converting text output into graphical presentations:

    http://www.bioinfo.de/isb/1998/01/0003/

> 2. The other way I could think of: This way would be to generate a wrapper
> for each individual program based on its language and the methods that are
> avaiable to do that. For instance:
>   a. we could wrap C and C++ programs using SWIG (http://www.swig.org)
>   b. we could deal with Java programs by using JPython to input their classes
>   and then do scripting between them.
>   c. we could deal with Perl/Tcl programs by using Minotaur
>   (http://mini.net/pub/ts2/minotaur.html) to imbed perl and tcl scripts into
>   python classes and then run them from there.

Again, the person who wraps the program must be able to choose what will work
best, and this will mean having many options.  For example, David Lapointe
will be working on a method to wrap EMBOSS apps.  It will be generic enough to
handle any EMBOSS application but will still be a wrapping solution for EMBOSS
alone.

> Either way has pluses and minuses. I think the first way is nice because it
> allows a consistent method to "port" a program to Loci. However, unless we
> decided to use the applab language and/or parser, we would have to describe
> our own input language and then design a parser to deal with it. The second
> way uses already exciting programs, but it a lot uglier because it is
> different for every app ported and thus makes the porting process very
> difficult for a non-programming-user of Loci.
>         Is any of these two ways I mentioned about something anyone else
> was thinking for wrapping programs so they can run in Loci? Or have I
> completely forgotten an obvious way. Thanks in advance for any help anyone
> can provide on this dillemna of mine!

Perhaps all of these should be pursued.  This is the sort of 'middleware' that
doesn't affect the 'front-endware' (Workspace) and is not specific to any one
'back-endware' application.  Like almost everthing else in Loci, wrapping
solutions are plug-ins/loci.

Bottom line: let's start with AppLab's approach and then look into the others
a little later.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From chapmanb at arches.uga.edu  Sun Dec 19 03:13:36 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
In-Reply-To: <385C2E74.CF6F6056@geoserve.net>
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec	
 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]>
Message-ID: <l03130301b48231d175ce@[172.16.0.2]>

>
>Well, if you consider the relationship between a program and its library to be
>the same as a wrapper and its 'wrappee', then yes...sort of.  Bonobo is just
>one of the libraries we're using, and we're using it to...
>
>    (1) Include non-python GUIs in the Workspace
>    (2) Include Loci's GUI in other apps, if there is a need to do so
>

Aaaaa, gotcha. Sorry, I was thinking of bonobo as being more than it
actually was (I was thinking of it as covering CORBA, so that you make all
of your interfaces through bonobo, rather than CORBA). I don't consider a
program:library relationship to be the same as a wrapper:wrappee
relationship at all! So, bonobo is just for embedding a component of one
program inside the container of another program. Okay.

>
>Hmmmm.  I understood GConf to be akin to the Windows Registry.  I can't
>imagine trying to put all of our XML into it, especially if the script XML
>includes the GUI XML, etc.
>

Ack. Windows Registry. Bad! Bad! I'll forget I even mentioned GConf!

>
>GFM is, disappointingly, a clone of Windows Explorer running with the Active
>Desktop.  So, the viewers in GFM are 'just' giving you a preview/thumbnail of
>the file in one corner of GFM's window.
>>I never really thought that Loci's 'file system', or the way files and
>directories are shown, would provide previews or thumbnails.  It's an
>interesting idea that we can pursue later.

Okee-dokee. I agree, I don't think it's really necessary to have previews now.

>But for now, I'd like to see each
>directory on the file system be represented as a 'container locus'.  If you
>double-click on such a container, you get a windowlet just like any other
>locus.  But a container's windowlet is a 'list' widget that lists the contents
>of the directory:
>
>        +------+
>        | cont |                 <----- icon
>        | ainer|
>        +------+
>    +---------------+
>    |    file       |
>    |    file       |
>    |    file       |            <------ windowlet
>    |    container  |
>    |    file       |
>    +---------------+
>

Okay--will this be stored directly in the XML representing the workspace
graphical script? For instance if we had a container representation like
the following:

        +-------+
        | my seq|                 <----- icon
        | files |
        +-------+
    +-------------------+
    |    gb file 1      |
    |    gb file 2      |
    |  			|            <------ windowlet
    |    fasta_files    |
    |    phylip_files	|
    +-------------------+

where my_seq_files is a directory I store all of my sequence files in, gb
files 1 and 2 are just genbank formatted files, and fasta_files and
phylip_files are directories with fasta and phylip files, respectively
(sorry, I should be thinking up physics examples instead of bioinformatics
examples!). Then, if we represent this in the XML script (assuming that
this container is located on the main workspace) as:

<container>
 <name>loci_root</name>
 <location>/usr/local/loci/workspace</location>

 <container>
   <name>my_seq_files</name>
   <location>/usr/home/chapmanb/my_seq_files</location>

   <file>
    <name>gb_file_1.gb</name>
   </file>

   <file>
    <name>gb_file_2.gb</name>
   </file>

   <container>
    <name>fasta_files</name>
    ...the contents of the directory
   </container>

   <container>
    <name>phylip_files</name>
    ...the contents of the directory
   </container>
 </container>
</container>

If /usr/local/loci/workspace is where everything is analagous to the root
directory in a web server, this is where the "Loci filesystem" starts. Then
if a user double clicks on the my_seq_files container icon, we would go
through the XML to look for my_seq_files and find it at
loci_root.my_seq_files, which would by default be located at
/usr/local/loci/workspace/my_seq_files. In this example, I figured that the
user would probably have their sequence files located in some home
directory and not on the loci filesystem (/usr/home/chapmanb/my_seq_files,
in this example). So then we have something analagous to a symbolic link,
with /usr/local/loci/workspace/my_seq_files ->
/usr/home/chapmanb/my_seq_files. So my idea here is that the location of a
file or directory would be with respect to the loci_root directory unless
there is a <location> tag directly specifying to look elsewhere.
	Anyways, is this along the lines of what people were thinking for
the representation? I haven't really said anything about how to actually
generate this kind of XML, but I just wanted to make sure I was on the
right track!

>
>For now the 'list' widget just gives the names and icons of the files and
>directories (loci) held by the container.  And you can drag-and-drop loci
>to-and-from the container's list and Workspace!
>

Okay, so if in the above example I wanted to move gb_file_1.gb from the
container to the main workspace, I would drag it into the workspace and in
the real directory structure, the file would move from
/usr/home/chapmanb/my_seq_files/gb_file_1.gb to
/usr/local/loci/workspace/gb_file_1.gb? Do you want the file to actually
move, or just to create a link to the file from inside the
/usr/local/loci/workspace directory system?

>
>For now, I'm sure we're using...
>
>    gnome-libs           (what gnome-python wraps)
>    bonobo
>    orbit
>
>These are 3 distinct packages/parts to Gnome.  There is no need to look beyond
>these, so our use of Gnome is less confusing than you may be thinking.

Okay, that is what I was originally thinking, then I started to confuse
myself by thinking about all of these other gnome programs etc. Thanks for
clarifying!

Brad


From bizzaro at geoserve.net  Sun Dec 19 12:12:27 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec	
	 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]> <l03130301b48231d175ce@[172.16.0.2]>
Message-ID: <385D11FB.364D9A9A@geoserve.net>

Brad Chapman wrote:
> 
> So, bonobo is just for embedding a component of one
> program inside the container of another program. Okay.

Right, and I think only GUI components.

> Okay--will this be stored directly in the XML representing the workspace
> graphical script? For instance if we had a container representation like
> the following:
[cut]
>  <location>/usr/local/loci/workspace</location>

But since the network is transparent to Loci, all <locations>s have a URI. 
So, the above location would be something like this:

    <location>locus://localhost/</location>

where

    /usr/local/loci/

Is the root directory for Loci.

***JUST USE APACHE AS AN EXAMPLE***

    /home/httpd/html/

Is the root directory for Apache Web pages (on RedHat), and this is given the
URI/URL

    http://localhost/

Note that Apache can access _local_ directories using the same URL mechanism
for _remote_ access.  THIS IS HOW LOCI WILL WORK.

Perhaps Loci's root should be

    /home/loci/

(I can see we'll get some arguments about this from BSD users :-))

>  <container>
>    <name>my_seq_files</name>
>    <location>/usr/home/chapmanb/my_seq_files</location>
> 
>    <file>
>     <name>gb_file_1.gb</name>
>    </file>
> 
>    <file>
>     <name>gb_file_2.gb</name>
>    </file>
> 
>    <container>
>     <name>fasta_files</name>
>     ...the contents of the directory
>    </container>
> 
>    <container>
>     <name>phylip_files</name>
>     ...the contents of the directory
>    </container>
>  </container>
> </container>

Right.  When the container locus is made, all of the XML is generated,
including that of the windowlet contents, which in the case of a container, is
the _directory_ contents.

> If /usr/local/loci/workspace is where everything is analagous to the root
> directory in a web server, this is where the "Loci filesystem" starts.

You're correct that Loci should not have access to '/'.  This would be a
security problem, especially when a container can point to (via URI) the
filesystem of a remote computer.  Maybe we should have 2 branches under Loci's
root directory:

    /home/loci/public/
    /home/loci/private/

Remote Loci can then only access what is in the public directory.

> Then
> if a user double clicks on the my_seq_files container icon, we would go
> through the XML to look for my_seq_files and find it at
> loci_root.my_seq_files, which would by default be located at
> /usr/local/loci/workspace/my_seq_files.

Yes.

> In this example, I figured that the
> user would probably have their sequence files located in some home
> directory and not on the loci filesystem (/usr/home/chapmanb/my_seq_files,
> in this example). So then we have something analagous to a symbolic link,
> with /usr/local/loci/workspace/my_seq_files ->
> /usr/home/chapmanb/my_seq_files. So my idea here is that the location of a
> file or directory would be with respect to the loci_root directory unless
> there is a <location> tag directly specifying to look elsewhere.

Good idea.  You get a star.

Regarding a user accessing things in his home directory, and even making some
things public, we can do what Apache does and have

    locus://bradcom.com/~brad/

point to

    /home/brad/loci/

and you would have

    /home/brad/loci/public/
    /home/brad/loci/private/
    /home/brad/loci/workspace/

too.

>         Anyways, is this along the lines of what people were thinking for
> the representation? I haven't really said anything about how to actually
> generate this kind of XML, but I just wanted to make sure I was on the
> right track!

You are correct, sir!

> Okay, so if in the above example I wanted to move gb_file_1.gb from the
> container to the main workspace, I would drag it into the workspace and in
> the real directory structure, the file would move from
> /usr/home/chapmanb/my_seq_files/gb_file_1.gb to
> /usr/local/loci/workspace/gb_file_1.gb? Do you want the file to actually
> move, or just to create a link to the file from inside the
> /usr/local/loci/workspace directory system?

Hmmmm.  I suppose the user can either 'copy' or 'move' something onto/from the
Workspace (and other areas) just like using a file manager (copy means the
original stays, and move means the original is deleted).  But I don't think
the user should be able to 'move' a file to/from a _remote_ system, so only
copying would be allowed in such a case.  I'd say that by default a DnD would
move a locus, unless it is remote (not on the local filesystem).

Keep in mind though, that when the user manipulates loci, he/she is only
manipulating an XML _representation_ of something that can exist anywhere on
the Internet (and is _always_ referenced to via URI).  Since that XML
representation should be small (about the size of a typical Web page), the
transfer of it should be trivial.  So, I wouldn't create symlinks but just
copy or move the XML representations.

The transfer of the actual program or data that the locus represents is
another case altogether.  I think this can be handled (in a GUI sense) via
pop-up menu option and not DnD.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Sun Dec 19 13:49:45 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec	
		 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]> <l03130301b48231d175ce@[172.16.0.2]> <385D11FB.364D9A9A@geoserve.net>
Message-ID: <385D28C9.BB3CBCF2@geoserve.net>

"J.W. Bizzaro" wrote:
> 
> filesystem of a remote computer.  Maybe we should have 2 branches under Loci's
> root directory:
> 
>     /home/loci/public/
>     /home/loci/private/

I changed my mind.  I think to facilitate the use of Loci as a shell,
'private' access can be from _anywhere_ on the local filesystem.  Public
access would be from

    /home/loci/public/

or

    /home/username/loci/public/


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Sun Dec 19 15:19:01 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Loci as a locus
Message-ID: <385D3DB5.6AA76839@geoserve.net>

Wise and mighty Locians,

I haven't mentioned this before, but the thought came a while back about
embedding a copy of Loci within Loci so that it runs as a locus.

Where did this idea come from?  Well, I was thinking about what would happen
if you made a Workflow Diagram or graphical script where some outputs were
left unspecified (little dots not connected).  Loci should then send the
outputs to stdout, right?  Then I realized the same would apply to unspecified
inputs: They should come from stdin.  Or maybe, since we could have multiple
connectors unconnected, you could specify on THE COMMAND-LINE, what to do with
them:

    $ loci -i1 <input1> -i2 <input2> -o1 <output1>

So, hmmm, if Loci can run like this from the command-line, maybe Loci too can
be wrapped to run inside of Loci!

What's the use of this?  I'm thinking along the line of a composite locus. 
Since you can put parts of a WFD/script inside of composite locus (note that a
composite locus differs from a container locus in that with the former, the
connections/workflow are preserved), you should be able to view them (in a
windowlet) as being in their own Workspace.  And, if you look at the
unconnected dots/lines (connectors) on the composite's Workspace, they should
match the dots/lines on the composite's icon.  (We'll probably then need a way
to dynamically add and remove dots/lines (connectors) from an icon (locus).

This also gives Loci a 'workspace in a workspace' functionality like that of
AVS:

    http://www.avs.com/products/expdev/images/NE.GIF

I just thought I'd fill you in on this new feature.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Sun Dec 19 15:40:19 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Loci as a locus
References: <385D3DB5.6AA76839@geoserve.net>
Message-ID: <385D42B3.497A76CD@geoserve.net>

"J.W. Bizzaro" wrote:
> 
> Since you can put parts of a WFD/script inside of composite locus (note that a
> composite locus differs from a container locus in that with the former, the
> connections/workflow are preserved), you should be able to view them (in a
> windowlet) as being in their own Workspace.

IOW, a composite locus is an instance of Loci.

But visa versa is true: Loci is a composite locus.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From gvd at redpoll.pharmacy.ualberta.ca  Sun Dec 19 16:51:36 1999
From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec	
		 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]> <l03130301b48231d175ce@[172.16.0.2]> <385D11FB.364D9A9A@geoserve.net>
Message-ID: <385D5368.D998F7AC@redpoll.pharmacy.ualberta.ca>

> 
> Regarding a user accessing things in his home directory, and even making some
> things public, we can do what Apache does and have
> 
>     locus://bradcom.com/~brad/
> 
> point to
> 
>     /home/brad/loci/
> 
> and you would have
> 
>     /home/brad/loci/public/
>     /home/brad/loci/private/
>     /home/brad/loci/workspace/
> 
> too.

If we were to follow the apache example, we would not specify a public
and private directory explicitly, but rather use an authentication
procedure (like apache's .htaccess) to create private (or perhaps
'restricted') directories from publically accessible ones. So

/home/brad/loci/public_loci/   			//unrestricted access, network
viewable
/home/brad/loci/public_loci/germ_warfare/	//restricted access, network
viewable 


Of course, like apache, there's nothing stopping you from _making_ a
separate directory to contain your private files

/home/brad/loci/private_loci/			//completely private, network hidden

> 
> >         Anyways, is this along the lines of what people were thinking for
> > the representation? I haven't really said anything about how to actually
> > generate this kind of XML, but I just wanted to make sure I was on the
> > right track!
> 
> You are correct, sir!
> 
> > Okay, so if in the above example I wanted to move gb_file_1.gb from the
> > container to the main workspace, I would drag it into the workspace and in
> > the real directory structure, the file would move from
> > /usr/home/chapmanb/my_seq_files/gb_file_1.gb to
> > /usr/local/loci/workspace/gb_file_1.gb? Do you want the file to actually
> > move, or just to create a link to the file from inside the
> > /usr/local/loci/workspace directory system?
> 
> Hmmmm.  I suppose the user can either 'copy' or 'move' something onto/from the
> Workspace (and other areas) just like using a file manager (copy means the
> original stays, and move means the original is deleted).  But I don't think
> the user should be able to 'move' a file to/from a _remote_ system, so only
> copying would be allowed in such a case.  I'd say that by default a DnD would
> move a locus, unless it is remote (not on the local filesystem).
> 
> Keep in mind though, that when the user manipulates loci, he/she is only
> manipulating an XML _representation_ of something that can exist anywhere on
> the Internet (and is _always_ referenced to via URI).  Since that XML
> representation should be small (about the size of a typical Web page), the
> transfer of it should be trivial.  So, I wouldn't create symlinks but just
> copy or move the XML representations.
> 
> The transfer of the actual program or data that the locus represents is
> another case altogether.  I think this can be handled (in a GUI sense) via
> pop-up menu option and not DnD.

For DnD, you may want to consider providing the user with option to do a
move, copy, or symbolic link, via pop-up menu, in direct analogy to
right-button DnD in Windoze.


gary
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gary Van Domselaar		gvd@redpoll.pharmacy.ualberta.ca
Faculty of Pharmacy 		Phone: (780) 492-4493
University of Alberta		FAX:   (780) 492-5305
Edmonton, Alberta, Canada       http://redpoll.pharmacy.ualberta.ca/~gvd


From bizzaro at geoserve.net  Sun Dec 19 18:41:40 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec	
			 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]> <l03130301b48231d175ce@[172.16.0.2]> <385D11FB.364D9A9A@geoserve.net> <385D5368.D998F7AC@redpoll.pharmacy.ualberta.ca>
Message-ID: <385D6D34.327CC401@geoserve.net>

Gary Van Domselaar wrote:
> 
> If we were to follow the apache example, we would not specify a public
> and private directory explicitly, but rather use an authentication
> procedure (like apache's .htaccess) to create private (or perhaps
> 'restricted') directories from publically accessible ones. So
> 
> /home/brad/loci/public_loci/                    //unrestricted access, network
> viewable

Is this directory _automatically_ an unrestricted area?

Like I was saying in my follow-up message, we probably just need some
loci/public/ directories as security 'sandboxes'.

> /home/brad/loci/public_loci/germ_warfare/       //restricted access, network
> viewable

So, we can have a '.access' file that will cause Loci to ask for a login?  I
like that.

> Of course, like apache, there's nothing stopping you from _making_ a
> separate directory to contain your private files
> 
> /home/brad/loci/private_loci/                   //completely private, network hidden

Of course, EVERYTHING outside of loci/public/ should be private.  You can make
a loci/private directory, but it won't be any different from any non-
loci/public/ directory.  IOW, it wouldn't be neccessary.

I wonder how this 'Apache approach' meshes with CORBA.  CORBA has its own
security protocols, right?  Would anyone in-the-know care to comment on this?

> > The transfer of the actual program or data that the locus represents is
> > another case altogether.  I think this can be handled (in a GUI sense) via
> > pop-up menu option and not DnD.
> 
> For DnD, you may want to consider providing the user with option to do a
> move, copy, or symbolic link, via pop-up menu, in direct analogy to
> right-button DnD in Windoze.

So, a button3 DnD would bring up a dialog.

Button1 DnD would by default move a locus if source and destination are both
local.

Button1 DnD would by default copy a locus if either source or destination (or
both) are remote.  (This is typically how inter-filesystem transfers work on
the Mac and Windows.)

What about _writing_ to a _remote_ container?  If I do a DnD from my local
Workspace to a remote container, should I have write permissions?  This might
be a good mechanism for 'sharing loci'.  This certainly would require a login
of some sort.

So, should a .access file be required for any remote writing to a filesystem? 
Or should 'writers' have a shell account, as we have CVS set up (I think you
can give CVS write access to someone who doesn't have a shell account)?


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From gvd at redpoll.pharmacy.ualberta.ca  Sun Dec 19 20:50:02 1999
From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Loci as a locus
References: <385D3DB5.6AA76839@geoserve.net>
Message-ID: <385D8B4A.4870DBE7@redpoll.pharmacy.ualberta.ca>

"J.W. Bizzaro" wrote:
> 
> Wise and mighty Locians,
> 
> I haven't mentioned this before, but the thought came a while back about
> embedding a copy of Loci within Loci so that it runs as a locus.
> 
> Where did this idea come from?  Well, I was thinking about what would happen
> if you made a Workflow Diagram or graphical script where some outputs were
> left unspecified (little dots not connected).  Loci should then send the
> outputs to stdout, right?  Then I realized the same would apply to unspecified
> inputs: They should come from stdin.  Or maybe, since we could have multiple
> connectors unconnected, you could specify on THE COMMAND-LINE, what to do with
> them:
> 
>     $ loci -i1 <input1> -i2 <input2> -o1 <output1>
> 
> So, hmmm, if Loci can run like this from the command-line, maybe Loci too can
> be wrapped to run inside of Loci!

This strange loopiness reeks of Godel, Escher and Bach. I love it.  
Actually, I was thinking about some of Brad's suggestions for wrapping
backend apps and it struck me that, AFAIK, the only programs that Loci
can really 'wrap' are the ones in which Loci can control the redirection
of stdin and stdout.  This includes command-line driven apps, CORBAfied
apps, cgi scripts, and so on, but not apps with 'fixed' stdin and
stdout. This includes a large number of apps where the processing and
the GUI are 'integrated'. Input is typically restricted to file (or
database), keyboard, and mouse, and output is typically restricted to
the GUI display or file (or database).  We would be remiss to make an
application-wrapping framework that itself cannot be wrapped.  I just
wonder if CORBA might be a better solution than the command-line?

gary
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gary Van Domselaar		gvd@redpoll.pharmacy.ualberta.ca
Faculty of Pharmacy 		Phone: (780) 492-4493
University of Alberta		FAX:   (780) 492-5305
Edmonton, Alberta, Canada       http://redpoll.pharmacy.ualberta.ca/~gvd


From gvd at redpoll.pharmacy.ualberta.ca  Sun Dec 19 22:39:24 1999
From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] Gnome stuff
References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec	
				 17, 99 00:40:47 am <l03130301b480d0faab55@[172.16.0.2]> <l03130301b48231d175ce@[172.16.0.2]> <385D11FB.364D9A9A@geoserve.net> <385D5368.D998F7AC@redpoll.pharmacy.ualberta.ca> <385D6D34.327CC401@geoserve.net>
Message-ID: <385DA4EC.3698D0E6@redpoll.pharmacy.ualberta.ca>

"J.W. Bizzaro" wrote:
> 
> Gary Van Domselaar wrote:
> >
> > If we were to follow the apache example, we would not specify a public
> > and private directory explicitly, but rather use an authentication
> > procedure (like apache's .htaccess) to create private (or perhaps
> > 'restricted') directories from publically accessible ones. So
> >
> > /home/brad/loci/public_loci/                    //unrestricted access, network
> > viewable
> 
> Is this directory _automatically_ an unrestricted area?

I would suggest that loci's configuration utility would provide a
directive for identifying the default 'public_loci', but as with any
Unix filesystem, would require the proper permissions attributes in
order to make it truly 'world readable'.  The directory would not be
created by loci, but created by the Loci user who wants to have a 'Loci
site' ;-)


> > For DnD, you may want to consider providing the user with option to do a
> > move, copy, or symbolic link, via pop-up menu, in direct analogy to
> > right-button DnD in Windoze.
> 
> So, a button3 DnD would bring up a dialog.
> 
> Button1 DnD would by default move a locus if source and destination are both
> local.
> 
> Button1 DnD would by default copy a locus if either source or destination (or
> both) are remote.  (This is typically how inter-filesystem transfers work on
> the Mac and Windows.)
> 
> What about _writing_ to a _remote_ container?  If I do a DnD from my local
> Workspace to a remote container, should I have write permissions?  This might
> be a good mechanism for 'sharing loci'.  This certainly would require a login
> of some sort.
> 
> So, should a .access file be required for any remote writing to a filesystem?
> Or should 'writers' have a shell account, as we have CVS set up (I think you
> can give CVS write access to someone who doesn't have a shell account)?

I like the .access idea, for writing to remote filesystems, but I dont
know enough about CORBA's authentication capabilities (although I know
that they are provided for in the OMG's LSG's IDL) to make a decent
comparison between the two approaches.  I suspect a .access file would
require extra work for the Loci developers, if CORBA can do the same
thing, perhaps that is a better solution.  Maybe Justin has some
relevant comments on this...


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Gary Van Domselaar		gvd@redpoll.pharmacy.ualberta.ca
Faculty of Pharmacy 		Phone: (780) 492-4493
University of Alberta		FAX:   (780) 492-5305
Edmonton, Alberta, Canada       http://redpoll.pharmacy.ualberta.ca/~gvd


From bizzaro at geoserve.net  Sun Dec 19 23:41:53 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:02 2006
Subject: [Pipet Devel] get it while it's hot
Message-ID: <385DB391.80F3F565@geoserve.net>

Locians,

Here is the latest snapshot:

  http://bioinformatics.org/loci/download/snapshots/loci-core-19991219.tar.gz

I made some improvements to dialog and menu handling, which you may or may not
notice.

Also, I managed to do what I wrote about today: make Loci appear in a
composite locus's windowlet.  Check it out, but I have to warn you, the
windowlet Loci doesn't work...something to do with event handling.  If anyone
on the list is a PyGTK expert, it'd be great if you could take a look.

Some of the errors Brad found are in this snapshot.

Note that you can get all of this from CVS too.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Sun Dec 19 23:56:46 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] get it while it's hot
References: <385DB391.80F3F565@geoserve.net>
Message-ID: <385DB70E.B0F44F87@geoserve.net>

"J.W. Bizzaro" wrote:
> 
> Some of the errors Brad found are in this snapshot.

I mean, "are fixed in this snapshot."  :-)


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From chapmanb at arches.uga.edu  Mon Dec 20 07:29:48 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] While you're in CVS...
In-Reply-To: <385DB391.80F3F565@geoserve.net>
Message-ID: <l03130302b483ca6bea77@[172.16.0.2]>

Dearest Locians;
	I have done some actual coding (woo-hoo!) on some of the
filesystem/container stuff we were talking about and just committed it to
cvs in a brand new directory: loci-file. I decided to stay out of loci-core
for now because:

1. I didn't want to screw stuff up.
2. I'm trying to learn how to do GUIs with pyGTK/pyGNOME from the ground
up, so that I can get a good grasp on how it all works.
3. I didn't want to screw stuff up.

So anyways, if you decide you would like to look at the mess in loci-file,
you will find:

filegui.py: The (ugly) GUI interface
file2XML.py: A program that converts a directory structure into an XMLish
document
other random stuff: TODO, CHANGES...

Since it is rough, this is the step by step on how to work it:

1. It requires all the same stuff as Loci: pyGNOME/pyGTK, python,
gnome-libs (well, maybe it doesn't need all this--who knows!)
2. Just move into the loci-files directory and type './filegui.py &'
3. You'll be presented with window containing one nasty little button
labelled "container" that fills the entire window. This is my temporary
substitute for a container.
4. Click on the button and you'll be presented with a file dialog. Pick a
directory.
5. Click okay, and the program will write out a file 'XMLoutput.xml'
containing the directory substructure modelled as an XML-type document.
6. Click on File and Exit in the main window, since that's all it does.
7. Check out XMLoutput.xml and see if it accurately represents the filesystem.

The XML file has indentations and everything so it isn't too bad to look at
and check. I've just tested it on a few directories and it seems to do
okay.
	So if you are interested, please check it out, try it out on your
favorite directories and be sure it is modeling them okay, take a look at
the code and send suggestions/mocking comments, and generally have a nifty
time with it. Please let me know if I messed up the cvs commit or if
anything else is horrible wrong and I'll try to fix it.
	I'm off school and without formal responsibilities, so I'll be
doing more coding on it in the next couple days (while I am near my
computer) and it should *hopefully* improve and do more.

	WRT all of the discussion on the list--these are my quick thoughts
on the loci as Apache type filesystem stuff:

1. As much as I hate config files, we probably need a loci.conf file to
specify things like $LOCI_ROOT and private and public directories. I think
instead of having specific defined directories for public and private, we
should just take the Apache-type approach and specifically specify private
directories within the $LOCI_ROOT file system. Although I could really care
less where Loci is on my file system, people, in general, like to have
control over the location of their programs and will probably want to
specify it.
2. How are we going to deal with security issues surrounding programs? Will
all programs running under Loci need to be located within the $LOCI_ROOT
filesystem? If so, will they all need to be in a specific directory
($LOCI_ROOT/bin?) like cgi-scripts in apache? I really know nothing about
security so I'm just throwing out an idea.

I still need to digest most of the conversation before I can make half-way
rational comments on it.

Oh, and Jeff--with the new snapshot I lost the nice scrollbars that I had
previously. So now I can move loci off the desktop and they just end up
disappearing instead of the desktop scrolling. I like the new direction it
is going though, and will be excited to see some loci inside loci!

Brad


From bizzaro at geoserve.net  Mon Dec 20 12:52:11 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] While you're in CVS...
References: <l03130302b483ca6bea77@[172.16.0.2]>
Message-ID: <385E6CCB.E0ACDB1B@geoserve.net>

Brad Chapman wrote:
> 
>         I have done some actual coding (woo-hoo!) on some of the
> filesystem/container stuff we were talking about and just committed it to
> cvs in a brand new directory: loci-file.

Woo-hoo!

> 5. Click okay, and the program will write out a file 'XMLoutput.xml'
> containing the directory substructure modelled as an XML-type document.
> 6. Click on File and Exit in the main window, since that's all it does.
> 7. Check out XMLoutput.xml and see if it accurately represents the filesystem.

The XML looks decent to me.

I'd suggest looking at the 'xmllib' module that actually comes with Python. 
We probably should use it (an althernative is Gnome's LibXML) for all of our
XML work.  Of course it'll save quite a bit of coding for reading and (I
think) writing XML.  Look at 'xmlparse.py' (bottom half of file) in the
loci-core module for an example of its use in parsing.

There is one catch to 'file2XML.py': It recursively descends subdirectories,
which is not needed (it's neat that you made it do that though).  A container
only needs to know the contents (types, etc.) of its top level directory. 
Only if you DnD a 'sub-container'/subdirectory out of the list, will the
contents of the subdir need to be known: That's when a new container,
containing the subdirectory, is made.

> time with it. Please let me know if I messed up the cvs commit or if
> anything else is horrible wrong and I'll try to fix it.

The only 'problem' we're having with making new directories via CVS is that
the ownership is by default username.username.  It needs to be username.cvs. 
I can fix that as root, and any user can fix it, by going to

    /home/cvs/

and typing

    $ chown -R .cvs <modulename>

I already did that for loci-file, but if you make a new directory in
loci-file, you need to check that it is set to group cvs (and that it is group
read/writable).  Otherwise, another user will get a 'permission denied' error
doing a checkout.

>         WRT all of the discussion on the list--these are my quick thoughts
> on the loci as Apache type filesystem stuff:
> 
> 1. As much as I hate config files, we probably need a loci.conf file to
> specify things like $LOCI_ROOT and private and public directories.

We need to define all sorts of settings anyway.  Maybe we'll use XML.

> I think
> instead of having specific defined directories for public and private, we
> should just take the Apache-type approach and specifically specify private
> directories within the $LOCI_ROOT file system. Although I could really care
> less where Loci is on my file system, people, in general, like to have
> control over the location of their programs and will probably want to
> specify it.

So all of $LOCI_ROOT is public unless otherwise specified?  I'd agree if the
rest of the filesystem was _privately_ accessible.

For example, most binaries are installed to

    /usr/bin/

If $LOCI_ROOT were set to

    /home/loci/

All those lovely bioinformatics apps in /usr/bin/ would be inaccessible, even
privately :-(

> 2. How are we going to deal with security issues surrounding programs? Will
> all programs running under Loci need to be located within the $LOCI_ROOT
> filesystem? If so, will they all need to be in a specific directory
> ($LOCI_ROOT/bin?) like cgi-scripts in apache? I really know nothing about
> security so I'm just throwing out an idea.

Pretty much what I addressed above.  You wouldn't have access to /usr/bin/
unless you used symlinks, which is a possibility.

I'm not a security guru myself, but some on this list seem to know quite a
bit.  Perhaps someone should be appointed 'Security Guru'.  Any volunteers?

> Oh, and Jeff--with the new snapshot I lost the nice scrollbars that I had
> previously. So now I can move loci off the desktop and they just end up
> disappearing instead of the desktop scrolling.

Are you sure the Workspace scrolled automatically when you dragged a locus?  I
never added that functionality.

Scrollbars are there but only show up when the window is smaller than the
Workspace.  Try resizing the window.

> I like the new direction it
> is going though, and will be excited to see some loci inside loci!

Me too ;-)


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Mon Dec 20 13:36:49 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] Brad: I've got a little task for you
Message-ID: <385E7741.656C94C2@geoserve.net>

Brad,

Since you're working on containers and getting into PyGTK, how about making a
prototype widget to go in the windowlet of a container: a list widget.

Just name the file 'container_list.py' or whatever.  Look in the PyGTK
examples for a list widget.  You can put some fake values in for now.  As an
example of how Loci widgets are structured, look at 'testwidget1.py' (pasted
below) in loci-core.  Basically, we're making a 'composite widget' (no
relation to composite locus), which is a widget that inherits the objects of a
standard GTK widget.  So, WidgetMain inherits GtkVBox....

---------------------------------------------------
>from gtk import *
from gnome.ui import *


class WidgetMain(GtkVBox):

    get_type = GtkVBox(spacing=5).get_type()

    def __init__(self):

        self._o = GtkVBox(spacing=5)._o

        self.width = 150
        self.height = 50

        self.set_usize(self.width, self.height)

        self.set_border_width(5)

        w = GtkLabel('Label')
        self.add(w)
        w.show()
---------------------------------------------------

The actual widget I'm using here is a GtkLabel and is 'added' to self
(WidgetMain).

That's all that is really needed, and then I can get it to show up in the
windowlet.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From chapmanb at arches.uga.edu  Tue Dec 21 21:01:38 1999
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] Brad: I've got a little task for you
In-Reply-To: <385E7741.656C94C2@geoserve.net>
Message-ID: <l03130304b485d78753bc@[172.16.0.2]>

Oh great Locians;

J.W. Bizzaro wrote:
>Since you're working on containers and getting into PyGTK, how about making a
>prototype widget to go in the windowlet of a container: a list widget.
>

Surely! I just updated loci-file with two (double the excitement!) widgets:

1. container_list.py: A GtkCList embedded in a scrolling window that you
can initiated with listWidget().

2. container_tree.py: A GtkCTree embedded in a scrolling window that can be
initiated with treeWidget().

Both widgets have an optional initiation with some data already added. To
get these, just call them up and pass them a 1 (ie. myList =
listWidget(1)). I hope these widgets are what you were looking for!

You can take a look at the widgets in action by updating your copy of
loci-file. I just committed some changes so there are now three buttons to
push (wow!):

1. 'Container loci': same as before, outputs XML from a selected directory
structure into XMLoutput.xml

2. 'Display a listing': Takes the info from XMLoutput.xml and displays it
in a list widget. I still need to learn to add pictures next to the names
so that you can tell the difference between documents and directories.

3. 'Display a tree': Displays the example tree. I'm currently stuck trying
to figure out how to parse the XML into a tree.

Note that since this now does XML parsing, loci-file requires something
new. I used the SAX (simple API for XML) complient parser from the python
xml toolkit. You can get the toolkit by going to:
http://www.python.org/topics/xml/download.html. The newest version is
PyXML-0.5.2.tar.gz, but I couldn't get this to install for me, so I am
using PyXML-0.5.1. So whatever you can get to work should be okay, I'm not
doing anything really fancy. If anyone has the chance to check out the
changes, please drop me any comments you have!

J.W. Bizzaro wrote:
>I'd suggest looking at the 'xmllib' module that actually comes with Python.
>We probably should use it (an althernative is Gnome's LibXML) for all of our
>XML work.  Of course it'll save quite a bit of coding for reading and (I
>think) writing XML.  Look at 'xmlparse.py' (bottom half of file) in the
>loci-core module for an example of its use in parsing.

Sorry I had to go above this. I couldn't figure enough out from xmlparse.py
to know how to work this lib, and there is some pretty helpful
documentation on the SAX stuff. I'm not sure about all the differences, but
if necessary I can probably scale back later not to use the XML toolkit.

J.W. Bizzaro wrote:
>There is one catch to 'file2XML.py': It recursively descends subdirectories,
>which is not needed (it's neat that you made it do that though).  A container
>only needs to know the contents (types, etc.) of its top level directory.
>Only if you DnD a 'sub-container'/subdirectory out of the list, will the
>contents of the subdir need to be known: That's when a new container,
>containing the subdirectory, is made.

Well, give all the credit for the recursive descending to the writers of
os.path.walk(), not me! The reason I did this is for flexibility. I would
really like to represent the contents of a container as a tree, instead of
a list (hence the two widgets) so that a user can look into subdirectories
to see what is there, without having to create a new container. I am having
trouble parsing the XML into a tree, so I can't demo this yet, but I'll
keep working on it. Even with the recursive descending, it is no problem to
display the info as a list. I would be interested to hear people's thoughts
on the tree vs. list representations.

J.W. Bizzaro wrote:
>The only 'problem' we're having with making new directories via CVS is that
>the ownership is by default username.username.  It needs to be username.cvs.
>I can fix that as root, and any user can fix it, by going to

I will definately do this next time. I tried to use the command 'cvs add
loci-file', which, according to the book "Open Source Development with
CVS", *should* allow me to add a directory, but this gives something like
the following error:

cvs add: in directory .:
cvs [add aborted]: there is no version here: do 'cvs checkout' first

Any ideas why that is?

J.W. Bizzaro wrote:
>So all of $LOCI_ROOT is public unless otherwise specified?  I'd agree if the
>rest of the filesystem was _privately_ accessible.

That is the picture I was imagining. If we are following the Apache model,
there are no opportunites for outside users to reach any directories
besides those inside $LOCI_ROOT. Same with ftp servers, right? I'm not
positive how to do this, but it seems definately possible!

>Are you sure the Workspace scrolled automatically when you dragged a locus?  I
>never added that functionality.

I don't know, I may have just been drinking heavily and imagined it!

Brad


From bizzaro at geoserve.net  Tue Dec 21 22:37:31 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] Brad: I've got a little task for you
References: <l03130304b485d78753bc@[172.16.0.2]>
Message-ID: <3860477B.5B9DF95D@geoserve.net>

Brad Chapman wrote:
> 
> Note that since this now does XML parsing, loci-file requires something
> new. I used the SAX (simple API for XML) complient parser from the python
> xml toolkit.

The licenses are acceptable.  I'd like to keep the number of extra packages
that the user needs to install to a minimum though.  Does SAX provide what
straight xmllib cannot?

> You can get the toolkit by going to:
> http://www.python.org/topics/xml/download.html. The newest version is
> PyXML-0.5.2.tar.gz, but I couldn't get this to install for me, so I am
> using PyXML-0.5.1.

There is no rule for 'make install' in 0.5.2.

I just copied xml/ to /usr/lib/python1.5/site-packages/ after 'make'.

> Sorry I had to go above this. I couldn't figure enough out from xmlparse.py
> to know how to work this lib, and there is some pretty helpful
> documentation on the SAX stuff. I'm not sure about all the differences, but
> if necessary I can probably scale back later not to use the XML toolkit.

SAX actually includes a modified version of xmllib, so if SAX is well
documented, you may find xmllib docs there too.

> Well, give all the credit for the recursive descending to the writers of
> os.path.walk(), not me! The reason I did this is for flexibility. I would
> really like to represent the contents of a container as a tree, instead of
> a list (hence the two widgets) so that a user can look into subdirectories
> to see what is there, without having to create a new container.

It's certainly more convenient than openning up new containers.  But since
containers can represent large databases, it wouldn't be a good idea to use
trees everywhere.

BTW, It looks good, Brad.  Nice work!

> I am having
> trouble parsing the XML into a tree, so I can't demo this yet, but I'll
> keep working on it. Even with the recursive descending, it is no problem to
> display the info as a list. I would be interested to hear people's thoughts
> on the tree vs. list representations.

I wonder about speed:

    1. Recursively descend a directory (of any size and at any _location_).

    2. Write to XML.

    3. Create tree widget.

    4. Parse the XML and put into tree.

How long would it take to do this for a large filesystem? at a remote
location?

These are the reasons why I wanted to use a list.  If trees are (1) made fast
enough and (2) aren't used for every container, they would be good to use. 
Can you give us some feedback about these issues?

> I will definately do this next time. I tried to use the command 'cvs add
> loci-file', which, according to the book "Open Source Development with
> CVS", *should* allow me to add a directory, but this gives something like
> the following error:
> 
> cvs add: in directory .:
> cvs [add aborted]: there is no version here: do 'cvs checkout' first
> 
> Any ideas why that is?

loci-file was the module, and modules need to be made using

    cvs import (etc.)

>from _within_ loci-file, you can add directories using

    cvs add <directory>

after having made the directory in the filesystem.

> That is the picture I was imagining. If we are following the Apache model,
> there are no opportunites for outside users to reach any directories
> besides those inside $LOCI_ROOT. Same with ftp servers, right? I'm not
> positive how to do this, but it seems definately possible!

Yeah, anonymous ftp is a good example too.  You can going anywhere in the
filesystem with ftp providing you log in with an account.  Anonymous users are
limited in where they can go.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From bizzaro at geoserve.net  Tue Dec 21 22:49:49 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] WidgetMain()
Message-ID: <38604A5D.67872C5B@geoserve.net>

BTW Brad,

In the source for the widgets, you use

    class listWidget()

and

    class treeWidget()

But when the Workspace builds a windowlet, it has no knowledge of widget
details, only that the class is

    class WidgetMain()

The name may change, but it should be the same for all widgets.


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From David.Lapointe at umassmed.edu  Wed Dec 22 13:03:35 1999
From: David.Lapointe at umassmed.edu (Lapointe, David)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] FW: Bioperl: BPlite.pm
Message-ID: <93307F07DE63D211B2F30000F808E9E501644F68@edunivexch02.umassmed.edu>

A forward from the bioperl list.

-----Original Message-----
From: Jeffrey Chang [mailto:jchang@SMI.Stanford.EDU] 
Sent: Wednesday, December 22, 1999 11:31 AM
To: Ewan Birney
Cc: Ian Korf; vsns-bcd-perl@lists.uni-bielefeld.de
Subject: Re: Bioperl: BPlite.pm


Hi Everybody,

Just popping in from biopython!  I thought I'd mention that over there,
we're using an event-oriented design for our parsers, which is described
in a mail:
http://www.biopython.org/pipermail/biopython/1999-December/000149.html

How it works, is that a Scanner object chews through a data file and
generates events when it runs across information.  The events are then
handled by a Consumer.

This design is nice because it decouples a lot of the parsing work from
the final representation, and makes it easy to accomodate parsers of
varying complexity.  You can create Consumers to handle as much or as
little of the data as you want.  The plan for biopython is to distribute
Scanners, and a Consumer that shoves all the information into some data
structure.  Advanced users, however, will have the option of using the
scanner but building their own high performance Consumer tailored
specifically for their own purposes.

The code for this is sitting on my local drive now, and will be in the
biopython CVS repository soon.

Jeff


On Wed, 22 Dec 1999, Ewan Birney wrote:

> On Tue, 21 Dec 1999, Ian Korf wrote:
> 
> > I've been getting requests recently for old BLAST parsers.
> > Seems as though some people are looking for a lighweight
> > parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
> > can find my version of such a module. It parses both NCBI-
> > and WU-BLAST, and works well in pipes since it reads one
> > subject and one alignment at a time.
> 
> I'd really like to see a lighter blast parser with less embedded
> functionality in bioperl, ideally with the main features of steve's
> blast parser. If I can persuade someone to look at this Ian, is it
> ok to bring it inside bioperl? (any chance of you wanting to do that? I
> guess not...)
> 
> Steve - we *do* need to think of upgrading the blast parser - only
> you know the code, and the largest set of bugs are found in it.
> 
> 
> > 
> > The pod2text version of the documentation follows.
> > 
> > -Ian Korf
> > 
> > 
> > NAME
> >     BPlite - Lightweight BLAST parser
> > 
> > SYNOPSIS
> >      use BPlite;
> >      my $report = new BPlite(\*STDIN);
> >      $report->query;
> >      $report->database;
> >      while(my $sbjct = $report->nextSbjct) {
> >          $sbjct->name;
> >          while (my $hsp = $sbjct->nextHSP) {
> >              $hsp->score;
> >              $hsp->bits;
> >              $hsp->percent;
> >              $hsp->P;
> >              $hsp->queryBegin;
> >              $hsp->queryEnd;
> >              $hsp->sbjctBegin;
> >              $hsp->sbjctEnd;
> >              $hsp->queryAlignment;
> >              $hsp->sbjctAlignment;
> >          }
> >      }
> > 
> > DESCRIPTION
> >     BPlite is a package for parsing BLAST reports. The BLAST
> >     programs are a family of widely used algorithms for sequence
> >     database searches. The reports are non-trivial to parse, and
> >     there are differences in the formats of the various flavors of
> >     BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
> >     TBLASTX reports from both the high performance WU-BLAST, and the
> >     more generic NCBI-BLAST.
> > 
> >     Many people have developed BLAST parsers (I myself have made at
> >     least three). BPlite is for those people who would rather not
> >     have a giant object specification, but rather a simple handle to
> >     a BLAST report that works well in pipes.
> > 
> >   Object
> > 
> >     BPlite has three kinds of objects, the report, the subject, and
> >     the HSP. To create a new report, you pass a filehandle reference
> >     to the BPlite constructor.
> > 
> >      my $report = new BPlite(\*STDIN); # or any other filehandle
> > 
> >     The report has two attributes (query and database), and one
> >     method (nextSbjct).
> > 
> >      $report->query;     # access to the query name
> >      $report->database;  # access to the database name
> >      $report->nextSbjct; # gets the next subject
> >      while(my $sbjct = $report->nextSbjct) {
> >          # canonical form of use is in a while loop
> >      }
> > 
> >     A subject is a BLAST hit, which should not be confused with an
> >     HSP (below). A BLAST hit may have several alignments associated
> >     with it. A useful way of thinking about it is that a subject is
> >     a gene and HSPs are the exons. Subjects have one attribute
> >     (name) and one method (nextHSP).
> > 
> >      $sbjct->name;    # access to the subject name
> >      "$sbjct";        # overloaded to return name
> >      $sbjct->nextHSP; # gets the next HSP from the sbjct
> >      while(my $hsp = $sbjct->nextHSP) {
> >          # canonical form is again a while loop
> >      }
> > 
> >     An HSP is a high scoring pair, or simply an alignment. HSP
> >     objects do not have any methods, just attributes (score, bits,
> >     percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
> >     queryAliignment, sbjctAlignment) that should be familiar to
> >     anyone who has seen a blast report. For lazy/efficient coders,
> >     two-letter abbreviations are available for the attributes with
> >     long names (qb, qe, sb, se, qa, sa).
> > 
> >      $hsp->score;
> >      $hsp->bits;
> >      $hsp->percent;
> >      $hsp->P;
> >      $hsp->queryBegin;     $hsp->qb;
> >      $hsp->queryEnd;       $hsp->qe;
> >      $hsp->sbjctBegin;     $hsp->sb;
> >      $hsp->sbjctEnd;       $hsp->se;
> >      $hsp->queryAlignment; $hsp->qa;
> >      $hsp->sbjctAlignment; $hsp->sa;
> >      "$hsp"; # overloaded for begin..end bits
> > 
> >     I've included a little bit of overloading for double quote
> >     variable interpolation convenience. A subject will return its
> >     name and an HSP will return its queryBegin, queryEnd, and bits
> >     in the alignment. Feel free to modify this to whatever is most
> >     frequently used by you.
> > 
> >     So a very simple look into a BLAST report might look like this.
> > 
> >      my $report = new BPlite(\*STDIN);
> >      while(my $sbjct = $report->nextSbjct) {
> >          print "$scbjct\n";
> >          while(my $hsp = $sbjct->nextHSP) {
> >                     print "\t$hsp\n";
> >          }
> >      }
> > 
> >     The output of such code might look like this:
> > 
> >      >foo
> >          100..155 29.5
> >          268..300 20.1
> >      >bar
> >          100..153 28.5
> >          265..290 22.1
> > 
> > AUTHOR
> >     Ian Korf (ikorf@sapiens.wustl.edu,
> >     http://sapiens.wustl.edu/~ikorf)
> > 
> > ACKNOWLEDGEMENTS
> >     This software was developed at the Genome Sequencing Center at
> >     Washington Univeristy, St. Louis, MO.
> > 
> > COPYRIGHT
> >     Copyright (C) 1999 Ian Korf. All Rights Reserved.
> > 
> > DISCLAIMER
> >     This software is provided "as is" without warranty of any kind.
> > 
> > =========== Bioperl Project Mailing List Message Footer =======
> > Project URL: http://bio.perl.org/
> > For info about how to (un)subscribe, where messages are archived, etc:
> > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> > ====================================================================
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
> <birney@sanger.ac.uk>
> http://www.sanger.ac.uk/Users/birney/
> -----------------------------------------------------------------
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================


From bizzaro at geoserve.net  Wed Dec 22 21:03:57 1999
From: bizzaro at geoserve.net (J.W. Bizzaro)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] naming madness
Message-ID: <3861830D.1912DF15@geoserve.net>

The non-profit "Leonardo Association" in France was raided by police following
a lawsuit filed by the "Leonardo Finance" company.  Why?  Leonardo
Association's Web site showed up on an Internet search of the word
"Leonardo".  Apparently no one but the Leonardo Finance may use the word.

    http://mitpress.mit.edu/e-journals/Leonardo/

This reminds me of the time the Frenchman whose last name is "Montana" sued
the U.S. state of Montana for use of his name.  So, maybe it's just the French
;-)

What does this have to do with Loci?  I'm just thinking about how seriously
some people take alleged idea-theft.  Not that we do that, but as with the
above cases, it's only the perception that matters.

Maybe we should have used some obscure name like "tacg" :-)


Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+


From mangalam at home.com  Thu Dec 30 16:22:20 1999
From: mangalam at home.com (Harry Mangalam)
Date: Fri Feb 10 19:19:03 2006
Subject: [Pipet Devel] naming madness
References: <3861830D.1912DF15@geoserve.net>
Message-ID: <386BCD0C.FC3755C1@home.com>

My lawyers (Dewey, Cheatem and Howe) will be contacting you shortly...

Harry

"J.W. Bizzaro" wrote:
> 
> The non-profit "Leonardo Association" in France was raided by police following
> a lawsuit filed by the "Leonardo Finance" company.  Why?  Leonardo
> Association's Web site showed up on an Internet search of the word
> "Leonardo".  Apparently no one but the Leonardo Finance may use the word.
> 
>     http://mitpress.mit.edu/e-journals/Leonardo/
> 
> This reminds me of the time the Frenchman whose last name is "Montana" sued
> the U.S. state of Montana for use of his name.  So, maybe it's just the French
> ;-)
> 
> What does this have to do with Loci?  I'm just thinking about how seriously
> some people take alleged idea-theft.  Not that we do that, but as with the
> above cases, it's only the perception that matters.
> 
> Maybe we should have used some obscure name like "tacg" :-)
> 
> Cheers.
> Jeff
> --
>                       +----------------------------------+
>                       |           J.W. Bizzaro           |
>                       |                                  |
>                       | http://bioinformatics.org/~jeff/ |
>                       |                                  |
>                       |           THE OPEN LAB           |
>                       |    Open Source Bioinformatics    |
>                       |                                  |
>                       |    http://bioinformatics.org/    |
>                       +----------------------------------+
> 
> _______________________________________________
> pipet-devel maillist  -  pipet-devel@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/pipet-devel

-- 
Cheers,
Harry

Harry J Mangalam -- (949) 856 2847 -- mangalam@home.com