From justin at ukans.edu Wed Dec 1 00:30:15 1999 From: justin at ukans.edu (Justin Bradford) Date: Fri Feb 10 19:18:59 2006 Subject: [Pipet Devel] python-bonobo (was Re: desktop as...) In-Reply-To: Message-ID: > It is my understanding that while bonobo uses ORBit, it is just a library. > If I am correct, non-elegant bindings should be relatively easy. By > "non-elegant", I mean it the Python interface would feel more C-ish than > Python-ish. Ok, upon reviewing the documentation, I was way off before. However, one could still contain all of the ORBit code within the C stubs, and have only the Embeddable, Container, View, ViewFrame, ClientSite, etc object interfaces exposed in Python. So, ORBit bindings would not be necessary, and it would make components/containers pretty easy to write in Python. Anyone know if James Henstridge has started/implemented any of the bonobo stuff? Justin From bizzaro at geoserve.net Wed Dec 1 02:26:02 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] Databases/Languages (was New guy speaks up) References: Message-ID: <3844CD8A.51BB0253@geoserve.net> Brad Chapman wrote: > > Gary Van Domselaar wrote: > > Loci as a Graphical Shell/ Graphical > >Scripting language with a database 'locus', but the actual database, and > >data model used to store the sequence data (and annotations) would be an > >'option' depending on what the developers have provided for loci. so > >there may be a relational database, an object database etc., > > Based on this thinking and the 'plug-in' idea mentioned so often, > why not implement a database as a loci? (maybe this could be some kind of > derivative of the container?). This way, the user can use or not use a > database depending on their work with Loci. I think you said just what Gary did: a container is a locus that is a type of a database. > For instance, if I am using Loci to pull single files from the PDB > and pipe them into RasMol for viewing, it would be really stupid to have an > intermediate step where I stick a single object into a database. By > contrast, if I am parsing the current UniGene text file (100MB), it is > crazy to not have some structured way to store this. I mean, I would be > none too happy a user if I watched my computer spend an hour parsing a huge > document and another several BLASTing the results and then had the computer > crash losing all of my data. A 'plug-in' database loci could serve as the > storage for huge data files--allowing data backup and easy access to the > important parts of the data (sequences in this case). Is this the kind of > plan everyone was thinking of? Yep, you got it! The plan is, you can have any sort of intermediate between data and processor, depending on the needs of the processor (and your needs, if you're developing new extensions for Loci). > I agree completely. A database is still a data intermediate, but I think it > at least has the advantages of: 1) being readily storable 2) allowing > specific parts of the data to be individually queried. 3) being flexible > enough to allow a wide variety of data types to be stored without data loss. Those are some good points we can use for our documentation. > Gary Van Domselaar wrote: > >Loci's own database requirements may not be so > >much for sequence storage as much as it is for things like the container > >locus, which is a queriable locus that contains other loci. > > One thing I'm not clear about--how does all this relate to the idea of the > container and storing loci? I guess I'm not clear about exactly what it > means to store a converter loci, for instance? Even more confusing to me, > how do you query a container locus? How does this relate with storing > actual data? Confusion, confusion over here! You simply have to define the word 'database' rather broadly. Literally ANYTHING that can store information in a queriable fashion is a 'database', for our purposes. But we'll just use the word 'container' to keep the language lawyers at bay. Does a filesystem 'store information in a queriable fashion'? Yes. Now this is where things get interesting: In Loci, you can open a container that represents a filesystem directory. Since loci (data, programs, etc.) are individual files in a directory, they can be thought of as being stored in a container. Subdirectories are then container loci within container loci. BUT KEEP IN MIND: This is ONE type of container. Not all container loci represent filesystem directories. > I will agree that I would rather not mess around with two languages/two > interpreters and all of that jazz. No fun! However there are a number of > things that are implemented in perl currently that are not available in > python. For instance, the bioperl modules. Although the biopython project > is dealing with building the same functionalities in python they currently > have no code (and the list has been relatively silent!). In addition, a > number of excellent programmers are coding in perl and so there are a lot > of good scripts/code available. Should all perl scripts either: a) be run > through CORBA to be used with Loci? or b) have to be reimplemented in > python to be used with Loci? This is something we can't expect for ANY program ported to Loci: to be modified, whether it is by making it use CORBA or by translating it to Python. The only things that should require compliance with a Loci specification for interoperability are (1) GUI widgets, (2) programs that need direct access or control of Loci internals, (3) wrappers. THESE will use CORBA (especially if not written in Python) and/or Python. > For example, how are we planning on connecting > to AceDB servers--rewriting AcePerl or running it through CORBA? I think > our disadvantage if we try and rewrite everything is that we can't take > full advantage of work done in other languages and have to work really hard > to keep the python implementation "up to date" with the perl. In addition, > there is an unhealthy competition between perl and python for programmer > time, regardless of which is a "better" language. As I mention above, all this will be done via a large variety of wrapppers. But for the most part, if it runs on the command line, the Workspace will allow the user to make his/her own wrappers. See an earlier post to the list about 'constructing the command line' or something like that. It should be part of the documentation. Let me know if you can't find it in the archives. > I mention above the kind of things I had in mind. Specifically, where is > the point where it takes more effort to connect things with CORBA then it > does to rewrite them in python? What should be rewritten and what should be > connected? From what I've read, gnome development (which you all seem to be > wisely following closely!) seems to do a good job of making CORBA tie a lot > together, but I'm not completely positive how this all can translate to > Loci. Once again, confusion is overwhealming me! Just remember: CORBA and/or Python for (1) Widgets (2) Low-level customization of Loci (3) Wrappers Gary described this as 'middleware', which is a good way to think of it. Front-endware: Loci's Workspace/GUI Middleware: CORBA and/or Python for points mentioned above Back-endware: Bioinformatics apps, unmodified Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Wed Dec 1 02:58:31 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] OMG approves new, revised CORBA specs Message-ID: <3844D527.26ED8A74@geoserve.net> FYI: http://news.cnet.com/news/0-1003-200-1472911.html Jeff From bizzaro at geoserve.net Wed Dec 1 03:40:08 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] python praise Message-ID: <3844DEE8.9BB875F6@geoserve.net> Bruce Eckel, who has written articles and books on Java and C++, has some surprisingly nice things to say about Python in his article at Borland.com: -------------------------- Where does Python fit? Everywhere else. It's both a programming language and a scripting language, but it's very nicely object-oriented from the ground up, easy to learn and use. In fact, I think it could be the ideal beginner's language. You can write command-line programs and GUI programs. You can write programs to test your design, then re-code the programs in C++ or Java once you've gotten the kinks out. But to me the key is productivity. I seem to be able to develop programs 10 times faster than in C++ or Java, and for that reason I'm willing to write programs in Python that I wouldn't trouble myself with in other languages, simply because using those languages would take too long. Although many programs for Linux will be written in Java or C++, there will be lots of smaller solutions as well because of Python. Perl, Tcl/TK, and Rebol will also be used, but I don't think those languages scale as well as Python. Nor is the code they produce as maintainable, which means they won't be as heavily used in the end. http://community.borland.com/devnews/article/1,1714,20173,00.html -------------------------- Bruce's publications: -------------------------- Since 1986, Bruce Eckel (www.BruceEckel.com) has published over 150 computer articles and 6 books, four of which were on C++, and given hundreds of lectures and seminars throughout the world. He is the author of Thinking in Java (Prentice-Hall 1998, freely available at www.BruceEckel.com; 2nd edition in progress on the Web site), the Hands-On Java Seminar CD ROM (available at www.BruceEckel.com), Thinking in C++ (Prentice-Hall, 1995; 2nd edition in progress on the Web site), C++ Inside & Out (Osborne/McGraw-Hill 1993; the 2nd edition of Using C++, Osborne/McGraw-Hill 1989) and was the editor of the anthology Black Belt C++ (M&T/Holt 1994). He was a founding member of the ANSI/ISO C++ committee. He speaks regularly at conferences and is the track chair for both C++ and Java at the Software Development conference. -------------------------- Jeff From bizzaro at geoserve.net Wed Dec 1 03:53:53 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line] Message-ID: <3844E221.E114B259@geoserve.net> Just for kicks, I'm reposting my June message about 'constructing the command-line' (well, and because I mentioned it to Brad). Note that I refer to 'our own' XML for bioinformatics + Loci internals: LocusML. The plans for a LocusML have changed a bit since then. Jeff -------------- next part -------------- An embedded message was scrubbed... From: "J.W. Bizzaro" Subject: [Pipet Devel] constructing the command-line Date: Sun, 06 Jun 1999 12:21:00 +0000 Size: 6694 Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19991201/bf9ec5da/attachment.mht From dlapointe at mediaone.net Wed Dec 1 06:53:25 1999 From: dlapointe at mediaone.net (David Lapointe) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line] In-Reply-To: <3844E221.E114B259@geoserve.net> References: <3844E221.E114B259@geoserve.net> Message-ID: <99120107414600.00536@gnomen> On Wed, 01 Dec 1999, J.W. Bizzaro wrote: > > Just for kicks, I'm reposting my June message about 'constructing the > command-line' (well, and because I mentioned it to Brad). Note that I refer > to 'our own' XML for bioinformatics + Loci internals: LocusML. The plans for > a LocusML have changed a bit since then. Jeff I am glad Jeff reposted this. I have been creating perl CGI interfaces to EMBOSS programs. I was writing to Jeff about this and how it would be great to parse the *.acd files for each program ( these define the input and output data types, which are required, the data ranges, etc) into a GUI interface. This might be similar to GDE but Glade seems very promising. Alternatively, for a loci interface, parsing the *.acd files might generate a series of linked loci. One hassle with doing this is the acd interface will change, incrementally ( see below). As an aside on the internal data representation, you could either have one or not, similar to what Brad just mentioned about using databases. Personally I think format conversions are too lossy wrt annotations. Also, short of rewriting (almost) every application outside of loci, you would need to deal with format conversions at some point. The EMBOSS list has interesting thread going about protein sequences with very high ATCG content, so they must be forced to protein type otherwise the program thinks they are nucleic acids. The issue is adding a new flag for this forcing, what will be the flags name. The diversity of opinion on this issue is heartening. BLAST for example does this up front. You have to tell the program what type you have. Other programs tag sequences at the top with their type, but that would involve changing the databases, to create a new data format, like FBF. -- .david David Lapointe "The meek will inherit the earth," noted tycoon J. Paul Getty. "But not the mineral rights." From bizzaro at geoserve.net Wed Dec 1 13:09:40 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line] References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> Message-ID: <38456464.56B2E0C0@geoserve.net> David Lapointe wrote: > > I am glad Jeff reposted this. And I'm glad that you're glad. > I have been creating perl CGI interfaces to EMBOSS programs. > I was writing to Jeff about this and how it would be great > to parse the *.acd files for each program ( these define the > input and output data types, which are required, the data > ranges, etc) into a GUI interface. This might be similar to > GDE but Glade seems very promising. Alternatively, for a loci > interface, parsing the *.acd files might generate > a series of linked loci. ...which can be combined into one composite locus. > One hassle with doing this is the > acd interface will change, incrementally ( see below). Will it change because the entire interface is still under development, or because individual programs will require changes to their *.acd files? > As an aside on the internal data representation, you could > either have one or not, similar to what Brad just > mentioned about using databases. Personally I think format > conversions are too lossy wrt annotations. Also, short > of rewriting (almost) every application outside of loci, you > would need to deal with format conversions at some point. Again, we can promote something as our 'preferred format' and use it as an intermediate in format conversions. Just because we don't hard-code a data format into Loci, it doesn't mean we can't push for some new standard. I've heard some interesting ideas for a universal bioinformatics XML. Peter Murray-Rust even started a mailing list to promote the development of an _open_ standard for such a beast. But the list now seems dead. If some Lab Rats want to start an effort here, I'm all for it. > The EMBOSS list has interesting thread going about protein > sequences with very high ATCG content, so they must > be forced to protein type otherwise the program thinks they > are nucleic acids. The issue is adding a new flag for this > forcing, what will be the flags name. The diversity of > opinion on this issue is heartening. BLAST for example > does this up front. You have to tell the program what type you > have. Other programs tag sequences at the top with their type, > but that would involve changing the databases, to create a new > data format, like FBF. Yeah, I've been following the EMBOSS list. It's funny that some programs 'assume' you are using a certain type of data. And the same goes for data formats. How hard is it to have one word to say what it is you're dealing with? GCATAAGCATGCAGATC ACGATCATCAGCATCAG I had a problem like this with GenBank once. You might think GenBank has all the descriptors needed to annotate a nucleotide sequence. But...hmmm...where did that DNA come from anyway? The nucleus? The mitochondria? The chloroplasts? There's no descriptor for that!!! Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From dlapointe at mediaone.net Wed Dec 1 20:44:26 1999 From: dlapointe at mediaone.net (David Lapointe) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line] In-Reply-To: <38456464.56B2E0C0@geoserve.net> References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> Message-ID: <99120122172000.00534@gnomen> > > Alternatively, for a loci > > interface, parsing the *.acd files might generate > > a series of linked loci. > > ...which can be combined into one composite locus. > Yes, that would be the idea. The composite locus would encapsulate the I/O and parameters. > > One hassle with doing this is the > > acd interface will change, incrementally ( see below). > > Will it change because the entire interface is still under development, or > because individual programs will require changes to their *.acd files? Well, it seems like anything that has a version 0.0.4 will change 8-). But I would imagine that before all is said and done it will be different. I am not doing justice to the acd scheme, perhaps because I am trying to use it in a different way. > Again, we can promote something as our 'preferred format' and use it as an > intermediate in format conversions. Just because we don't hard-code a data > format into Loci, it doesn't mean we can't push for some new standard. I've > heard some interesting ideas for a universal bioinformatics XML. Peter > Murray-Rust even started a mailing list to promote the development of an > _open_ standard for such a beast. But the list now seems dead. If some Lab > Rats want to start an effort here, I'm all for it. I think there are two things here to consider. First, if you are going from genbank to fasta, why have an intermediate format? Second, if you were going to write de novo some analysis program to work with loci, what format would you use? If you could settle on that, that would be the internal format, which might not be a format at all but rather a sequence object. > Yeah, I've been following the EMBOSS list. It's funny that some programs > 'assume' you are using a certain type of data. And the same goes for data > formats. How hard is it to have one word to say what it is you're dealing > with? Some programs work with both Nucs and Prots, FASTA, BLAST, CLUSTAL to name a few. I think historically someone thought it was a good idea to consider sequences with >80% AT(U)CG as nucleic acids, of course that has problems right away, just like 99 0r 88. > > GCATAAGCATGCAGATC > > > > ACGATCATCAGCATCAG > > Heh heh or ATCGRTSNRYTACG. > I had a problem like this with GenBank once. You might think GenBank has all > the descriptors needed to annotate a nucleotide sequence. But...hmmm...where > did that DNA come from anyway? The nucleus? The mitochondria? The > chloroplasts? There's no descriptor for that!!! There is but someone has to annotate that section. Check out Sequin on the NCBI site. There is a section for location of the sequence ( genomic, mitochondrial, ...). Or check out seq.asn in the NCBI toolkit. > > Cheers. > Jeff -- .david David Lapointe "Hokey religions and ancient weapons are no match for a good blaster at your side, kid," From stein at fmppr.fmnh.org Thu Dec 2 11:03:37 1999 From: stein at fmppr.fmnh.org (J. Steinbachs) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] databases References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> Message-ID: <38469859.2F2F8185@fmnh.org> Hey all... I went to an interesting seminar yesterday at U of Chicago. Susan Davidson (co-director, Center for Bioinformatics, UPenn) gave a talk on "Refreshing the Tower of Babel." Caveat: I know very little about databases. The application: EpoDB, a database created at UPenn Center for Bioinformatics, designed to study gene regulation during differentiation and development of vertebrate red blood cells. The problems: extracting data from a sorts of databases with different underlying structures; cleansing the data (error removal); integration; annotation; updating (particularly, updating without losing the information added/removed during data cleansing). I guess Susan is a strong proponent in the DB field for complex value databases (blah blah blah ginger... don't ask me what those are). However, for this problem, she and her colleagues have chosen to use XML, modifying it a bit into something they call WHAX. The data can be represented as a "WHAX tree", with the tag representing the branches and the tag value representing the node. Additions to the a subset of the data can be integrated into the larger database by simple manipulations of WHAX trees. I originally went because of the application to genetic data. But then I got sidetracked... Here at the Museum, we have specimen data (21+ million specimens in total) in which species names change, higher taxonomic information changes, and so on, all of which should be tracked within the database. In some cases, we are integrating the traditional genetic data into our specimen databases; i.e., in newer portions of our collection of specimens, we have a one-to-one correspondence between the dead dried pressed plant (or the stuffed animal and corresponding skeleton), the DNA extracted from said plant (or animal), and a record in our developing databases (birds are separate from plants are separate from fishes...). The computer scientists were intrigued by this type of data :) This WHAX "thing" would be perfect for tracking all that information. Perhaps "bioinformatics" is currently too narrowly defined (organisms have more characteristics about them than just their DNA). If we, the community of manipulators of biological data, do come up with an open standard for representing said data, that standard should be flexible enough to encompass all the characteristics about the organisms. And, in light of all the stupid patenting going on, perhaps an open standard is needed before big bad multinational corporation patents it first. Just a few thoughts... -jennifer -------------------------- J. Steinbachs, PhD Computational Biologist Dept of Botany The Field Museum Chicago, IL 60605-2496 office: 312-665-7810 fax: 312-665-7158 -------------------------- From mangalam at home.com Thu Dec 2 13:44:11 1999 From: mangalam at home.com (Harry Mangalam) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] databases References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> Message-ID: <3846BDFB.8AA9F4B9@home.com> That's interesting. One of the things I'm working on (just about the only thing I'm working on, it seems) is a gene expression database that will support multiple species as well as multiple technologies (glasss microarrays, Affy chips, AFLP, SAGE, etc). As you might imagine, it;s one thing to store multiple species information; it's quite another ot try to make it interpretable and queriable and expect to get anything sensible back. We're currently using the NCBI taxonomy tree and looking forward to seeing more effort thru the Gene Ontology project, but it sounds like this might be a more flexible solution if you can do dynamic modifications to the tree by manipulating these WHAX trees. I'm off to check out her site, but it's unresponsive right now: http://cbil.humgen.upenn.edu/epodb/epodb.html How are you currently representing this problem at the Field Museum? Especially the dynamic nature of the problem? Cheers Harry "J. Steinbachs" wrote: > > > Hey all... > > I went to an interesting seminar yesterday at U of Chicago. Susan > Davidson (co-director, Center for Bioinformatics, UPenn) gave a talk on > "Refreshing the Tower of Babel." > > Caveat: I know very little about databases. > > The application: EpoDB, a database created at UPenn Center for > Bioinformatics, designed to study gene regulation during differentiation > and development of vertebrate red blood cells. > > The problems: extracting data from a sorts of databases with different > underlying structures; cleansing the data (error removal); integration; > annotation; updating (particularly, updating without losing the > information added/removed during data cleansing). > > I guess Susan is a strong proponent in the DB field for complex value > databases (blah blah blah ginger... don't ask me what those are). > However, for this problem, she and her colleagues have chosen to use > XML, modifying it a bit into something they call WHAX. > > The data can be represented as a "WHAX tree", with the tag representing > the branches and the tag value representing the node. Additions to the > a subset of the data can be integrated into the larger database by > simple manipulations of WHAX trees. > > I originally went because of the application to genetic data. But then > I got sidetracked... Here at the Museum, we have specimen data (21+ > million specimens in total) in which species names change, higher > taxonomic information changes, and so on, all of which should be tracked > within the database. In some cases, we are integrating the traditional > genetic data into our specimen databases; i.e., in newer portions of our > collection of specimens, we have a one-to-one correspondence between the > dead dried pressed plant (or the stuffed animal and corresponding > skeleton), the DNA extracted from said plant (or animal), and a record > in our developing databases (birds are separate from plants are separate > from fishes...). The computer scientists were intrigued by this type of > data :) This WHAX "thing" would be perfect for tracking all that > information. > > Perhaps "bioinformatics" is currently too narrowly defined (organisms > have more characteristics about them than just their DNA). If we, the > community of manipulators of biological data, do come up with an open > standard for representing said data, that standard should be flexible > enough to encompass all the characteristics about the organisms. And, > in light of all the stupid patenting going on, perhaps an open standard > is needed before big bad multinational corporation patents it first. > > Just a few thoughts... > -jennifer > > -------------------------- > J. Steinbachs, PhD > Computational Biologist > Dept of Botany > The Field Museum > Chicago, IL 60605-2496 > > office: 312-665-7810 > fax: 312-665-7158 > -------------------------- > > _______________________________________________ > pipet-devel maillist - pipet-devel@bioinformatics.org > http://bioinformatics.org/mailman/listinfo/pipet-devel -- Cheers, Harry Harry J Mangalam -- (949) 856 2847 -- mangalam@home.com From stein at fmppr.fmnh.org Thu Dec 2 14:45:44 1999 From: stein at fmppr.fmnh.org (J. Steinbachs) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] databases References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846BDFB.8AA9F4B9@home.com> Message-ID: <3846CC68.A8EBA111@fmnh.org> Harry Mangalam wrote: > > How are you currently representing this problem at the Field Museum? > Especially the dynamic nature of the problem? > Keep in mind that I'm not here working on databases... that I've only become peripherally interested and involved because these are problems that are not being addressed well by computing services. Being at a Museum has huge drawbacks - we don't have easy access to experts in the fields who could help put together theory and applications solve some of these informational issues. That said, our databases are separate - a big problem in and of itself. Our databases are relational databases which clearly do not easily address the problem of changing identifiers (or any other characteristic), especially when temp workers are hired to enter data and make on-the-fly corrections without consulting the curator. e.g., a pot that was recorded in field notes as being collected in Rhodesia is entered into the database as being collected from the modern political equivalent; bad move as the listing of location as Rhodesia is an important time stamp as to when the pot was collected. Time stamps on databases are clearly of utmost importance. I don't know how these relational databases are currently keeping track of species name (or political country) changes; I would guess that some kind of "memo" field might be in use. Currently, anybody conducting a historical biodiversity survey of our collections ("What organisms are in your collection from the Pacific Northwest?") has to consult over half a dozen different databases, all relational, but using different products. Most have limited web-accessibility. On the molecular end, we've got individuals working on particular genes for different groups of species. They do their alignment (by eye only - *cringe*), then plunk the data into NEXUS format for use in Paup*. So they have bunches of different text files floating around their hard drives. A really useful thing would be a database of aligned genes for the different groups (e.g., the ribosomal database project)... but how would one keep the alignment up-to-date? What would be the best underlying structure for such data? Lots of problems, no clear solutions... -jennifer -------------------------- J. Steinbachs, PhD Computational Biologist Dept of Botany The Field Museum Chicago, IL 60605-2496 office: 312-665-7810 fax: 312-665-7158 -------------------------- From bizzaro at geoserve.net Thu Dec 2 16:10:39 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] databases References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> Message-ID: <3846E04F.4F59F82E@geoserve.net> "J. Steinbachs" wrote: > > The data can be represented as a "WHAX tree", with the tag representing > the branches and the tag value representing the node. Additions to the > a subset of the data can be integrated into the larger database by > simple manipulations of WHAX trees. As a complete aside from the database issue, Loci needs to represent the Workflow Diagram / Graphical Script in XML. This is of course a tree structure. Perhaps we should look at how WHAX trees work for this purpose. Is this the URL? http://cbil.humgen.upenn.edu/epodb/epodb.html As Harry said, it's unresponsive. > Perhaps "bioinformatics" is currently too narrowly defined (organisms > have more characteristics about them than just their DNA). If we, the > community of manipulators of biological data, do come up with an open > standard for representing said data, that standard should be flexible > enough to encompass all the characteristics about the organisms. And, > in light of all the stupid patenting going on, perhaps an open standard > is needed before big bad multinational corporation patents it first. I couldn't have said it better. And how does the media define bioinformatics? "The use of computer databases to organize the huge amount of biological information obtained by sequencing the human genome [DNA]..." http://www.newsalert.com/bin/story?StoryId=Coenz0bKbyta1nJu Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From stein at fmppr.fmnh.org Thu Dec 2 16:12:39 1999 From: stein at fmppr.fmnh.org (J. Steinbachs) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] UPenn EpoDB URL References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846E04F.4F59F82E@geoserve.net> Message-ID: <3846E0C7.E4685332@fmnh.org> "J.W. Bizzaro" wrote: > Is this the URL? > > http://cbil.humgen.upenn.edu/epodb/epodb.html > Try http://www.cbil.upenn.edu/EpoDB/index.html instead :) -j -------------------------- J. Steinbachs, PhD Computational Biologist Dept of Botany The Field Museum Chicago, IL 60605-2496 office: 312-665-7810 fax: 312-665-7158 -------------------------- From bizzaro at geoserve.net Thu Dec 2 16:40:44 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] UPenn EpoDB URL References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846E04F.4F59F82E@geoserve.net> <3846E0C7.E4685332@fmnh.org> Message-ID: <3846E75C.949B5CA6@geoserve.net> "J. Steinbachs" wrote: > > Try http://www.cbil.upenn.edu/EpoDB/index.html instead :) Jennifer, do you have any reference for WHAX? <:-) I couldn't find anything about it in the EpoDB literature. Thank you. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From stein at fmppr.fmnh.org Thu Dec 2 16:50:01 1999 From: stein at fmppr.fmnh.org (J. Steinbachs) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] UPenn EpoDB URL References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> <3846E04F.4F59F82E@geoserve.net> <3846E0C7.E4685332@fmnh.org> <3846E75C.949B5CA6@geoserve.net> Message-ID: <3846E989.2E4348D7@fmnh.org> "J.W. Bizzaro" wrote: > > "J. Steinbachs" wrote: > > > > Try http://www.cbil.upenn.edu/EpoDB/index.html instead :) > > Jennifer, do you have any reference for WHAX? <:-) I couldn't find anything > about it in the EpoDB literature. > Sadly, I do not. I do recall that Susan mentioned that the WHAX stuff had only been done within the past four months, so a publication is not likely to be forthcoming soon. It might be worthwhile contacting her directly (see the CBIL web page for contact details) for more information (especially people actually doing the work). -jennifer -------------------------- J. Steinbachs, PhD Computational Biologist Dept of Botany The Field Museum Chicago, IL 60605-2496 office: 312-665-7810 fax: 312-665-7158 -------------------------- From chapmanb at arches.uga.edu Thu Dec 2 18:07:49 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] databases In-Reply-To: <3846E04F.4F59F82E@geoserve.net> References: <3844E221.E114B259@geoserve.net> <99120107414600.00536@gnomen> <38456464.56B2E0C0@geoserve.net> <99120122172000.00534@gnomen> <38469859.2F2F8185@fmnh.org> Message-ID: Fresh off of my schooling on what Loci is/is not, here are my thoughts on a tree representation of a workflow diagram: >> The data can be represented as a "WHAX tree", with the tag representing >> the branches and the tag value representing the node. Additions to the >> a subset of the data can be integrated into the larger database by >> simple manipulations of WHAX trees. > >As a complete aside from the database issue, Loci needs to represent the >Workflow Diagram / Graphical Script in XML. This is of course a tree >structure. Perhaps we should look at how WHAX trees work for this purpose. > It seems to me that a balanced tree data structure (B-Tree, from my Intro to Algorithms text) would be an excellent way to represent a workflow diagram! I haven't looked through enough Python libs yet, but I'm positive there must be some nice tree classes already implemented that could be extended so we wouldn't have to do all the work (just implement a suitable holder class and the additional functions to deal with it). The tree seems to flow kind of naturally from the structure of how I picture a loci diagram working. In addition, since it would be *just* a data stucture (although a big one!), this could help in passing it around (I think I saw a mention somewhere about the idea of interchanging loci implementations between collaborators). The only problem I picture is when mutiple branches feed into a single node: big container loci | | | | document document (genbank sequence) (genbank sequence 2) | | | | converter converter | | | | ---------------- | processor (ie. seqalign to align the 2 sequences) Does this corrupt a tree? I don't think this is explicitly disallowed in the rules on a branched tree, but I can't really every recall seeing a tree like this. I would think it would then become a graph, but this seems too general to represent the type of data, since it still has a lot of "tree" characteristics. Anyways, that is just my naive "I have read the intro to algorithms text" thoughts on the WHAX tree idea. I would be very interested in seeing the WHAX algorithms, and also in hearing other people's input on this (all the computer science people out there can straighten me out!). Many thanks to Jennifer for posting the original info. Brad From bizzaro at geoserve.net Fri Dec 3 01:10:33 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] Entity Message-ID: <38475ED9.982B3287@geoserve.net> Another source for ideas on the use of XML: http://entity.netidea.com/ Cheers. Jeff From bizzaro at geoserve.net Fri Dec 3 01:36:01 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:00 2006 Subject: [Pipet Devel] BLADE Message-ID: <384764D1.A7893148@geoserve.net> Perhaps what we could use for a Web interface to Loci: http://www.thestuff.net/bob/projects/blade/ Cheers. Jeff From bizzaro at geoserve.net Fri Dec 3 20:39:40 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] [Fwd: WHAX] Message-ID: <384870DB.D915F35B@geoserve.net> This message is from Susan Davidson, who resently spoke with Jennifer at UChicago. Susan mentioned in her talk there the WHAX XML model for tree structures. Jeff -------------- next part -------------- An embedded message was scrubbed... From: Susan Davidson Subject: WHAX Date: Fri, 03 Dec 1999 17:29:28 EST Size: 1723 Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19991204/90ca56ff/attachment.mht From chapmanb at arches.uga.edu Sun Dec 5 13:53:54 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] WHAX and Loci storage ideas In-Reply-To: <384870DB.D915F35B@geoserve.net> Message-ID: Oh Great Locians; Hello! I have been doing some more thinking about data storage for loci and along these lines have read up on the WHAX stuff. Below I kind of give a quick overview of WHAX (for those of you who didn't like the looks of the 60 page technical document about it!) based on what I was able to get out of it (since I'm not a database expert). Then I follow up with a plan for data storage for Loci based on WHAX ideas, info from the archives, and my own random ideas. Sorry it's so long, but I would really be interested in hearing everyone's comments if they can make it all of the way though! WHAX (Warehouse Architechture for XML) -------------------------------------- Basically, this is a technical document detailing the implementation of WHAX. Basically, what WHAX is designed to do is to take selected information from a data source, which can be either a database or an XML document, and represent it as an "XML Warehouse." This XML Warehouse contains specific information from a database which has been selected by the user. For instance, if you had a database full of books you've read, you could create an XML warehouse of all of the books you've read that were written by Stephen King. Some key characteristics of an XML Warehouse is that it is in XML format and is represented by a tree structure. So based on my limited XML knowledge, this seems analagous to a Document Object Model (DOM). What WHAX does is define a method for upkeeping this XML Warehouse. The upkeep is unique from upkeep of databases because XML is in a semi-structured format--the paper describes it as "self-describing, irregular data." That paper details methods for changing the XML warehouse when new data is added or removed, and for keeping the warehouse consistent with changes in the underlying database where the XML warehouse got its information from. Data Storage in Loci -------------------- Reading through this document got me thinking about how this could be applied to Loci and I came up with the following model of data storage in Loci. To make things simpler in my head, I split the data storage needs of Loci (according to my, hopefully correct!, model of Loci) into three categories: 1. The data that comes in as a document (for instance, a set of sequences in FASTA format). These are the input files provided by the user. 2. The actual setup of a workflow diagram--the underlying structure of the diagram (how all of the loci are connected together). This is supplied by the user in the workflow diagram by connecting all of the dots together and constructing the command-lines (in the words of Jeff!). 3. The internal XML warehouse (to use my new WHAX-learned term!). This would be a subset of the supplied data (1.) that is passed from loci to loci according to the work flow diagram. Jeff describes this very well (Data Storage Interfaces--June 11) as an XML document that travels from loci to loci and changes XML formats (ie. changes to different document structures according to the specific DTD (document type definition) needed at that loci). Each of these points has a specific storage needs, so I have come up with a separate plan for each of them: 1. Input Data: Since the user supplied this data, it is their choice to determine how they want to deal with it. If they want to store it as a backup in a database of some sort, then they can do this through the work flow diagram. So the data can be stored in a 'plug-in' database (what Gary and Jeff mentioned to be). This type of interface/data storage component isn't "essential" to the functioning of Loci, so I will go on to the essential data storage needs. 2. Workflow Data: Loci will need a method to store the user defined workflow diagram. This diagram includes: 1. the setup of the workflow diagram (how everything is connected together) 2. The constructed command line for each program 3. more???. This is the kind of storage need I was thinking about when I wrote my incoherent message a couple of days ago about trees and graphs. Basically, my thinking is that we can stick all of the information from a workflow diagram into a data stucture, and then move through this structure in the specified order to execute the contents of the workflow diagram. My new data structure of choice is a flow network (still from Intro Algorithms). Basically I think each element of network would have a setup kind of like the following pseudo-code: data-structure loci: array[pointers] TheNextLoci #pointers to the loci which come next in #the flow diagram string Type # The loci type string IOName #the program or document represented by the loci tuple CommandLine #all of the command line arguments pointer XMLDocument #the info being processed pointer DTD #the document definition for the particular loci pointer ActionInstructions #a document with what to do at that loci Of course, this would require each loci to setup a DTD type file that has the specifications to create a document for the particular program (I talk more about how I think this would work in point 3. below) and also an ActionInstruction to determine what to do at that loci (ie. display a pdb file in RasMol, align sequences from the XML document etc.). My mental image is that the XML document would move into a particular locus, be converted to the DTD required for that particular locus, and then processed according to the specifications of the program at that locus. I imagine the setup of the DTD and action instructions would be part of the plug-in process for each program that needs to read a document into or get info from the workflow diagram. 3. Internal XML warehouse: My thoughts on this on pretty directly based off the WHAX paper. Here is kind of what I imagine happening with a document that comes into Loci. First the document will be converted into XML format based on the DTD of the locus (ie. the type of data in the document). This XML document will then be put into an XML database (Note: This is kind of what I was thinking before--have a database to store info instead of a specific internal format.) Then, as you progress through the work-flow diagram, each loci will create an XML warehouse from the XML database based on the DTD requirements of the particular loci. So what I am thinking is that we can use the WHAX system to maintain an XML document that has all of the info needed for a particular locus. For instance, if we come to a processor that requires sequences in the database in FASTA format, we can pull out the sequences and other required info from the database and update the XML warehouse to have this info. So we would maintain a view of the data available in the database and update it for the needs of a locus. Okay, I should stop talking about this point before I get any more confusing! More ranting ---------------------- Basically, I am proposing a plan whereby we eliminate a specific internal storage format and essentially put everything into a database. Of course, this type of plan "requires" a database, and here I was thinking that we could use dbXML (http://www.dbXML.org), mentioned by Jeff in the archives. The database is under a BSD-style license (which I think is compatible with the LGPL) and although it still doesn't "do" anything yet, it is under current development (most recent tarball = November 27th) and we could try and coordinate development with Tom Bradford, the developer there. He is developing it in C++ with a CORBA interface (he is using ORBacus as his ORB), so ultimately the database could also be pluggable (you could use any XML storage database), which fits in well with the Loci schema. The reason that I think this kind of plan is better than an internal format is that it gives us a lot of flexibility to input any kind of information, as Jennifer was talking about. For instance, say we had a program to plug in that uses specific animal descriptors to build an evolutionary tree. So you might have data for an anteater in the input file like: Sharp and Pointy Long Really Long (Okay, so I don't know anything about anteaters! Sorry!). With an internal data format, we could have to define a new DTD to include these three elements but with a database format, I don't think this would be necessary. Okay, well basically this is what has been on my mind for the past couple of days and hopefully I've managed to scrape it together in a semi-organized fashion. I would be really interested to hear everyone's comments about the ideas to see if they are along the lines of other peoples' thinking or just really crazy. Also, thank you very much if you read this through all of the way to the end! Brad . From gvd at redpoll.pharmacy.ualberta.ca Sun Dec 5 23:30:16 1999 From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] WHAX and Loci storage ideas References: Message-ID: <384B3BD8.6B557542@redpoll.pharmacy.ualberta.ca> Brad, It good to know that someone is thinking about data storage issues for Loci. This is an important and (in my personal opinion) underdiscussed topic. Let's disscuss some of these ideas now. For clarity, lets keep in mind that Loci is constructed in a 'three-tier' architecture: 1. The GUI 'Front-end' with 'bindings' to the 'Middleware'. 2. The 'Middleware', which is the CORBA, or command line interface, or http protocol, or whatever is needed to access the 'Back-end'. These will be the services that allow the backend to interoperate, as dictated by the WFD. A 'data translator locus' is a good example of loci 'middleware'. The database used to store the individual loci contained within a 'container locus' would be another example. 3. The Back-end, which are the information repositories (filesystems, databases, and so on), and the analysis programs that manipulate the data. The back-end likely is diverse, both architecturally and geographically. Note that nowhere in this description is there any mention of data-type: Loci can work for physicists as well as it can for bioinformaticists, but we are all bioinformaticists here, so we always provide our scenarios (and will use Loci) as a bioinformatics application. A multiple-alignment program is a good example of a 'back-end' locus. The back-end 'resources' are the 'loci'. They are represented by the icons / nodes in the Front-End, and made interoperable by the middleware. The front-end and the back-end dont even know about each other. Although I'm not the absolute authority on Loci's architecture, and the architecture likely will cotinue to evolve, I'm relatively certain that this is the current 'Loci architectural paradigm'. I'm pretty certain that you already understand this paradigm, but I thought I should make it explicit for the sake of discussing your ideas on data storage for Loci. Brad Chapman wrote: > WHAX (Warehouse Architechture for XML) > -------------------------------------- > Basically, this is a technical document detailing the > implementation of WHAX. Basically, what WHAX is designed to do is to take > selected > information from a data source, which can be either a database or an XML > document, and represent it as an "XML Warehouse." This XML Warehouse > contains specific information from a database which has been selected by > the user. For instance, if you had a database full of books you've read, > you could create an XML warehouse of all of the books you've read that > were written by Stephen King. Some key characteristics of an XML Warehouse > is that it is in XML format and is represented by a tree structure. So > based on my limited XML knowledge, this seems analagous to a Document > Object Model (DOM). > What WHAX does is define a method for upkeeping this XML Warehouse. > The upkeep is unique from upkeep of databases because XML is in a > semi-structured format--the paper describes it as "self-describing, > irregular data." That paper details methods for changing the XML warehouse > when new data is added or removed, and for keeping the warehouse consistent > with changes in the underlying database where the XML warehouse got its > information from. The URL for this document is http://db.cis.upenn.edu/cgi-bin/Person.perl?susan The document title is: Efficient View Maintenance in XML Data Warehouses > Data Storage in Loci > -------------------- > Reading through this document got me thinking about how this could > be applied to Loci and I came up with the following model of data storage > in Loci. > > To make things simpler in my head, I split the data storage needs of Loci > (according to my, hopefully correct!, model of Loci) into three categories: > > 1. The data that comes in as a document (for instance, a set of > sequences in FASTA format). These are the input files provided by > the user. Or retrieved from a database query, or output by an analysis program. > > 2. The actual setup of a workflow diagram--the underlying structure of the > diagram (how all of the loci are connected together). This is supplied by > the user in the workflow diagram by connecting all of the dots together and > constructing the command-lines (in the words of Jeff!). This is my understanding as well, although the WFD will be constructed via a graphical shell, which has a 'thin interface' to the middleware. When you say 'constructing the command-lines', do you mean 'generating the interface to the middleware'? > > 3. The internal XML warehouse (to use my new WHAX-learned term!). This > would be a subset of the supplied data (1.) that is passed from loci to > loci according to the work flow diagram. Jeff describes this very well > (Data Storage Interfaces--June 11) as an XML document that travels from > loci to loci and changes XML formats (ie. changes to different document > structures according to the specific DTD (document type definition) needed > at that loci). > > Each of these points has a specific storage needs, so I have come up with a > separate plan for each of them: > > 1. Input Data: Since the user supplied this data, it is their choice to > determine how they want to deal with it. If they want to store it as a > backup in a database of some sort, then they can do this through the work > flow diagram. So the data can be stored in a 'plug-in' database (what Gary > and Jeff mentioned to be). This type of interface/data storage component > isn't "essential" to the functioning of Loci, so I will go on to the > essential data storage needs. Exactly. Using Jeff's analogy, what if we were to retrieve an entire 2 Terabyte sequence file, in GenBank format, from the NCBI database, and wanted to search the entire file against the cDNA for alpha-hemoglobin. Lets suppose further that we had access to a remote analysis program running on a fancy supercomputer that did BLAST searches for us and required GenBank formatted files to perform the search. Suppose further that the NCBI database and the Supercomputer were on the same machine. We could construct a WFD where we retrieve the 2 Terabyte file from NCBI and 'pipe' it directly to the analysis program, along with our a-hemoglobin cDNA, and BLAST away. In theory, Loci would send the data from the database and through the analysis program, possibly without the data ever touching a network-interface card, and without ever being reformatted If however, Loci required the data to be reformatted and stored in an intermediate database, say on my 66Mhz 486 with 400 MB Hard drive and 4Mb ram, I'd be running for the fire-extinguisher as my cpu exploded in a core-dumping ball of fire. On the other hand, what if we planned to do our entire thesis project based upon the information kept in that 2 Terabyte file? Would we want to retrieve it from the NCBI database everytime we wanted to do an analysis on it, especially if we wanted only to search a small segment of it? No way! we would wan to have that file stored in a fashion wherein we could easily extract only the parts that we are interested in performing an analysis on. This is where Loci's ability to store sequence data in a database becomes important. > > 2. Workflow Data: Loci will need a method to store the user defined > workflow diagram. This diagram includes: 1. the setup of the workflow > diagram (how everything is connected together) 2. The constructed command > line for each program 3. more???. This is the kind of storage need I was > thinking about when I wrote my incoherent message a couple of days ago > about trees and graphs. Basically, my thinking is that we can stick all of > the information from a workflow diagram into a data stucture, and then move > through this structure in the specified order to execute the contents of > the workflow diagram. My new data structure of choice is a flow network > (still from Intro Algorithms). Basically I think each element of network > would have a setup kind of like the following pseudo-code: > > data-structure loci: > array[pointers] TheNextLoci #pointers to the loci which come next in > #the flow diagram > string Type # The loci type > string IOName #the program or document represented by the loci > tuple CommandLine #all of the command line arguments > pointer XMLDocument #the info being processed > pointer DTD #the document definition for the particular loci > pointer ActionInstructions #a document with what to do at that loci We still need to formalize the interface to the the command-line-run backend apps. but this sounds about right to me. The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence Analysis working group has a nearly complete RFP (http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_RFP.html) for sequences and their alignment and annotation. Loci plans to adopt their CORBA IDL for passing biomolecular sequence objects to CORBA-compliant backend apps. This RFP has 'XML extensions' for future compatability, btw. > > Of course, this would require each loci to setup a DTD type file that has > the specifications to create a document for the particular program (I talk > more about how I think this would work in point 3. below) and also an > ActionInstruction to determine what to do at that loci (ie. display a pdb > file in RasMol, align sequences from the XML document etc.). > My mental image is that the XML document would move into a > particular locus, be converted to the DTD required for that particular > locus, and then processed according to the specifications of the program at > that locus. I imagine the setup of the DTD and action instructions would be > part of the plug-in process for each program that needs to read a document > into or get info from the workflow diagram. My understanding is that Loci will come with 'data translators' (middleware) that will be placed between a document / database to accomodate the formatting requirements of the analysis program that will operate on the document. > > 3. Internal XML warehouse: My thoughts on this on pretty directly based off > the WHAX paper. Here is kind of what I imagine happening with a document > that comes into Loci. First the document will be converted into XML format > based on the DTD of the locus (ie. the type of data in the document). This > XML document will then be put into an XML database (Note: This is kind of > what I was thinking before--have a database to store info instead of a > specific internal format.) I think this is appropriate only for Loci's own internal data requirements, but violates Loci's 'laissez-faire' paradigm for operating on 'exogenous' data. Jeff explained to me best when he said that Loci should be like the Bash shell: the bash shell has redirection operators and pipes, which you can combine to do some fairly sophisticated data processing, for example: bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt Here bash will pipe the contents of /var/adm/messages to grep, which will extract all the lines containing the word 'root' and place them in the /tmp/root.txt file. Bash itself cares not about the contents of /var/adm/messages, doesnt reformat it, doesnt store it in an intermediate database, then re-extract it from the database, reformat it once again, and finally pump out the /tmp/root.txt file according to some xml dtd. Neither should Loci, in its most abstracted form. Instead,the data conversions and XML operations should be the modular extensions to Loci that we provide as valuable options for the end-user, so that Loci becomes not just a graphical 'bash', but a sophisticated distributed data processing system. Not that a graphical bash wouldn't be nice: the gnome dudes have talked about using Loci's graphical shell to do just that! Bottom line: maximum abstraction + maximum modularization = maximum flexibility = maximum power! gary -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Gary Van Domselaar gvd@redpoll.pharmacy.ualberta.ca Faculty of Pharmacy Phone: (780) 492-4493 University of Alberta FAX: (780) 492-5305 Edmonton, Alberta, Canada http://redpoll.pharmacy.ualberta.ca/~gvd From David.Lapointe at umassmed.edu Mon Dec 6 13:21:40 1999 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Linux Clusters vs SMP Message-ID: <93307F07DE63D211B2F30000F808E9E501644F33@edunivexch02.umassmed.edu> There is an interesting discussion about SMP vs Beowulf going on on bionet.software. Here's an attachment: I missed the first article by David Mathog. <> David Lapointe, Ph.D. Research Computing Manager 6-5141 "What we obtain too cheap, we esteem too lightly." - T. Paine -------------- next part -------------- From: wrp@alpha0.bioch.virginia.edu (William R. Pearson) Subject: Re: SMP vs. Beowulf? Newsgroups: bionet.software Date: 02 Dec 1999 15:18:22 -0500 Organization: University of Virginia We have not looked into SMP vs Beowulf exhaustively, but we have quite a bit of experience. (1) SMP is far easier to configure and run than PVM (or MPI or others). You just run the program; if its threaded SMP, it runs faster. SMP programs are also much easier to develop and debug. (2) Our current PVM implementation is not as CPU efficient as spawning a bunch of threaded fasta33_t runs when the algorithm is fast. For Smith-Waterman, which is compute bound, they are equally efficient. In line with point (1), I think it is easier to improve the performance of an SMP program. I don't think this is an inherent shortcoming of PVM, but reflect the fact that our PVM implementaion (and very primitive scheduling system) was build when machines and interconnections were much slower. (3) However, we have not yet found a version of Linux Pthreads that works 100% of the time. With the kernal and C-libraries that we use, we see failures which are almost certainly caused by Linux Pthreads. (We never see them in any other environment, and we don't see them unthreaded.) Linux PVM is very reliable. So we use both. We use PVM for genome-vs-genome Smith-Waterman searches, and we use SMP threaded versions for our WWW server. Starting up PVM (or any other system that spawns large numbers of jobs on other machines) has a high overhead, which isn't worth the cost when the search will be done in a few minutes - we don't see nearly as much overhead with SMP machines. But large SMP machines are considerably more expensive. A cost-effective solution is a WWW server that sends its searches to a bank of 1-CPU or 2-CPU machines. Bill Pearson ############ From: Tim Cutts Subject: Re: SMP vs. Beowulf? Newsgroups: bionet.software Date: 03 Dec 1999 11:30:28 +0000 (GMT) Organization: Linux Unlimited William R. Pearson wrote: > >We have not looked into SMP vs Beowulf exhaustively, but we have quite >a bit of experience. > >(1) SMP is far easier to configure and run than PVM (or MPI or > others). You just run the program; if its threaded SMP, it runs > faster. SMP programs are also much easier to develop and debug. There are a couple of points to make here. 1) MPI is far more efficient than PVM. No-one should be using PVM these days. 2) MPI is more flexible than threads in that an MPI version of a program can still be run on an SMP machine, as well as on a distributed network. Programs like BLAST and FASTA have a problem in that their I/O requirements are large, and this can be a real performance problem on a distributed network. For example, you could think of implementing your parallel program by giving each MPI process part of the database to work on. The problem there is that you have a large overhead in getting the database to the processor. Ethernet is too slow, and will destroy any performance gain from the parallel code. A better solution, easier to implement, and probably more useful for most purposes, is a workstation farm with each node having a local copy of all the target databases, and run normal single threaded blast on each. For large scale work, you typically want to blast lots of sequences against several databases, so such coarse grained parallelisation is fine. You just need some way of distributing the blast jobs to your farm. You can either do this with some fairly trivial perl scripting, or you can use some more flexible commercial offering. I can highly recommend platform computing's LSF package. It's expensive, but it extremely good at managing workstation farms, in particular with cycle stealing from machines when they're idle. Using LSF at the University of Cambridge, I got 100 %CPU utilisation on a 20 workstation farm. These were interactive workstations too; people doing NMR spectrum assignment at the workstations weren't even aware their machines were also performing highly CPU intensive analysis jobs in the background. Efficient use of the workstations like this ultimately saved money, since they realised that they no longer needed to buy further machines. Tim. ########## From: Piotr Kozbial Subject: Re: SMP vs. Beowulf? Newsgroups: bionet.software Date: Sat, 04 Dec 1999 15:11:40 +0100 Organization: http://news.icm.edu.pl/ Reply-To: piotrk-NO@SPAMM-ibb.waw.pl There are other kinds of Linux clusters. You can read discussion about "Choosing the Right Cluster System" http://slashdot.org/article.pl?sid=99/11/12/0354238 For example (posted by SEWilco): Beowulf is one of a family of parallel programming API tools. Programs must use the API to accomplish parallel programming. http://cesdis.gsfc.nasa.gov/linux/beowulf/beowulf.html SCI is fast hardware with support for distributed shared memory, messaging, and data transfers. Again, if you don't use the API then no gain. http://nicewww.cern.ch/~hmuller/sci.htm DIPC is distributed System V IPC. Programs which use the IPC API can be converted to DIPC easily, such as just by adding the DIPC flag to the IPC call. http://wallybox.cei.net/dipc/dipc.html MOSIX is the most general-purpose. Processes are scattered across a cluster automatically without having to modify the programs. No API needed other than usual Unix-level process use. Allows parallel execution of any program, although full use requires a parallel program design. http://www.cnds.jhu.edu/mirrors/mosix/ From bizzaro at geoserve.net Mon Dec 6 14:10:19 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Conglomerate Message-ID: <384C0A1B.9F24D3A5@geoserve.net> This may be interesting to us in more than one way: http://www.conglomerate.org/ Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From chapmanb at arches.uga.edu Mon Dec 6 20:18:08 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] WHAX and Loci storage ideas In-Reply-To: <384B3BD8.6B557542@redpoll.pharmacy.ualberta.ca> References: Message-ID: Gary et al.; Thanks for getting back with me about my data storage thinking! I think I may have the idea now--so I kind of work through everything in the rest of this e-mail, and then humbly propose a short-term development plan(!) just for the sake of argument. Gary Van Domselaar wrote: >in mind that Loci is constructed in a 'three-tier' architecture: >1. The GUI 'Front-end' with 'bindings' to the 'Middleware'. >2. The 'Middleware', which is the CORBA, or command line interface, or >3. The Back-end, which are the information repositories (filesystems, >I'm pretty certain that you already understand this paradigm, but I >thought I should make it explicit for the sake of discussing your ideas >on data storage for Loci. Yeah, I have a firm grasp on the theory but in practice, I know that I have a lot of difficulty separating Front-End (ie. Loci proper) and Middleware (ie. plug-ins to Loci). I apologize about that--I know that some of my thoughts probably reflect my inability to separate these components. I'm working at it! Gary Van Domselaar wrote: >> WHAX (Warehouse Architechture for XML) >> -------------------------------------- >The URL for this document is >http://db.cis.upenn.edu/cgi-bin/Person.perl?susan > >The document title is: Efficient View Maintenance in XML Data Warehouses Thanks, I meant to include that info! Gary Van Domselaar wrote: >> 1. The data that comes in as a document (for instance, a set of >> sequences in FASTA format). These are the input files provided by >> the user. > >Or retrieved from a database query, or output by an analysis program. Right-o! Gary Van Domselaar wrote: >> >> 2. The actual setup of a workflow diagram--the underlying structure of the >> diagram (how all of the loci are connected together). This is supplied by >> the user in the workflow diagram by connecting all of the dots together and >> constructing the command-lines (in the words of Jeff!). > >This is my understanding as well, although the WFD will be constructed >via a graphical shell, which has a 'thin interface' to the middleware. >When you say 'constructing the command-lines', do you mean 'generating >the interface to the middleware'? What I think this refers to is generating a command-line for a program by using a GUI to input all of the switches. For instance, if I were using program foo that used a -l switch to specify a log file, I would use the Loci interface to generate the equivalent of 'foo -l /var/mylogfile.' My thinking was that 'the interface to the middleware' would be worked out during the programming of the plug-in to work with Loci. For instance, to get Loci to use my sequence viewer program, I would have to tell it by writing the plug-in: 1. What kind of file the program needs (ie. PDB, FASTA, etc) 2. How to work the program (ie. the command line stuff: the switches it takes, etc) Loci would then take this info and have a GUI for 'constructing the command line' (getting the switches set up) and do error checking do make sure the user supplies the right file for the program. At least, this is my current understanding of how stuff would work Gary Van Domselaar wrote: >We still need to formalize the interface to the the command-line-run >backend apps. but this sounds about right to me. > >The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence >Analysis working group has a nearly complete RFP >(http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_R >FP.html) >for sequences and their alignment and annotation. Loci plans to adopt >their CORBA IDL for passing biomolecular sequence objects to >CORBA-compliant backend apps. This RFP has 'XML extensions' for future >compatability, btw. Thanks--I'll take a look at it (whenever I am feeling up to looking at a huge document with half the lines crossed out!). I just came up with that "interface" specification off the top of my head--just wanted to make sure I was on the right track. Gary Van Domselaar wrote: >I think this is appropriate only for Loci's own internal data >requirements, but violates Loci's 'laissez-faire' paradigm for operating >on 'exogenous' data. Jeff explained to me best when he said that Loci >should be like the Bash shell: the bash shell has redirection operators >and pipes, which you can combine to do some fairly sophisticated data >processing, for example: > >bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt > >Here bash will pipe the contents of /var/adm/messages to grep, which >will extract all the lines containing the word 'root' and place them in >the /tmp/root.txt file. Bash itself cares not about the contents of >/var/adm/messages, doesnt reformat it, doesnt store it in an >intermediate database, then re-extract it from the database, reformat it >once again, and finally pump out the /tmp/root.txt file according to >some xml dtd. Neither should Loci, in its most abstracted form. I really like the idea of piping! You (and Jeff) are right, there is no reason to stick stuff in a database if you could just pipe it around. However, I have a couple of practical questions for using a piping approach like this: 1. If you have data from a number of sources in a bunch of different formats, how would you get them together to pipe them into a program that would require them all in one text document in, say, FASTA format? Would you have to run each of them through a converter to get them in a common format, then pipe them all into a processor that would stick them into a single file? 2. Conversely, what if you had a huge document and wanted to break it up into smaller documents? For example, what if you had a swiss-prot file and wanted to get just the protein sequences for all Zea mays (corn) accessions--how would this be done? 3. How could individual parts of the data be queried or reordered? For instance, if I wanted to separate all sequences with a particular motif out of a file and then reorder them by organism. 4. What about doing things like generating GUIs on the fly, as Jeff talked about in the 'constructing the command line' mail? He mentioned getting a pyGTK GUI directly from a Glade output XML document in this case, but similary, what if we wanted to put the output into a web browser? Would we convert the file to XML, then process it into HTML/GladeXML and then output it? These are just a few concerns I thought up for discussion regarding the piping system you described. I really like the idea, and think it would be a more straightforward to do, but my only concern is how well it would scale as operations got more complicated. I guess I have been thinking of Loci more as a graphical scripting language, which I imagine having a lot more options then just a redirection shell. Gary Van Domselaar wrote: >Instead,the data conversions and XML operations should be the modular >extensions to Loci that we provide as valuable options for the end-user, >so that Loci becomes not just a graphical 'bash', but a sophisticated >distributed data processing system. Not that a graphical bash wouldn't >be nice: the gnome dudes have talked about using Loci's graphical shell >to do just that! Bottom line: maximum abstraction + maximum >modularization = maximum flexibility = maximum power! You are absolutely right! The best way to combine the piping backbone with the scripting extensions would be to use a pluggable database type option (the container) within the pipeline as I was mentioning before. There I was thinking more in the context of a relational database for long term storage but now I am thinking more in terms of an XML type database for stort term storage for Loci's internal data requirements. Alright, yet another separation between Front-end and Middleware! Sorry that I did not grasp this sooner! So, how does this new paradigm for storage sound?: 1. Front-end: No storage capabilities of its own. Used to organize the connections to the middleware and pass data around. 2. Middleware--2 storage options: a. Provide option for XML storage of an "internal XML format." If a user has a need for more complicated data-handling (as I described in my questions above), they can utilize this option to place things in an internal XML database and then use the XML warehouse kind of stuff I described in point 3 in my last e-mail. b. Provide an option for permanent storage with relational databases (ie. MySQL, PostgreSQL, Sybase ...), so that the data can be available after Loci has quit. The middleware would handle the connections between the Loci front-end, which asks for a database or internal format, and the back-end, which provides it. 3. Back-end: All of the databases themselves. If this sounds like a plan, then I would like to humbly propose an immediate development focus: Get the piping stuff working with the Loci front-end so that we can do something like the following: 1. Input a sequence in FASTA format 2. Convert it to a new format 3. View it in a sequence viewer. This type of activity would not require any storage options, so this would simplify things. In addition, Jeff has the GUI set-up to make the connections, so we are currently able to construct this kind of workflow diagram. I think reaching this kind of short term goal would be extremely exciting as Loci would actually "do" something and would provide us with a base for further development. How does this sound? Anyone for this? Hip-hip-hooray? Booooo? Whatta you think? Well, if you are to the end again, thank you very much! I would love to hear comments, etc. Also, I hope I don't step on any toes by making a development direction suggestion. I just want to get an idea of the short and long term goals of Loci and kind of find my place somewhere in there so I can have Loci working for my thesis project needs. Thanks again for listening! Brad From bizzaro at geoserve.net Tue Dec 7 09:26:00 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] WHAX and Loci storage ideas References: Message-ID: <384D18F8.76F10B4B@geoserve.net> Hey Brad! Having read through most of your message at this point, I want to first rehash a couple issues about the use of an 'internal format' or 'database': (1) We have to distinguish between 'data to be processed' and 'workflow data'. My objection to a _required_ internal format or database, is for data to be processed, NOT workflow data. Of course we need our own system of handling workflow data, and as Brad suggested, they can be kept in a database. (2) As for data to be processed (biological/bioinformatics data), we can come up with our own system, using XML or a database, or whatever. I just don't want to _require_ that every bioinformatics datum be converted to that format, without the user's knowledge. As Brad says, the user is responsible for knowing what to do with the data. (3) Processable data can be encapsulated in the workflow data, providing the format of the processable data is maintained. So, if locus represents a FASTA document, our workflow data should just insert the whole document, unchanged, between some tags: . Or if a database is used for Loci's infrastructure and workflow management, the whole document is kept there. BUT THE DATA WITHIN THE DOCUMENT IS NOT CHANGED BY LOCI: Only the user can make the change, and it is done via 'converter' loci. Brad Chapman wrote: > > To make things simpler in my head, I split the data storage needs of Loci > (according to my, hopefully correct!, model of Loci) into three categories: > > 1. The data that comes in as a document (for instance, a set of > sequences in FASTA format). These are the input files provided by > the user. Okay, in this case we're talking about data to be processed. > 2. The actual setup of a workflow diagram--the underlying structure of the > diagram (how all of the loci are connected together). This is supplied by > the user in the workflow diagram by connecting all of the dots together and > constructing the command-lines (in the words of Jeff!). This is workflow data. We're saying the WFD is a graphical script, which is a _script_ nonetheless, that has to be represented as text (underneath it all) and parsed by an interpreter (of our own invention) during execution. It may be obvious to some here that this is what we're aiming for (a scripting language), but some may be scared off thinking this is an enormous task. I like to think that it is exciting and challenging. > 3. The internal XML warehouse (to use my new WHAX-learned term!). This > would be a subset of the supplied data (1.) that is passed from loci to > loci according to the work flow diagram. Jeff describes this very well > (Data Storage Interfaces--June 11) as an XML document that travels from > loci to loci I think you're talking about workflow data here too. I learned that 'travel' is not the best word to use here, because it implies that everything has to be parsed and rewritten (literally moved) between every locus, even if all loci are on the local system. Humberto and Justin have correctly remarked that we want to minimize 'travel' where we can. In the case of all local loci, Loci (the program) should 'know' there is no need to move anything: Everything stays on the local filesystem. And in most cases, the data accompanying any communication between remote loci should _point_ (via URI) to where loci (documents, programs, etc.) lie and not assume the user wants or needs them: The user may already have the locus on his/her local computer. Also, since the remote system may be only the first in a chain/workpath of connected systems, it would be most efficient to have a pointer to any loci, rather than moving the whole thing across some umpteen nodes. IOW, I want the DNA doc on the 13th system I'm connected to. I can either make a direct connection to the 13th server via IP, or I can have the 13th send the doc to the 12th, which sends the doc to the 11th, which sends the doc to the 10th... (Get the picture?) > and changes XML formats (ie. changes to different document > structures according to the specific DTD (document type definition) needed > at that loci). I'm not sure if you're talking about workflow or processable data here. > Each of these points has a specific storage needs, so I have come up with a > separate plan for each of them: > > 1. Input Data: Since the user supplied this data, it is their choice to > determine how they want to deal with it. Amen brother! > If they want to store it as a > backup in a database of some sort, then they can do this through the work > flow diagram. So the data can be stored in a 'plug-in' database (what Gary > and Jeff mentioned to be). This type of interface/data storage component > isn't "essential" to the functioning of Loci, so I will go on to the > essential data storage needs. Something that needs serious thought, however, on the extensions end of this project. > 2. Workflow Data: Loci will need a method to store the user defined > workflow diagram. This diagram includes: 1. the setup of the workflow > diagram (how everything is connected together) 2. The constructed command > line for each program 3. more???. This is the kind of storage need I was > thinking about when I wrote my incoherent message a couple of days ago > about trees and graphs. Basically, my thinking is that we can stick all of > the information from a workflow diagram into a data stucture, and then move > through this structure in the specified order to execute the contents of > the workflow diagram. My new data structure of choice is a flow network > (still from Intro Algorithms). Basically I think each element of network > would have a setup kind of like the following pseudo-code: > > data-structure loci: > array[pointers] TheNextLoci #pointers to the loci which come next in > #the flow diagram > string Type # The loci type > string IOName #the program or document represented by the loci > tuple CommandLine #all of the command line arguments > pointer XMLDocument #the info being processed > pointer DTD #the document definition for the particular loci > pointer ActionInstructions #a document with what to do at that loci There is some talk about the format of 'workflow data' in the mail archives. There were even thoughts that workflow and processable data could be mixed...which gets back to a required internal data format. > Of course, this would require each loci to setup a DTD type file that has > the specifications to create a document for the particular program (I talk > more about how I think this would work in point 3. below) and also an > ActionInstruction to determine what to do at that loci (ie. display a pdb > file in RasMol, align sequences from the XML document etc.). Hmmm. > My mental image is that the XML document would move into a > particular locus, be converted to the DTD required for that particular > locus, and then processed according to the specifications of the program at > that locus. I imagine the setup of the DTD and action instructions would be > part of the plug-in process for each program that needs to read a document > into or get info from the workflow diagram. Oh okay, you're talking about wrapping programs not designed for Loci, to be used in Loci: workflow data. As I think you're suggesting, the same wrapping system should be used for all loci, whether they be data or programs. To a large extent, _something_ has to accompany each locus. > 3. Internal XML warehouse: My thoughts on this on pretty directly based off > the WHAX paper. Here is kind of what I imagine happening with a document > that comes into Loci. First the document will be converted into XML format > based on the DTD of the locus (ie. the type of data in the document). This > XML document will then be put into an XML database (Note: This is kind of > what I was thinking before--have a database to store info instead of a > specific internal format.) I'm not sure what you mean by 'document'. I usually use that word for processable data, but I think you're referring to workflow data. > Then, as you progress through the work-flow > diagram, each loci will create an XML warehouse from the XML database based > on the DTD requirements of the particular loci. So what I am thinking is > that we can use the WHAX system to maintain an XML document that has all of > the info needed for a particular locus. For instance, if we come to a > processor that requires sequences in the database in FASTA format, we can > pull out the sequences and other required info from the database and update > the XML warehouse to have this info. So we would maintain a view of the > data available in the database and update it for the needs of a locus. > Okay, I should stop talking about this point before I get any more > confusing! I think I may need some hand-holding on this. > More ranting > ---------------------- > > Basically, I am proposing a plan whereby we eliminate a specific internal > storage format and essentially put everything into a database. Of course, > this type of plan "requires" a database, and here I was thinking that we > could use dbXML (http://www.dbXML.org), mentioned by Jeff in the archives. I'm still not sure if you're suggesting that all processable (bioinformatics) data be broken up and converted into XML tags. > The database is under a BSD-style license (which I think is compatible with > the LGPL) It is. BSD allows proprietary/closed-source derivatives of your program, which I don't like. But it's not our program anyway. Providing we can ship it with Loci, that's all that matters to us. > and although it still doesn't "do" anything yet, it is under > current development (most recent tarball = November 27th) and we could try > and coordinate development with Tom Bradford, the developer there. Justin had some of his own ideas for an XML database, which he mentions on this list. He didn't give any details, so it's not worth searching for. But I thought you should know. Of course, our own would be LGPL'd. > He is > developing it in C++ with a CORBA interface (he is using ORBacus as his > ORB), so ultimately the database could also be pluggable (you could use any > XML storage database), which fits in well with the Loci schema. We could use it until something better (uses ORBit, Python, LGPL) comes along. > The reason that I think this kind of plan is better than an internal format > is that it gives us a lot of flexibility to input any kind of information, > as Jennifer was talking about. For instance, say we had a program to plug > in that uses specific animal descriptors to build an evolutionary tree. So > you might have data for an anteater in the input file like: > > Sharp and Pointy > Long > Really Long > > (Okay, so I don't know anything about anteaters! Sorry!). With an internal > data format, we could have to define a new DTD to include these three > elements but with a database format, I don't think this would be necessary. I would consider this for a plug-in database and not mix processable data with workflow data. So, are we looking at parallel (two, interconnected) databases? If someone wants to use Loci for, say physics, would this be a problem? Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Tue Dec 7 09:51:02 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] WHAX and Loci storage ideas References: <384B3BD8.6B557542@redpoll.pharmacy.ualberta.ca> Message-ID: <384D1ED6.1C18564F@geoserve.net> Gary Van Domselaar wrote: > > It good to know that someone is thinking about data storage issues for > Loci. ...cuz we can't rely on Jeff ;-) > 'middleware'. The database used to store the individual loci contained > within a 'container locus' would be another example. Interesting point. A database for a locus's workflow data is middleware, but a database for a locus's processable data is back-endware. > mention of data-type: Loci can work for physicists as well as it can > for bioinformaticists, but we are all bioinformaticists here, so we And Brad, this is why you can give an example of ant-eater physiology. If any one of us designed Loci for ourselves, the audience would be very small. Even within the scope of bioinformatics, it would be limited. > Although I'm not the absolute authority on Loci's architecture, No one is ;-) > On the other hand, what if we planned to do our entire thesis project > based upon the information kept in that 2 Terabyte file? Would we want > to retrieve it from the NCBI database everytime we wanted to do an > analysis on it, especially if we wanted only to search a small segment > of it? No way! we would wan to have that file stored in a fashion > wherein we could easily extract only the parts that we are interested in > performing an analysis on. This is where Loci's ability to store > sequence data in a database becomes important. Everytime Loci 'points' to a locus (see my last message), the user should have the option to download the whole thing. If remote_locus_1 is a processor and remote_locus_2 is the data, and they both reside on the same remote computer, NOTHING should be passed back to the user but the results of the process. This is why we use pointers (URI's - not C pointers): low bandwith usage, convenience. But if the user really wants remote_locus_2 on his/her computer, he/she should be able to 'get it'. I haven't thought about how the user interface for this would work. > The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence > Analysis working group has a nearly complete RFP > (http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_RFP.html) > for sequences and their alignment and annotation. Loci plans to adopt > their CORBA IDL for passing biomolecular sequence objects to > CORBA-compliant backend apps. This RFP has 'XML extensions' for future > compatability, btw. Right, and AppLab and some others have adopted the RFP. > My understanding is that Loci will come with 'data translators' > (middleware) that will be placed between a document / database to > accomodate the formatting requirements of the analysis program that will > operate on the document. Again, it depends on whether Brad was talking about workflow data or processable data. > I think this is appropriate only for Loci's own internal data > requirements, but violates Loci's 'laissez-faire' paradigm for operating > on 'exogenous' data. Jeff explained to me best when he said that Loci > should be like the Bash shell: the bash shell has redirection operators > and pipes, which you can combine to do some fairly sophisticated data > processing, for example: > > bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt > > Here bash will pipe the contents of /var/adm/messages to grep, which > will extract all the lines containing the word 'root' and place them in > the /tmp/root.txt file. Bash itself cares not about the contents of > /var/adm/messages, doesnt reformat it, doesnt store it in an > intermediate database, then re-extract it from the database, reformat it > once again, and finally pump out the /tmp/root.txt file according to > some xml dtd. Neither should Loci, in its most abstracted form. > Instead,the data conversions and XML operations should be the modular > extensions to Loci that we provide as valuable options for the end-user, > so that Loci becomes not just a graphical 'bash', but a sophisticated > distributed data processing system. You said it so well! > Not that a graphical bash wouldn't > be nice: the gnome dudes have talked about using Loci's graphical shell > to do just that! Bottom line: maximum abstraction + maximum > modularization = maximum flexibility = maximum power! So, Loci is more like a graphical bash + some nifty programs to go with it. Regarding processable data format conversions, a bash command might work like this: echo data.fasta | fasta2xml | bioxmlview.py Does bash need to know ANYTHING about biological data??? Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Tue Dec 7 10:44:55 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] WHAX and Loci storage ideas References: Message-ID: <384D2B77.36B555C4@geoserve.net> Brad Chapman wrote: > > >This is my understanding as well, although the WFD will be constructed > >via a graphical shell, which has a 'thin interface' to the middleware. > >When you say 'constructing the command-lines', do you mean 'generating > >the interface to the middleware'? > > What I think this refers to is generating a command-line for a program by > using a GUI to input all of the switches. For instance, if I were using > program foo that used a -l switch to specify a log file, I would use the > Loci interface to generate the equivalent of 'foo -l /var/mylogfile.' That's exactly right, and applies pretty much to generating commands for command-line applications. I would like as much as possible for other interface constructions to work in a similar fashion. The idea is this: LOCI IS IT'S OWN SOFTWARE DEVELOPMENT KIT (SDK). If you think about it, most programming and GUI building paradigms use tree and workflow models. If we can carry these over to Loci, you have a capable and flexible development environment TOO! > My > thinking was that 'the interface to the middleware' would be worked out > during the programming of the plug-in to work with Loci. For instance, to > get Loci to use my sequence viewer program, I would have to tell it by > writing the plug-in: > > 1. What kind of file the program needs (ie. PDB, FASTA, etc) > 2. How to work the program (ie. the command line stuff: the switches it > takes, etc) > > Loci would then take this info and have a GUI for 'constructing the command > line' (getting the switches set up) and do error checking do make sure the > user supplies the right file for the program. > At least, this is my current understanding of how stuff would work It sounds about right to me. Later, we'll need some people thinking about how to add these features to the Loci 'SDK'. > I really like the idea of piping! You (and Jeff) are right, there is no > reason to stick stuff in a database if you could just pipe it around. > However, I have a couple of practical questions for using a piping approach > like this: > > 1. If you have data from a number of sources in a bunch of different > formats, how would you get them together to pipe them into a program that > would require them all in one text document in, say, FASTA format? Would > you have to run each of them through a converter to get them in a common > format, then pipe them all into a processor that would stick them into a > single file? I think you hit the nail on the head. > 2. Conversely, what if you had a huge document and wanted to break it up > into smaller documents? For example, what if you had a swiss-prot file and > wanted to get just the protein sequences for all Zea mays (corn) > accessions--how would this be done? You'd need a processor (or database query) to do this. It'd be better to have a more general-purpose processor (can handle extracting all sorts of data) than a special purpose one. And (if we make our own) the processor should work from one 'good' data format, leaving translation from swiss-prot to a converter locus. Let's say the 'good' data format is 'HumbertoXML' ;-) swiss-prot swiss-prot breaker- ----> Zea mays document ----> to HXML ----> upper ----> sequences converter ----> in HXML > 3. How could individual parts of the data be queried or reordered? For > instance, if I wanted to separate all sequences with a particular motif out > of a file and then reorder them by organism. If this stuff was databased first, you could use a more sophisticated query system than above. So, you may want to pipe your data into a database to start. > 4. What about doing things like generating GUIs on the fly, as Jeff talked > about in the 'constructing the command line' mail? He mentioned getting a > pyGTK GUI directly from a Glade output XML document in this case, but > similary, what if we wanted to put the output into a web browser? Would we > convert the file to XML, then process it into HTML/GladeXML and then output > it? Web output of Loci interfaces is a tricky problem, and the whole Web interface project is the biggest sub-project to Loci. I can think of some ways to make simple and limited Web interfaces, but just like you cannot get MS Word to run via HTML browser, many Loci interfaces cannot not be run this way. This is why people made Java applets, etc. What I am hoping to be able to do is convert diagrams or illustrations (for example, protein motifs) made by Loci into JPG's for Web display. I'm trying to be realistic about this part of Loci. > These are just a few concerns I thought up for discussion regarding the > piping system you described. I really like the idea, and think it would be > a more straightforward to do, but my only concern is how well it would > scale as operations got more complicated. I guess I have been thinking of > Loci more as a graphical scripting language, which I imagine having a lot > more options then just a redirection shell. Alright, as far as scripting languages are concerned, Loci is very limited. But I'd like to think of it as being 'high-level' or a '4GL' (fourth generation language). And I think that keeping Loci agnostic of data type does not deminish its capabilities. How can one turn a redirection shell into a scripting language? As long as we're looking at bash as an analogy, we can consider SHELL SCRIPTING, which is really just a more structured command-line. > 2. Middleware--2 storage options: > a. Provide option for XML storage of an "internal XML format." If a user > has a need for more complicated data-handling (as I described in my > questions above), they can utilize this option to place things in an > internal XML database and then use the XML warehouse kind of stuff I > described in point 3 in my last e-mail. > b. Provide an option for permanent storage with relational databases (ie. > MySQL, PostgreSQL, Sybase ...), so that the data can be available after > Loci has quit. > > The middleware would handle the connections between the Loci front-end, > which asks for a database or internal format, and the back-end, which > provides it. I think you're suggesting a generic XML database as an 'internal database', which can handle any processable data that are marked up. I like the idea, providing it is very generic. But included in this list of middleware should be the mechanism (database?) for knowing what locus is connected to what...basically handling all of the workflow data...and a parser/interpreter You mentioned this before. > 3. Back-end: All of the databases themselves. All the programs, data, converters...everything called a 'locus'. > If this sounds like a plan, then I would like to humbly propose an > immediate development focus: Get the piping stuff working with the Loci > front-end so that we can do something like the following: 1. Input a > sequence in FASTA format 2. Convert it to a new format 3. View it in a > sequence viewer. This type of activity would not require any storage > options, so this would simplify things. In addition, Jeff has the GUI > set-up to make the connections, so we are currently able to construct this > kind of workflow diagram. I think reaching this kind of short term goal > would be extremely exciting as Loci would actually "do" something and would > provide us with a base for further development. How does this sound? Anyone > for this? Hip-hip-hooray? Booooo? Whatta you think? As a focus or goal, this sounds good. It doesn't say how we'll get there. But I never mentioned what the simplest senario for running Loci should be. > Also, I hope I don't step on any toes by making > a development direction suggestion. I just want to get an idea of the short > and long term goals of Loci and kind of find my place somewhere in there so > I can have Loci working for my thesis project needs. No problem. I still owe you a TODO list. I worked on one, and I will pass it to Gary for some comments before making it official. If anyone else wants to see the unofficial version so that they can comment on it, mail me directly. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From gvd at redpoll.pharmacy.ualberta.ca Wed Dec 8 10:48:47 1999 From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Conglomerate References: <384C0A1B.9F24D3A5@geoserve.net> Message-ID: <384E7DDF.F5CD8064@redpoll.pharmacy.ualberta.ca> "J.W. Bizzaro" wrote: > > This may be interesting to us in more than one way: > > http://www.conglomerate.org/ "Conglomerate" is a structured document authoring application with an intuitive interface. I haven't downloaded a copy, but this does look like a viable alternative to docbook. I'll check it out soon, and would like to have the honourable Dr. Lapointe take a look at it as well. The issues for me, in terms of using conglomerate for writing documentation is that their software is new and probably not widely adopted. Their own documentation is sparse, which is ironic considering that they are writing a structured document development interface. Their general news web archive is only month old, and their development web archive link is broken. On the other hnad, it does use XML, can produce multiple output (HTML, TeX) from a single source, uses XML, and provides a very nice interface for authoring structured docs. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Gary Van Domselaar gvd@redpoll.pharmacy.ualberta.ca Faculty of Pharmacy Phone: (780) 492-4493 University of Alberta FAX: (780) 492-5305 Edmonton, Alberta, Canada http://redpoll.pharmacy.ualberta.ca/~gvd From toneman at phil.uu.nl Wed Dec 8 11:01:48 1999 From: toneman at phil.uu.nl (Michiel Toneman) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Conglomerate In-Reply-To: <384E7DDF.F5CD8064@redpoll.pharmacy.ualberta.ca> Message-ID: On Wed, 8 Dec 1999, Gary Van Domselaar wrote: > "J.W. Bizzaro" wrote: > > > > This may be interesting to us in more than one way: > > > > http://www.conglomerate.org/ > > "Conglomerate" is a structured document authoring application with an > intuitive interface. > > I haven't downloaded a copy, but this does look like a viable > alternative to docbook. I'll check it out soon, and would like to have > the honourable Dr. Lapointe take a look at it as well. > > The issues for me, in terms of using conglomerate for writing > documentation is that their software is new and probably not widely > adopted. Their own documentation is sparse, which is ironic considering > that they are writing a structured document development interface. > Their general news web archive is only month old, and their development > web archive link is broken. On the other hnad, it does use XML, can > produce multiple output (HTML, TeX) from a single source, uses XML, and > provides a very nice interface for authoring structured docs. > > I think you will see much progress on Conglomerate, because when it was announced on the Gnome Notices (Gnotices, see http://www.gnome.org/) it got a warm welcome. I think there is much motivation to make this a "killer app" for gnome. Greetings, Michiel Toneman -- I wish there was a knob on the TV to turn up the intelligence. There's a knob called "brightness", but it doesn't work. From dlapointe at mediaone.net Wed Dec 8 22:04:42 1999 From: dlapointe at mediaone.net (David Lapointe) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Conglomerate In-Reply-To: <384C0A1B.9F24D3A5@geoserve.net> References: <384C0A1B.9F24D3A5@geoserve.net> Message-ID: <99120822122500.00658@gnomen> On Mon, 06 Dec 1999, J.W. Bizzaro wrote: > This may be interesting to us in more than one way: > > http://www.conglomerate.org/ That's a very interesting application. It would ( or anything for that matter ) be great if it could read DTD's and generate the proper tags ( withing the hierarchy ) automagically. However, I believe we have to wait for the availablity of conglomerate. -- .david David Lapointe "There are two things that are infinite; Human stupidity and the universe. And I'm not sure about the universe." - Albert Einstein From bizzaro at geoserve.net Thu Dec 9 16:07:21 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Directory-XML Message-ID: <38501A09.6D91978C@geoserve.net> DSML may be useful for Loci's directory service ('hub'): http://www.internetwk.com/story/INW19991207S0007 BTW, I sent Gary the TODO list for review. Also, fixes to the Workspace bugs, found by Brad, have been added to the CVS. Cheers. Jeff From bizzaro at geoserve.net Thu Dec 16 19:40:47 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Gnome stuff Message-ID: <3859868F.E4E3FCA2@geoserve.net> Locians, If you're tracking Gnome development (as I am), you might find this interview with the developers very interesting: http://news.gnome.org/gnome-news/945331082/index_html Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From gvd at redpoll.pharmacy.ualberta.ca Thu Dec 16 20:10:20 1999 From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Gnome stuff In-Reply-To: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am Message-ID: <199912170110.SAA25755@redpoll.pharmacy.ualberta.ca> > > Locians, > > If you're tracking Gnome development (as I am), you might find this interview > with the developers very interesting: > > http://news.gnome.org/gnome-news/945331082/index_html Jeff, Thanks for pointing this interview out. There is a lot of discussion relevant to the Loci project, considering our heavy reliance on the GNOME application development environment. Most relevant to me are the discussions related to Conglomerate, the structured text that we reviewed recently. It looks like the GNOME team wants very much to use Conglomerate as the front end to write DocBook documents. If we can use Conglomerate to write our DocBook documents, then we _definitely_ should. I'm gonna give it a try right away. I strongly suggest all Locians review this interview. From dlapointe at mediaone.net Thu Dec 16 20:28:54 1999 From: dlapointe at mediaone.net (David Lapointe) Date: Fri Feb 10 19:19:01 2006 Subject: [Pipet Devel] Re: New Python Book Message-ID: <99121620304000.00623@gnomen> The Beasley book is very impressive. It's nicely organized as a reference book, with examples. -- .david David Lapointe It is good to have an end to journey toward; but it is the journey that matters, in the end.--Ursula K. Le Guin From bizzaro at geoserve.net Thu Dec 16 21:32:49 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] on a personal note Message-ID: <3859A0D1.7E102232@geoserve.net> Greetings fellow Lab Rats! I would like to announce that I have earned my "Master of Science in Chemistry/Biochemistry" degree from Boston College. This comes with my completion of the degree requirements this fall semester. My future plans are centered entirely around this organization (The Open Lab) and The Loci Project. I will be entering the Doctoral Biochemistry Program at the University of Massachusetts Lowell, where operations have been based since the organization's inception. Yes, this is where I earned my undergraduate degree in chemistry, but it is most important to me that I am able to continue my work with The Open Lab, and I believe I would not have that luxury anywhere else. Bigger and better: I am working with our advisors (Ken Marx, Rob Harrison and David Lapointe) and administrators (Gary Van Domselaar, Pete St. Onge and Mark Luo) to expand and improve services at The Open Lab. Each expansion or improvement will be announced as it is made, but I can give you a hint as to our plans: (1) donation and purchase of more computing hardware, including some Big Iron to serve Loci applications; (2) change of name to include "BIOINFORMATICS.ORG", which we will soon be able to use as a domain; (3) continued development of the bioinformatics portal, which started with the "Bioinformatics GNU's" news list; and (4) grant awards and corporate sponsorship. I would like to thank the advisors, administrators, project coordinators (Justin Bradford, SooHaeng Yoo, Thomas Sicheritz, Rick Ree and Carlos Maltzahn), and everyone else, ALL volunteers of their time and effort to make this organization successful. I think we have a bright future ahead! Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Thu Dec 16 23:04:20 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] TODO! Message-ID: <3859B644.7BBE8A46@geoserve.net> Locians, We finally have a TODO list (attached). There are many projects within Loci and sub-projects within those. So, I listed them all out in outline format with a very brief description under each project. Each project (and most sub-projects) needs a 'project leader', and some leaders are named BY MY ASSUMPTION (please confirm). Where no leaders are identified, you will see a '???'. This is where we need YOUR help! Any suggestions/additions for this list are of course welcome. Jeff -------------- next part -------------- THE LOCI PROJECT; TODO 19991216 LOCI PROJECT LEADER: J.W. Bizzaro ASSISTANT: Gary Van Domselaar I. CORE WORKSPACE PROJECT LEADER: J.W. Bizzaro Part of loci-core. This project covers the entire GUI for Loci and the implementation of GUI extentions. A. GUI construction via XML B. Dynamic menu generation C. Themes D. CORBA integration E. Bonobo integration II. CORE SCRIPTING LANGUAGE (XML-based) PROJECT LEADER: ??? Part of loci-core. Once a graphical script is generated by the user, via the Workspace, it can be executed. The graphical script will therefore need to be represented in text (XML) and executed by an interpreter. A. Language definition B. Interpreter III. CORE DATABASE CONNECTIVITY PROJECT LEADER: Brad Chapman? Part of loci-core. The line is blurred between what is a 'real' database being used by Loci and just about everything else. A. Representation of filesystem as containers B. Representation of databases as containers IV. CORE DIRECTORY SERVICES (formerly called 'hub') PROJECT LEADER: ??? Part of loci-core. Akin to domain name serving, a world-wide registry needs to be made containing what loci are available where. Each copy of Loci will in fact have the ability to contact others to find out what is _pulicly_ available there. All copies of Loci should register their available loci with a central registry too. V. CORE UTILITIES PROJECT LEADER: ??? Part of loci-core. This includes helper applications that are external to Loci. What would be interesting is finding a way to run these as loci. 1. Installation Manager SUB-PROJECT LEADER: ??? 2. User Preferences Configuration SUB-PROJECT LEADER: ??? VI. PYTHON BINDINGS PROJECT LEADER: Justin Bradford Since most/all of Loci's core is written in Python and uses Gnome libraries, several bindings are needed. A. GTK/GNOME These already exist, thanks to James Henstridge. B. ORBit C. Bonobo VII. WEB INTERFACE (loci-web) PROJECT LEADER: David Lapointe? This would replace the Workspace and allow a limited number of loci to run via Web browser. VIII. CORE WRAPPERS AND EXTENSIONS PROJECT LEADER: J.W. Bizzaro Part of loci-core. These are basic loci that come with each copy of Loci. A. Locus output to command-line SUB-PROJECT LEADER: J.W. Bizzaro B. Locus input from command-line/stdout SUB-PROJECT LEADER: Thomas Junier? C. Generic XML database SUB-PROJECT LEADER: Brad Chapman? IX. BIOINFORMATICS WRAPPERS AND EXTENSIONS (loci-bio) PROJECT LEADER: ??? These are loci for basic bioinformatics research. A. Bioinformatics XML and Converters ('internal format') SUB-PROJECT LEADER: Humberto Otiz Zuazaga? B. Misc. Converters SUB-PROJECT LEADER: ??? 1. GenBank to Raw Sequence X. EMBOSS WRAPPERS (loci-emboss) PROJECT LEADER: David Lapointe These are loci for running EMBOSS under Loci. XI. DOCUMENTATION PROJECT LEADER: Gary Van Domselaar ASSISTANT: David Lapointe? From chapmanb at arches.uga.edu Sat Dec 18 01:29:09 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff In-Reply-To: <199912170110.SAA25755@redpoll.pharmacy.ualberta.ca> References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am Message-ID: Oh great locians; Jeff and Gary--thanks much for mentioning this interview. As I read throught it with the brand new ToDo list in hand, I had a couple of questions about what is relevant/not relevant to Loci: 1. Bonobo: What are everyone's thoughts on the reliance of Loci on Bonobo? It sounds like, if I read the description correctly, Bonobo implements a wrapper around CORBA which allows linking of multiple objects or, to quote from the interview: 'Think "multi-directional pipes".' Is the plan for implementing Loci to make it a wrapper around Bonobo so that we have: (((Loci ((Bonobo (CORBA/ORBit) Bonobo)) Loci))) or rather, a wrapper around a wrapper around ORBit? I guess this falls into ID. and IE. in the ToDo outline: CORBA and Bonobo integration (well, and also VIC. python bindings for bonobo!) 2. GConf: This is described in the interview as "an API for storing configuration data...for now just XML text files." Is this something that can be utilized for storing the core scripting language described in II. of the ToDo? 3. The as-yet-unamed replacement for the GMC file manager: According to the interview this new manager "..is designed to be able to plug in Bonobo components so that you can install viewers for different types of files or different file systems altogether." Is this something that we should investigate for representing the Loci file system (ie. IIIA.) or am I totally off in thinking it does a simiar thing to what we need for managing files? Is there more stuff in there that could be useful to us? How about other gnome stuff that I haven't mentioned here? I guess I am not completely clear on how much Loci will be integrated into the GNOME project so if anyone could "throw me a friggin' bone" on this, I would be quite appreciative! Along these lines, if we are going to be using a lot of gnome libraries/programs, do you think it would be worthwhile to keep a listing of "useful gnome stuff" or something along those lines, to make it easier to dig into the gnome api's? Also, maybe this way we could divide up the process of understanding different parts of gnome and thus making the learning curve for diving into it a little less steep... (at least for me!) Brad From chapmanb at arches.uga.edu Sat Dec 18 01:58:34 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Random thoughts on application wrapping Message-ID: Hello all! I was messing around with CORBA and trying to make myself a little piped program that called an already existing program when I got hard into thinking about how to wrap applications so that they can be run within the python framework of Loci. So I was just wondering, what is the proposed mechanism for taking an existing program (say the dnacomp program of phylip, written in c) and allowing it to be called from a loci script. I could come up with two possible ways to do this: 1. The Applab way: Applab, a java application wrapper for CORBA (which has been mentioned on the list several times) does the following to incorporate a program: a. has an IDL interface for controlling and running outside apps (ie. our dnacomp program) b. requires the construction of a meta-data file describing the interface. c. parses (using a perl script) the meta-data file into java code which fits into the server side implementation and wraps the program. 2. The other way I could think of: This way would be to generate a wrapper for each individual program based on its language and the methods that are avaiable to do that. For instance: a. we could wrap C and C++ programs using SWIG (http://www.swig.org) b. we could deal with Java programs by using JPython to input their classes and then do scripting between them. c. we could deal with Perl/Tcl programs by using Minotaur (http://mini.net/pub/ts2/minotaur.html) to imbed perl and tcl scripts into python classes and then run them from there. Either way has pluses and minuses. I think the first way is nice because it allows a consistent method to "port" a program to Loci. However, unless we decided to use the applab language and/or parser, we would have to describe our own input language and then design a parser to deal with it. The second way uses already exciting programs, but it a lot uglier because it is different for every app ported and thus makes the porting process very difficult for a non-programming-user of Loci. Is any of these two ways I mentioned about something anyone else was thinking for wrapping programs so they can run in Loci? Or have I completely forgotten an obvious way. Thanks in advance for any help anyone can provide on this dillemna of mine! Brad From bizzaro at geoserve.net Sat Dec 18 20:01:40 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am Message-ID: <385C2E74.CF6F6056@geoserve.net> Brad Chapman wrote: > > 1. Bonobo: What are everyone's thoughts on the reliance of Loci on Bonobo? > It sounds like, if I read the description correctly, Bonobo implements a > wrapper around CORBA which allows linking of multiple objects or, to quote > from the interview: 'Think "multi-directional pipes".' Is the plan for > implementing Loci to make it a wrapper around Bonobo so that we have: > > (((Loci ((Bonobo (CORBA/ORBit) Bonobo)) Loci))) > > or rather, a wrapper around a wrapper around ORBit? I guess this falls into > ID. and IE. in the ToDo outline: CORBA and Bonobo integration (well, and > also VIC. python bindings for bonobo!) Well, if you consider the relationship between a program and its library to be the same as a wrapper and its 'wrappee', then yes...sort of. Bonobo is just one of the libraries we're using, and we're using it to... (1) Include non-python GUIs in the Workspace (2) Include Loci's GUI in other apps, if there is a need to do so > 2. GConf: This is described in the interview as "an API for storing > configuration data...for now just XML text files." Is this something that > can be utilized for storing the core scripting language described in II. of > the ToDo? Hmmmm. I understood GConf to be akin to the Windows Registry. I can't imagine trying to put all of our XML into it, especially if the script XML includes the GUI XML, etc. > 3. The as-yet-unamed replacement for the GMC file manager: GFM: Gnome File Manager, right? That's what I had seen. > According to the > interview this new manager "..is designed to be able to plug in Bonobo > components so that you can install viewers for different types of files or > different file systems altogether." Is this something that we should > investigate for representing the Loci file system (ie. IIIA.) or am I > totally off in thinking it does a simiar thing to what we need for managing > files? GFM is, disappointingly, a clone of Windows Explorer running with the Active Desktop. So, the viewers in GFM are 'just' giving you a preview/thumbnail of the file in one corner of GFM's window. I never really thought that Loci's 'file system', or the way files and directories are shown, would provide previews or thumbnails. It's an interesting idea that we can pursue later. But for now, I'd like to see each directory on the file system be represented as a 'container locus'. If you double-click on such a container, you get a windowlet just like any other locus. But a container's windowlet is a 'list' widget that lists the contents of the directory: +------+ | cont | <----- icon | ainer| +------+ +---------------+ | file | | file | | file | <------ windowlet | container | | file | +---------------+ Since the contents are either files or directories, and these are automatically simple loci, a container is (by definition) a locus that contains other loci. And some of these loci inside of said container are directories, which are again containers, so we have the directory heirarchy represented as loci inside of loci ad infinitum (or containers inside of containers...). For now the 'list' widget just gives the names and icons of the files and directories (loci) held by the container. And you can drag-and-drop loci to-and-from the container's list and Workspace! We need to start off with a sufficiently high level container/directory, say the system's root directory (?) > Is there more stuff in there that could be useful to us? How about > other gnome stuff that I haven't mentioned here? I guess I am not > completely clear on how much Loci will be integrated into the GNOME project > so if anyone could "throw me a friggin' bone" on this, I would be quite > appreciative! I wouldn't say that Loci is being integrated into Gnome, although the thought of having Loci serve as the desktop for Gnome has surfaced recently (I think it's unlikely to happen). Gnome rather serves as some development tools for Loci. And communication with other Gnome applications can be facilitated via CORBA and Bonobo. > Along these lines, if we are going to be using a lot of gnome > libraries/programs, do you think it would be worthwhile to keep a listing > of "useful gnome stuff" or something along those lines, to make it easier > to dig into the gnome api's? Also, maybe this way we could divide up the > process of understanding different parts of gnome and thus making the > learning curve for diving into it a little less steep... (at least for me!) For now, I'm sure we're using... gnome-libs (what gnome-python wraps) bonobo orbit These are 3 distinct packages/parts to Gnome. There is no need to look beyond these, so our use of Gnome is less confusing than you may be thinking. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Sat Dec 18 20:24:47 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Random thoughts on application wrapping References: Message-ID: <385C33DF.FF079473@geoserve.net> Brad Chapman wrote: > > I was messing around with CORBA and trying to make myself a little > piped program that called an already existing program when I got hard into > thinking about how to wrap applications so that they can be run within the > python framework of Loci. So I was just wondering, what is the proposed > mechanism for taking an existing program (say the dnacomp program of > phylip, written in c) and allowing it to be called from a loci script. I > could come up with two possible ways to do this: If it runs from the command-line and is non-interactive, the default (although somewhat sloppy perhaps) method is the one I outlined in the message 'constructing the command-line'. Otherwise, we should have a SET of tools available for wrapping proggies FROM THE WORKSPACE. > 1. The Applab way: Applab, a java application wrapper for CORBA (which has > been mentioned on the list several times) does the following to incorporate > a program: > a. has an IDL interface for controlling and running outside apps (ie. > our dnacomp program) > b. requires the construction of a meta-data file describing the interface. > c. parses (using a perl script) the meta-data file into java code which fits > into the server side implementation and wraps the program. For more sophisticated wrappings, we want to use the AppLab approach. Since nothing has really been finalized about just how we will use CORBA, we will simply copy AppLab. SEView, by Thomas Junier who is on this list, will also give us some ideas about converting text output into graphical presentations: http://www.bioinfo.de/isb/1998/01/0003/ > 2. The other way I could think of: This way would be to generate a wrapper > for each individual program based on its language and the methods that are > avaiable to do that. For instance: > a. we could wrap C and C++ programs using SWIG (http://www.swig.org) > b. we could deal with Java programs by using JPython to input their classes > and then do scripting between them. > c. we could deal with Perl/Tcl programs by using Minotaur > (http://mini.net/pub/ts2/minotaur.html) to imbed perl and tcl scripts into > python classes and then run them from there. Again, the person who wraps the program must be able to choose what will work best, and this will mean having many options. For example, David Lapointe will be working on a method to wrap EMBOSS apps. It will be generic enough to handle any EMBOSS application but will still be a wrapping solution for EMBOSS alone. > Either way has pluses and minuses. I think the first way is nice because it > allows a consistent method to "port" a program to Loci. However, unless we > decided to use the applab language and/or parser, we would have to describe > our own input language and then design a parser to deal with it. The second > way uses already exciting programs, but it a lot uglier because it is > different for every app ported and thus makes the porting process very > difficult for a non-programming-user of Loci. > Is any of these two ways I mentioned about something anyone else > was thinking for wrapping programs so they can run in Loci? Or have I > completely forgotten an obvious way. Thanks in advance for any help anyone > can provide on this dillemna of mine! Perhaps all of these should be pursued. This is the sort of 'middleware' that doesn't affect the 'front-endware' (Workspace) and is not specific to any one 'back-endware' application. Like almost everthing else in Loci, wrapping solutions are plug-ins/loci. Bottom line: let's start with AppLab's approach and then look into the others a little later. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From chapmanb at arches.uga.edu Sun Dec 19 03:13:36 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff In-Reply-To: <385C2E74.CF6F6056@geoserve.net> References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am Message-ID: > >Well, if you consider the relationship between a program and its library to be >the same as a wrapper and its 'wrappee', then yes...sort of. Bonobo is just >one of the libraries we're using, and we're using it to... > > (1) Include non-python GUIs in the Workspace > (2) Include Loci's GUI in other apps, if there is a need to do so > Aaaaa, gotcha. Sorry, I was thinking of bonobo as being more than it actually was (I was thinking of it as covering CORBA, so that you make all of your interfaces through bonobo, rather than CORBA). I don't consider a program:library relationship to be the same as a wrapper:wrappee relationship at all! So, bonobo is just for embedding a component of one program inside the container of another program. Okay. > >Hmmmm. I understood GConf to be akin to the Windows Registry. I can't >imagine trying to put all of our XML into it, especially if the script XML >includes the GUI XML, etc. > Ack. Windows Registry. Bad! Bad! I'll forget I even mentioned GConf! > >GFM is, disappointingly, a clone of Windows Explorer running with the Active >Desktop. So, the viewers in GFM are 'just' giving you a preview/thumbnail of >the file in one corner of GFM's window. >>I never really thought that Loci's 'file system', or the way files and >directories are shown, would provide previews or thumbnails. It's an >interesting idea that we can pursue later. Okee-dokee. I agree, I don't think it's really necessary to have previews now. >But for now, I'd like to see each >directory on the file system be represented as a 'container locus'. If you >double-click on such a container, you get a windowlet just like any other >locus. But a container's windowlet is a 'list' widget that lists the contents >of the directory: > > +------+ > | cont | <----- icon > | ainer| > +------+ > +---------------+ > | file | > | file | > | file | <------ windowlet > | container | > | file | > +---------------+ > Okay--will this be stored directly in the XML representing the workspace graphical script? For instance if we had a container representation like the following: +-------+ | my seq| <----- icon | files | +-------+ +-------------------+ | gb file 1 | | gb file 2 | | | <------ windowlet | fasta_files | | phylip_files | +-------------------+ where my_seq_files is a directory I store all of my sequence files in, gb files 1 and 2 are just genbank formatted files, and fasta_files and phylip_files are directories with fasta and phylip files, respectively (sorry, I should be thinking up physics examples instead of bioinformatics examples!). Then, if we represent this in the XML script (assuming that this container is located on the main workspace) as: loci_root /usr/local/loci/workspace my_seq_files /usr/home/chapmanb/my_seq_files gb_file_1.gb gb_file_2.gb fasta_files ...the contents of the directory phylip_files ...the contents of the directory If /usr/local/loci/workspace is where everything is analagous to the root directory in a web server, this is where the "Loci filesystem" starts. Then if a user double clicks on the my_seq_files container icon, we would go through the XML to look for my_seq_files and find it at loci_root.my_seq_files, which would by default be located at /usr/local/loci/workspace/my_seq_files. In this example, I figured that the user would probably have their sequence files located in some home directory and not on the loci filesystem (/usr/home/chapmanb/my_seq_files, in this example). So then we have something analagous to a symbolic link, with /usr/local/loci/workspace/my_seq_files -> /usr/home/chapmanb/my_seq_files. So my idea here is that the location of a file or directory would be with respect to the loci_root directory unless there is a tag directly specifying to look elsewhere. Anyways, is this along the lines of what people were thinking for the representation? I haven't really said anything about how to actually generate this kind of XML, but I just wanted to make sure I was on the right track! > >For now the 'list' widget just gives the names and icons of the files and >directories (loci) held by the container. And you can drag-and-drop loci >to-and-from the container's list and Workspace! > Okay, so if in the above example I wanted to move gb_file_1.gb from the container to the main workspace, I would drag it into the workspace and in the real directory structure, the file would move from /usr/home/chapmanb/my_seq_files/gb_file_1.gb to /usr/local/loci/workspace/gb_file_1.gb? Do you want the file to actually move, or just to create a link to the file from inside the /usr/local/loci/workspace directory system? > >For now, I'm sure we're using... > > gnome-libs (what gnome-python wraps) > bonobo > orbit > >These are 3 distinct packages/parts to Gnome. There is no need to look beyond >these, so our use of Gnome is less confusing than you may be thinking. Okay, that is what I was originally thinking, then I started to confuse myself by thinking about all of these other gnome programs etc. Thanks for clarifying! Brad From bizzaro at geoserve.net Sun Dec 19 12:12:27 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am Message-ID: <385D11FB.364D9A9A@geoserve.net> Brad Chapman wrote: > > So, bonobo is just for embedding a component of one > program inside the container of another program. Okay. Right, and I think only GUI components. > Okay--will this be stored directly in the XML representing the workspace > graphical script? For instance if we had a container representation like > the following: [cut] > /usr/local/loci/workspace But since the network is transparent to Loci, all s have a URI. So, the above location would be something like this: locus://localhost/ where /usr/local/loci/ Is the root directory for Loci. ***JUST USE APACHE AS AN EXAMPLE*** /home/httpd/html/ Is the root directory for Apache Web pages (on RedHat), and this is given the URI/URL http://localhost/ Note that Apache can access _local_ directories using the same URL mechanism for _remote_ access. THIS IS HOW LOCI WILL WORK. Perhaps Loci's root should be /home/loci/ (I can see we'll get some arguments about this from BSD users :-)) > > my_seq_files > /usr/home/chapmanb/my_seq_files > > > gb_file_1.gb > > > > gb_file_2.gb > > > > fasta_files > ...the contents of the directory > > > > phylip_files > ...the contents of the directory > > > Right. When the container locus is made, all of the XML is generated, including that of the windowlet contents, which in the case of a container, is the _directory_ contents. > If /usr/local/loci/workspace is where everything is analagous to the root > directory in a web server, this is where the "Loci filesystem" starts. You're correct that Loci should not have access to '/'. This would be a security problem, especially when a container can point to (via URI) the filesystem of a remote computer. Maybe we should have 2 branches under Loci's root directory: /home/loci/public/ /home/loci/private/ Remote Loci can then only access what is in the public directory. > Then > if a user double clicks on the my_seq_files container icon, we would go > through the XML to look for my_seq_files and find it at > loci_root.my_seq_files, which would by default be located at > /usr/local/loci/workspace/my_seq_files. Yes. > In this example, I figured that the > user would probably have their sequence files located in some home > directory and not on the loci filesystem (/usr/home/chapmanb/my_seq_files, > in this example). So then we have something analagous to a symbolic link, > with /usr/local/loci/workspace/my_seq_files -> > /usr/home/chapmanb/my_seq_files. So my idea here is that the location of a > file or directory would be with respect to the loci_root directory unless > there is a tag directly specifying to look elsewhere. Good idea. You get a star. Regarding a user accessing things in his home directory, and even making some things public, we can do what Apache does and have locus://bradcom.com/~brad/ point to /home/brad/loci/ and you would have /home/brad/loci/public/ /home/brad/loci/private/ /home/brad/loci/workspace/ too. > Anyways, is this along the lines of what people were thinking for > the representation? I haven't really said anything about how to actually > generate this kind of XML, but I just wanted to make sure I was on the > right track! You are correct, sir! > Okay, so if in the above example I wanted to move gb_file_1.gb from the > container to the main workspace, I would drag it into the workspace and in > the real directory structure, the file would move from > /usr/home/chapmanb/my_seq_files/gb_file_1.gb to > /usr/local/loci/workspace/gb_file_1.gb? Do you want the file to actually > move, or just to create a link to the file from inside the > /usr/local/loci/workspace directory system? Hmmmm. I suppose the user can either 'copy' or 'move' something onto/from the Workspace (and other areas) just like using a file manager (copy means the original stays, and move means the original is deleted). But I don't think the user should be able to 'move' a file to/from a _remote_ system, so only copying would be allowed in such a case. I'd say that by default a DnD would move a locus, unless it is remote (not on the local filesystem). Keep in mind though, that when the user manipulates loci, he/she is only manipulating an XML _representation_ of something that can exist anywhere on the Internet (and is _always_ referenced to via URI). Since that XML representation should be small (about the size of a typical Web page), the transfer of it should be trivial. So, I wouldn't create symlinks but just copy or move the XML representations. The transfer of the actual program or data that the locus represents is another case altogether. I think this can be handled (in a GUI sense) via pop-up menu option and not DnD. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Sun Dec 19 13:49:45 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am <385D11FB.364D9A9A@geoserve.net> Message-ID: <385D28C9.BB3CBCF2@geoserve.net> "J.W. Bizzaro" wrote: > > filesystem of a remote computer. Maybe we should have 2 branches under Loci's > root directory: > > /home/loci/public/ > /home/loci/private/ I changed my mind. I think to facilitate the use of Loci as a shell, 'private' access can be from _anywhere_ on the local filesystem. Public access would be from /home/loci/public/ or /home/username/loci/public/ Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Sun Dec 19 15:19:01 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Loci as a locus Message-ID: <385D3DB5.6AA76839@geoserve.net> Wise and mighty Locians, I haven't mentioned this before, but the thought came a while back about embedding a copy of Loci within Loci so that it runs as a locus. Where did this idea come from? Well, I was thinking about what would happen if you made a Workflow Diagram or graphical script where some outputs were left unspecified (little dots not connected). Loci should then send the outputs to stdout, right? Then I realized the same would apply to unspecified inputs: They should come from stdin. Or maybe, since we could have multiple connectors unconnected, you could specify on THE COMMAND-LINE, what to do with them: $ loci -i1 -i2 -o1 So, hmmm, if Loci can run like this from the command-line, maybe Loci too can be wrapped to run inside of Loci! What's the use of this? I'm thinking along the line of a composite locus. Since you can put parts of a WFD/script inside of composite locus (note that a composite locus differs from a container locus in that with the former, the connections/workflow are preserved), you should be able to view them (in a windowlet) as being in their own Workspace. And, if you look at the unconnected dots/lines (connectors) on the composite's Workspace, they should match the dots/lines on the composite's icon. (We'll probably then need a way to dynamically add and remove dots/lines (connectors) from an icon (locus). This also gives Loci a 'workspace in a workspace' functionality like that of AVS: http://www.avs.com/products/expdev/images/NE.GIF I just thought I'd fill you in on this new feature. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Sun Dec 19 15:40:19 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Loci as a locus References: <385D3DB5.6AA76839@geoserve.net> Message-ID: <385D42B3.497A76CD@geoserve.net> "J.W. Bizzaro" wrote: > > Since you can put parts of a WFD/script inside of composite locus (note that a > composite locus differs from a container locus in that with the former, the > connections/workflow are preserved), you should be able to view them (in a > windowlet) as being in their own Workspace. IOW, a composite locus is an instance of Loci. But visa versa is true: Loci is a composite locus. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From gvd at redpoll.pharmacy.ualberta.ca Sun Dec 19 16:51:36 1999 From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am <385D11FB.364D9A9A@geoserve.net> Message-ID: <385D5368.D998F7AC@redpoll.pharmacy.ualberta.ca> > > Regarding a user accessing things in his home directory, and even making some > things public, we can do what Apache does and have > > locus://bradcom.com/~brad/ > > point to > > /home/brad/loci/ > > and you would have > > /home/brad/loci/public/ > /home/brad/loci/private/ > /home/brad/loci/workspace/ > > too. If we were to follow the apache example, we would not specify a public and private directory explicitly, but rather use an authentication procedure (like apache's .htaccess) to create private (or perhaps 'restricted') directories from publically accessible ones. So /home/brad/loci/public_loci/ //unrestricted access, network viewable /home/brad/loci/public_loci/germ_warfare/ //restricted access, network viewable Of course, like apache, there's nothing stopping you from _making_ a separate directory to contain your private files /home/brad/loci/private_loci/ //completely private, network hidden > > > Anyways, is this along the lines of what people were thinking for > > the representation? I haven't really said anything about how to actually > > generate this kind of XML, but I just wanted to make sure I was on the > > right track! > > You are correct, sir! > > > Okay, so if in the above example I wanted to move gb_file_1.gb from the > > container to the main workspace, I would drag it into the workspace and in > > the real directory structure, the file would move from > > /usr/home/chapmanb/my_seq_files/gb_file_1.gb to > > /usr/local/loci/workspace/gb_file_1.gb? Do you want the file to actually > > move, or just to create a link to the file from inside the > > /usr/local/loci/workspace directory system? > > Hmmmm. I suppose the user can either 'copy' or 'move' something onto/from the > Workspace (and other areas) just like using a file manager (copy means the > original stays, and move means the original is deleted). But I don't think > the user should be able to 'move' a file to/from a _remote_ system, so only > copying would be allowed in such a case. I'd say that by default a DnD would > move a locus, unless it is remote (not on the local filesystem). > > Keep in mind though, that when the user manipulates loci, he/she is only > manipulating an XML _representation_ of something that can exist anywhere on > the Internet (and is _always_ referenced to via URI). Since that XML > representation should be small (about the size of a typical Web page), the > transfer of it should be trivial. So, I wouldn't create symlinks but just > copy or move the XML representations. > > The transfer of the actual program or data that the locus represents is > another case altogether. I think this can be handled (in a GUI sense) via > pop-up menu option and not DnD. For DnD, you may want to consider providing the user with option to do a move, copy, or symbolic link, via pop-up menu, in direct analogy to right-button DnD in Windoze. gary -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Gary Van Domselaar gvd@redpoll.pharmacy.ualberta.ca Faculty of Pharmacy Phone: (780) 492-4493 University of Alberta FAX: (780) 492-5305 Edmonton, Alberta, Canada http://redpoll.pharmacy.ualberta.ca/~gvd From bizzaro at geoserve.net Sun Dec 19 18:41:40 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am <385D11FB.364D9A9A@geoserve.net> <385D5368.D998F7AC@redpoll.pharmacy.ualberta.ca> Message-ID: <385D6D34.327CC401@geoserve.net> Gary Van Domselaar wrote: > > If we were to follow the apache example, we would not specify a public > and private directory explicitly, but rather use an authentication > procedure (like apache's .htaccess) to create private (or perhaps > 'restricted') directories from publically accessible ones. So > > /home/brad/loci/public_loci/ //unrestricted access, network > viewable Is this directory _automatically_ an unrestricted area? Like I was saying in my follow-up message, we probably just need some loci/public/ directories as security 'sandboxes'. > /home/brad/loci/public_loci/germ_warfare/ //restricted access, network > viewable So, we can have a '.access' file that will cause Loci to ask for a login? I like that. > Of course, like apache, there's nothing stopping you from _making_ a > separate directory to contain your private files > > /home/brad/loci/private_loci/ //completely private, network hidden Of course, EVERYTHING outside of loci/public/ should be private. You can make a loci/private directory, but it won't be any different from any non- loci/public/ directory. IOW, it wouldn't be neccessary. I wonder how this 'Apache approach' meshes with CORBA. CORBA has its own security protocols, right? Would anyone in-the-know care to comment on this? > > The transfer of the actual program or data that the locus represents is > > another case altogether. I think this can be handled (in a GUI sense) via > > pop-up menu option and not DnD. > > For DnD, you may want to consider providing the user with option to do a > move, copy, or symbolic link, via pop-up menu, in direct analogy to > right-button DnD in Windoze. So, a button3 DnD would bring up a dialog. Button1 DnD would by default move a locus if source and destination are both local. Button1 DnD would by default copy a locus if either source or destination (or both) are remote. (This is typically how inter-filesystem transfers work on the Mac and Windows.) What about _writing_ to a _remote_ container? If I do a DnD from my local Workspace to a remote container, should I have write permissions? This might be a good mechanism for 'sharing loci'. This certainly would require a login of some sort. So, should a .access file be required for any remote writing to a filesystem? Or should 'writers' have a shell account, as we have CVS set up (I think you can give CVS write access to someone who doesn't have a shell account)? Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From gvd at redpoll.pharmacy.ualberta.ca Sun Dec 19 20:50:02 1999 From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Loci as a locus References: <385D3DB5.6AA76839@geoserve.net> Message-ID: <385D8B4A.4870DBE7@redpoll.pharmacy.ualberta.ca> "J.W. Bizzaro" wrote: > > Wise and mighty Locians, > > I haven't mentioned this before, but the thought came a while back about > embedding a copy of Loci within Loci so that it runs as a locus. > > Where did this idea come from? Well, I was thinking about what would happen > if you made a Workflow Diagram or graphical script where some outputs were > left unspecified (little dots not connected). Loci should then send the > outputs to stdout, right? Then I realized the same would apply to unspecified > inputs: They should come from stdin. Or maybe, since we could have multiple > connectors unconnected, you could specify on THE COMMAND-LINE, what to do with > them: > > $ loci -i1 -i2 -o1 > > So, hmmm, if Loci can run like this from the command-line, maybe Loci too can > be wrapped to run inside of Loci! This strange loopiness reeks of Godel, Escher and Bach. I love it. Actually, I was thinking about some of Brad's suggestions for wrapping backend apps and it struck me that, AFAIK, the only programs that Loci can really 'wrap' are the ones in which Loci can control the redirection of stdin and stdout. This includes command-line driven apps, CORBAfied apps, cgi scripts, and so on, but not apps with 'fixed' stdin and stdout. This includes a large number of apps where the processing and the GUI are 'integrated'. Input is typically restricted to file (or database), keyboard, and mouse, and output is typically restricted to the GUI display or file (or database). We would be remiss to make an application-wrapping framework that itself cannot be wrapped. I just wonder if CORBA might be a better solution than the command-line? gary -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Gary Van Domselaar gvd@redpoll.pharmacy.ualberta.ca Faculty of Pharmacy Phone: (780) 492-4493 University of Alberta FAX: (780) 492-5305 Edmonton, Alberta, Canada http://redpoll.pharmacy.ualberta.ca/~gvd From gvd at redpoll.pharmacy.ualberta.ca Sun Dec 19 22:39:24 1999 From: gvd at redpoll.pharmacy.ualberta.ca (Gary Van Domselaar) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] Gnome stuff References: <3859868F.E4E3FCA2@geoserve.net> from "J.W. Bizzaro" at Dec 17, 99 00:40:47 am <385D11FB.364D9A9A@geoserve.net> <385D5368.D998F7AC@redpoll.pharmacy.ualberta.ca> <385D6D34.327CC401@geoserve.net> Message-ID: <385DA4EC.3698D0E6@redpoll.pharmacy.ualberta.ca> "J.W. Bizzaro" wrote: > > Gary Van Domselaar wrote: > > > > If we were to follow the apache example, we would not specify a public > > and private directory explicitly, but rather use an authentication > > procedure (like apache's .htaccess) to create private (or perhaps > > 'restricted') directories from publically accessible ones. So > > > > /home/brad/loci/public_loci/ //unrestricted access, network > > viewable > > Is this directory _automatically_ an unrestricted area? I would suggest that loci's configuration utility would provide a directive for identifying the default 'public_loci', but as with any Unix filesystem, would require the proper permissions attributes in order to make it truly 'world readable'. The directory would not be created by loci, but created by the Loci user who wants to have a 'Loci site' ;-) > > For DnD, you may want to consider providing the user with option to do a > > move, copy, or symbolic link, via pop-up menu, in direct analogy to > > right-button DnD in Windoze. > > So, a button3 DnD would bring up a dialog. > > Button1 DnD would by default move a locus if source and destination are both > local. > > Button1 DnD would by default copy a locus if either source or destination (or > both) are remote. (This is typically how inter-filesystem transfers work on > the Mac and Windows.) > > What about _writing_ to a _remote_ container? If I do a DnD from my local > Workspace to a remote container, should I have write permissions? This might > be a good mechanism for 'sharing loci'. This certainly would require a login > of some sort. > > So, should a .access file be required for any remote writing to a filesystem? > Or should 'writers' have a shell account, as we have CVS set up (I think you > can give CVS write access to someone who doesn't have a shell account)? I like the .access idea, for writing to remote filesystems, but I dont know enough about CORBA's authentication capabilities (although I know that they are provided for in the OMG's LSG's IDL) to make a decent comparison between the two approaches. I suspect a .access file would require extra work for the Loci developers, if CORBA can do the same thing, perhaps that is a better solution. Maybe Justin has some relevant comments on this... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Gary Van Domselaar gvd@redpoll.pharmacy.ualberta.ca Faculty of Pharmacy Phone: (780) 492-4493 University of Alberta FAX: (780) 492-5305 Edmonton, Alberta, Canada http://redpoll.pharmacy.ualberta.ca/~gvd From bizzaro at geoserve.net Sun Dec 19 23:41:53 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:02 2006 Subject: [Pipet Devel] get it while it's hot Message-ID: <385DB391.80F3F565@geoserve.net> Locians, Here is the latest snapshot: http://bioinformatics.org/loci/download/snapshots/loci-core-19991219.tar.gz I made some improvements to dialog and menu handling, which you may or may not notice. Also, I managed to do what I wrote about today: make Loci appear in a composite locus's windowlet. Check it out, but I have to warn you, the windowlet Loci doesn't work...something to do with event handling. If anyone on the list is a PyGTK expert, it'd be great if you could take a look. Some of the errors Brad found are in this snapshot. Note that you can get all of this from CVS too. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Sun Dec 19 23:56:46 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] get it while it's hot References: <385DB391.80F3F565@geoserve.net> Message-ID: <385DB70E.B0F44F87@geoserve.net> "J.W. Bizzaro" wrote: > > Some of the errors Brad found are in this snapshot. I mean, "are fixed in this snapshot." :-) Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From chapmanb at arches.uga.edu Mon Dec 20 07:29:48 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] While you're in CVS... In-Reply-To: <385DB391.80F3F565@geoserve.net> Message-ID: Dearest Locians; I have done some actual coding (woo-hoo!) on some of the filesystem/container stuff we were talking about and just committed it to cvs in a brand new directory: loci-file. I decided to stay out of loci-core for now because: 1. I didn't want to screw stuff up. 2. I'm trying to learn how to do GUIs with pyGTK/pyGNOME from the ground up, so that I can get a good grasp on how it all works. 3. I didn't want to screw stuff up. So anyways, if you decide you would like to look at the mess in loci-file, you will find: filegui.py: The (ugly) GUI interface file2XML.py: A program that converts a directory structure into an XMLish document other random stuff: TODO, CHANGES... Since it is rough, this is the step by step on how to work it: 1. It requires all the same stuff as Loci: pyGNOME/pyGTK, python, gnome-libs (well, maybe it doesn't need all this--who knows!) 2. Just move into the loci-files directory and type './filegui.py &' 3. You'll be presented with window containing one nasty little button labelled "container" that fills the entire window. This is my temporary substitute for a container. 4. Click on the button and you'll be presented with a file dialog. Pick a directory. 5. Click okay, and the program will write out a file 'XMLoutput.xml' containing the directory substructure modelled as an XML-type document. 6. Click on File and Exit in the main window, since that's all it does. 7. Check out XMLoutput.xml and see if it accurately represents the filesystem. The XML file has indentations and everything so it isn't too bad to look at and check. I've just tested it on a few directories and it seems to do okay. So if you are interested, please check it out, try it out on your favorite directories and be sure it is modeling them okay, take a look at the code and send suggestions/mocking comments, and generally have a nifty time with it. Please let me know if I messed up the cvs commit or if anything else is horrible wrong and I'll try to fix it. I'm off school and without formal responsibilities, so I'll be doing more coding on it in the next couple days (while I am near my computer) and it should *hopefully* improve and do more. WRT all of the discussion on the list--these are my quick thoughts on the loci as Apache type filesystem stuff: 1. As much as I hate config files, we probably need a loci.conf file to specify things like $LOCI_ROOT and private and public directories. I think instead of having specific defined directories for public and private, we should just take the Apache-type approach and specifically specify private directories within the $LOCI_ROOT file system. Although I could really care less where Loci is on my file system, people, in general, like to have control over the location of their programs and will probably want to specify it. 2. How are we going to deal with security issues surrounding programs? Will all programs running under Loci need to be located within the $LOCI_ROOT filesystem? If so, will they all need to be in a specific directory ($LOCI_ROOT/bin?) like cgi-scripts in apache? I really know nothing about security so I'm just throwing out an idea. I still need to digest most of the conversation before I can make half-way rational comments on it. Oh, and Jeff--with the new snapshot I lost the nice scrollbars that I had previously. So now I can move loci off the desktop and they just end up disappearing instead of the desktop scrolling. I like the new direction it is going though, and will be excited to see some loci inside loci! Brad From bizzaro at geoserve.net Mon Dec 20 12:52:11 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] While you're in CVS... References: Message-ID: <385E6CCB.E0ACDB1B@geoserve.net> Brad Chapman wrote: > > I have done some actual coding (woo-hoo!) on some of the > filesystem/container stuff we were talking about and just committed it to > cvs in a brand new directory: loci-file. Woo-hoo! > 5. Click okay, and the program will write out a file 'XMLoutput.xml' > containing the directory substructure modelled as an XML-type document. > 6. Click on File and Exit in the main window, since that's all it does. > 7. Check out XMLoutput.xml and see if it accurately represents the filesystem. The XML looks decent to me. I'd suggest looking at the 'xmllib' module that actually comes with Python. We probably should use it (an althernative is Gnome's LibXML) for all of our XML work. Of course it'll save quite a bit of coding for reading and (I think) writing XML. Look at 'xmlparse.py' (bottom half of file) in the loci-core module for an example of its use in parsing. There is one catch to 'file2XML.py': It recursively descends subdirectories, which is not needed (it's neat that you made it do that though). A container only needs to know the contents (types, etc.) of its top level directory. Only if you DnD a 'sub-container'/subdirectory out of the list, will the contents of the subdir need to be known: That's when a new container, containing the subdirectory, is made. > time with it. Please let me know if I messed up the cvs commit or if > anything else is horrible wrong and I'll try to fix it. The only 'problem' we're having with making new directories via CVS is that the ownership is by default username.username. It needs to be username.cvs. I can fix that as root, and any user can fix it, by going to /home/cvs/ and typing $ chown -R .cvs I already did that for loci-file, but if you make a new directory in loci-file, you need to check that it is set to group cvs (and that it is group read/writable). Otherwise, another user will get a 'permission denied' error doing a checkout. > WRT all of the discussion on the list--these are my quick thoughts > on the loci as Apache type filesystem stuff: > > 1. As much as I hate config files, we probably need a loci.conf file to > specify things like $LOCI_ROOT and private and public directories. We need to define all sorts of settings anyway. Maybe we'll use XML. > I think > instead of having specific defined directories for public and private, we > should just take the Apache-type approach and specifically specify private > directories within the $LOCI_ROOT file system. Although I could really care > less where Loci is on my file system, people, in general, like to have > control over the location of their programs and will probably want to > specify it. So all of $LOCI_ROOT is public unless otherwise specified? I'd agree if the rest of the filesystem was _privately_ accessible. For example, most binaries are installed to /usr/bin/ If $LOCI_ROOT were set to /home/loci/ All those lovely bioinformatics apps in /usr/bin/ would be inaccessible, even privately :-( > 2. How are we going to deal with security issues surrounding programs? Will > all programs running under Loci need to be located within the $LOCI_ROOT > filesystem? If so, will they all need to be in a specific directory > ($LOCI_ROOT/bin?) like cgi-scripts in apache? I really know nothing about > security so I'm just throwing out an idea. Pretty much what I addressed above. You wouldn't have access to /usr/bin/ unless you used symlinks, which is a possibility. I'm not a security guru myself, but some on this list seem to know quite a bit. Perhaps someone should be appointed 'Security Guru'. Any volunteers? > Oh, and Jeff--with the new snapshot I lost the nice scrollbars that I had > previously. So now I can move loci off the desktop and they just end up > disappearing instead of the desktop scrolling. Are you sure the Workspace scrolled automatically when you dragged a locus? I never added that functionality. Scrollbars are there but only show up when the window is smaller than the Workspace. Try resizing the window. > I like the new direction it > is going though, and will be excited to see some loci inside loci! Me too ;-) Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Mon Dec 20 13:36:49 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] Brad: I've got a little task for you Message-ID: <385E7741.656C94C2@geoserve.net> Brad, Since you're working on containers and getting into PyGTK, how about making a prototype widget to go in the windowlet of a container: a list widget. Just name the file 'container_list.py' or whatever. Look in the PyGTK examples for a list widget. You can put some fake values in for now. As an example of how Loci widgets are structured, look at 'testwidget1.py' (pasted below) in loci-core. Basically, we're making a 'composite widget' (no relation to composite locus), which is a widget that inherits the objects of a standard GTK widget. So, WidgetMain inherits GtkVBox.... --------------------------------------------------- >from gtk import * from gnome.ui import * class WidgetMain(GtkVBox): get_type = GtkVBox(spacing=5).get_type() def __init__(self): self._o = GtkVBox(spacing=5)._o self.width = 150 self.height = 50 self.set_usize(self.width, self.height) self.set_border_width(5) w = GtkLabel('Label') self.add(w) w.show() --------------------------------------------------- The actual widget I'm using here is a GtkLabel and is 'added' to self (WidgetMain). That's all that is really needed, and then I can get it to show up in the windowlet. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From chapmanb at arches.uga.edu Tue Dec 21 21:01:38 1999 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] Brad: I've got a little task for you In-Reply-To: <385E7741.656C94C2@geoserve.net> Message-ID: Oh great Locians; J.W. Bizzaro wrote: >Since you're working on containers and getting into PyGTK, how about making a >prototype widget to go in the windowlet of a container: a list widget. > Surely! I just updated loci-file with two (double the excitement!) widgets: 1. container_list.py: A GtkCList embedded in a scrolling window that you can initiated with listWidget(). 2. container_tree.py: A GtkCTree embedded in a scrolling window that can be initiated with treeWidget(). Both widgets have an optional initiation with some data already added. To get these, just call them up and pass them a 1 (ie. myList = listWidget(1)). I hope these widgets are what you were looking for! You can take a look at the widgets in action by updating your copy of loci-file. I just committed some changes so there are now three buttons to push (wow!): 1. 'Container loci': same as before, outputs XML from a selected directory structure into XMLoutput.xml 2. 'Display a listing': Takes the info from XMLoutput.xml and displays it in a list widget. I still need to learn to add pictures next to the names so that you can tell the difference between documents and directories. 3. 'Display a tree': Displays the example tree. I'm currently stuck trying to figure out how to parse the XML into a tree. Note that since this now does XML parsing, loci-file requires something new. I used the SAX (simple API for XML) complient parser from the python xml toolkit. You can get the toolkit by going to: http://www.python.org/topics/xml/download.html. The newest version is PyXML-0.5.2.tar.gz, but I couldn't get this to install for me, so I am using PyXML-0.5.1. So whatever you can get to work should be okay, I'm not doing anything really fancy. If anyone has the chance to check out the changes, please drop me any comments you have! J.W. Bizzaro wrote: >I'd suggest looking at the 'xmllib' module that actually comes with Python. >We probably should use it (an althernative is Gnome's LibXML) for all of our >XML work. Of course it'll save quite a bit of coding for reading and (I >think) writing XML. Look at 'xmlparse.py' (bottom half of file) in the >loci-core module for an example of its use in parsing. Sorry I had to go above this. I couldn't figure enough out from xmlparse.py to know how to work this lib, and there is some pretty helpful documentation on the SAX stuff. I'm not sure about all the differences, but if necessary I can probably scale back later not to use the XML toolkit. J.W. Bizzaro wrote: >There is one catch to 'file2XML.py': It recursively descends subdirectories, >which is not needed (it's neat that you made it do that though). A container >only needs to know the contents (types, etc.) of its top level directory. >Only if you DnD a 'sub-container'/subdirectory out of the list, will the >contents of the subdir need to be known: That's when a new container, >containing the subdirectory, is made. Well, give all the credit for the recursive descending to the writers of os.path.walk(), not me! The reason I did this is for flexibility. I would really like to represent the contents of a container as a tree, instead of a list (hence the two widgets) so that a user can look into subdirectories to see what is there, without having to create a new container. I am having trouble parsing the XML into a tree, so I can't demo this yet, but I'll keep working on it. Even with the recursive descending, it is no problem to display the info as a list. I would be interested to hear people's thoughts on the tree vs. list representations. J.W. Bizzaro wrote: >The only 'problem' we're having with making new directories via CVS is that >the ownership is by default username.username. It needs to be username.cvs. >I can fix that as root, and any user can fix it, by going to I will definately do this next time. I tried to use the command 'cvs add loci-file', which, according to the book "Open Source Development with CVS", *should* allow me to add a directory, but this gives something like the following error: cvs add: in directory .: cvs [add aborted]: there is no version here: do 'cvs checkout' first Any ideas why that is? J.W. Bizzaro wrote: >So all of $LOCI_ROOT is public unless otherwise specified? I'd agree if the >rest of the filesystem was _privately_ accessible. That is the picture I was imagining. If we are following the Apache model, there are no opportunites for outside users to reach any directories besides those inside $LOCI_ROOT. Same with ftp servers, right? I'm not positive how to do this, but it seems definately possible! >Are you sure the Workspace scrolled automatically when you dragged a locus? I >never added that functionality. I don't know, I may have just been drinking heavily and imagined it! Brad From bizzaro at geoserve.net Tue Dec 21 22:37:31 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] Brad: I've got a little task for you References: Message-ID: <3860477B.5B9DF95D@geoserve.net> Brad Chapman wrote: > > Note that since this now does XML parsing, loci-file requires something > new. I used the SAX (simple API for XML) complient parser from the python > xml toolkit. The licenses are acceptable. I'd like to keep the number of extra packages that the user needs to install to a minimum though. Does SAX provide what straight xmllib cannot? > You can get the toolkit by going to: > http://www.python.org/topics/xml/download.html. The newest version is > PyXML-0.5.2.tar.gz, but I couldn't get this to install for me, so I am > using PyXML-0.5.1. There is no rule for 'make install' in 0.5.2. I just copied xml/ to /usr/lib/python1.5/site-packages/ after 'make'. > Sorry I had to go above this. I couldn't figure enough out from xmlparse.py > to know how to work this lib, and there is some pretty helpful > documentation on the SAX stuff. I'm not sure about all the differences, but > if necessary I can probably scale back later not to use the XML toolkit. SAX actually includes a modified version of xmllib, so if SAX is well documented, you may find xmllib docs there too. > Well, give all the credit for the recursive descending to the writers of > os.path.walk(), not me! The reason I did this is for flexibility. I would > really like to represent the contents of a container as a tree, instead of > a list (hence the two widgets) so that a user can look into subdirectories > to see what is there, without having to create a new container. It's certainly more convenient than openning up new containers. But since containers can represent large databases, it wouldn't be a good idea to use trees everywhere. BTW, It looks good, Brad. Nice work! > I am having > trouble parsing the XML into a tree, so I can't demo this yet, but I'll > keep working on it. Even with the recursive descending, it is no problem to > display the info as a list. I would be interested to hear people's thoughts > on the tree vs. list representations. I wonder about speed: 1. Recursively descend a directory (of any size and at any _location_). 2. Write to XML. 3. Create tree widget. 4. Parse the XML and put into tree. How long would it take to do this for a large filesystem? at a remote location? These are the reasons why I wanted to use a list. If trees are (1) made fast enough and (2) aren't used for every container, they would be good to use. Can you give us some feedback about these issues? > I will definately do this next time. I tried to use the command 'cvs add > loci-file', which, according to the book "Open Source Development with > CVS", *should* allow me to add a directory, but this gives something like > the following error: > > cvs add: in directory .: > cvs [add aborted]: there is no version here: do 'cvs checkout' first > > Any ideas why that is? loci-file was the module, and modules need to be made using cvs import (etc.) >from _within_ loci-file, you can add directories using cvs add after having made the directory in the filesystem. > That is the picture I was imagining. If we are following the Apache model, > there are no opportunites for outside users to reach any directories > besides those inside $LOCI_ROOT. Same with ftp servers, right? I'm not > positive how to do this, but it seems definately possible! Yeah, anonymous ftp is a good example too. You can going anywhere in the filesystem with ftp providing you log in with an account. Anonymous users are limited in where they can go. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From bizzaro at geoserve.net Tue Dec 21 22:49:49 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] WidgetMain() Message-ID: <38604A5D.67872C5B@geoserve.net> BTW Brad, In the source for the widgets, you use class listWidget() and class treeWidget() But when the Workspace builds a windowlet, it has no knowledge of widget details, only that the class is class WidgetMain() The name may change, but it should be the same for all widgets. Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From David.Lapointe at umassmed.edu Wed Dec 22 13:03:35 1999 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] FW: Bioperl: BPlite.pm Message-ID: <93307F07DE63D211B2F30000F808E9E501644F68@edunivexch02.umassmed.edu> A forward from the bioperl list. -----Original Message----- From: Jeffrey Chang [mailto:jchang@SMI.Stanford.EDU] Sent: Wednesday, December 22, 1999 11:31 AM To: Ewan Birney Cc: Ian Korf; vsns-bcd-perl@lists.uni-bielefeld.de Subject: Re: Bioperl: BPlite.pm Hi Everybody, Just popping in from biopython! I thought I'd mention that over there, we're using an event-oriented design for our parsers, which is described in a mail: http://www.biopython.org/pipermail/biopython/1999-December/000149.html How it works, is that a Scanner object chews through a data file and generates events when it runs across information. The events are then handled by a Consumer. This design is nice because it decouples a lot of the parsing work from the final representation, and makes it easy to accomodate parsers of varying complexity. You can create Consumers to handle as much or as little of the data as you want. The plan for biopython is to distribute Scanners, and a Consumer that shoves all the information into some data structure. Advanced users, however, will have the option of using the scanner but building their own high performance Consumer tailored specifically for their own purposes. The code for this is sitting on my local drive now, and will be in the biopython CVS repository soon. Jeff On Wed, 22 Dec 1999, Ewan Birney wrote: > On Tue, 21 Dec 1999, Ian Korf wrote: > > > I've been getting requests recently for old BLAST parsers. > > Seems as though some people are looking for a lighweight > > parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you > > can find my version of such a module. It parses both NCBI- > > and WU-BLAST, and works well in pipes since it reads one > > subject and one alignment at a time. > > I'd really like to see a lighter blast parser with less embedded > functionality in bioperl, ideally with the main features of steve's > blast parser. If I can persuade someone to look at this Ian, is it > ok to bring it inside bioperl? (any chance of you wanting to do that? I > guess not...) > > Steve - we *do* need to think of upgrading the blast parser - only > you know the code, and the largest set of bugs are found in it. > > > > > > The pod2text version of the documentation follows. > > > > -Ian Korf > > > > > > NAME > > BPlite - Lightweight BLAST parser > > > > SYNOPSIS > > use BPlite; > > my $report = new BPlite(\*STDIN); > > $report->query; > > $report->database; > > while(my $sbjct = $report->nextSbjct) { > > $sbjct->name; > > while (my $hsp = $sbjct->nextHSP) { > > $hsp->score; > > $hsp->bits; > > $hsp->percent; > > $hsp->P; > > $hsp->queryBegin; > > $hsp->queryEnd; > > $hsp->sbjctBegin; > > $hsp->sbjctEnd; > > $hsp->queryAlignment; > > $hsp->sbjctAlignment; > > } > > } > > > > DESCRIPTION > > BPlite is a package for parsing BLAST reports. The BLAST > > programs are a family of widely used algorithms for sequence > > database searches. The reports are non-trivial to parse, and > > there are differences in the formats of the various flavors of > > BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and > > TBLASTX reports from both the high performance WU-BLAST, and the > > more generic NCBI-BLAST. > > > > Many people have developed BLAST parsers (I myself have made at > > least three). BPlite is for those people who would rather not > > have a giant object specification, but rather a simple handle to > > a BLAST report that works well in pipes. > > > > Object > > > > BPlite has three kinds of objects, the report, the subject, and > > the HSP. To create a new report, you pass a filehandle reference > > to the BPlite constructor. > > > > my $report = new BPlite(\*STDIN); # or any other filehandle > > > > The report has two attributes (query and database), and one > > method (nextSbjct). > > > > $report->query; # access to the query name > > $report->database; # access to the database name > > $report->nextSbjct; # gets the next subject > > while(my $sbjct = $report->nextSbjct) { > > # canonical form of use is in a while loop > > } > > > > A subject is a BLAST hit, which should not be confused with an > > HSP (below). A BLAST hit may have several alignments associated > > with it. A useful way of thinking about it is that a subject is > > a gene and HSPs are the exons. Subjects have one attribute > > (name) and one method (nextHSP). > > > > $sbjct->name; # access to the subject name > > "$sbjct"; # overloaded to return name > > $sbjct->nextHSP; # gets the next HSP from the sbjct > > while(my $hsp = $sbjct->nextHSP) { > > # canonical form is again a while loop > > } > > > > An HSP is a high scoring pair, or simply an alignment. HSP > > objects do not have any methods, just attributes (score, bits, > > percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd, > > queryAliignment, sbjctAlignment) that should be familiar to > > anyone who has seen a blast report. For lazy/efficient coders, > > two-letter abbreviations are available for the attributes with > > long names (qb, qe, sb, se, qa, sa). > > > > $hsp->score; > > $hsp->bits; > > $hsp->percent; > > $hsp->P; > > $hsp->queryBegin; $hsp->qb; > > $hsp->queryEnd; $hsp->qe; > > $hsp->sbjctBegin; $hsp->sb; > > $hsp->sbjctEnd; $hsp->se; > > $hsp->queryAlignment; $hsp->qa; > > $hsp->sbjctAlignment; $hsp->sa; > > "$hsp"; # overloaded for begin..end bits > > > > I've included a little bit of overloading for double quote > > variable interpolation convenience. A subject will return its > > name and an HSP will return its queryBegin, queryEnd, and bits > > in the alignment. Feel free to modify this to whatever is most > > frequently used by you. > > > > So a very simple look into a BLAST report might look like this. > > > > my $report = new BPlite(\*STDIN); > > while(my $sbjct = $report->nextSbjct) { > > print "$scbjct\n"; > > while(my $hsp = $sbjct->nextHSP) { > > print "\t$hsp\n"; > > } > > } > > > > The output of such code might look like this: > > > > >foo > > 100..155 29.5 > > 268..300 20.1 > > >bar > > 100..153 28.5 > > 265..290 22.1 > > > > AUTHOR > > Ian Korf (ikorf@sapiens.wustl.edu, > > http://sapiens.wustl.edu/~ikorf) > > > > ACKNOWLEDGEMENTS > > This software was developed at the Genome Sequencing Center at > > Washington Univeristy, St. Louis, MO. > > > > COPYRIGHT > > Copyright (C) 1999 Ian Korf. All Rights Reserved. > > > > DISCLAIMER > > This software is provided "as is" without warranty of any kind. > > > > =========== Bioperl Project Mailing List Message Footer ======= > > Project URL: http://bio.perl.org/ > > For info about how to (un)subscribe, where messages are archived, etc: > > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html > > ==================================================================== > > > > ----------------------------------------------------------------- > Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230 > > http://www.sanger.ac.uk/Users/birney/ > ----------------------------------------------------------------- > > =========== Bioperl Project Mailing List Message Footer ======= > Project URL: http://bio.perl.org/ > For info about how to (un)subscribe, where messages are archived, etc: > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html > ==================================================================== > =========== Bioperl Project Mailing List Message Footer ======= Project URL: http://bio.perl.org/ For info about how to (un)subscribe, where messages are archived, etc: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html ==================================================================== From bizzaro at geoserve.net Wed Dec 22 21:03:57 1999 From: bizzaro at geoserve.net (J.W. Bizzaro) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] naming madness Message-ID: <3861830D.1912DF15@geoserve.net> The non-profit "Leonardo Association" in France was raided by police following a lawsuit filed by the "Leonardo Finance" company. Why? Leonardo Association's Web site showed up on an Internet search of the word "Leonardo". Apparently no one but the Leonardo Finance may use the word. http://mitpress.mit.edu/e-journals/Leonardo/ This reminds me of the time the Frenchman whose last name is "Montana" sued the U.S. state of Montana for use of his name. So, maybe it's just the French ;-) What does this have to do with Loci? I'm just thinking about how seriously some people take alleged idea-theft. Not that we do that, but as with the above cases, it's only the perception that matters. Maybe we should have used some obscure name like "tacg" :-) Cheers. Jeff -- +----------------------------------+ | J.W. Bizzaro | | | | http://bioinformatics.org/~jeff/ | | | | THE OPEN LAB | | Open Source Bioinformatics | | | | http://bioinformatics.org/ | +----------------------------------+ From mangalam at home.com Thu Dec 30 16:22:20 1999 From: mangalam at home.com (Harry Mangalam) Date: Fri Feb 10 19:19:03 2006 Subject: [Pipet Devel] naming madness References: <3861830D.1912DF15@geoserve.net> Message-ID: <386BCD0C.FC3755C1@home.com> My lawyers (Dewey, Cheatem and Howe) will be contacting you shortly... Harry "J.W. Bizzaro" wrote: > > The non-profit "Leonardo Association" in France was raided by police following > a lawsuit filed by the "Leonardo Finance" company. Why? Leonardo > Association's Web site showed up on an Internet search of the word > "Leonardo". Apparently no one but the Leonardo Finance may use the word. > > http://mitpress.mit.edu/e-journals/Leonardo/ > > This reminds me of the time the Frenchman whose last name is "Montana" sued > the U.S. state of Montana for use of his name. So, maybe it's just the French > ;-) > > What does this have to do with Loci? I'm just thinking about how seriously > some people take alleged idea-theft. Not that we do that, but as with the > above cases, it's only the perception that matters. > > Maybe we should have used some obscure name like "tacg" :-) > > Cheers. > Jeff > -- > +----------------------------------+ > | J.W. Bizzaro | > | | > | http://bioinformatics.org/~jeff/ | > | | > | THE OPEN LAB | > | Open Source Bioinformatics | > | | > | http://bioinformatics.org/ | > +----------------------------------+ > > _______________________________________________ > pipet-devel maillist - pipet-devel@bioinformatics.org > http://bioinformatics.org/mailman/listinfo/pipet-devel -- Cheers, Harry Harry J Mangalam -- (949) 856 2847 -- mangalam@home.com