From bizzaro at bc.edu Thu Feb 4 03:22:56 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:06 2006 Subject: [Pipet Devel] XML-RPC Message-ID: <36B958E0.2ED2AF61@bc.edu> Fellow Locians, Not to say that we should use this protocol (I guess it was developed by Microsoft), but XML-RPC seems to embed object requests (or "procedure calls") in an XML. Here is a link for those who are not familiar with it: http://www.scripting.com/davenet/98/07/xmlRpcForNewbies.html Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From hinsen at cnrs-orleans.fr Thu Feb 4 03:44:48 1999 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Feb 10 19:18:06 2006 Subject: [Pipet Devel] XML-RPC In-Reply-To: <36B958E0.2ED2AF61@bc.edu> (bizzaro@bc.edu) References: <36B958E0.2ED2AF61@bc.edu> Message-ID: <199902040844.JAA17952@dirac.cnrs-orleans.fr> > Not to say that we should use this protocol (I guess it was developed by > Microsoft), but XML-RPC seems to embed object requests (or "procedure calls") in > an XML. Here is a link for those who are not familiar with it: > > http://www.scripting.com/davenet/98/07/xmlRpcForNewbies.html And here's the Python implementation: http://www.pythonware.com/products/xmlrpc/ -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From bizzaro at bc.edu Thu Feb 4 04:19:02 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:06 2006 Subject: [Pipet Devel] better link to Python-XML Message-ID: <36B96606.27415BA7@bc.edu> Here is a link to the Python-XML "home page", which I guess is difficult to find from the XML-SIG page I wrote about earlier: http://www.python.org/topics/xml/index.html See, lots of XML stuff for Python :-) Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From rahul at photino.sid.rice.edu Thu Feb 4 11:29:02 1999 From: rahul at photino.sid.rice.edu (Rahul Jain) Date: Fri Feb 10 19:18:06 2006 Subject: [Pipet Devel] better link to Python-XML In-Reply-To: <36B96606.27415BA7@bc.edu> Message-ID: On Thu, 4 Feb 1999, J.W. Bizzaro wrote: > Here is a link to the Python-XML "home page", which I guess is difficult to find > from the XML-SIG page I wrote about earlier: > > http://www.python.org/topics/xml/index.html > > See, lots of XML stuff for Python :-) My reason for wanting to use Perl in specific parts of the software are not because of a lack of support in Python for specific libraries, but because of the purpose of the language. Perl is *designed* for processing text, Python is like a normal programming language, designed for calculation. If we have all of our communication between the various tools in pure XML, we can use any language we want for the tools. As for C, that should really only be used in the processor-intensive routines, most of which would be called from the Python scripts (as they are handling the computation). Perl would only be use when we want to input text and manipulate it into another text format, e.g. my project, the web interface. That is something where the sloppiness of Perl becomes useful and almost essential. In Python, the project is doable, but not nearly as easy because Python wasn't meant to do all of this stuff, or at least not as much as Perl was meant to do it. -- -> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <- -> -/-=-=-=-=-=-=-=-=-=/ { Rahul -<>- Jain } \=-=-=-=-=-=-=-=-=-\- <- -> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <- -> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain@usa.net -\- <- |--|--------|--------------|----|-------------|------|---------|-----|-| Version 10.423.999.211011001.23.20110101.042 (c)1996-1998, All rights reserved. Disclaimer available upon request. From bizzaro at bc.edu Thu Feb 4 17:25:21 1999 From: bizzaro at bc.edu (bizzaro@bc.edu) Date: Fri Feb 10 19:18:06 2006 Subject: [Pipet Devel] better link to Python-XML In-Reply-To: Message-ID: On Thu, 4 Feb 1999 10:29:02 -0600 (CST) rahul@photino.sid.rice.edu (Rahul Jain) wrote: >My reason for wanting to use Perl in specific parts of the software are >not because of a lack of support in Python for specific libraries, but >because of the purpose of the language. Perl is *designed* for processing >text, Python is like a normal programming language, designed for calculation. I don't see such a big difference between Perl and Python regarding their text handling capabilities. Python, like any good UNIX scripting language, uses ASCII as a standard mean of communication...so it has to be designed for it. However, since your project is dealing with even more text than the others, by having to manipulate HTML, you have a good point about needing the best tool for the job. >If we have all of our communication between the various tools in pure >XML, we can use any language we want for the tools. ...providing they can communicate with the Paos object server. > As for C, that should >really only be used in the processor-intensive routines, most of which >would be called from the Python scripts (as they are handling the computation). Right. >Perl would only be use when we want to input text and manipulate it into >another text format, e.g. my project, the web interface. That is something >where the sloppiness of Perl becomes useful and almost essential. In >Python, the project is doable, but not nearly as easy because Python >wasn't meant to do all of this stuff, or at least not as much as Perl was >meant to do it. Actually, your project, the Web interface, is not a part of the "core distribution" of Loci, as I've been defining it. It is the core that I am really fighting over trying to keep it all Python. The Web interface is the way for non-UNIX users (and UNIX users who don't have Loci installed) to access all/most of the analysis algorithms that will be tied into Loci via Internet servers. So, I think you can use whatever programming language you want to. But I think you'll find that we may come up with Python alternatives to the Perl tools you'll be using. Also, you want to be sure you can tie into the Paos server, so some Python may be necessary. Jeff bizzaro@bc.edu From bizzaro at bc.edu Wed Feb 10 11:02:48 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:16 2006 Subject: [Pipet Devel] GTK port to BeOS et al Message-ID: <36C1ADA8.6E258A8F@bc.edu> Locians, Here is an article about the porting of GTK to the BeOS. This is of interest to us because the plan for Loci is to develop under Python-GTK and wait for GTK to migrate to non-UNIX systems. I have been confident that the great interest in GTK will solve much of the portability issue for us. http://www.benews.com/story/?ID=623 Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Wed Feb 10 14:44:29 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:16 2006 Subject: [Pipet Devel] Hmmm Message-ID: <36C1E19D.7B8D1EFE@bc.edu> Locians, An interesting footnote: You may have seen the link on Loci overview page that sends you to the page for GCG Wisconsin Package pricing. The very first thing I say against GCG is how terribly expensive it is...$10,000 for this and that. http://www.gcg.com/ordering/price_schedule.html Well, that page _used_to_ give prices. They now have a list of phone numbers. It makes me wonder if we've been spotted ;-) Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Mon Feb 15 20:16:09 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:16 2006 Subject: [Pipet Devel] goings on Message-ID: <36C8C6D9.2D2B2F0@bc.edu> Hello Locians! I guess we've all been kind of quiet. I'm working on a better description of the project for the Web site. This is in preparation of some announcements I hope to post at some relevant places on the Internet. Once posted, I hope we can pick up some more developers, particularly people with experience in GTK. We have many more projects than people at this time. I also hope to get a new projects/TODO list to you guys soon. Also, I still haven't heard back from Peter Rice of the EMBOSS project. Everything is really quiet there. For your reading pleasure, I found an interesting editorial by Ajay Shah that seems to hit on some of the key features of our project. Here is an excerpt: (http://ny.us.mirrors.freshmeat.net/news/1998/11/15/911138358.html) Strategies for building applications software * Model 1: Clean core, with third party extensions The development model which fits open source the best, of course, is something like GIMP or Emacs, where a technically solid core is extensible by third parties. This is the most parallelisable development style which obtains the maximum human inputs from across the globe with minimal problems of coordination. If such a design can be applied to build a product, then I believe that `open source' always wins because of the range of extensions, and the code quality therein. The entry barrier of knowledge required to obtain the thrills of producing useful code is very low with the scripting languages used in such situations - as compared with starting from scratch writing in C. Hence it's easier for the project to recruit developers. I suspect this design will work for a spreadsheet and (to some extent) for a presentation program, but not really for a word processor. * Model 2: Moving the application onto the network The second way in which open source can make inroads is by making an established product category obsolete. If personal finance programs turn into Internet sites then the personal finance category ceases to exist. I have seen applications which are painful attempts at putting databases (on CD or on hard disk) for local querying under Microsoft Windows. This is ultimately obsolete because it's so much more sensible to simply query this same data over the Internet. Open source developers are in a unique position to apply this principle. Open source developers are innovative, and highly knowledgeable about the Internet. Open source developers have no qualms about cannibalising existing product lines, a hurdle which limits innovation with many shareholder-owned companies. To take a standalone application and convert it into an Internet service scores high marks on the coolness scale; it'd attract development talent. To the extent that innovative open source developers migrate existing application categories into Internet versions, the problem of replicating existing software is sidestepped. Of course, if Microsoft is able to own basic protocols of the Internet or of Internet commerce, then Internet applications could be even more closed than traditional MS Windows applications. Microsoft has thus far had a near--zero impact upon protocol or technology development in the context of the Internet, so this is not going to be easy for them. * Model 3: Applications which implement 20% of the features which account for 90% of the use Every software product manager knows the misery of seeing 90% of users use only 20% of the features. I believe this is the direction from which new projects can rapidly come up against well-established incumbents. I feel there is something misplaced about the debates about whether `open source' applications software match the features of mainstream commercial products. A product which contains 20% of the features of a mainstream word processor is adequate at the low-end market, since the bulk of the low-end market never uses the complex features anyway. New projects should work to carefully isolate the features which the `open source' applications should match. It can't be very difficult for the wizards to hack up a filter which logs the features used by existing word processor users. Such a program, runing at workplaces all over the world, would yield data about the features that are useful versus the features that aren't. This is reminiscent of the discovery, in the days that preceded RISC, that compilers were only utilising a small core of the instruction set. I suspect that a program which implements one-fifth the complexity accounts for 90% of the usage. A clean reimplementation of these one--fifth of the features would be lean and bug-free when compared with the bloated implementations that are presently found with commercial user applications. If this conjecture is on track, it implies that Microsoft's marketing department is confused in what they're trying on applications software complexity. I believe that 90% of humans will enjoy a lean word processor (with one-fifth the features of existing GUI word processors) and the remaining 10% would be better off with TeX. Free, as in zero dollars rms has talked at length about the issue of freedom, not price. I agree with him on the way his argument applies to the development process. However, when we discuss large-scale adoption by computer users worldwide, I wonder if we're losing sight of the power of `free', as in zero dollars. If there was one thing I was surprised to not see in the `haloween memo', it was the discomfort that Microsoft must feel when competing against a price tag of 0. It is common to ask whether linux, apache etc. are beating Microsoft technically, and generally the answers are in the affirmative both on product and on development process. The debate is incomplete unless we also factor in the price at which users access the alternative products. This is where the "20% of the features" product becomes compelling. The competition is not between a 100% product and a 20% product at the same price. The competition is between a 100% product at commercial prices versus a 20% product at zero cost. Users would have to really want the remaining 80% of the features to put up the money for commercial software. My suspicion is that the fraction of users who use those remaining 80% of the features is around 10%. Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Thu Feb 18 10:09:37 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] [Fwd: EMBOSS] Message-ID: <36CC2D31.5E9025A2@bc.edu> Peter Rice from EMBOSS finally sent me a message. It seems they're making progress. Jeff bizzaro@bc.edu -------------- next part -------------- An embedded message was scrubbed... From: Peter Rice Subject: EMBOSS Date: Thu, 18 Feb 1999 09:21:13 GMT Size: 2018 Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990218/0e0fc19b/attachment.mht From bizzaro at bc.edu Thu Feb 18 10:49:28 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] Re: EMBOSS References: <199902180921.JAA09093@scarp.sanger.ac.uk> Message-ID: <36CC3688.89D0DA6@bc.edu> Peter Rice wrote: > Richard Durbin tells me you thought I had disappeared :-) Yes, I was beginning to wonder ;-) > We are busy working on an EMBOSS release. We now have 5 more > folk working on applications in Hinxton, with more to join soon. Great. TULIP/Loci is slowly progressing. We have 10 people on the list now, plus we are collaborating with Harry Mangalam of the tacg project: http://hornet.bio.uci.edu/~hjm/projects/tacg/tacg2.main.html Much of our work lately has been innovating a new object management or workflow system, which is the real meat of the project. We have drafted the developer of the Paos project, Carlos Maltzahn: http://www.cs.colorado.edu/~carlosm/software.html We hope TULIP/Loci will be a framework for connecting bioinformatics and structural biology programs of any type to a central GUI. And we still think Loci and EMBOSS can collaborate on this, since our projects are complementary not competing. I invite you to subscribe to our mailing list. Send an e-mail to majordomo@busboy.sped.ukans.edu with "subscribe tulip-list" in the message body. I'd like to know if I can subscribe to your "closed" emboss-dev mailing list...? > > I have been very busy with documentation and support for them. > There is a release 0.0.4 on our FTP server which is a nightly dump > of the current sources. Watch for changes in the file size to catch > new versions as it does not change every day. > > ftp://ftp.sanger.ac.uk/pub/pmr/emboss/EMBOSS_0.0.4.tar.gz > > We will be reviewing the documentation next week before releasing it > to the rest of the world. It may take a few extra days to patch it up. We look forward to it. Have you made any changes to the design since we last communicated, in December? Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Thu Feb 18 15:54:19 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] reply from Peter Message-ID: <36CC7DFB.206B303B@bc.edu> Locians, Attached is the reply from Peter. Peter, tacg may "overlap" EMBOSS, but Loci will not. Loci is only concerned with developing a framework for communication between tools, plus a set of small sequence/structure visualization/manipulation tools. Larger analysis programs will come from elsewhere (such as tacg and EMBOSS). We will not be creating anything new in that respect. Possibly the first thing we would like to implement from the EMBOSS project is Ajax/ACD. We have a "locus" being developed by Justin Bradford called "Gatekeeper", which will act as a gateway between loci and command-line analysis tools. Gatekeeper needs to convert queries/requests from Loci into command-line standard-in (much like Ajax) plus convert standard-out into XML. Jeff bizzaro@bc.edu -------------- next part -------------- An embedded message was scrubbed... From: Peter Rice Subject: Re: EMBOSS Date: Thu, 18 Feb 1999 16:03:37 GMT Size: 2407 Url: http://bioinformatics.org/pipermail/pipet-devel/attachments/19990218/79e9f418/attachment.mht From bizzaro at bc.edu Thu Feb 18 16:09:48 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] correction Message-ID: <36CC819C.439247CA@bc.edu> I guess it is both Justin and Harry that will be developing Gatekeeper. I need to get a new project list out :-) Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Tue Feb 23 19:18:30 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] more nice interfaces Message-ID: <36D34556.673C4A2B@bc.edu> Thomas, Attached are screenshots of some more interfaces you might want to look at. One is of a sequence editor, and the other is of a sequence aligner. Both are for Windows, but they are very much along the line of what I was thinking of: publication-quality WYSIWYG tools. These pics are from a commercial package called Vector NTI Suite, by InforMax. I don't know the exact price, but it is in the $1,000s, since they sent me an e-mail about a $700 discount. Here is the Web site: http://www.informaxinc.com/vntisuite/index.html They have a downloadable demo, if anyone is interested. BTW, how's the sequence editor coming along? Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- -------------- next part -------------- A non-text attachment was scrubbed... Name: seqedit.gif Type: image/gif Size: 30399 bytes Desc: not available Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990224/4028d437/seqedit.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: seqalign.gif Type: image/gif Size: 31233 bytes Desc: not available Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990224/4028d437/seqalign.gif From bizzaro at bc.edu Thu Feb 25 23:27:27 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] stuff Message-ID: <36D622AF.AC5A67EC@bc.edu> Locians, Tidbits: We're getting an Internet line put in at UMass Lowell for the project. And a friend of mine is donating a Linux server to use until Ken Marx or I purchase a new one. We may see it up and running in a couple weeks. Plus I thought you'd like to read an interesting comparison between Python and Perl at the LinuxWorld Web site: http://linuxworld.com/linuxworld/expo/lw-python.html?0225 Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From Thomas.Sicheritz at molbio.uu.se Fri Feb 26 09:20:43 1999 From: Thomas.Sicheritz at molbio.uu.se (Thomas.Sicheritz@molbio.uu.se) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] more nice interfaces In-Reply-To: <36D34556.673C4A2B@bc.edu> References: <36D34556.673C4A2B@bc.edu> Message-ID: <14038.43467.520971.536971@beagle.bmc.uu.se> Hej again, First, I don't remeber if I ahve allready replied to this ... :-) > Attached are screenshots of some more interfaces you might want to look at. One > is of a sequence editor, and the other is of a sequence aligner. Both are for > Windows, but they are very much along the line of what I was thinking of: > publication-quality WYSIWYG tools. If you strip the windows feel and look ... ok. - but I thought we were going to make something more gimpish ... Another commerc. application: A former colleague send me this screendump (lousy quality) -------------- next part -------------- A non-text attachment was scrubbed... Name: xwd.gif Type: image/gif Size: 128305 bytes Desc: not available Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990226/518e40a0/xwd.gif -------------- next part -------------- > BTW, how's the sequence editor coming along? I just returned to work. I have just started to take a look into python and converted my biowish C module to a python extension. if anyone is interested: http://evolution.bmc.uu.se/~thomas/tulip/ Questions: * how can I combine a python module with a python class definition I want to add python code to the c-module ... * how can I implement this tcl code in python ? foreach i "reverse coplement antiparallel" { puts [eval bb_sequence.$i $seq] } * what minimum set do I need for compiling gnome canvas ? I really dont want to compile all possible (sound,game ..) modules on my solarisbox ... c ya -thomas -- Sicheritz Ponten Thomas E. Department of Molecular Biology blippblopp@linux.nu BMC, Uppsala University BMC: +46 18 4714214 BOX 590 S-751 24 UPPSALA Sweden Fax +46 18 557723 http://evolution.bmc.uu.se/~thomas Molecular Tcl: http://evolution.bmc.uu.se/~thomas/tcl Molecular Linux: http://evolution.bmc.uu.se/~thomas/mol_linux De Chelonian Mobile ... The Turtle Moves ... From hinsen at cnrs-orleans.fr Fri Feb 26 09:51:50 1999 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] more nice interfaces In-Reply-To: <14038.43467.520971.536971@beagle.bmc.uu.se> (Thomas.Sicheritz@molbio.uu.se) References: <36D34556.673C4A2B@bc.edu> <14038.43467.520971.536971@beagle.bmc.uu.se> Message-ID: <199902261451.PAA15154@dirac.cnrs-orleans.fr> > Questions: > * how can I combine a python module with a python class definition > I want to add python code to the c-module ... Sorry, I don't understand what you are trying to do. Something with Python and C and modules... Could you give a more detailed description? > * how can I implement this tcl code in python ? > foreach i "reverse coplement antiparallel" { > puts [eval bb_sequence.$i $seq] > } I'd have to know what the Tcl code means! I suppose it's a loop over three strings, which in Python is for i in ["reverse" "coplement" "antiparallel"]: .... But I don't understand the stuff with "puts" etc. > * what minimum set do I need for compiling gnome canvas ? > I really dont want to compile all possible (sound,game ..) modules on my > solarisbox ... You mean Python modules? Certainly no more than what is activated by default in the Python distribution. About Gnome, I don't know... Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From hjm at cx408397-a.irvn1.occa.home.com Fri Feb 26 12:18:12 1999 From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] more nice interfaces In-Reply-To: <14038.43467.520971.536971@beagle.bmc.uu.se> Message-ID: You're right - that is a lousy screen shot :), but brightened up, it becomes readable, and it actually looks pretty nice - what is the application? It would be nice to know what functionality underlies the pretty face. hjm On Fri, 26 Feb 1999 Thomas.Sicheritz@molbio.uu.se wrote: > Hej again, > > First, I don't remeber if I ahve allready replied to this ... :-) > > > Attached are screenshots of some more interfaces you might want to look at. One > > is of a sequence editor, and the other is of a sequence aligner. Both are for > > Windows, but they are very much along the line of what I was thinking of: > > publication-quality WYSIWYG tools. > > If you strip the windows feel and look ... ok. - but I thought we were > going to make something more gimpish ... > > Another commerc. application: > A former colleague send me this screendump (lousy quality) > > Cheers, Harry Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com From david.lapointe at umassmed.edu Fri Feb 26 13:33:30 1999 From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu) Date: Fri Feb 10 19:18:17 2006 Subject: [Pipet Devel] more nice interfaces In-Reply-To: Message-ID: <93307F07DE63D211B2F30000F808E9E525D6CF@edunivexch02.umassmed.edu> Yes I am curious also. It seems to be java applets. What is BioWeb? Next: foreach i "reverse coplement antiparallel" { puts [eval bb_sequence.$i $seq] } I would imagine that bb_sequence.{reverse complement antiparallel} returns $seq as a reversed, complemented, or reverse-complemented string through puts ( write string out ). David -----Original Message----- > From: Harry Mangalam [mailto:hjm@cx408397-a.irvn1.occa.home.com] > Sent: Friday, February 26, 1999 12:18 PM > To: tulip-list@busboy.sped.ukans.edu > Subject: Re: [Pipet Devel] more nice interfaces > > > You're right - that is a lousy screen shot :), but brightened > up, it becomes > readable, and it actually looks pretty nice - what is the > application? It > would be nice to know what functionality underlies the pretty face. > > hjm > > On Fri, 26 Feb 1999 Thomas.Sicheritz@molbio.uu.se wrote: > > > Hej again, > > > > First, I don't remeber if I ahve allready replied to this ... :-) > > > > > Attached are screenshots of some more interfaces you > might want to look at. One > > > is of a sequence editor, and the other is of a sequence > aligner. Both are for > > > Windows, but they are very much along the line of what I > was thinking of: > > > publication-quality WYSIWYG tools. > > > > If you strip the windows feel and look ... ok. - but I > thought we were > > going to make something more gimpish ... > > > > Another commerc. application: > > A former colleague send me this screendump (lousy quality) > > > > > > Cheers, > Harry > > Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com > > From bizzaro at bc.edu Fri Feb 26 15:49:03 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] more nice interfaces References: <36D34556.673C4A2B@bc.edu> <14038.43467.520971.536971@beagle.bmc.uu.se> Message-ID: <36D708BF.44C4AEA8@bc.edu> Thomas.Sicheritz@molbio.uu.se wrote: > If you strip the windows feel and look ... ok. - but I thought we were > going to make something more gimpish ... Gimpish, meaning everything gets its own little window? Yes, unless 2+ things are much better being in the same window. A file list on the side, as I keep seeing, seems to be a convenient feature. In any case, the most important thing (and this is where comparisons to GIMP come in) is that the data appears just as it would be printed in a publication. So, in a sense, what the users are doing is manipulating a picture, image, photo, whatever. > Another commerc. application: > A former colleague send me this screendump (lousy quality) I am also interested in just what that is a picture of. It seems to be a rather comprehensive little package written in Java. > > BTW, how's the sequence editor coming along? > I just returned to work. I have just started to take a look into python and > converted my biowish C module to a python extension. > if anyone is interested: http://evolution.bmc.uu.se/~thomas/tulip/ Grrreat! ;-) > Questions: > * what minimum set do I need for compiling gnome canvas ? > I really dont want to compile all possible (sound,game ..) modules on my > solarisbox ... I think I can answer this one! You need to get just the gnome-libs distribution. For Python bindings, you need just gnome-python, which is at 0.100.0 right now I think. BTW, GTK+ 1.2, and PyGTK 0.5.11 just came out. gnome-python 0.100.0 comes with PyGTK 0.5.11. But following the PyGTK developments closely, I have to warn everyone that there are some major revisions occuring now, so that something made in PyGTK 0.5.6 will probably need major revisions to work with PyGTK 1.0, when it comes out. This should not be a great concern to us since we have almost nothing written. But I am still confident that Python-GNOME/GTK is the best path for us. Along this line, I was reading about Corel's decision to support the WINE project, which lets Windows programs run on UNIX. They consider Windows to be the development/deployment evironment, which is then made "portable and transparent" by WINE. I think our use of UNIX works in the reverse. We can develop for Python/GTK/GNOME/UNIX, for which there are efforts to port to Windows, etc. Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Fri Feb 26 16:17:53 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] licensing Message-ID: <36D70F81.7F3E7103@bc.edu> Locians, I'm sure you know that Loci/TULIP is supposed to be licensed under the GNU General Public License (GPL). But there is also the LGPL or Library GPL. What is the major difference between these two? Why does the LGPL exist? It turns out that the wording of the GPL prevents programs licensed as such from being incorporated into non-free or proprietary programs (GPL says that any project that extends the work covered by GPL must also be GPL). And this would cover links to any library. So, legally, one cannot connect a proprietary program to a GPL program. If you guys have been following the debate over KDE and GNOME, this is at the heart of the issue: KDE is GPL, but Qt (the library) is owned by Troll, which is "illegal". So, what about Loci? If we use GPL, can just anyone link their apps into it, as we intended? No. But this is where the LGPL comes in. Knowing how restrictive it would be licensing libraries under GPL, GNU/FSF made the LGPL. This simply removes the clause in GPL that all programs that link to the library/program be free too. All other aspects of the GPL remain. GTK and GNOME, by the way, are LGPL. But using LGPL doesn't mean your program is a library. GNU/FSF is actually going to change the name of LGPL to "Lesser GPL". Therefore, I think we should license Loci under LGPL. This is an important issue to settle now, even though Loci is vaporware, because the source code will be available as soon as it is written. For example, Thomas's sequence editor is somewhat non-vapor. The good news is, Harry, tacg won't have to be GPL to be "a part of" Loci. We wrote before about tacg's license, how it restricts commercial use/distribution. Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From david.lapointe at umassmed.edu Fri Feb 26 16:23:36 1999 From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL Message-ID: <93307F07DE63D211B2F30000F808E9E525D6D1@edunivexch02.umassmed.edu> I think this is where the Green GIF came from. Pictures at 10:00! http://www.informaxinc.com/ssbm/ssbm.html David Lapointe Manager - Research Computing Services UMass Medical School Worcester, MA 01655 508/856-5141 From bizzaro at bc.edu Fri Feb 26 16:36:53 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] and still more licensing Message-ID: <36D713F5.BC37BEBF@bc.edu> By the way, every source file that we generate must include this copyright statement from GNU. Of course you can use your name for name of author, but please include The BIC Group, which is the rest of us. Example: Copyright (C) 1999 by Konrad Hinsen and The BIC Group -------------------------cut--------------------------------- Copyright (C) by This library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. -------------------------cut--------------------------------- Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From hjm at cx408397-a.irvn1.occa.home.com Fri Feb 26 16:38:54 1999 From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL In-Reply-To: <93307F07DE63D211B2F30000F808E9E525D6D1@edunivexch02.umassmed.edu> Message-ID: Ahh yes, Informax's new Oracle-based infosystem - starting at $2M. It better be good... Still, nice of them to do interface prototyping for us.. ;) hjm On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote: /I think this is where the Green GIF came from. Pictures at 10:00! / /http://www.informaxinc.com/ssbm/ssbm.html / /David Lapointe /Manager - Research Computing Services /UMass Medical School /Worcester, MA 01655 /508/856-5141 / / Cheers, Harry Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com From bizzaro at bc.edu Fri Feb 26 17:06:12 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL References: <93307F07DE63D211B2F30000F808E9E525D6D1@edunivexch02.umassmed.edu> Message-ID: <36D71AD4.99D145A9@bc.edu> Jeeeez! Does the concept seem a little familiar? BTW, this is the same company from which I got the first pics. You know, I have been thinking seriously about taking Loci one step further and making it a system for Internet-wide research collaboratives, between loosely affiliated people. It's something I still have to clear with Ken Marx, but I was thinking that we, The BIC Group, could use Loci to collaborate on some "open" research projects, making an "open laboratory" that treats scientific research like a GNU software project. Any thoughts? Jeff bizzaro@bc.edu david.lapointe@umassmed.edu wrote: > > I think this is where the Green GIF came from. Pictures at 10:00! > > http://www.informaxinc.com/ssbm/ssbm.html > > David Lapointe > Manager - Research Computing Services > UMass Medical School > Worcester, MA 01655 > 508/856-5141 -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Fri Feb 26 17:09:19 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL References: Message-ID: <36D71B8E.68A02D59@bc.edu> Harry Mangalam wrote: > > Ahh yes, Informax's new Oracle-based infosystem - starting at $2M. It > better be good... $2,000,000 or $2,000??? For how many users? I can hardly believe it's 2 million. > Still, nice of them to do interface prototyping for > us.. ;) Yes, hehehe ;-) Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From hjm at cx408397-a.irvn1.occa.home.com Fri Feb 26 17:13:39 1999 From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL In-Reply-To: <36D71B8E.68A02D59@bc.edu> Message-ID: I'm not sure it still stands, but their previous promo listed this as part of a $2M (that's MILLION) system for bioinformatics. You had to buy the oracle db from them as well as pay substantial support costs. cf Incyte's system which uses SGI's Mineset for $1M -$2M/year and it doersn;t sound so bizar .. oops .. strange. hjm On Fri, 26 Feb 1999, J.W. Bizzaro wrote: /Harry Mangalam wrote: /> /> Ahh yes, Informax's new Oracle-based infosystem - starting at $2M. It /> better be good... / /$2,000,000 or $2,000??? For how many users? I can hardly believe it's 2 /million. / /> Still, nice of them to do interface prototyping for /> us.. ;) / /Yes, hehehe ;-) / / /Jeff /-- /J.W. Bizzaro Phone: 617-552-3905 /Boston College mailto:bizzaro@bc.edu /Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ /-- / Cheers, Harry Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com From david.lapointe at umassmed.edu Fri Feb 26 17:14:03 1999 From: david.lapointe at umassmed.edu (david.lapointe@umassmed.edu) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL In-Reply-To: <36D71AD4.99D145A9@bc.edu> Message-ID: <93307F07DE63D211B2F30000F808E9E525D6D4@edunivexch02.umassmed.edu> Yeah, I realized that just after I sent that message. $2M ? Seems like a lot but if you've invested $20 million( or more) in sequencing hardware what's $2M to make it work? Are you talking about Collaboratories? That is an interesting concept. David David Lapointe Manager - Research Computing Services UMass Medical School Worcester, MA 01655 508/856-5141 > -----Original Message----- > From: J.W. Bizzaro [mailto:bizzaro@bc.edu] > Sent: Friday, February 26, 1999 5:06 PM > To: tulip-list@busboy.sped.ukans.edu > Subject: Re: [Pipet Devel] Check out this URL > > > Jeeeez! Does the concept seem a little familiar? BTW, this > is the same company > from which I got the first pics. > > You know, I have been thinking seriously about taking Loci > one step further and making it a system for Internet-wide research collaboratives, > between loosely affiliated people. It's something I still have to clear with > Ken Marx, but I was thinking that we, The BIC Group, could use Loci to > collaborate on some "open" research projects, making an "open laboratory" that > treats scientific research like a GNU software project. Any thoughts? > > > Jeff > bizzaro@bc.edu > From hjm at cx408397-a.irvn1.occa.home.com Fri Feb 26 17:58:08 1999 From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Check out this URL In-Reply-To: <93307F07DE63D211B2F30000F808E9E525D6D4@edunivexch02.umassmed.edu> Message-ID: Hi Again, Well, this wasn;t part of my original interest in this group and may be well-suited for it, but let me describe one of the things I'm working on (partially supported by National Center for Genomic Resources (NCGR, out of Santa Fe, NM) in support of a yeast genomics project at UC Irvine. The UCI group has gotten an Affymetrix Genechip machine and is busy subjecting yeast to various stresses, generating whole-genome datasets for time points along this stress. I'm building a relational database with a web interface that will suck up those datasets (and be amenable to accepting data from other such gene expression studies) and allow it to be queried on various params, as well as subjecting the returned values to various statistical analyses with the stats language 'R' (a clone of S/SPlus), using gnuplots for the simple outputs, VRML for complex viz's. Because the size of the datasets are so large (6k orfs x 4 timepoints, plus associated pointers, descriptors, images, etc) and the number of them is going to be pretty big, I'm using mysql as a prototyping system, with perl glue, talking thru Apache/FASTCGI, replacing the perl with C as I identify bottlenecks. There will be a generic interface to commandline apps (other clustering routines, tacg, clustalw, blast, etc, so that it can become pretty extensible. NCGR may rewrite it at commercial quality to support their plant genomics project, but I get to do the fun part... I hadn't considered it, but you bring up the possibility of using such a system as a collaboratory by making the analyses persistent in some way, either as paths thru an analysis or the analysis itself (altho that would get very large very fast) so that they might be re-used or extended by others interested in the topic. Or maybe just the paths thru an analysis would be an important resource - if I could somehow record the 'analysis track' that users took, I could identify, then automate them so that the whole pathway could be boiled down to a button. This is WELL off the LOCI topic, but perhaps the 2 could be designed to communicate at some level. As I said, it was never the intent for the above-described project to use LOCI, but if they can be made to better co-exist so much the better. Cheers Harry On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote: /Yeah, I realized that just after I sent that message. / /$2M ? Seems like a lot but if you've invested $20 million( or more) /in sequencing hardware what's $2M to make it work? / /Are you talking about Collaboratories? That is an interesting concept. / / /David / /David Lapointe /Manager - Research Computing Services /UMass Medical School /Worcester, MA 01655 /508/856-5141 / / /> -----Original Message----- /> From: J.W. Bizzaro [mailto:bizzaro@bc.edu] /> Sent: Friday, February 26, 1999 5:06 PM /> To: tulip-list@busboy.sped.ukans.edu /> Subject: Re: [Pipet Devel] Check out this URL /> /> /> Jeeeez! Does the concept seem a little familiar? BTW, this /> is the same company /> from which I got the first pics. /> /> You know, I have been thinking seriously about taking Loci /> one step further and making it a system for Internet-wide research /collaboratives, /> between loosely affiliated people. It's something I still have to clear /with /> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to /> collaborate on some "open" research projects, making an "open laboratory" /that /> treats scientific research like a GNU software project. Any thoughts? /> /> /> Jeff /> bizzaro@bc.edu /> / Cheers, Harry Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com From bizzaro at bc.edu Fri Feb 26 19:16:26 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] collaboratories - was Check out this URL References: Message-ID: <36D7395A.96CD0CCE@bc.edu> Hello Harry. It does sound as if the project you described is a great example of how Loci could be used as a collaboratory. Even if the base installation of Loci does not include every tool needed to do this, the license would allow NCGR to extend it and repackage it...providing the original Loci code remains LGPL. I don't know if they'd like to sell it (you said they'd like to rewrite it at commercial quality), but contrary to popular belief, GNU programs can be sold, just like Linux. The key is that you don't make it proprietary; you're just selling packaged media and support. ***We're hitting on an important strategy here for Loci. What is most important is that Loci becomes ubiquitous and highly accepted. By not restricting commercial use or redistribution, we're going a long way toward that goal. Personally, I don't care about getting rich off of anything. But anyone can make money from Loci, by distributing CD-ROM's, manuals, etc. I think even selling server time for those server-side analyses is an option. Yeah, any kind of collaboratory may be implemented once we set Loci up to do that sort of thing. I don't see it being a big step beyond the whole concept of a distributed workflow system. Public or private, open or closed, we can do it. Harry, were you referring to an open collaboratory or a closed one? Can you guys imagine the impact this could have on the field if Loci were to be successful? Jeff bizzaro@bc.edu Harry Mangalam wrote: > > Hi Again, > > Well, this wasn;t part of my original interest in this group and may be > well-suited for it, but let me describe one of the things I'm > working on (partially supported by National Center for Genomic Resources > (NCGR, out of Santa Fe, NM) in support of a yeast genomics project at UC > Irvine. > > The UCI group has gotten an Affymetrix Genechip machine and is busy > subjecting yeast to various stresses, generating whole-genome datasets for > time points along this stress. I'm building a relational database with a > web interface that will suck up those datasets (and be amenable to accepting > data from other such gene expression studies) and allow it to be queried on > various params, as well as subjecting the returned values to various > statistical analyses with the stats language 'R' (a clone of S/SPlus), using > gnuplots for the simple outputs, VRML for complex viz's. > > Because the size of the datasets are so large (6k orfs x 4 timepoints, plus > associated pointers, descriptors, images, etc) and the number of them is > going to be pretty big, I'm using mysql as a prototyping system, with perl > glue, talking thru Apache/FASTCGI, replacing the perl with C as I > identify bottlenecks. There will be a generic interface to commandline apps > (other clustering routines, tacg, clustalw, blast, etc, so that it can > become pretty extensible. NCGR may rewrite it at commercial > quality to support their plant genomics project, but I get to do the fun > part... > > I hadn't considered it, but you bring up the possibility of using such a > system as a collaboratory by making the analyses persistent in some way, > either as paths thru an analysis or the analysis itself (altho that would > get very large very fast) so that they might be re-used or extended by > others interested in the topic. Or maybe just the paths thru an analysis > would be an important resource - if I could somehow record the 'analysis > track' that users took, I could identify, then automate them so that the > whole pathway could be boiled down to a button. > > This is WELL off the LOCI topic, but perhaps the 2 could be designed to > communicate at some level. > As I said, it was never the intent for the above-described project to use > LOCI, but if they can be made to better co-exist so much the better. > > Cheers > Harry > > On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote: > > /Yeah, I realized that just after I sent that message. > / > /$2M ? Seems like a lot but if you've invested $20 million( or more) > /in sequencing hardware what's $2M to make it work? > / > /Are you talking about Collaboratories? That is an interesting concept. > / > / > /David > / > /David Lapointe > /Manager - Research Computing Services > /UMass Medical School > /Worcester, MA 01655 > /508/856-5141 > / > / > /> -----Original Message----- > /> From: J.W. Bizzaro [mailto:bizzaro@bc.edu] > /> Sent: Friday, February 26, 1999 5:06 PM > /> To: tulip-list@busboy.sped.ukans.edu > /> Subject: Re: [Pipet Devel] Check out this URL > /> > /> > /> Jeeeez! Does the concept seem a little familiar? BTW, this > /> is the same company > /> from which I got the first pics. > /> > /> You know, I have been thinking seriously about taking Loci > /> one step further and making it a system for Internet-wide research > /collaboratives, > /> between loosely affiliated people. It's something I still have to clear > /with > /> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to > /> collaborate on some "open" research projects, making an "open laboratory" > /that > /> treats scientific research like a GNU software project. Any thoughts? > /> > /> > /> Jeff > /> bizzaro@bc.edu > /> > / > > Cheers, > Harry > > Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Fri Feb 26 20:28:12 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] language lawyers Message-ID: <36D74A2C.5B2D9432@bc.edu> For you language lawyers out there, more of what I just wrote about GPL vs. LGPL can be found at the following sites: Richard Stallman argues libraries should be GPL: http://www.gnu.org/philosophy/why-not-lgpl.html Eric Kidd rebuts, says use LGPL: http://www.randomhacks.com/~emk/why-lgpl-good.html Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From hjm at cx408397-a.irvn1.occa.home.com Sat Feb 27 10:40:39 1999 From: hjm at cx408397-a.irvn1.occa.home.com (Harry Mangalam) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] collaboratories - was Check out this URL In-Reply-To: <36D7395A.96CD0CCE@bc.edu> Message-ID: On Sat, 27 Feb 1999, J.W. Bizzaro wrote: /It does sound as if the project you described is a great example of how Loci /could be used as a collaboratory. Even if the base installation of Loci does /not include every tool needed to do this, the license would allow NCGR to extend /it and repackage it...providing the original Loci code remains LGPL. I have to make a distinction here - NCGR is paying me to develop code that does a particular job in support of a project they're very interested in. While I have my own agenda in terms of freedom and redistribution of code (to which they're surprisingly open), they have their own agenda. They are a non-profit org, but they were set up as a hot-house agency to spin off for-profits if possible. SO while they plan to make all the services they develop freely available to the public, they want to reserve the right to spin off a com to exploit that code for companies that want to replicate the system behind their firewall. Therefore, there are some additional problems in basing their code on LOCI. I know that they're aware of LOCI because I told them about it (one of my functions is to find out about ideas out there that seem to be worth paying attention to, like gnome and other CORBA services, like BSML, like Bioperl, etc that look like they are worthwhile), but it's up to their executive to decide which to support. So while I support the idea of LOCI, it and will spend time trying to integrate aspects of the genex db with LOCI, it doesn;t mean that NCGR will officially support it. The problem with who owns intellectual property is HUGE in SW (I just resigned from UCI because of it to work on NCGR's project), so don't go looking for large developers to leap onto the freee software bandwagon - there is huge resistancce, especially from their legal depts. Yhe success of Redhat and Gnu/Linux is changing that, but slowly. I'm counting on it b/c I'm starting a company to try to do (sort of) the same thing, with my software - the core software is free, but I'll sell support, customization, and interface components to those who want/need them)... That said, for what NCGR wants to do, it seems to me that the software is almost incidental; what they're really selling is the integration technology and support (not unlike Redhat itself). They COULD give the software, charge only for support and that would in fact make more $ for them, as they would then benefit from other free software developers to contribute to the code base. / /I don't know if they'd like to sell it (you said they'd like to rewrite it at /commercial quality), but contrary to popular belief, GNU programs can be sold, /just like Linux. The key is that you don't make it proprietary; you're just /selling packaged media and support. EXACTLY. You put the words right in my mouth ;). /***We're hitting on an important strategy here for Loci. What is most important /is that Loci becomes ubiquitous and highly accepted. By not restricting /commercial use or redistribution, we're going a long way toward that goal. / /Personally, I don't care about getting rich off of anything. But anyone can /make money from Loci, by distributing CD-ROM's, manuals, etc. I think even /selling server time for those server-side analyses is an option. / /Yeah, any kind of collaboratory may be implemented once we set Loci up to do /that sort of thing. I don't see it being a big step beyond the whole concept of /a distributed workflow system. Public or private, open or closed, we can do /it. Harry, were you referring to an open collaboratory or a closed one? /Can you guys imagine the impact this could have on the field if Loci were to be /successful? Yup - it would have a big impact, but there are lots of similar projects going on in 'coopetition', so it's important to actually produce something. Bio-perl has already started regular dists of their package, and EMPRESS will start soon. It's demo or die. (I'm one to speak - I really haven't done anthing yet except flap my lips (they move when I type), but as soon as I finish the commandline version of tacg V3 (in final packaging for beta release and documentation now), I'll put some time on trying to LOCI-lize it.) /Jeff /bizzaro@bc.edu / / /Harry Mangalam wrote: /> /> Hi Again, /> /> Well, this wasn;t part of my original interest in this group and may be /> well-suited for it, but let me describe one of the things I'm /> working on (partially supported by National Center for Genomic Resources /> (NCGR, out of Santa Fe, NM) in support of a yeast genomics project at UC /> Irvine. /> /> The UCI group has gotten an Affymetrix Genechip machine and is busy /> subjecting yeast to various stresses, generating whole-genome datasets for /> time points along this stress. I'm building a relational database with a /> web interface that will suck up those datasets (and be amenable to accepting /> data from other such gene expression studies) and allow it to be queried on /> various params, as well as subjecting the returned values to various /> statistical analyses with the stats language 'R' (a clone of S/SPlus), using /> gnuplots for the simple outputs, VRML for complex viz's. /> /> Because the size of the datasets are so large (6k orfs x 4 timepoints, plus /> associated pointers, descriptors, images, etc) and the number of them is /> going to be pretty big, I'm using mysql as a prototyping system, with perl /> glue, talking thru Apache/FASTCGI, replacing the perl with C as I /> identify bottlenecks. There will be a generic interface to commandline apps /> (other clustering routines, tacg, clustalw, blast, etc, so that it can /> become pretty extensible. NCGR may rewrite it at commercial /> quality to support their plant genomics project, but I get to do the fun /> part... /> /> I hadn't considered it, but you bring up the possibility of using such a /> system as a collaboratory by making the analyses persistent in some way, /> either as paths thru an analysis or the analysis itself (altho that would /> get very large very fast) so that they might be re-used or extended by /> others interested in the topic. Or maybe just the paths thru an analysis /> would be an important resource - if I could somehow record the 'analysis /> track' that users took, I could identify, then automate them so that the /> whole pathway could be boiled down to a button. /> /> This is WELL off the LOCI topic, but perhaps the 2 could be designed to /> communicate at some level. /> As I said, it was never the intent for the above-described project to use /> LOCI, but if they can be made to better co-exist so much the better. /> /> Cheers /> Harry /> /> On Fri, 26 Feb 1999 david.lapointe@umassmed.edu wrote: /> /> /Yeah, I realized that just after I sent that message. /> / /> /$2M ? Seems like a lot but if you've invested $20 million( or more) /> /in sequencing hardware what's $2M to make it work? /> / /> /Are you talking about Collaboratories? That is an interesting concept. /> / /> / /> /David /> / /> /David Lapointe /> /Manager - Research Computing Services /> /UMass Medical School /> /Worcester, MA 01655 /> /508/856-5141 /> / /> / /> /> -----Original Message----- /> /> From: J.W. Bizzaro [mailto:bizzaro@bc.edu] /> /> Sent: Friday, February 26, 1999 5:06 PM /> /> To: tulip-list@busboy.sped.ukans.edu /> /> Subject: Re: [Pipet Devel] Check out this URL /> /> /> /> /> /> Jeeeez! Does the concept seem a little familiar? BTW, this /> /> is the same company /> /> from which I got the first pics. /> /> /> /> You know, I have been thinking seriously about taking Loci /> /> one step further and making it a system for Internet-wide research /> /collaboratives, /> /> between loosely affiliated people. It's something I still have to clear /> /with /> /> Ken Marx, but I was thinking that we, The BIC Group, could use Loci to /> /> collaborate on some "open" research projects, making an "open laboratory" /> /that /> /> treats scientific research like a GNU software project. Any thoughts? /> /> /> /> /> /> Jeff /> /> bizzaro@bc.edu /> /> /> / /> /> Cheers, /> Harry /> /> Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com / /-- /J.W. Bizzaro Phone: 617-552-3905 /Boston College mailto:bizzaro@bc.edu /Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ /-- / Cheers, Harry Harry J Mangalam -- (949) 856 2899 -- mangalam@home.com From bizzaro at bc.edu Sat Feb 27 17:47:06 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] collaboratories - was Check out this URL References: Message-ID: <36D875EA.30CBB70B@bc.edu> Harry Mangalam wrote: > > So while I support the idea of LOCI, it and will spend time trying to > integrate aspects of the genex db with LOCI, it doesn;t mean that NCGR will > officially support it. The problem with who owns intellectual property is > HUGE in SW (I just resigned from UCI because of it to work on NCGR's > project), so don't go looking for large developers to leap onto the freee > software bandwagon - there is huge resistancce, especially from their legal > depts. Yhe success of Redhat and Gnu/Linux is changing that, but slowly. > I'm counting on it b/c I'm starting a company to try to do (sort of) the > same thing, with my software - the core software is free, but I'll sell > support, customization, and interface components to those who want/need > them)... Oh yeah, intellectual property is a very big deal everywhere, with companies and schools getting all of the rights and employees and students getting none...or so it seems. One thing I have to take care of regarding Loci is getting a disclaimer from UMass Lowell. The University is much better about these things than some of the really big schools, like UCI or even MIT. FSF actually says something about this... You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. signature of Ty Coon, 1 April 1990 Ty Coon, President of Vice Whether or not it is that simple, I'll have to see. I guess I'll be visiting the Chancellor soon. Something that is unclear to me, however, and maybe you guys can give me your opinion, is if a copyrighter can change the GNU license. In other words, just because UMass Lowell may be one of the copyrighters on Loci, does that mean they can decide to make it proprietary? The GNU license appears to be immutable, and if so, should it matter if the institution shares the copyright? Have you thought about this with respect to NCGR, Harry? > Yup - it would have a big impact, but there are lots of similar projects > going on in 'coopetition', so it's important to actually produce something. > Bio-perl has already started regular dists of their package, and EMPRESS > will start soon. I don't think I am aware of EMPRESS. Do you have a URL? Unless you mean EMBOSS? > It's demo or die. (I'm one to speak - I really haven't done anthing yet > except flap my lips (they move when I type), but as soon as I finish the > commandline version of tacg V3 (in final packaging for beta release and > documentation now), I'll put some time on trying to LOCI-lize it.) Yeah, me too. I'd like to start pumping some code out, but where do I begin? One big issue now is that most of the GUI tools will share a common core. We should be sure not to reinvent that for each tool. So, in a sense, Thomas will be breaking ground for most others. Also, I want to get a GTK hacker to work on the bechtop (GCL or "Work Flow Diagram"). Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From justin at ukans.edu Sat Feb 27 18:24:27 1999 From: justin at ukans.edu (Justin Bradford) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Loci markup language and infrastructure things Message-ID: I've been busy with school lately (in fact, I really should be studying right now for an exam Monday), so I haven't gotten much of anything done. However, I've been reading over BioML, BSML, and the bioperl site, and I have some ideas about the markup language. First, reading BSML files makes a lot of things seem overly complex. Second, BioML looks cleaner, but I hate the organism tag enclosing everything. While that information could be useful for a structure or sequence, it would be better to reference it, rather than enclosing it. Also, BSML doesn't seem to cover protein sequences, while BioML does. However, BSML does seem to allow for more thorough definition of features in the sequence. Aesthetically, I prefer BioML over BSML, and I think that's just because BioML uses different tag names for various features of the sequences, while BSML just has a general feature tag with lots of options. Also, BSML, and even BioML to a degree, try to define display information as well. Do we want that in our ML? I can't see why we would need it, since we have an intelligent client. BSML seems to be intended for direct display in a generic BSML browser, in addition to defining data. BSML has a second DTD with that layout stuff removed, however. BioML has tags for forms, which seem totally unnecessary. I would like to effectively merge BioML and BSML, incorporating protein sequence information and feature specification, and use more descriptive tag names (like BioML) for defining the sequences and features. I wouldn't put any layout information in. Does anyone think we need it? Also, for structure, there don't appear to be any MLs even attempting to do this, with the exception of CML. So, my idea is to take the PDB file format and XMLize it. If any of you know any glaring holes in PDB let me know, and we can work around those. Also, these sections will need some tags to allow for defining relationships between multiple objects. It might describe homology, alignment, etc. between two or more sequences, or for structures, it might relate 3D similarities, regions of high interaction (binding probabilities through free energy calculations), and other similar concepts. Generated data should also return information about the analysis process, like the algorithm used, statistical probabilities, etc. Now that is just the "data" section. A LociML file will have a variety of additional information as well. We'll probably need control, status, and query sections, too. Control has to describe the analysis pathway. Status is information concerning the data returned at each analysis step. Query has to hold the actual query at each step. Now, the control section is fairly straightforward, as is the status section, although both will need to be fairly flexible. Incidental information concerning an analysis that might be useful to the client. I don't really have any good examples, but I imagine some will come up. The query section is more complex, but here's my idea: When the user creates the analysis pathway, all of query commands are generated at that time as well, but it can make use of variables referencing data from queries in earlier stages. The workflow system will fill in the variables for a query before sending it off for that analysis. Here is a crude example: Analyzing sequence... lots of other stuff here .... data.step[q1].protein.* ... There obviously needs to be a lot of detail filled in here, but I think this gets my basic idea across. Also, there's no particular reason there couldn't be multiple entries for a stage. That's why I defined every component of a query by an id, rather than by it's stage. Since the first few steps of an analysis pathway might not depend on previous data, we could have multiple steps occuring simultaneously. There's no reason for all of the steps to be sequential. This would be especially true of a pathway which had a number of database queries. Actually, we could probably get rid of the whole ordering thing completely, since the wfs could just figure out dependencies by the variable references in the queries. Of course, the interface for this could be more complicated... Also, it probably makes more sense to move all of input data into the data section, and have the query reference it there. Also, the format of specifying variables and input in general will probably need to be improved. In terms of implementation, I imagine it would work like this: The wfs identifies queries it can currently run, and creates a Paos object on the specified server, giving it only the portions of the xml file necessary for it to run (query and relevant data sections). The input data goes into one attribute of the Paos object. The remote analysis system creates a second attribute containing for status tags, and when it's complete, it creates an output section with it's new data. The wfs can frequently grab the status attribute on the object, since it's small, and update it's local copy for any clients who want to know what is going on. When the analysis is complete, the wfs grabs the output attribute off of the remote object and updates it's copy, and moves on. The remote analysis system just drops it's object once it has been acknowledged by the wfs. Any thoughts on the markup language, the query syntax, variable references, asynchronous analyses, or the workflow system (wfs, if you were wondering what I was refering to)? I'll start my BioML, BSML, PDB merger/implementation/cleanup. Once we agree on how Loci works underneath, a rough wfs/paos/gatekeeper system can be set up fairly quickly. Then just a quick python wrapper around some analysis tool and a simple viewer program will give us a functioning system (not a particularly easy to use system, but functioning nonetheless). Justin Bradford justin@ukans.edu From bizzaro at bc.edu Sat Feb 27 23:11:04 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:18 2006 Subject: [Pipet Devel] Loci markup language References: Message-ID: <36D8C1D8.DF9B7F3@bc.edu> I'll reply to a few LocusML (or LociML?) points here and then infrastructure points in another e-mail. Justin Bradford wrote: > > I've been busy with school lately (in fact, I really should be studying > right now for an exam Monday), so I haven't gotten much of anything done. That's fine. We all appreciate the work you've done, especially this e-mail/book ;-) > First, reading BSML files makes a lot of things seem overly complex. > Second, BioML looks cleaner, but I hate the organism tag enclosing > everything. While that information could be useful for a structure or > sequence, it would be better to reference it, rather than enclosing it. I think the importance of an organism tag depends on the audience. Most biochemists couldn't care less about the organism. But to microbiologists, geneticists and the like, this information is very important. What matters to you, it seems, is that the organism information _has_to_be_ present. But I think as long as it _can_ be inserted at some level, we'll do fine. > Also, BSML doesn't seem to cover protein sequences, while BioML does. > However, BSML does seem to allow for more thorough definition of features > in the sequence. Of course we'll take the best of both worlds :-) > Also, BSML, and even BioML to a degree, try to define display information > as well. Do we want that in our ML? I can't see why we would need it, > since we have an intelligent client. No, we don't need display information. You're absolutely correct that each locus should be intelligent enough to know how to interpret the data that are targeted for it (and what locus to pass other data types to if they are encountered). > I would like to effectively merge BioML and BSML, incorporating protein > sequence information and feature specification, and use more descriptive > tag names (like BioML) for defining the sequences and features. I wouldn't > put any layout information in. Does anyone think we need it? By layout, you mean display information? I don't think we need it. > Also, for structure, there don't appear to be any MLs even attempting to > do this, with the exception of CML. So, my idea is to take the PDB file > format and XMLize it. If any of you know any glaring holes in PDB let > me know, and we can work around those. Now Konrad's ears should have perked up here. He'll have the final word on a format for structural information, but I recall he does not like any of the well-accepted formats for structure, especially not PDB. This is Konrad's chance to show the world what the perfect description of structure looks like ;-) What I do want, with respect to PDB's however, is an easy way to translate from PDB to LocusML, because PDB is the major format for 3D structure right now. So, Konrad, can you help us make LocusML the perfect structural (among other things) ML? Is there a way we can change CML to describe biomacromolecules the way you want it to? > Also, these sections will need some tags to allow for defining > relationships between multiple objects. It might describe homology, > alignment, etc. between two or more sequences, or for structures, it > might relate 3D similarities, regions of high interaction (binding > probabilities through free energy calculations), and other similar > concepts. Yes. That's something I haven't thought much about. > Generated data should also return information about the analysis process, > like the algorithm used, statistical probabilities, etc. Yes, great! Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Sun Feb 28 03:44:27 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] infrastructure things References: Message-ID: <36D901EB.BAAA97B0@bc.edu> Justin et al, I'll get to what you wrote about infrastructure things in the next e-mail, but first I'd like to make a few points. You wrote an e-mail a couple months ago about how you think the workflow system would function, from the point of view of an XML file created by the Benchtop, monitored by the Benchtop/CGL, traveling to the Gatekeeper, and back. But I want to bring up some questions about the true mobility of the XML file. Just how confusing would everything get if each locus got posession of either (1) the one-and-only XML file or (2) just a copy. Problem with case (1): What if the XML needs to be split for forked analyses? E.g., the user has a sequence, gets an aligned sequence from a database, and now wants to do something else with the new sequence. What happens to the XML file? Do we make a copy of the entire file (case 2!) to be used with the new sequence, or do we cut the XML file in half...so to speak? Problem with case (2): Will the information ever have to be sewn back together? E.g., there is a fork in an analysis, as described for case (1). Will we ever have to consider the whole analysis a single XML file, bringing all pieces back together? Or do we consider each fork/child to be a new analysis, never to rejoined with its parent? Another confusing point is the idea that the XML file actually moves. I referred to it once as a basketball that is passed between players, but everyone should be comfortable with the fact that each file will remain where it was created...AND THIS IS TRUE EVEN FOR SERVER-SIDE ANLYSES! The way I see it, we have a Python program on the client machine that handles all of the interactions with the Gatekeeper. So, EACH LOCUS WON'T HAVE TO DEAL DIRECTLY WITH THE GATEKEEPER! They deal with "Porta Internet", which makes everything transparent or seem like it is all on the client machine. (The same is true for Porta CORBA.) Maybe instead of basketball players tossing a basketball around, the baskbetball tosses the players around :-) You wrote about how Benchtop/GCL "updates a local copy" of the XML. I personally think each locus should update the XML it is working with (the "Locus-In-Charge" or LIC), by itself, so as not to overwhelm Benchtop. (Realize that there should be no limit on the number of loci/processes spawned for forked analyses, so Benchtop would have to handle in some cases a lot of communication...maybe hundreds of XML files...in a word, it would be a "bottleneck".) In the case of server-side analyses, going thru Porta Internet and Gatekeeper, Gatekeeper should not use Benchtop to update the XML and take the next step, rather I think it should be Porta Internet, the LIC. Now what about those spawned loci/processes? If Benchtop were the only LIC, all spawned processes would be the first generation children of Benchtop. But if each locus were capable of spawning its own child, and that child capable of spawning its own, the workload would just be much more distributed--each locus would be an LIC. One thing leads to another, if you recall that song by The Fixx. At this point, we need to answer the questions I proposed above. I think if the analysis needs to fork, the LIC should copy the XML, put relevant instructions in each copy, spawn two loci for the task, handing the copies over. (And maybe at this point the parent can be closed.) But the copies won't be automatically sewn back togther at the end (we could have an option to combine XML's, as an afterthought here). But, in the way I think things should work, would those little drawings on the Benchtop give the user an indication of what is going on, or what the progress is? You thought that this is how the Benchtop would operate, which is a very good idea. And we do need _some_ sort of communication for this. So if we let the LIC's handle everything, can each LIC just send a simple "hello" back to the Benchtop? If a new child is spawned, maybe the first thing that child does is tell the Benchtop what it is and where it came from...Maybe this function could also be used to build a database of loci available to the users...? In short, we are thinking of a highly distributed set of intelligent agents existing all over. Benchtop should be the user's eyes to the whole world of Loci, not the brain of Loci. Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Sun Feb 28 04:07:46 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] infrastructure things References: <36D901EB.BAAA97B0@bc.edu> Message-ID: <36D90762.7DD44C37@bc.edu> You know, what I just wrote says nothing about how Paos and the workflow system fit into the whole scheme. And, in fact, some of my points may be point-less in light of the wfs. For example, Justin describes the wfs as being reponsible for launching loci; I said each locus launches it's own child. We'll just have to look more closely at Paos and wfs as they develop and see how these issues should be resolved. In any case, Benchtop is a no-brainer ;-) Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Sun Feb 28 05:13:19 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things References: Message-ID: <36D916BF.7852DB4B@bc.edu> Okay, so what you once considered the responsibility of the Benchtop/GCL, you now consider that of the wfs. So, I'll try to look at the XML as an object rather than a file this time. And wfs launches the apps, not individual loci/clients. Justin Bradford wrote: > Generated data should also return information about the analysis process, > like the algorithm used, statistical probabilities, etc. Of course we should make a sharp division at the start between data that is biological and data that is for the workflow system. I even imagine the very top of the file/object to be all workflow stuff. > Now that is just the "data" section. A LociML file will have a variety of > additional information as well. We'll probably need control, status, and > query sections, too. > Control has to describe the analysis pathway. ...description of the whole pathway > Status is information concerning the data returned at each analysis step. ...what was collected along the way > Query has to hold the actual query at each step. ...what still needs to be collected > Now, the control section is fairly straightforward, as is the status > section, although both will need to be fairly flexible. Incidental > information concerning an analysis that might be useful to the client. I > don't really have any good examples, but I imagine some will come up. Status should contain the "log" of the analyses. Status will say what control says, among other things, when the final destination is reached. So, at the final destination, control is irrelevant. > The query section is more complex, but here's my idea: > When the user creates the analysis pathway, all of query commands are > generated at that time as well, but it can make use of variables > referencing data from queries in earlier stages. The workflow system will > fill in the variables for a query before sending it off for that analysis. Sure. IOW, the query section is dynamic. > Here is a crude example: > > > > > > > > Analyzing sequence... > > So status is reported back, via wfs, to the Benchtop, a la my previous e-mail. Good. > > > lots of other stuff here > > > > > > > > .... > > > > > > > > > > data.step[q1].protein.* > > ... > > > > > > > > Nice. But how will Paos handle this? Are we looking at some major changes to Paos itself? > Also, there's no particular reason there couldn't be multiple entries for > a stage. stage == step? Or I guess a step can contain different stages... > That's why I defined every component of a query by an id, rather > than by it's stage. Since the first few steps of an analysis pathway > might not depend on previous data, we could have multiple steps occuring > simultaneously. There's no reason for all of the steps to be sequential. Right. That'd save time, but be difficult to manage. Now we're talking about concurrency. > This would be especially true of a pathway which had a number of database > queries. Actually, we could probably get rid of the whole ordering thing > completely, since the wfs could just figure out dependencies by the > variable references in the queries. Of course, the interface for this > could be more complicated... Hmmm. Now are we dealing with the whole forking/sewing issue here? Once an XML object is split up, will it have to be put back together again? > Also, it probably makes more sense to move all of input data into the > data section, and have the query reference it there. Also, the format > of specifying variables and input in general will probably need to be > improved. I was thinking about keeping workflow data together. Also, ID numbers could be longer and randomly generated. > In terms of implementation, I imagine it would work like this: > The wfs identifies queries it can currently run How? By the database of available loci/clients? > and creates a > Paos object on the specified server ...via Porta Internet or whatever, as long as it appears transparent. > giving it only the portions > of the xml file necessary for it to run (query and relevant data > sections). Yeah, this is where I see Porta Internet or Gatekeeper filtering out stuff the server-side algorithms/databases don't need. > The input data goes into one attribute of the Paos object. > The remote analysis system creates a second attribute containing for > status tags, and when it's complete, it creates an output section > with it's new data. Okay. > The wfs can frequently grab the status attribute > on the object, since it's small, and update it's local copy for any > clients who want to know what is going on. Yes. Wonderful! > When the analysis is > complete, the wfs grabs the output attribute off of the remote > object and updates it's copy, and moves on. The remote analysis > system just drops it's object once it has been acknowledged by the > wfs. Okay. > Any thoughts on the markup language, the query syntax, variable > references, asynchronous analyses, or the workflow system (wfs, if > you were wondering what I was refering to)? Just work with Konrad on the markup of structure. > I'll start my BioML, > BSML, PDB merger/implementation/cleanup. Once we agree on how Loci > works underneath, a rough wfs/paos/gatekeeper system can be set up > fairly quickly. Then just a quick python wrapper around some analysis > tool and a simple viewer program will give us a functioning system > (not a particularly easy to use system, but functioning > nonetheless). I'm glad you think this will go quickly. Are you able to work with Paos as it is, or will Carlos need to make changes? How comfortable are you with the Python? Buh-bye! Jeff bizzaro@bc.edu From justin at ukans.edu Sun Feb 28 05:52:17 1999 From: justin at ukans.edu (Justin Bradford) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things In-Reply-To: <36D916BF.7852DB4B@bc.edu> Message-ID: > Okay, so what you once considered the responsibility of the Benchtop/GCL, you > now consider that of the wfs. It's a separate process, but like an extension of the Benchtop/GCL, it just handles all of the little details behind the scenes. > So, I'll try to look at the XML as an object rather than a file this > time. And > wfs launches the apps, not individual loci/clients. I think I need a clarification on the meaning of a locus. My understanding was a locus is term covering an instance of Porta/Gatekeeper/analysis tool(s) on a computer somewhere. It's just a place where analysis is done, and that's it. The wfs system worries about direction of the whole object. > Of course we should make a sharp division at the start between data that is > biological and data that is for the workflow system. I even imagine the very > top of the file/object to be all workflow stuff. I agree. I had intended to make a generic C/BS/BioML2 format first. Then this would be what's under the data sections, so LociML would just encapsulate that portion of it. As for the algorithm and statistics stuff, I was thinking of that as something potentially useful to keep in with sequence/structure/relation data. For instance, it could be useful to know a structure was derived using some particular X-ray crystallography technique. That stuff is related to Loci. > > Control has to describe the analysis pathway. > ...description of the whole pathway Yeah. Just a XML version of the GCL view. > > Status is information concerning the data returned at each analysis step. > ...what was collected along the way More specifically, how the collection went. Actual data would get stuck back in a block under . > Nice. But how will Paos handle this? Are we looking at some major changes to > Paos itself? I don't think so. My intention was to have the wfs only send what that specific analysis needed. Input, output, and status each have an attribute on the object. The wfs sends input once, reads output once (and merges the new data with the full object), and gets constant updates on the status attribute. So whenever the analysis tool changes status, the wfs knows, and the benchtop can be updates (assuming any are paying attention at the moment). > > Also, there's no particular reason there couldn't be multiple entries for > > a stage. > > stage == step? Or I guess a step can contain different stages... The stage, step, and order terminology I used in the example XML are all bad and need to be changed, but the idea was just that multiple things could be happening at once. > Right. That'd save time, but be difficult to manage. Now we're talking about > concurrency. > > Hmmm. Now are we dealing with the whole forking/sewing issue here? > Once an XML > object is split up, will it have to be put back together again? Concerning the dependency scheduling, it wouldn't be difficult to manage this from a central server, as I was envisioning the wfs. If an object roamed independently, it would be difficult to manage, unless we had it all of the threads regroup when data needed to be rejoined. > I was thinking about keeping workflow data together. > > Also, ID numbers could be longer and randomly generated. Yes, it needs to be restructured. Many of the ID numbers would be assigned by the GCL to XML query translator. > > The wfs identifies queries it can currently run > > How? By the database of available loci/clients? However GCL defines it. I imagine explicitly naming a server as one option, or just specifying a type of analysis, where the wfs will use a list of some kind to find one available. But before it contacts the server, it has to make sure it has all of the data available for its query (check dependencies). > > giving it only the portions > > of the xml file necessary for it to run (query and relevant data > > sections). > > Yeah, this is where I see Porta Internet or Gatekeeper filtering out > stuff the > server-side algorithms/databases don't need. I had imagined the wfs server doing that, but I imagine are difference is in semantics. Basically, the analysis tool just gets what it needs. > Just work with Konrad on the markup of structure. Ok Konrad, I'm interested in hearing your ideas on describing structures. > I'm glad you think this will go quickly. Are you able to work with > Paos as it > is, or will Carlos need to make changes? At the very least, I can pass blocks of XML through attributes on the paos object. It would be interesting to see if the Paos object could be a mirror of the XML, however. So: Ok Becomes: paos_object.status.message = 'Ok' But I can work without that. > How comfortable are you with the > Python? I miss enclosed blocks, but otherwise I'm doing ok. { whitespace usage should be random . you can just parse around it. } What odd things amuse me at 5AM. Justin Bradford justin@ukans.edu From bizzaro at bc.edu Sun Feb 28 06:44:03 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things References: Message-ID: <36D92C03.1AF788EF@bc.edu> We're both late night or early morning people, huh? :-) Justin Bradford wrote: > I think I need a clarification on the meaning of a locus. My understanding > was a locus is term covering an instance of Porta/Gatekeeper/analysis > tool(s) on a computer somewhere. It's just a place where analysis is done, > and that's it. The wfs system worries about direction of the whole object. "Locus" just means any program or object, and I mean _any_. The name "Loci" then emphasizes that this is a distributed system to the extreme. But usually I mean a client or server process. > I agree. I had intended to make a generic C/BS/BioML2 format first. Then > this would be what's under the data sections, so LociML would just > encapsulate that portion of it. Fine with me. > As for the algorithm and statistics stuff, I was thinking of that as > something potentially useful to keep in with sequence/structure/relation > data. For instance, it could be useful to know a structure was derived > using some particular X-ray crystallography technique. That stuff is > related to Loci. Hmmm. It almost lies between biological and workflow data. I suppose it could go either place, but the workflow stuff is just temporary really. When the data is to be archived, we don't need to keep old status and query data around. > > > Status is information concerning the data returned at each analysis step. > > ...what was collected along the way > > More specifically, how the collection went. Actual data would get stuck > back in a block under . Okay. > > Nice. But how will Paos handle this? Are we looking at some major changes to > > Paos itself? > > I don't think so. My intention was to have the wfs only send what that > specific analysis needed. Input, output, and status each have an attribute > on the object. The wfs sends input once, reads output once (and merges the > new data with the full object), and gets constant updates on the status > attribute. So whenever the analysis tool changes status, the wfs knows, > and the benchtop can be updates (assuming any are paying attention at the > moment). This brings up a question I wrote at the end of this e-mail. > > Right. That'd save time, but be difficult to manage. Now we're talking about > > concurrency. > > > > Hmmm. Now are we dealing with the whole forking/sewing issue here? > > Once an XML > > object is split up, will it have to be put back together again? > > Concerning the dependency scheduling, it wouldn't be difficult to manage > this from a central server, as I was envisioning the wfs. If an object > roamed independently, it would be difficult to manage, unless we had it > all of the threads regroup when data needed to be rejoined. Of course we can deal with this after we are comfortable with the basic wfs. > Yes, it needs to be restructured. Many of the ID numbers would be assigned > by the GCL to XML query translator. Okay. > > > The wfs identifies queries it can currently run > > > > How? By the database of available loci/clients? > > However GCL defines it. I imagine explicitly naming a server as one > option, or just specifying a type of analysis, where the wfs will use a > list of some kind to find one available. > But before it contacts the server, it has to make sure it has all of the > data available for its query (check dependencies). Yes. We define dependencies as data, servers, and clients (loci). > > > giving it only the portions > > > of the xml file necessary for it to run (query and relevant data > > > sections). > > > > Yeah, this is where I see Porta Internet or Gatekeeper filtering out > > stuff the > > server-side algorithms/databases don't need. > > I had imagined the wfs server doing that, but I imagine are difference is > in semantics. Basically, the analysis tool just gets what it needs. Right, we agree on the end but not the mean...We'll sort that out. > At the very least, I can pass blocks of XML through attributes on the paos > object. It would be interesting to see if the Paos object could be a > mirror of the XML, however. > So: > > Ok > > Becomes: > paos_object.status.message = 'Ok' > > But I can work without that. That brings up a big question I had, and where I've been getting confused... Is there really any such thing as an "XML object"? I mean, XML is a way to save structured data as a _file_. Python objects, on the other hand, are data structures in memory. We would just be going back and forth between file and object using XML. So, where do we really need XML? Could the data just be a Python object? If we need to save the object, I think it can just be "pickled"? Konrad? Guten Morgen! -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From justin at ukans.edu Sun Feb 28 07:09:18 1999 From: justin at ukans.edu (Justin Bradford) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things In-Reply-To: <36D92C03.1AF788EF@bc.edu> Message-ID: > We're both late night or early morning people, huh? :-) Yes, I think so. The damned Internet never lets me sleep anymore ;) > > Concerning the dependency scheduling, it wouldn't be difficult to manage > > this from a central server, as I was envisioning the wfs. If an object > > roamed independently, it would be difficult to manage, unless we had it > > all of the threads regroup when data needed to be rejoined. > > Of course we can deal with this after we are comfortable with the basic wfs. Well, I'm not sure this has been entirely answered. Will the wfs handle all the analyses from a single centralized process? Or do you still want for a decentralized analysis object (pathway), where each node sends the object to the next node, rather than going back to the wfs each time? > Yes. We define dependencies as data, servers, and clients (loci). What is the distinction between a client and server? Is the wfs a client and the analysis tool (gatekeeper) a server? > Is there really any such thing as an "XML object"? I mean, XML is a > way to save structured data as a _file_. Python objects, on the other > hand, are data structures in memory. We would just be going back and > forth between file and object using XML. Yes. I just tend to think of the Paos object structured like the XML file. I guess I was basically asking for a DOM interface. > So, where do we really need XML? Could the data just be a Python > object? If we need to save the object, I think it can just be "pickled"? Well for internal network stuff, it makes sense to just use the Python object. Like I said, I imagine it structured something like the XML file I described earlier. Also, for saving a Loci analysis locally, I would prefer to see it written out to an XML format. The conversion would be fairly simple, anyway. Rather than an obscure, semi-binary format, why not use an easy to read text format? It'll make it easier for non-Loci tools to get information from our files, too. But you're right, there's no reason to use XML for anything but files and maybe drag and drop (but that's not important for now). I wasn't thinking about that earlier. Can Paos support complex objects, though? Actually, can Python for that matter? Can I have things status.analysis[5].message in both? Justin Bradford justin@ukans.edu From bizzaro at bc.edu Sun Feb 28 07:56:32 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things References: Message-ID: <36D93D00.7123859B@bc.edu> Justin Bradford wrote: > Well, I'm not sure this has been entirely answered. Will the wfs handle > all the analyses from a single centralized process? Or do you still want > for a decentralized analysis object (pathway), where each node sends the > object to the next node, rather than going back to the wfs each time? My intuition and experience tell me a decentralized pathway will be less complex, work more efficiently, and be considerably faster. > What is the distinction between a client and server? > Is the wfs a client and the analysis tool (gatekeeper) a server? (sigh) I'm now trying to use your terminology, I think... client - process performing analysis or visualization server - process controlling workflow and clients What I was referring to in past e-mails is this... client - local machine server - remote machine You can see, mixing these up can be confusing ;-) > > Is there really any such thing as an "XML object"? I mean, XML is a > > way to save structured data as a _file_. Python objects, on the other > > hand, are data structures in memory. We would just be going back and > > forth between file and object using XML. > > Yes. I just tend to think of the Paos object structured like the XML file. > I guess I was basically asking for a DOM interface. _Structured_, but we are parsing then writing. > > So, where do we really need XML? Could the data just be a Python > > object? If we need to save the object, I think it can just be "pickled"? > > Well for internal network stuff, it makes sense to just use the Python > object. Like I said, I imagine it structured something like the XML file I > described earlier. Also, for saving a Loci analysis locally, I would > prefer to see it written out to an XML format. The conversion would be > fairly simple, anyway. Rather than an obscure, semi-binary format, why not > use an easy to read text format? It'll make it easier for non-Loci tools > to get information from our files, too. > > But you're right, there's no reason to use XML for anything but files and > maybe drag and drop (but that's not important for now). I wasn't thinking > about that earlier. Thinking about this a bit more, we do need to work from a file on disk because our data can be so large. If a user has 15-20 GUI loci opened at once, and they are all from DNA Polymerase PDB's and 100 kb GenBank files, and all of this is in RAM, we'll hear Scotty in the background saying, "She can't take any more of this Captain. She's falling apart at the seams!" But let's reverse the question. If we need XML files for (1) working with large data, (2) passing data across the Internet and to CORBA systems, and (3) archiving data, then what do we need Paos for? I know it was my idea to choose Paos, but I'm asking if everyone thinks it fits, and where it fits, considering the model I've been describing. > Can Paos support complex objects, though? Actually, can Python for that > matter? Can I have things status.analysis[5].message in both? I'm not sure about that structure, but from what I know of Python, it'll handle anything any modern language can. Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From bizzaro at bc.edu Sun Feb 28 08:11:46 1999 From: bizzaro at bc.edu (J.W. Bizzaro) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things References: <36D93D00.7123859B@bc.edu> Message-ID: <36D94092.E3E60859@bc.edu> Replying to my own e-mail... "J.W. Bizzaro" wrote: > But let's reverse the question. If we need XML files for (1) working with large > data, (2) passing data across the Internet and to CORBA systems, and (3) > archiving data, then what do we need Paos for? > I know it was my idea to choose Paos, but I'm asking if everyone thinks it fits, > and where it fits, considering the model I've been describing. How about having Paos handle the workflow data while XML handles the biological, instead of defining an XML that mixes the two? I guess it depends on just how Paos passes the objects. It would be best if Paos could be called just once, each time the workflow data needed to be passed, passing it not only to the next process but to the Benchtop too...hmmm. I wouldn't want Loci to be anymore centralized than that. Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro@bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ -- From carlosm at moet.cs.colorado.edu Sun Feb 28 19:54:22 1999 From: carlosm at moet.cs.colorado.edu (Carlos Maltzahn) Date: Fri Feb 10 19:18:19 2006 Subject: [Pipet Devel] and still more infrastructure things In-Reply-To: <36D94092.E3E60859@bc.edu> Message-ID: [J.W. Bizzaro] But let's reverse the question. If we need XML files for (1) working with large data, (2) passing data across the Internet and to CORBA systems, and (3) archiving data, then what do we need Paos for? I know it was my idea to choose Paos, but I'm asking if everyone thinks it fits, and where it fits, considering the model I've been describing. [Later] How about having Paos handle the workflow data while XML handles the biological, instead of defining an XML that mixes the two? I guess it depends on just how Paos passes the objects. It would be best if Paos could be called just once, each time the workflow data needed to be passed, passing it not only to the next process but to the Benchtop too...hmmm. I wouldn't want Loci to be anymore centralized than that. I totally agree that Paos shouldn't shuffle around real data. I see the role of Paos as a coordination tool but not as a database management system. I attached a GIF picture to this mail. This picture contains Gnome clients, Paos server, and Tool Manager (excuse me if I introduce yet another set of terms). Gnome clients and Tool Manager are Paos clients. A Gnome client consists of a GCL editor and progress monitor, among other things. A Tool Manager - parses XML data and forwards it to the actual tool, - turn the result of a tool into XML data and send it to another tool manager - sends status information to a Paos server (e.g. processing started or completed, or processing ran out of memory), - receives notifications from a Paos server (e.g. "suspend", "abort", or status query), - queries a Paos server about where to send results to, The thin lines are communicating Python objects, the thick lines communicate XML structures. Note that the destination of Tool Manager can also be a Gnome client which is used to visualize results. Another question in the discussion was whether to use Python objects for communication or XML. XML is safer because it is an accepted and extensible standard. However, transfering serialized objects was the performance bottleneck in the Chautauqua workflow system (which uses Paos) and I introduced a bit of trickery to reduce this overhead. So I would recommend sticking with Python objects for Paos communications but use XML for everything else. Carlos -------------- next part -------------- A non-text attachment was scrubbed... Name: tulip-architecture.gif Type: image/gif Size: 3548 bytes Desc: architecture Url : http://bioinformatics.org/pipermail/pipet-devel/attachments/19990228/060d6d63/tulip-architecture.gif