From M.Liebman at wriwindber.org Wed Aug 1 10:31:43 2007 From: M.Liebman at wriwindber.org (Michael Liebman) Date: Wed, 1 Aug 2007 10:31:43 -0400 Subject: [BiO BB] Re: Bioinformatic and the Smith-waterman In-Reply-To: <46AFC6FF.9020206@bioinformatics.org> Message-ID: <7690673B79E70D429A1205791933922397BAC0@wri-xchng.WRIWINDBER.ORG> Fyi Liebman, M. N. Molecular Modeling of Protein Structure and Function: A Bioinformatic Approach, J. Comp.-Aided Molecular Design 1, 323-341 l987 Michael N. Liebman, Ph.D. Executive Director Windber Research Institute President and Managing Director Strategic Medicine, Inc 620 Seventh St Windber, PA 15963 (814) 467-9844 (814) 361-6932 Office (814) 659-5450 Mobile -----Original Message----- From: bio_bulletin_board-bounces+m.liebman=wriwindber.org at bioinformatics.org [mailto:bio_bulletin_board-bounces+m.liebman=wriwindber.org at bioinformati cs.org] On Behalf Of J.W. Bizzaro Sent: Tuesday, July 31, 2007 7:34 PM To: Temple F. Smith Cc: BiO_Bulletin_Board at bioinformatics.org Subject: [BiO BB] Re: Bioinformatic and the Smith-waterman Hi Temple! (For those on the mailing list, the origins of the word "bioinformatics" was brought up in this thread: http://bioinformatics.org/pipermail/bio_bulletin_board/2002-April/000635 .html and we have a short wiki page on the topic: http://wiki.bioinformatics.org/Origins_of_bioinformatics) Thank you for clearing up some misconceptions about the origins of the field, including some of my own. I guess when we spoke about this around 1998, you actually said that you were *incorrectly* credited with having coined the word. In any case, I agree that no one person or group started the field. And it seems there are as many different definitions as there are practitioners. To me, bioinformatics is a compound of "bio" and the English/common word "informatics" (http://en.wikipedia.org/wiki/Informatics), with the latter being a subdiscipline of computer science, whereas the French "informatique" will translate to "computer science" in general. A Wikipedia contributor wrote the following about "informatics," and I pretty much agree with it: "Used as a compound, in conjunction with the name of a discipline, as in medical informatics, bioinformatics, etc., it denotes the specialization of informatics to the management and processing of data, information and knowledge in the named discipline, and the incorporation of informatic concepts and theories to enrich the other discipline." So, I think of bioinformatics as a subdiscipline of computational biology, the same way that informatics is a subdiscipline of computer science. But, I will cede that most people think that the terms are synonymous. And maybe it just doesn't matter. I will integrate most of what you've written into our wiki, which will hopefully help clear up some of the confusion about where things started. Cheers, Jeff Temple F. Smith wrote: > Jeff for the record: > > Jean-Michel Claverie of France used the term "bio-informatik" > in some email at the time of the "Waterville Valley computational > biology meetings in the mid to late 80's. However in his book, > Bioinformatics for Dummies I do not recall that he discusses the > origin of the term? Recall that the term Informatik is the French > word for computer science and Jean-Michel was one of the early guys > in this "field" but true computational biology goes back to Haldane > (1908) and D'arcy Thompson (1942), Dayhoff (1966) etc ..... to say > nothing about the x-ray crystal guys of the last 1950's!! Clearly the > term was not used in 1980 to 1982 when Dr. Waterman and I were > starting out!! And no one "started the Field of Bioinformatics" it > grow out of the molecular biology with protein sequencing and then > DNA/RNA sequencing's need for databases and computer analysis. The > first such recognized early work was by people like Zuckerkandl and > Pauling (1965) and Fitch and Margolisash (1967) and then Needleman > and Wuchsch (1977)! Thus unless the Dr. Hwa A.Lim can claim to have > been doing sequence comparative analysis in the late 1960's he is not > a founder! > While I was the organizer of the three Waterville Valley Genes > and Machines meetings to which Jean-Michel attended, I did not use > the term for at least another two years if I remember correctly. Also > on my visits with Dayhoff and later Fitch they both agreed that it > was likely Jean-Michel then at the FRENCH Institute Pasture who > surely used it first --particularly given his use in email as > something he had been using at home in the Paris Institute. Thus > unless Jean-Michel says other wise all others making such claims > should stop! This is an old discussion which I find not funny any > more. In fact the terms is a bit out of date these days and the > better term is computational biology in any case. > > Please pass this on to who ever is still asking this now unimportant > question. > > Temple F. Smith, PhD > BMERC > Boston University > -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From browne.ken at virgin.net Wed Aug 1 05:08:46 2007 From: browne.ken at virgin.net (Ken Browne) Date: Wed, 1 Aug 2007 10:08:46 +0100 Subject: [BiO BB] High Throughput in Silico Docking Training Course - London 23 Oct 2007 Message-ID: <000001c7d466$f6994120$0201a8c0@TechnologyNetworks.local> -------------------- HALF DAY TRAINING COURSE -------------------- Design and Deployment of High Throughput in Silico Docking on Grid Infrastructure 23 October 2007, 13:00 - 17:00 London, England Select Biosciences welcome you to attend a half day training course presented by Nicolas Jacq, PhD, of HealthGrid, France. The course will take place prior to the Virtual Discovery conference and exhibition, to be held in London. ( www.VirtualDiscovery.net ) Nicolas Jacq is the Technical Coordinator of the European SHARE project for the HealthGrid association and an active member of the WISDOM collaboration. -------------------- WHO SHOULD ATTEND -------------------- The course will be suitable for scientists, engineers and PhD students working within the pharmaceutical and biotechnology industry or academia. The course is designed for people who are interested in learning what is the grid technology, what are the benefits to adopt it in drug discovery and what are the steps for deploying applications on a grid infrastructure. A particular focus will be placed upon high throughput virtual screening by in silico docking. -------------------- LEARNING OBJECTIVES -------------------- . Learn about the grid, a new Information Technology . Find out about the real performance of the grid by analysing several use cases from industry and academia (drug discovery, epidemiology.) . Become familiar with the grid services of the gLite middleware (security, job submission, data management.) . Learn how to design and deploy high throughput in silico docking on a cluster grid infrastructure -------------------- MORE INFORMATION -------------------- http://www.selectbiosciences.com/conferences/htmlemails/training/Jacq(VirtDisc07).html Best Regards, Ken Browne From letondal at pasteur.Fr Fri Aug 3 04:58:29 2007 From: letondal at pasteur.Fr (Catherine Letondal) Date: Fri, 3 Aug 2007 10:58:29 +0200 Subject: [BiO BB] Course in informatics for biology 2008 at Institut Pasteur Message-ID: Hi, ************************************************************************ * Course in informatics for biology 2008 at Institut Pasteur http://www.pasteur.fr/formation/infobio-en.html ************************************************************************ * In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2008. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of an additional language (C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 1 2007. *** Sincerely, -- Benno Schwikowski & Catherine Letondal Institut Pasteur -- Course in Informatics for Biology www.pasteur.fr/formation/infobio From marty.gollery at gmail.com Fri Aug 3 15:32:20 2007 From: marty.gollery at gmail.com (Martin Gollery) Date: Fri, 3 Aug 2007 12:32:20 -0700 Subject: [BiO BB] Automatic blast database maintenance/updating In-Reply-To: <25b698b90707301348n2e655331j7884077dd7e26ef8@mail.gmail.com> References: <25b698b90707301348n2e655331j7884077dd7e26ef8@mail.gmail.com> Message-ID: Hi Rohan, Check out Biodownloader- It is described in Maxim V. Shapovalov, Adrian A. Canutescu, and Roland L. Dunbrack, Jr. BioDownloader: bioinformatics downloads and updates in a few clicks Bioinformatics Advance Access published on May 5, 2007 Bioinformatics 2007 23: 1437-1439; doi:10.1093/bioinformatics/btm120 Cheers, Marty On 7/30/07, Rohan Sachdeva wrote: > Hello I've been charged with installing and maintaining a wwwblast server in > my lab. I've got everything setup but I am looking for an easy way to keep > all the databases updated. I was hoping someone could point me toward a > script that used update_blastdb.pl to update whatever databases and then > extract them too. > > Thanks > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Senior Bioinformatics Scientist TimeLogic- a Division of Active Motif 775-833-9113 880 Northwood Blvd. Suite 7 Incline Village, NV 89451 From landman at scalableinformatics.com Fri Aug 3 17:21:20 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 03 Aug 2007 17:21:20 -0400 Subject: [BiO BB] Automatic blast database maintenance/updating In-Reply-To: References: <25b698b90707301348n2e655331j7884077dd7e26ef8@mail.gmail.com> Message-ID: <46B39C50.5050005@scalableinformatics.com> Must have missed the original email, I saw Marty's response. > On 7/30/07, Rohan Sachdeva wrote: >> Hello I've been charged with installing and maintaining a wwwblast server in >> my lab. I've got everything setup but I am looking for an easy way to keep >> all the databases updated. I was hoping someone could point me toward a >> script that used update_blastdb.pl to update whatever databases and then >> extract them too. I haven't used update_blastdb.pl (looks like it came out in 2005, some years after our db_dlaf.pl came out at http://downloads.scalableinformatics.com/downloads/db_dlaf.pl ). With db_dlaf.pl (Database DownLoad And Format), it is pretty easy to set up scripted/cron'ed updates, without mouse clicks... [landman at crunch-r.scalableinformatics.com:~/q] 6 >./db_dlaf.pl --list alu.a.gz alu.n.gz drosoph.aa.gz drosoph.nt.gz ecoli.aa.gz ecoli.nt.gz env_nr.gz env_nt.gz est.nal . . . wgs.gz yeast.aa.gz yeast.nt.gz and pulling and formatting say yeast.aa as a protein database [landman at crunch-r.scalableinformatics.com:~/q] 10 >./db_dlaf.pl --db=yeast.aa.gz --fdb /usr/bin/formatdb \ --formatdb "-p T -o T" --verbose destination path = ./20070803 url = ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ Using the following database(s) yeast.aa.gz starting transfer of ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/yeast.aa.gz transfer for file=yeast.aa.gz is 10.1 seconds, rate=0.18 MB/s, size=1.86 MB file=yeast.aa.gz size=1.86 MB, uncompressed file=yeast.aa, size=3.24 MB formatdb against db= yeast.aa preparing to run formatdb command line = /usr/bin/formatdb -i yeast.aa -p T -o T It deposits the formatted databases into directories named as indicated [landman at crunch-r.scalableinformatics.com:~/q] 11 >ls -alF 20070803/ total 7560 drwxrwxr-x 2 landman landman 4096 Aug 3 17:12 ./ drwxrwxr-x 3 landman landman 4096 Aug 3 16:56 ../ -rw-rw-r-- 1 landman landman 499 Aug 3 17:12 formatdb.log -rw-rw-r-- 1 landman landman 3399727 Aug 3 17:12 yeast.aa -rw-rw-r-- 1 landman landman 624891 Aug 3 17:12 yeast.aa.phr -rw-rw-r-- 1 landman landman 50456 Aug 3 17:12 yeast.aa.pin -rw-rw-r-- 1 landman landman 50384 Aug 3 17:12 yeast.aa.pnd -rw-rw-r-- 1 landman landman 244 Aug 3 17:12 yeast.aa.pni -rw-rw-r-- 1 landman landman 561270 Aug 3 17:12 yeast.aa.psd -rw-rw-r-- 1 landman landman 12773 Aug 3 17:12 yeast.aa.psi -rw-rw-r-- 1 landman landman 2980337 Aug 3 17:12 yeast.aa.psq No mouse clicks required, though a simple foreach loop in a script that you insert into a crontab could help... #!/bin/tcsh # # start with some aa DBs ... # foreach d ("pataa.gz" "pdbaa.gz" "swissprot.gz") /usr/local/bin/db_dlaf.pl --db=$d --fdb /usr/bin/formatdb \ --formatdb "-p T -o T" --verbose end # # and move on to some nt DBs ... # foreach d ("est_human.gz" "est_mouse.gz" "nt.gz") /usr/local/bin/db_dlaf.pl --db=$d --fdb /usr/bin/formatdb \ --formatdb "-p F -v 1024 -o T" --verbose end # # END OF SCRIPT Put that in a file somewhere, make it executable, and then run crontab -e to have it run every so often. Once a week/month/quarter/year. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From wongls at comp.nus.edu.sg Mon Aug 6 03:30:05 2007 From: wongls at comp.nus.edu.sg (Limsoon Wong) Date: Mon, 6 Aug 2007 15:30:05 +0800 Subject: [BiO BB] LBM2007 --- 2nd call for papers Message-ID: <00af01c7d7fb$9c3cec70$8bb81aac@comp.nus.edu.sg> 2nd International Symposium on Languages in Biology and Medicine (LBM 2007) 6-7 December 2007 @ Biopolis, Singapore (http://lbm2007.biopathway.org/) ? Language is a powerful tool that in its many manifestations is a system, used for communication, comprising a finite set of arbitrary symbols and a set of rules (or grammar) by which the manipulation of these symbols is governed. In biology and medicine, the importance of languages used to represent knowledge, communicate and query information is immense. Likewise auxiliary tasks such as translation, summarization and information extraction play important roles supporting scientific research. The automation of such tasks has significantly advanced knowledge discovery in biomedicine. Incumbent technologies that discover, read and process language are continually stretched by the vigorous demands of bio-medical scientists and there is the continual need and incentive for language techniques to evolve. Despite this, the distinct communities involved in language processing rarely borrow from one another or look over the fence to see what other approaches are in use. And yet synergistic interactions across methodological disciplines and across different topics are frequently the harbingers of revolutionary technologies. In this context, it is imperative that we adopt diversification, more lateral and creative interaction between language professionals. The International Symposium on Languages in Biology and Medicine (LBM) 2007 seeks to provide a renewed opportunity for interaction between language professionals with different methodological backgrounds. LBM was established in 2005 and the remit of this event remains highly relevant today. The symposium focuses on the languages that are in active use for biology and medicine. We are calling for original research papers on, but not limited to the topics listed below. Papers focusing on application aspects of languages in biology and medicine are also invited. - Natural language: text mining, retrieval and management; - Ontology language: ontology construction, extension and management; - Logic language: knowledge representation and induction; - Sequence language: RNA structure prediction, protein domain prediction; - Database language: database interface, query language; - Visualization language: information visualization, molecular visualization LBM2007 will consist of oral paper and poster presentation, invited speeches and a panel discussion. In addition to on-line conference proceedings, oral paper presentations will be published in BMC Bioinformatics and receive a MEDLINE citation. The symposium is co-located with the 18th International Conference on Genome Informatics (GIW 2007), which will be held in Singapore from December 3 to 5, 2007. ? Submission of Papers ==================== Submission should follow BMC instructions for authors which could be found at http://lbm2007.biopathway.org/PaperSubmission and should not exceed 14 pages including references. Additional pages could be given to figures and tables. Paper review will be double blind, so papers should not include authors? names and affiliations. Self-references are to be avoided?instead of "As we showed in Smith et al. 1999...", say "As Smith et al. 1999 showed...." Paper submission software will allow authors to enter full author information separately. The paper submission site is http://www.easychair.org/LBM2007. Manuscripts must be submitted no later than September 9, 2007. ? Important Dates =============== Paper submission due: September 9, 2007 Notification of acceptance: October 15, 2007 Camera ready due: November 1, 2007 LBM 2007 Conference: December 6-7, 2007 ? Steering Committee ================== See-Kiong Ng, NTU & I2R, Singapore Jong C. Park, KAIST, South Korea Limsoon Wong, NUS, Singapore ? General Chairs ============== Jong C. Park, KAIST, South Korea Limsoon Wong, NUS, Singapore ? Programme Committee Chairs ========================== Christopher J. O. Baker, I2R, Singapore Su Jian, I2R, Singapore ? Local Organizing Chair ====================== Rajaraman Kanagasabai, I2R, Singapore ? Programme Committee =================== Sophia Ananiadou, University of Manchester, UK Vlad Bajic, University of the Western Cape, South Africa Chitta Baral, Arizona State University, USA Christian Blaschke, Bioalma, Spain Anita Burgun, Universite de Rennes, France Werner Ceusters, Buffalo NY, USA Kevin B. Cohen, University of Colorado Health Sciences Center, USA Nigel Collier, National Institute for Informatics, Japan Mark Craven, University of Wisconsin, USA Rebholz Dietrich, EMBL-EBI, UK Julian Gough, University of Bristol, UK Volker Haarslev, Concordia University, Canada Udo Hahn, Jena University, Germany Lynette Hirschman, MITRE, USA Graeme Hirst, University of Toronto, Canada Ewan Klein, Edinburgh University, UK Satoshi Kobayashi, University of Electro-Communications, Japan Michael Krauthammer, Yale University School of Medicine, USA Patrick Lambrix, Link?ping University, Sweden Liu, Hong Fang, Georgetown University Medical Center, USA Yves Lussier, University of Chicago, USA Erik van Mulligen, Erasmus MC, Netherlands Jinah Park, Information & Communications University, Korea Tom Rindflesch, National Library of Medicine, USA Jasmin Saric, Boehringer Ingelheim Pharma GmbH & Co. KG, Germany Neil Sarkar, Woods Hole, USA Stefan Schulz, Freiburg University Hospital, Germany Donia Scott, Open University, UK Hagit Shatkay, Queen's University, Canada Margaret Anne Storey, University Victoria, Canada Jun'ichi Tsujii, University of Tokyo, Japan, University of Manchester, UK Alfonso Valencia, CNIO, Spain W. John Wilbur, NIH, USA Rene Witte, University of Karlsruhe, Germany Hong Yu, University of Wisconsin-Milwaukee, USA Pierre Zweigenbaum, LIMSI-CNRS, France ? For more information, or to be placed on our mailing list for updates, please contact: e-mail: lbm2007 at biopathway.org Website: http://lbm2007.biopathway.org/ From wongls at comp.nus.edu.sg Mon Aug 6 03:32:46 2007 From: wongls at comp.nus.edu.sg (Limsoon Wong) Date: Mon, 6 Aug 2007 15:32:46 +0800 Subject: [BiO BB] GIW2007 --- 2nd call for posters Message-ID: <00b601c7d7fb$fc6f3c60$8bb81aac@comp.nus.edu.sg> 2ND CALL FOR POSTERS The 18th International Conference on Genome Informatics (GIW 2007) Biopolis, Singapore. December 3-5 2007. http://www.comp.nus.edu.sg/~giw2007 The 18th International Conference on Genome Informatics (GIW 2007) will be held at the Biopolis in Singapore on December 3-5, 2007. The GIW is the longest running international bioinformatics conference, which has provided unique opportunities that bridge theory and experiments, academia and industry, and East and West. SUBMISSIONS: Poster or software demonstration abstracts are limited to 2 pages, including title, figures, tables, text, and bibliography. Please see below for Abstract Templates. All abstract should be submitted at the site http://www.easychair.org/GIWPoster2007. Accepted posters and software demonstration abstracts will be compiled into "The 18th International Conference on Genomic Informatics, Posters and Software Demonstrations". Additionally, a number of notable abstracts will be selected for oral presentations. IMPORTANT DATES: Poster submission deadline: 16 September 2007 Poster decision: 14 October 2007 Poster templates: http://www.comp.nus.edu.sg/~giw2007/poster.html From forward at hongyu.org Thu Aug 16 18:55:48 2007 From: forward at hongyu.org (Hongyu Zhang) Date: Thu, 16 Aug 2007 15:55:48 -0700 (PDT) Subject: [BiO BB] Observation: multiple sequence alignment affected by the input sequence order Message-ID: <633548.20935.qm@web51412.mail.re2.yahoo.com> Dear all, I've observed that several multiple sequence alignment programs, including ProbCons, ClustalW and Musle, all share the same behavior, i.e., given a group of sequences in FASTA format as the input, if I change the order of the sequences in the input file, the results generated by those programs will change as well. It's not just the sequence order that will change, but also the amino acid matches. I think it's a little counter-intuitive because one would expect the opposite. Is there a program that can output a stable alignment independent of the input sequence order? Thanks! From marchywka at hotmail.com Thu Aug 16 20:45:01 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Thu, 16 Aug 2007 20:45:01 -0400 Subject: [BiO BB] Observation: multiple sequence alignment affected by theinput sequence order In-Reply-To: <633548.20935.qm@web51412.mail.re2.yahoo.com> Message-ID: I've never bothered to check these details but you really have to evaluate these ill-defined fits in light of some objective. That is, given two sequences you really don't know if one was generated from the other by any particular set of operations. It may even make sense, from the standpoint offitting to an evolution model, to assume one is derived from the other in non-symmetric ways. Perhaps it would make more sense to output a list of steps to turn one into the other? Clustal source code is available. Having said that, I think I've actually got what you mention but only because I was lazy and my needs don't care about evolution of one string from another. If you take two strings, and generate a matrix of all possible comparisons, you can generate you own "best-fits." This one for example, recursively takes the largest exact matches irrepsective of offset ( so I think it is insensitive to order) and tries to align the leftovers in the same way. I've compared this to clustalw and the clustalw "makes more sense" as this thing seems to think nothing of inserting gaps ( obviously adjustable parameters for a figure of merit is a nice feature...): $ ./string_correlator abcdefghijkl abdddefhjkl abc-defghijkl abdddef-h-jkl $ ./string_correlator abdddefhjkl abcdefghijkl full one:11 12 132 ab{dd,c}def{,g}h{,i}jkl abdddef-h-jkl abc-defghijkl I've been using this approach to make my own blast database of genomic repeats- while its too early to tell if this will be useful initial alignments with known stuff like ORF's seems encouraging ( hits from this database seem to occur in only a few consistent places in the few cases I've examined and do not appear to just place litter in and around coding sequences. ). Anyway, my question is, now that I have my own text and graphical alignment tools, what software exists for taking a bunch of notes from various sources ( blast hits, genome annotations, etc) and aligning them in one picture or text document? I have my own now that I'd like to discuss with interested parties ( I'd be willing to post some gzipped bmp files too). Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. >From: Hongyu Zhang [ deleted to meet size limits ] _________________________________________________________________ More photos, more messages, more storage?get 2GB with Windows Live Hotmail. http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507 From Sterten at aol.com Thu Aug 16 21:21:27 2007 From: Sterten at aol.com (Sterten at aol.com) Date: Thu, 16 Aug 2007 21:21:27 EDT Subject: [BiO BB] Observation: multiple sequence alignment affected by theinput se... Message-ID: of course, you could always resort the sequences first -say alphabetically- so you get some agreement for the output on several programs. But I assume that's not what you want... From Sterten at aol.com Thu Aug 16 21:26:45 2007 From: Sterten at aol.com (Sterten at aol.com) Date: Thu, 16 Aug 2007 21:26:45 EDT Subject: [BiO BB] gaps in alignment Message-ID: programs to align multiple sequences usually insert a "-" gap in almost all sequences, when just only one of the sequences has an additional entry here. e.g.: abcdef abxdef ajcdef abbcde gives: ab-cdef ab-xdef aj-cdef abbcde- with 3 gaps at position 3 and only one non-gap. I'd prefer: abcdef abxdef ajcdef ab{b}cde is it possible ? wouldn't it make more sense ? Is there a converter ? From barry.hardy at vtxmail.ch Thu Aug 16 13:38:32 2007 From: barry.hardy at vtxmail.ch (Barry Hardy) Date: Thu, 16 Aug 2007 19:38:32 +0200 Subject: [BiO BB] Best Practices in Virtual Screening Message-ID: <46C48B98.6090905@vtxmail.ch> Perhaps the most frequent topic of discussion that I have seen consistently arising in my recent conversations with drug discovery researchers, is the topic of Virtual Screening and its complexities, confusions, and varying validity and reliability. John Irwin and I initiated the idea of a best practice initiative last Autumn (http://barryhardy.blogs.com/cheminfostream/2006/10/could_we_take_a.html). We realise this will take time but I believe it is an endeavour worth undertaking that will be of significant benefit to both industry and academic researchers. To this end we are supporting workshop and wiki activity this Autumn to initiate such a program. The Virtual Screening Community of Practice Workshop and Forum will take place 15-16 October at Bryn Mawr, Philadelphia to further the above goals. This activity will consist of the following components: 1. Workshop to share experiences on current practices in virtual screening and to collaboratively develop best practices for comparison studies. (morning/afternoon of October 15). 2. Conference session on latest method developments with presentations and panel discussion. (October 16) 3. Poster Session (evening of October 16). NOTE: If interested in presenting a poster, please send an abstract (ca. 300-500 words) for review to eCheminfo (-at-) douglasconnect.com We have also left space on the program schedule to feature a selection of the abstracts submitted as oral presentations. 4. Virtual communication and collaboration approaches will be used pre- and post-event to maximise the benefit of the workshop activity. In particular a wiki will be opened prior to the workshop to commence documentation of supporting materials and to start to populate the area with initial suggestions, ideas, practices and methods. The wiki will also support subsequent practice group activities and development initiatives, including future ongoing meetings and workshops and research and development projects. (Realising this activity needs to be in progress for quite some time.) The agenda of workshop will be designed so as to maximise interaction, discussion, issue resolution, and action plans for cooperation. Workshop activities will address the specific challenges: * statistically significant relationships between docking scores and ligand affinity * practices and procedures for the operation of community-based screening and docking comparisons including tests and interpretation of results, in a way that everyone can agree is fair. * peer review, data compilation, running of programs, judgement of results * workflow descriptions for comparisons * beyond conformational energetics in the rank ordering of diverse compounds in high throughput virtual screening * measurement and benchmarking * binding mode prediction, virtual screening for lead identification, rank-ordering by affinity for lead optimization * atom typing, ligand preparation (ionic forms, tautomers, ...), ligand conformer generation, protein preparation (protonation, residue orientation, ...), ligand placement (top-down, bottom-up, fragment based, group based, ...), energy calculation (force field type, grid type, algorithm, ...), constraint handling (global and local optimization strategy? process to escape local minima?), scoring (single-objective, multi-objective, consensus, ...) * separation of test set information from model development * validation datasets, results and applicability domains * objective comparisons of standardized test datasets * extraction of data from the scientific literature * methods and procedures for secure testing of commercial data that could be acceptable to industry * frameworks for computational model testing and validation * impact of knowledge management approaches * collaboration and community support structures and environments We welcome the collaboration and participation of all academic, government and industry practitioners in drug discovery in strengthening the scientific foundations of this valuable set of cheminformatics techniques. More Information Website: http://www.echeminfo.com/COMTY_screeningforumbm07 Pdf Download: http://barryhardy.blogs.com/cheminfostream/files/eChemProgramBrynMawr07-web1.PDF best regards Barry Hardy eCheminfo Community of Practice Barry Hardy, PhD Douglas Connect Zeiningen, CH-4314 Switzerland Tel: +41 61 851 0170 Blog: http://barryhardy.blogs.com/cheminfostream/ From piet.amc at gmail.com Fri Aug 17 05:35:21 2007 From: piet.amc at gmail.com (piet molenaar) Date: Fri, 17 Aug 2007 11:35:21 +0200 Subject: [BiO BB] Opening registration: "Integrative Bioinformatics": 5th Annual Cytoscape Public Symposium Message-ID: <554293e20708170235y5efbaaf1h512ac7a61fc57998@mail.gmail.com> *November 8, 2007: Public Symposium:* *Integrative Bioinformatics: At the cutting edge of network analysis and biological data integration* *Leroy Hood* ?Institute of Systems Biology *Ewan Birney* ? European Bioinformatics Institute *Peter Sorger* ? Harvard University Ruedi Aebersold ? Swiss Federal Institute of Technology *Trey Ideker* ? University of California, San Diego *Chris Sander* ? Memorial-Sloan-Kettering Cancer Center *Benno Schwikowski* ? Institute Pasteur *Andrew Hopkins*?Pfizer Global Research & Development* * *November 6,7 and 9, 2007: Developers retreat * *Department of Human Genetics, **Academic** **Medical** **Center* *University of Amsterdam**, **Netherlands* We are pleased to announce the opening of registration for the forthcoming 2007 Cytoscape Symposium and Retreat to be held at the Academic Medical Center, University of Amsterdam, Netherlands on 6-9 Nov 2007. Registration is now possible at: http://www.cytoscape.org/retreat2007 This years meeting is particularly exciting since it is the first time it will be held in Europe, specifically in the vibrant historic city of Amsterdam. 'Floating' on a web of canals, with its unique combination of old-world charm and cosmopolitan culture, Amsterdam is one of the most popular European cities for international visitors. To reach Cytoscape's large European user base a Public Symposium will be held on November 8th for which there is a formidable list of confirmed speakers. In addition to the Public Symposium the retreat consists of hands-on demo's, user-training sessions and informal, technically focused developer meetings: - *Tues 6th Nov:* Cytoscape Developer's Discussions - *Wed 7th Nov:* *Application showcase, hands-on sessions, tutorials * - *Thur 8th Nov:* *Public Symposium* - *Fri 9th Nov:* Development of Cytoscape Roadmap for 2007, 2008 The Symposium on Thursday 8th November is of general interest to both biologists and informaticians. Current and future users of Cytoscape are invited to visit the Application showcase on the 7th also. The developers days are targeted to people with an interest in the development of the Cytoscape software and associated plugins. We hope that you can join us! The Organizing Committee, 5th Cytoscape Retreat 2007 Annette Adler - Agilent Technologies Piet Molenaar - Human Genetics Department AMC Guy Warner - Unilever *Contact*: cytoretreat at cytoscape.org* * *The retreat is supported by*: Unilever (www.unilever.com) Netherlands Bioinformatics Center (NBIC) ( www.nbic.nl) Agilent ( www.agilent.com) *About Cytoscape:* Cytoscape (www.cytoscape.org) is an open source bioinformatics software platform for *visualizing* molecular interaction networks and *integrating *these interactions with gene expression profiles and other state data. The software architecture enables adaptation of Cytoscape functionality to the specific needs of biologists and bioinformaticians. It is jointly developed by the groups of Benno Schwikowski (Pasteur Institute, Paris), Trey Ideker (University of California San Diego), Chris Sander (Memorial Sloan-Kettering Cancer Center), Lee Hood (Institute of Systems Biology, Seattle), Annette Adler (Agilent Technologies, Santa Clara, CA) and Bruce Conklin (Gladstone/UCSF, GenMAPP). PS: Excuse us for cross-posting; we're trying to avoid this as much as possible. However, to reach a large audience we decided to include several relevant maillists also thereby increasing the risk that users get this mail several times. From dcaragea at ksu.edu Sat Aug 18 23:18:11 2007 From: dcaragea at ksu.edu (Doina Caragea) Date: Sat, 18 Aug 2007 22:18:11 -0500 Subject: [BiO BB] CFC: Computational Methodologies in Gene Regulatory Networks Message-ID: <02166BBA-A338-4AE3-9ED8-76FFA764B375@ksu.edu> CALL FOR CHAPTERS: COMPUTATIONAL METHODOLOGIES IN GENE REGULATORY NETWORKS Dear potential author, Please accept our apologies if you received multiple copies of this invitation. We request you to submit a chapter for our forthcoming book, "Computational Methodologies in Gene Regulatory Networks" on a topic of your interest. http://www.k-state.edu/cmgrn/ http://www.igi-pub.com/requests/details.asp?ID=205 Email: cmgrn at ksu.edu Proposals due: September 15, 2007 Sincerely, Sanjoy Das, Doina Caragea, William H. Hsu, Stephen M. Welch. The details are as follows: CALL FOR CHAPTERS Submission Deadlines: proposals due on September 15, 2007, full manuscripts due on February 15, 2008 COMPUTATIONAL METHODOLOGIES IN GENE REGULATORY NETWORKS URL: www.k-state.edu/cmgrn Email: cmgrn at ksu.edu A book edited by Sanjoy Das, Doina Caragea, W. H. Hsu, Stephen M. Welch, Kansas State University, USA. INTRODUCTION Recent advances in gene sequencing technology are shedding light on the complex interplay between genes that elicit phenotypic behavior characteristic of any given organism. It is now known that in order to mediate external as well as internal signals, an organism's genes are organized into complex signaling pathways. Unfortunately, unraveling the specific details about how these genetic pathways interact to regulate development, life histories, and respond to environmental cues, is proving to be a daunting task. A wide variety of models depicting gene-gene interactions, that are commonly referred to as gene regulatory networks (GRNs), have been proposed. A wide variety of computational tools are available for modeling gene regulatory networks. OVERALL OBJECTIVES A gene regulatory network (GRN) must be able to mimic experimentally observed behavior and also be computationally tractable. Under these circumstances, model simplicity is an important trade-off for functional fidelity. Modeling approaches taken by researchers are wide and disparate. Some gene regulatory networks are modeled entirely using non-parametric approaches such as Bayesian or neural networks, while some others represent genes in very physically realistic differential equation formats. The book will focus on the computational methods widely used in modeling gene regulatory networks, including structure discovery, learning and optimization. Both research and survey papers are welcome. TARGET AUDIENCE Biologists: The book can provide a comprehensive overview of computational intelligence approaches for learning and optimization and their use in gene regulatory networks to biologists. Computer Scientists: The book can assist computer scientists interested in gene regulatory network modeling. Classroom instructors and students: Although not a textbook, the book can serve as an excellent reference or supplementary material. Graduate students: As the book would bridge the gap between artificial intelligence and genomic research communities, it will be very useful to graduate students considering interdisciplinary research in this direction. Practicing computer scientists and geneticists: The book would be useful to those interested in gene regulatory network modeling. RECOMMENDED TOPICS Recommended topics include, but are not limited to, the following: Introduction to GRNs Introduction to graphical approaches for GRNs Bayesian network models for gene network models Petri nets and GRN models Dynamic Bayesian network GRNs Structure learning of GRNs Neural network based GRNs Boolean GRNs Temporal Boolean GRNs Probabilistic Boolean GRNs Machine learning in Boolean networks for GRNs Differential equation based GRNs Stochastic optimization algorithms for GRNs Evolutionary optimization in GRNs GRNs using the S-system formalism Optimization of S-system GRNs Clustering in GRNs SUBMISSION PROCEDURE Researchers and practitioners are invited to submit on or before September 15, 2007, a 2-5 page manuscript proposal clearly explaining the mission and concerns of the proposed chapter. Authors of accepted proposals will be notified by October 15, 2007 about the status of their proposals and sent chapter organizational guidelines. Full chapters are due on February 15, 2008. All submitted chapters will be reviewed on a double-blind review basis. The book is scheduled to be published by IGI Global, www.igi-pub.com, publisher of the IGI Publishing (formerly Idea Group Publishing), Information Science Publishing, IRM Press, CyberTech Publishing and Information Science Reference (formerly Idea Group Reference) imprints. INQUIRIES Inquiries and submissions can be forwarded electronically (pdf or word document) to: cmgrn at ksu.edu More information can be found at the proposed book's website: http://www.k-state.edu/cmgrn/ or http://www.igi-pub.com/requests/details.asp?ID=205 Individual authors can also be contacted directly: Dr. Sanjoy Das Elect. & Comp. Engg. Dept. Kansas State University sdas at ksu.edu Tel: (785) 532-4642 Dr. Doina Caragea Comp. & Info. Sci. Dept. Kansas State University dcaragea at ksu.edu Tel: (785) 532-7908 Dr. Stephen. M. Welch Dept. of Agronomy Kansas State University welchsm at ksu.edu Tel: (785) 532-7236 Dr. William H. Hsu Comp. & Info. Sci. Dept. Kansas State University bhsu at ksu.edu Tel: (785) 532-7905 From ilya_shl at alum.mit.edu Mon Aug 20 23:44:20 2007 From: ilya_shl at alum.mit.edu (Ilya Shlyakhter) Date: Mon, 20 Aug 2007 23:44:20 -0400 Subject: [BiO BB] getting a phylogenetic tree w/distances Message-ID: <4b11f87e0708202044t518353aaw6df9e1e477ec3e31@mail.gmail.com> Where can I get a phylogenetic tree, with distances for all branches, for the organisms { mouse, rat, dog, cow, human }? I can get the structure of the tree from ToLWeb.org, but not the distances; and TreeBASE.org doesn't have a tree for this specific set. I need the tree for use with a phylogeny-aware Gibbs motif finder. Thanks for help, Ilya From kiran.soorya at gmail.com Tue Aug 21 05:19:34 2007 From: kiran.soorya at gmail.com (soorya kiran) Date: Tue, 21 Aug 2007 14:49:34 +0530 Subject: [BiO BB] Enhancer databases Message-ID: *Hi all,* ** *It will be appreciable if any one guide me, to find enhancer databases or related algorithams * *Thanks in advance* ** *Sri* From dafniosn at post.tau.ac.il Tue Aug 21 11:49:40 2007 From: dafniosn at post.tau.ac.il (Osnat Dafni) Date: Tue, 21 Aug 2007 18:49:40 +0300 Subject: [BiO BB] Drosophila GO annotation pie charts Message-ID: <1187711380.46cb0994acee5@webmail.tau.ac.il> Hello, I'm looking for a toll that can output a pie chart of GO annotations for a given list of genes. Is anyone aware of such a thing? Thanks, Osnat ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From j.a.cotton at qmul.ac.uk Tue Aug 21 11:36:10 2007 From: j.a.cotton at qmul.ac.uk (James Cotton) Date: Tue, 21 Aug 2007 16:36:10 +0100 Subject: [BiO BB] getting a phylogenetic tree w/distances In-Reply-To: <4b11f87e0708202044t518353aaw6df9e1e477ec3e31@mail.gmail.com> References: <4b11f87e0708202044t518353aaw6df9e1e477ec3e31@mail.gmail.com> Message-ID: <8AD73B31-B171-4876-B745-B05D745E18DE@qmul.ac.uk> Depends on what you mean by distances, really: distances on molecular phylogenies are usually estimates of sequence divergence, and so depend on what gene is used to build the phylogeny. That might not be ideal for what you want: is the software sensitive to the absolute scale of the branch lengths, or just their relative sizes? If the latter, you could use the estimated divergence ages of the different taxa to put branch-lengths on your tree - i.e. have branch lengths in terms of millions of years since the divergence of the groups. you can find some estimates of those numbers in e.g. this paper: http://www.ncbi.nlm.nih.gov/sites/entrez? cmd=Retrieve&db=PubMed&list_uids=9582070, Though these estimates might be a bit out-of-date now. If the software is sensitive to the absolute scale of the branch lengths, I guess you need some kind of "mean sequence divergence" between the taxa, which is probably around in the literature somewhere, but I don't know where.. There are some estimates in textbooks e.g. Li's Molecular Evolution book, but I'm not sure you'll find all the numbers, and even then, they might not fit on a tree - perhaps someone else knows a tree with these figures already attached? Hope this helps, James On 21 Aug 2007, at 04:44, Ilya Shlyakhter wrote: > Where can I get a phylogenetic tree, with distances for all branches, > for the organisms { mouse, rat, dog, cow, human }? I can get the > structure of the tree from ToLWeb.org, but not the distances; and > TreeBASE.org doesn't have a tree for this specific set. I need the > tree for use with a phylogeny-aware Gibbs motif finder. > > Thanks for help, > > Ilya > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ____________________________________ James Cotton School of Biological and Chemical Sciences Queen Mary, University of London +44 (0)207 882 8287 j.a.cotton at qmul.ac.uk http://taxonomy.zoology.gla.ac.uk/~jcotton ____________________________________ From dan.bolser at gmail.com Wed Aug 22 06:30:54 2007 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 22 Aug 2007 12:30:54 +0200 Subject: [BiO BB] Enhancer databases In-Reply-To: References: Message-ID: <2c8757af0708220330x5a89ab75jf8088341e7b6950e@mail.gmail.com> On 21/08/07, soorya kiran wrote: > *Hi all,* > ** > *It will be appreciable if any one guide me, to find enhancer databases or > related algorithams * > *Thanks in advance* > ** > *Sri* Searching 'MetaBase' (http://biodatabase.org) for 'enhancer' gave the following hit; GeniSys, The Genisys database is an organized collection of information about 35,000 lines of Enhancer and Promoter (EP)-element-inserted Drosophila melanogaster mutants which can over-express or knock-out specific genes when crossed with stage and tissue-specific GAL4 fly lines. http://biodatabase.org/index.php/GeniSys Not sure if that is the kind of thing that you are looking for... perhaps if you provide a bit more detail? Dan. > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- hello From hlapp at gmx.net Tue Aug 21 18:01:38 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 21 Aug 2007 18:01:38 -0400 Subject: [BiO BB] getting a phylogenetic tree w/distances In-Reply-To: <4b11f87e0708202044t518353aaw6df9e1e477ec3e31@mail.gmail.com> References: <4b11f87e0708202044t518353aaw6df9e1e477ec3e31@mail.gmail.com> Message-ID: <955CAD8A-446F-458C-A200-24E22056C8F0@gmx.net> Have you tried TimeTree (http://www.timetree.net)? It would at least give you pairwise distances. -hilmar On Aug 20, 2007, at 11:44 PM, Ilya Shlyakhter wrote: > Where can I get a phylogenetic tree, with distances for all branches, > for the organisms { mouse, rat, dog, cow, human }? I can get the > structure of the tree from ToLWeb.org, but not the distances; and > TreeBASE.org doesn't have a tree for this specific set. I need the > tree for use with a phylogeny-aware Gibbs motif finder. > > Thanks for help, > > Ilya > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ahmed at pobox.com Tue Aug 21 18:40:39 2007 From: ahmed at pobox.com (Ahmed Moustafa) Date: Tue, 21 Aug 2007 17:40:39 -0500 Subject: [BiO BB] Drosophila GO annotation pie charts In-Reply-To: <1187711380.46cb0994acee5@webmail.tau.ac.il> References: <1187711380.46cb0994acee5@webmail.tau.ac.il> Message-ID: <46CB69E7.8010104@pobox.com> Hi Osnat, We used to use Blast2GO (http://www.blast2go.de/) to get pie charts of GO terms for FASTA sequences, so it might help. Ahmed On 8/21/2007 10:49 AM, Osnat Dafni wrote: > Hello, > > I'm looking for a toll that can output a pie chart of GO annotations for a given > list of genes. > > Is anyone aware of such a thing? > > Thanks, > Osnat From mockeldritch at yahoo.com Thu Aug 23 08:35:53 2007 From: mockeldritch at yahoo.com (Sid) Date: Thu, 23 Aug 2007 12:35:53 +0000 (UTC) Subject: [BiO BB] Re: Observation: multiple sequence alignment affected by the input sequence order References: <633548.20935.qm@web51412.mail.re2.yahoo.com> Message-ID: I've never observed this to happen, and it didn't happen with the specific test I tried. Could you tell us specifically which sequences you are aligning with it so that we can try and reproduce the error? If it *does* happen then it is quite definitely a flaw in the program, as it certainly shouldn't. From iain.m.wallace at gmail.com Fri Aug 24 06:02:06 2007 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Fri, 24 Aug 2007 11:02:06 +0100 Subject: [BiO BB] Observation: multiple sequence alignment affected by the input sequence order In-Reply-To: <633548.20935.qm@web51412.mail.re2.yahoo.com> References: <633548.20935.qm@web51412.mail.re2.yahoo.com> Message-ID: <8cff3eb80708240302j515b25f7i891472af51abe855@mail.gmail.com> Hi, I find this behavior very strange, as the programmes are designed not to exhibit this behavior. The first step in must alignment programmes is an all against all comparison, from which a tree is built. This tree is then used to determine the order in which the sequences are aligned. There is no dependence on input order in any of the alignment methods mentioned. There are a few methods that can be used to compare alignments (and to make sure that they are identical when only the ordering is changed), such as aln_compare from Cedric Notredame, Q_score from Bob Edgar ( http://www.drive5.com/) or veralign from Jaap Heringa (online server, http://zeus.cs.vu.nl/programs/veralignwww/) I would recommend that you redo your alignment using any of the programmes you mentioned, and then change the input order and then compare the two alignments....FYI clustal has an option to output the alignment in the order that the sequences were aligned, and this shouldn't change regardless of the input order. Hope this helps Iain On 8/16/07, Hongyu Zhang wrote: > > Dear all, > > I've observed that several multiple sequence alignment programs, including > ProbCons, ClustalW and Musle, all share the same behavior, i.e., given a > group of sequences in FASTA format as the input, if I change the order of > the sequences in the input file, the results generated by those programs > will change as well. It's not just the sequence order that will change, but > also the amino acid matches. > > I think it's a little counter-intuitive because one would expect the > opposite. Is there a program that can output a stable alignment independent > of the input sequence order? Thanks! > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From marchywka at hotmail.com Mon Aug 27 15:16:18 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 27 Aug 2007 15:16:18 -0400 Subject: [BiO BB] Observation: multiple sequence alignment affected by theinput sequence order In-Reply-To: <8cff3eb80708240302j515b25f7i891472af51abe855@mail.gmail.com> Message-ID: Does there exist an automated test script somewhere for validating clustalw builds? I tried to send e-mail to the "README" contact points in their distribution but at least 2 bounced, not sure on third yet. Anyway, I have a c++ version as I needed to modify the output to work with some other code I'm writing. That worked- if anyone wants to convert the clustalw data structures to an std::vector I have a reasonably contained class to do that that appears to work ( SeqTy is just a shell right now with sequence and name but you can add to it ). However, I then went ahead and got the rest of their code to build under c++ but I don't have anyway to test beyond my immediate interests ( there is a lot of hard to follow stuff with memory allocation and 0/1 base subscripts all over). FWIW, if you get the clustalw source the README contains references to the underlying algorithm papers. I try to put links in my source code and live links ( -help foo ) in scripts that open browsers or download webpages for help ( hard to maintain but ok for many things) - grepping source code for links isn't for everyone however :) Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. >From: "Iain Wallace" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: Re: [BiO BB] Observation: multiple sequence alignment affected by >theinput sequence order >Date: Fri, 24 Aug 2007 11:02:06 +0100 > >Hi, > >I find this behavior very strange, as the programmes are designed not to >exhibit this behavior. >The first step in must alignment programmes is an all against all >comparison, from which a tree is built. This tree is then used to determine >the order in which the sequences are aligned. There is no dependence on >input order in any of the alignment methods mentioned. > >There are a few methods that can be used to compare alignments (and to make >sure that they are identical when only the ordering is changed), such as >aln_compare from Cedric Notredame, Q_score from Bob Edgar ( >http://www.drive5.com/) or veralign from Jaap Heringa (online server, >http://zeus.cs.vu.nl/programs/veralignwww/) > >I would recommend that you redo your alignment using any of the programmes >you mentioned, and then change the input order and then compare the two >alignments....FYI clustal has an option to output the alignment in the >order >that the sequences were aligned, and this shouldn't change regardless of >the >input order. > >Hope this helps > >Iain > > > >On 8/16/07, Hongyu Zhang wrote: > > > > Dear all, > > > > I've observed that several multiple sequence alignment programs, >including > > ProbCons, ClustalW and Musle, all share the same behavior, i.e., given a > > group of sequences in FASTA format as the input, if I change the order >of > > the sequences in the input file, the results generated by those programs > > will change as well. It's not just the sequence order that will change, >but > > also the amino acid matches. > > > > I think it's a little counter-intuitive because one would expect the > > opposite. Is there a program that can output a stable alignment >independent > > of the input sequence order? Thanks! > > > > _______________________________________________ > > General Forum at Bioinformatics.Org - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ Tease your brain--play Clink! Win cool prizes! http://club.live.com/clink.aspx?icid=clink_hotmailtextlink2 From bioinfosm at gmail.com Tue Aug 28 17:27:36 2007 From: bioinfosm at gmail.com (Samantha Fox) Date: Tue, 28 Aug 2007 16:27:36 -0500 Subject: [BiO BB] genomatix Message-ID: <726450810708281427n71b0825fo855af60fafbf524b@mail.gmail.com> Hello all ... Any idea on the genomatix tool and its usage. Is it useful for analysis and any good tutorials or statistics on how it performs? Any comparison with other available tools ? Thanks !! ~S From k.ye at lacdr.leidenuniv.nl Thu Aug 30 04:32:07 2007 From: k.ye at lacdr.leidenuniv.nl (Kai Ye) Date: Thu, 30 Aug 2007 10:32:07 +0200 Subject: [BiO BB] Aligning sequence from pdb to an established MSA In-Reply-To: <20070829160145.EC527368498@primary.bioinformatics.org> References: <20070829160145.EC527368498@primary.bioinformatics.org> Message-ID: <002401c7eae0$40afb7a0$1a71e584@fwnc.net> Dear all: Any idea about a linux program to align the sequence extracted from pdb with an established MSA. In the MSA there may or may not be a sequence identical to the one from pdb. Thanks in advance! Kai From dan.bolser at gmail.com Fri Aug 31 12:31:40 2007 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 31 Aug 2007 18:31:40 +0200 Subject: [BiO BB] Aligning sequence from pdb to an established MSA In-Reply-To: <002401c7eae0$40afb7a0$1a71e584@fwnc.net> References: <20070829160145.EC527368498@primary.bioinformatics.org> <002401c7eae0$40afb7a0$1a71e584@fwnc.net> Message-ID: <2c8757af0708310931r751e863bj2b74acae20523b3a@mail.gmail.com> HMM like HMMER can do this. On 30/08/2007, Kai Ye wrote: > > Dear all: > > Any idea about a linux program to align the sequence extracted from pdb > with > an established MSA. In the MSA there may or may not be a sequence > identical > to the one from pdb. > > Thanks in advance! > > Kai > > > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- hello From christoph.gille at charite.de Fri Aug 31 13:06:59 2007 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 31 Aug 2007 19:06:59 +0200 (CEST) Subject: [BiO BB] Aligning sequence from pdb to an established MSA In-Reply-To: <2c8757af0708310931r751e863bj2b74acae20523b3a@mail.gmail.com> References: <20070829160145.EC527368498@primary.bioinformatics.org> <002401c7eae0$40afb7a0$1a71e584@fwnc.net> <2c8757af0708310931r751e863bj2b74acae20523b3a@mail.gmail.com> Message-ID: <61761.84.190.71.176.1188580019.squirrel@webmail.charite.de> > HMM like HMMER can do this. Is HMMER considering 3D information? clustalw and Tcoffee are the two programs I knwow that can align a sequence to an alignment. But they just take the sequence and disregard 3D information. From dtheobald at brandeis.edu Fri Aug 31 13:05:34 2007 From: dtheobald at brandeis.edu (Douglas Theobald) Date: Fri, 31 Aug 2007 13:05:34 -0400 Subject: [BiO BB] Aligning sequence from pdb to an established MSA In-Reply-To: <2c8757af0708310931r751e863bj2b74acae20523b3a@mail.gmail.com> References: <20070829160145.EC527368498@primary.bioinformatics.org> <002401c7eae0$40afb7a0$1a71e584@fwnc.net> <2c8757af0708310931r751e863bj2b74acae20523b3a@mail.gmail.com> Message-ID: <2100DB51-785D-44B7-9C25-F76DCC0597EF@brandeis.edu> Many alignment programs can do it, like MUSCLE or even CLUSTALW. On Aug 31, 2007, at 12:31 PM, Dan Bolser wrote: > HMM like HMMER can do this. > > On 30/08/2007, Kai Ye wrote: >> >> Dear all: >> >> Any idea about a linux program to align the sequence extracted >> from pdb >> with >> an established MSA. In the MSA there may or may not be a sequence >> identical >> to the one from pdb. >> >> Thanks in advance! >> >> Kai >> >> >> >> _______________________________________________ >> General Forum at Bioinformatics.Org - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > > > -- > hello > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From dan.bolser at gmail.com Fri Aug 31 15:56:55 2007 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 31 Aug 2007 21:56:55 +0200 Subject: [BiO BB] Aligning sequence from pdb to an established MSA In-Reply-To: <2c8757af0708311229r4f99d388i9cfbad46958e49d6@mail.gmail.com> References: <2c8757af0708310931r751e863bj2b74acae20523b3a@mail.gmail.com> <2c8757af0708311229r4f99d388i9cfbad46958e49d6@mail.gmail.com> Message-ID: <2c8757af0708311256w3f3b239bq1206f0a5727a3e84@mail.gmail.com> On 31/08/2007, Mike Marchywka wrote: > > > Do you have a link to any HMM code downloads? I've been using clustalw but > looking for other stuff. I think someone posted links before but I don't > have > any handy. You can grab HMMER here http://hmmer.janelia.org/ HMMER does not currently use any structural information to align sequences (although it could in principle). SAM uses various predicted structural descriptors to improve homologue detection, but I am not sure what the release/development status of that program is. I seem to remember that performing explicit structural alignments to generate a seed sequence for an iterative HMM building job actually performs slightly worse at homologue detection than just using a clustalw alignment as the seed. The theory being that because the HMM is iteratively built it doesn't really matter that much how good the seed alignment is. I think that finding was described in the papers describing the SUPERFAMILY database of HMM's. HMMER was just one idea for the solution to the question given an alignment, how do I align a sequence to the alignment. You can do that with the 'hmmalign' app. For example here; http://bioweb.pasteur.fr/seqanal/interfaces/hmmalign.html I don't know if this is the best / easiest solution. Thanks. > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > Note: Hotmail is blocking my mom's entire > ISP claiming it is to reduce spam but probably > to force users to use hotmail. Please DON'T > assume I am ignoring you and try > me on marchywka at yahoo.com if no reply > here. Thanks. > > > > > > >From: "Dan Bolser" > >Reply-To: "General Forum at Bioinformatics.Org " > > > >To: "General Forum at Bioinformatics.Org" > >< bio_bulletin_board at bioinformatics.org> > >Subject: Re: [BiO BB] Aligning sequence from pdb to an established MSA > >Date: Fri, 31 Aug 2007 18:31:40 +0200 > > > >HMM like HMMER can do this. > > > >On 30/08/2007, Kai Ye wrote: > > > > > > Dear all: > > > > > > Any idea about a linux program to align the sequence extracted from > pdb > > > with > > > an established MSA. In the MSA there may or may not be a sequence > > > identical > > > to the one from pdb. > > > > > > Thanks in advance! > > > > > > Kai > > > > > > > > > > > > _______________________________________________ > > > General Forum at Bioinformatics.Org - > > > BiO_Bulletin_Board at bioinformatics.org > > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > > > > >-- > >hello > >_______________________________________________ > >General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _________________________________________________________________ > Learn.Laugh.Share. Reallivemoms is right place! > http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us > > -- hello -- hello