From golharam at umdnj.edu Mon Aug 3 13:45:57 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 03 Aug 2009 13:45:57 -0400 Subject: [BiO BB] time efficient global alignment algorithm Message-ID: <4A772255.4030301@umdnj.edu> I'm trying to perform a large amount of sequence alignments of long DNA sequences, some up to 163,000+ bp in length. I was trying to use the standard Needleman-Wunsch algorithm, but the matrix used requires a large amount of memory...about 100 GB of memory. This obviously won't work. I tried using stretcher from the EMBOSS package, but it takes way too long to align each pair of sequences. I'm looking for something that can perform alignments fast using a reasonable amount of memory. I found one tool, called AVID, but have been unsuccessful in getting it to run to the sequence set I have. Before I go an try to develop a new solution to this, does anyone have or recommend a program to perform a large number of global pairwise alignments for long sequences? Ideally, something with the speed similar to BLAST. Ryan From marchywka at hotmail.com Mon Aug 3 16:29:02 2009 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 3 Aug 2009 16:29:02 -0400 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: <4A772255.4030301@umdnj.edu> References: <4A772255.4030301@umdnj.edu> Message-ID: ---------------------------------------- > Date: Mon, 3 Aug 2009 13:45:57 -0400 > From: golhara > To: bbb at bioinformatics.org > Subject: [BiO BB] time efficient global alignment algorithm > > I'm trying to perform a large amount of sequence alignments of long DNA > sequences, some up to 163,000+ bp in length. I was trying to use the > standard Needleman-Wunsch algorithm, but the matrix used requires a > large amount of memory...about 100 GB of memory. This obviously won't work. How many were you trying to align? You mean 163kb or 163Mb? I was looking for test or comparisons for some alignment code I had which indexed the target sequences, don't recall the suggestions for that discussion but I was able to do simple genomes reasonably well ( I think I used 2 strains of e coli or something about 5 megs long) on a desktop. If you can find responses to my request from a few years ago that may ( or may not ) help. I'd offer my code, and indeed I think I have it on a website, but I stopped development and not sure it is nearly useful as-is unless you just want coarse alignment on two similar sequences. Many implementations of just about anything are bad with memory management- sometimes just blocking or sorting or compacting the internal representation can make a big improvement. Not sure what exists along these lines but often some simplifcations don't change results but decrease time/memory on futile possibilities. > > I tried using stretcher from the EMBOSS package, but it takes way too > long to align each pair of sequences. I'm looking for something that > can perform alignments fast using a reasonable amount of memory. > > I found one tool, called AVID, but have been unsuccessful in getting it > to run to the sequence set I have. > > Before I go an try to develop a new solution to this, does anyone have > or recommend a program to perform a large number of global pairwise > alignments for long sequences? Are all of these nominally the same or are you trying to align noise to noise? > > Ideally, something with the speed similar to BLAST. I guess in an odd way my approach could get there as it essentially queries each string for "interesting" short sequences but I'd have to check order ( howmany of these does it use etc). Last time I checked the academic lit, IIRC this exact-string matching was an open research area maybe there have been advancements in last few years that are trivial to code or exist in an academic's lab. > > Ryan > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Get your vacation photos on your phone! http://windowsliveformobile.com/en-us/photos/default.aspx?&OCID=0809TL-HM From dan.bolser at gmail.com Tue Aug 4 04:41:23 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 09:41:23 +0100 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: <4A772255.4030301@umdnj.edu> References: <4A772255.4030301@umdnj.edu> Message-ID: <2c8757af0908040141p116e87e7rbfa9caab5006feec@mail.gmail.com> 2009/8/3 Ryan Golhar : > I'm trying to perform a large amount of sequence alignments of long DNA > sequences, some up to 163,000+ bp in length. ?I was trying to use the > standard Needleman-Wunsch algorithm, but the matrix used requires a large > amount of memory...about 100 GB of memory. ?This obviously won't work. For two sequences in the region of > 85% similarity, MUMMER [1] works very well. For example, aligning two strains of e. coli on my desktop, both in the region of 460 kb: * U00096 (Escherichia coli str. K-12 substr. MG1655) * CP000948 (Escherichia coli str. K12 substr. DH10B) time nucmer U00096.fasta CP000948.fasta real 0m14.035s user 0m11.370s sys 0m0.400s It uses k-mer based alignment heuristics to do things very quickly and efficiently. HTH, Dan. [1] http://mummer.sourceforge.net/ > I tried using stretcher from the EMBOSS package, but it takes way too long > to align each pair of sequences. ?I'm looking for something that can perform > alignments fast using a reasonable amount of memory. > > I found one tool, called AVID, but have been unsuccessful in getting it to > run to the sequence set I have. > > Before I go an try to develop a new solution to this, does anyone have or > recommend a program to perform a large number of global pairwise alignments > for long sequences? > > Ideally, something with the speed similar to BLAST. > > Ryan > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From nuin at genedrift.org Mon Aug 3 16:14:01 2009 From: nuin at genedrift.org (Paulo Nuin) Date: Mon, 3 Aug 2009 16:14:01 -0400 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: <4A772255.4030301@umdnj.edu> References: <4A772255.4030301@umdnj.edu> Message-ID: <6EA37FCD-3E26-4571-895A-ABA50BD97F49@genedrift.org> Hi MUMMer never failed me. Check at http://mummer.sourceforge.net/ HTH Paulo On 3-Aug-09, at 1:45 PM, Ryan Golhar wrote: > I'm trying to perform a large amount of sequence alignments of long > DNA sequences, some up to 163,000+ bp in length. I was trying to > use the standard Needleman-Wunsch algorithm, but the matrix used > requires a large amount of memory...about 100 GB of memory. This > obviously won't work. > > I tried using stretcher from the EMBOSS package, but it takes way > too long to align each pair of sequences. I'm looking for something > that can perform alignments fast using a reasonable amount of memory. > > I found one tool, called AVID, but have been unsuccessful in getting > it to run to the sequence set I have. > > Before I go an try to develop a new solution to this, does anyone > have or recommend a program to perform a large number of global > pairwise alignments for long sequences? > > Ideally, something with the speed similar to BLAST. > > Ryan > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb From golharam at umdnj.edu Tue Aug 4 10:26:20 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 04 Aug 2009 10:26:20 -0400 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: References: <4A772255.4030301@umdnj.edu> Message-ID: <4A78450C.2030700@umdnj.edu> >> I'm trying to perform a large amount of sequence alignments of long DNA >> sequences, some up to 163,000+ bp in length. I was trying to use the >> standard Needleman-Wunsch algorithm, but the matrix used requires a >> large amount of memory...about 100 GB of memory. This obviously won't work. > > How many were you trying to align? You mean 163kb or 163Mb? > I was looking for test or comparisons for some alignment code I > had which indexed the target sequences, don't recall the suggestions > for that discussion but I was able to do simple genomes reasonably well > ( I think I used 2 strains of e coli or something about 5 megs long) > on a desktop. If you can find responses to my request from a few years > ago that may ( or may not ) help. I'd offer my code, and indeed I think > I have it on a website, but I stopped development and not sure > it is nearly useful as-is unless you just want coarse alignment on > two similar sequences. Hundreds of thousands. I'm trying to eliminate duplicates or near duplicates (>90% similarity). I'm using the methodology from cd-hit-est. However I'm not successful in getting that application to run on the number of sequences I have. Right now, I'm trying to cluster the nt database, however later I would like to cluster other sequences from other sources. > Many implementations of just about anything are bad with > memory management- sometimes just blocking or sorting or > compacting the internal representation can make a big improvement. > Not sure what exists along these lines but often some simplifcations > don't change results but decrease time/memory on futile possibilities. Agreed. However in doing the dynamic programming matrix, you still need to allocate an m x n matrix of ints. With sequences of 163,000 bp in length, you need about 100GB of RAM. Unless there is a way to using a compact representation of the DP matrix that I'm not aware of. > Are all of these nominally the same or are you trying to align > noise to noise? Yes, they are nominally the same...they have at least 50% of the non-overlapping words of the shorter of the two sequences. >> Ideally, something with the speed similar to BLAST. > > I guess in an odd way my approach could get there as it essentially > queries each string for "interesting" short sequences but I'd have to > check order ( howmany of these does it use etc). Last time I checked the > academic lit, IIRC this exact-string matching was an open research area maybe there have been advancements > in last few years that are trivial to code or exist in an academic's lab. If there are, I haven't heard of any. My thought was to run a BLAST alignment on the two sequences using bl2seq. Then string together the non-overlapping HSPs and perform a global alignment on the regions in between the HSPs. This is easy enough, but I want to see if there is a solution already out there first. Ryan From marty.gollery at gmail.com Tue Aug 4 16:54:07 2009 From: marty.gollery at gmail.com (Martin Gollery) Date: Tue, 4 Aug 2009 13:54:07 -0700 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: <4A772255.4030301@umdnj.edu> References: <4A772255.4030301@umdnj.edu> Message-ID: Ryan, are you trying to do lots of one-to-one alignments, or one very large multiple sequence alignment? Marty On Mon, Aug 3, 2009 at 10:45 AM, Ryan Golhar wrote: > I'm trying to perform a large amount of sequence alignments of long DNA > sequences, some up to 163,000+ bp in length. I was trying to use the > standard Needleman-Wunsch algorithm, but the matrix used requires a large > amount of memory...about 100 GB of memory. This obviously won't work. > > I tried using stretcher from the EMBOSS package, but it takes way too long > to align each pair of sequences. I'm looking for something that can perform > alignments fast using a reasonable amount of memory. > > I found one tool, called AVID, but have been unsuccessful in getting it to > run to the sequence set I have. > > Before I go an try to develop a new solution to this, does anyone have or > recommend a program to perform a large number of global pairwise alignments > for long sequences? > > Ideally, something with the speed similar to BLAST. > > Ryan > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com From golharam at umdnj.edu Tue Aug 4 17:00:47 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 04 Aug 2009 17:00:47 -0400 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: References: <4A772255.4030301@umdnj.edu> Message-ID: <4A78A17F.8050902@umdnj.edu> One-to-one alignments Martin Gollery wrote: > Ryan, are you trying to do lots of one-to-one alignments, or one very large > multiple sequence alignment? > Marty > > On Mon, Aug 3, 2009 at 10:45 AM, Ryan Golhar wrote: > >> I'm trying to perform a large amount of sequence alignments of long DNA >> sequences, some up to 163,000+ bp in length. I was trying to use the >> standard Needleman-Wunsch algorithm, but the matrix used requires a large >> amount of memory...about 100 GB of memory. This obviously won't work. >> >> I tried using stretcher from the EMBOSS package, but it takes way too long >> to align each pair of sequences. I'm looking for something that can perform >> alignments fast using a reasonable amount of memory. >> >> I found one tool, called AVID, but have been unsuccessful in getting it to >> run to the sequence set I have. >> >> Before I go an try to develop a new solution to this, does anyone have or >> recommend a program to perform a large number of global pairwise alignments >> for long sequences? >> >> Ideally, something with the speed similar to BLAST. >> >> Ryan >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > > > From u4113344 at dcsmail.anu.edu.au Tue Aug 4 19:33:47 2009 From: u4113344 at dcsmail.anu.edu.au (Luke Nguyen-Hoan) Date: Wed, 05 Aug 2009 09:33:47 +1000 Subject: [BiO BB] Short Survey of Scientific Software Development Message-ID: <4A78C55B.3000207@dcsmail.anu.edu.au> My name is Luke Nguyen-Hoan, and I am a PhD candidate in the Department of Computer Science at the Australian National University in the area of software intensive systems engineering. I am running a survey on current practices in scientific software development, and would like to invite you to take part. The survey will take approximately 10 minutes to complete, and is intended for people who have had experience in developing scientific software applications. If this does not apply to you, please accept my apologies. If you would like to participate, the survey is available online at https://apollo.anu.edu.au/default.asp?pid=3900 Thank you for your help with this research, Luke Nguyen-Hoan http://cs.anu.edu.au/~Luke.Nguyen-Hoan From dan.bolser at gmail.com Wed Aug 5 03:36:14 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 5 Aug 2009 08:36:14 +0100 Subject: [BiO BB] time efficient global alignment algorithm In-Reply-To: <4A78450C.2030700@umdnj.edu> References: <4A772255.4030301@umdnj.edu> <4A78450C.2030700@umdnj.edu> Message-ID: <2c8757af0908050036l122ce716vb8457e5bd80fe536@mail.gmail.com> 2009/8/4 Ryan Golhar : >>> I'm trying to perform a large amount of sequence alignments of long DNA >>> sequences, some up to 163,000+ bp in length. I was trying to use the >>> standard Needleman-Wunsch algorithm, but the matrix used requires a >>> large amount of memory...about 100 GB of memory. This obviously won't >>> work. >> >> How many were you trying to align? You mean 163kb or 163Mb? >> I was looking for test or comparisons for some alignment code I had which >> indexed the target sequences, don't recall the suggestions >> for that discussion but I was able to do simple genomes reasonably well ( >> I think I used 2 strains of e coli or something about 5 megs long) >> on a desktop. If you can find responses to my request from a few years ago >> that may ( or may not ) help. I'd offer my code, and indeed I think >> I have it on a website, but I stopped development and not sure >> it is nearly useful as-is unless you just want coarse alignment on >> two similar sequences. > > Hundreds of thousands. ?I'm trying to eliminate duplicates or near > duplicates (>90% similarity). ?I'm using the methodology from cd-hit-est. > ?However I'm not successful in getting that application to run on the number > of sequences I have. ?Right now, I'm trying to cluster the nt database, > however later I would like to cluster other sequences from other sources. First thing that came to mind when I read the above was cd-hit. What is cd-hit-est and how come it fails? I'm curious because I'm maintaining (or was) the cd-hit website for the project on bioinformatics.org: http://www.bioinformatics.org/cd-hit/ I'm planning to move that over into the wiki where it can (hopefully) stay more up to date. Dan. From paolo.romano at istge.it Thu Aug 6 03:35:37 2009 From: paolo.romano at istge.it (Paolo Romano) Date: Thu, 06 Aug 2009 09:35:37 +0200 Subject: [BiO BB] CFP: SWAT4LS 2009 Semantic Web Applications and Tools for Life Sciences Message-ID: <200908060736.n767Zoq7010845@clus2.istge.it> Apologies for possible multiple posts ----------------------------------------------------- First CFP: SWAT4LS Semantic Web Applications and Tools for Life Sciences 2009 ***Location and date Amsterdam, Science Park, November 20th 2009 (http://www.swat4ls.org/2009/) ***Rationale The adoption of semantic-enabled applications and collaborative social environments is ever more common in the Life Sciences. The Semantic Web provides a set of technologies and standards that are key to support semantic markup, ontology development, distributed information resources and collaborative social environemnts. Altogether the adoption of the Semantic Web in the Life Sciences has potential impact on the future of publishing, biological research and medecine. This workshop will provide a venue to present and discuss benefits and limits of the adoption of these technologies and tools in biomedical informatics and computational biology. It will showcase experiences, information resources, tools development and applications. It will bring together researchers, both developers and users, from the various fields of Biology, Bioinformatics and Computer Science, to discuss goals, current limits and some real use cases for Semantic Web technologies in Life Sciences. ***Topics Topics of interest include, but are not limited to: * Standards, Technologies, Tools for the Semantic Web o Semantic Web standards and new proposals (RDF, OWL, SKOS,... ) o Biomedical Ontologies and related tools o Formal approaches to large biomedical knowledge bases * Systems for a Semantic Web for Bioinformatics o RDF stores, Reasoners, query and visualization systems for life sciences o Semantic biomedical Web Services o Semantics aware Biological Data Integration Systems * Existing and prospective applications of the Semantic Web for Bioinformatics o Semantics aware application tools o Semantic collaborative research environments o Case studies, use cases, and scenarios ***Scientific Committee (committed so far) * Christopher J. O. Baker, Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada * Pedro Barahona, Department of Informatics, New University of Lisboa, Lisboa, Portugal * Liliana Barrio-Alvers, Transinsight GmbH, Dresden, Germany * Olivier Bodenreider, National Library of Medicine, Bethesda, United States of America * Matt-Mouley Bouamrane, School of Computer Science, University of Manchester, Manchester, United Kingdom * Werner Ceusters, NY CoE in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, United States of America * Kei Cheung, Center for Medical Informatics, Yale University School of Medicine, New Haven, United States of America * Tim Clark, Center for Innovative Computing, Harvard University, United States of America * Marie-Dominique Devignes, LORIA, Vandoeuvre les Nancy, France * Olivier Dameron, INSERM U936, University of Rennes 1, Rennes, France * Michel Dumontier, Carleton University, Ottawa, Ontario, Canada * Huajun Chen, Zhejiang University, Hangzhou, China * Duncan Hull, School of Chemistry, University of Manchester, Manchester, United Kingdom * C. Maria Keet, Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy * Graham Kemp, Chalmers University of Technology, Sweden * Jacob Tilman Koehler, Department of Molecular Biotechnology, Institute of Medical Biology, University of Troms?, Troms?, Norway * Michael Krauthammer, Department of Pathology, Yale University School of Medicine, United States of America * Martin Kuiper, Department of Pathology, Systems Biology group, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway * Patrick Lambrix, Department of Computer and Information Science, Link?ping University, Link?ping, Sweden * Phillip Lord, School of Computing Science, Newcastle University, Newcastle-upon-Tyne, United Kingdom * M. Scott Marshall, Leiden University Medical Center / University of Amsterdam, Amsterdam, The Netherlands * Chris Mungall, Lawrence Berkeley National Laboratories, United States of America * Stephan Philippi, Institute for Software Technology, University of Koblenz-Landau, Koblenz, Germany * Marco Roos, Instituut voor Informatica, University of Amsterdam, Amsterdam, The Netherlands * Alan Ruttenberg, Science Commons, Cambridge, United States of America * Matthias Samwald, DERI, Galway, Ireland, and Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria * Nigam Shah, Center for Biomedical Informatics Research, Stanford, United States of America * Michael Schr?der, Biotechnology Centre, TU Dresden, Dresden, Germany * Robert Stevens, School of Computer Science, University of Manchester, Manchester, United Kingdom * Tetsuro Toyoda, Genomic Sciences Center, RIKEN, Yokohama, Japan * Mark D. Wilkinson, iCAPTURE Center, St. Paul Hospital, Vancouver, Canada and the organizers ***Type of contributions The following possible contributions are sought: * Oral communications (regular papers) * Posters * Software demos All accepted oral communications and posters will be published with CEUR-WS. ***Deadlines * Submission openinig: 1 September 2009 * Submission of oral communications: 28 September 2009 * Submission for posters and demos: 15 October 2009 * Communication of acceptance: 23 October 2009 * Camera ready: 6 November 2009 ***Instructions All papers and posters must be in English and must be submitted through the EasyChair review system at http://www.easychair.org/conferences/?conf=swat4ls-09 . Please upload all submissions as PDF files in LNCS format (see http://www.springer.de/comp/lncs/authors.html). To ensure high quality, submitted papers will be carefully peer-reviewed by at least three members of the Scientific Committee. * Submissions for Oral communications should be between 10 and 15 pages. * Posters submissions should be between 4 and 8 pages. * Software demo proposals should also be between 4 and 8 pages. ***Proceedings All accepted oral communications and posters will be published with the CEUR-WS.org Workshop Proceedings service (see http://ceur-ws.org/). We are in the process of negotiating the possibility to have a special issue of a major bioinformatics journal related to the 2009 edition of swat4ls. To this end, a special Call will be launched shortly after the workshop, for extended and revised versions of contributions submitted to the workshop and accepted either as oral communication or poster. ***Organization * M. Scott Marshall, Leiden University Medical Center / University of Amsterdam, The Netherlands * Albert Burger, School of Mathematical and Computer Sciences, Heriot-Watt University, and Human Genetics Unit, Medical Research Council, Edinburgh, Scotland, United Kingdom * Adrian Paschke, Corporate Semantic Web, Freie Universitaet Berlin, Germany * Paolo Romano, Bioinformatics, National Cancer Research Institute, Genova, Italy * Andrea Splendiani, Biomathematics and Bioinformatics dept., Rothamsted Research, UK ----------- For any further information or clarification, please visit the website at http://www.swat4ls.org/2009 or contact the organization by email at info @ swat4ls.org Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) From Lambert at Chatham.edu Mon Aug 10 17:01:11 2009 From: Lambert at Chatham.edu (Lambert, Lisa) Date: Mon, 10 Aug 2009 17:01:11 -0400 Subject: [BiO BB] What's happened to Softberry? Message-ID: <370F994DA14AF6449AA6A1FFCBDF6D93F146E97EE9@MAILBOX.chatham.local> Does anyone know what's happened to Softberry.com? I use their FGENESH software on a regular basis, but I haven't been able to access their site at all for several days now. Lisa Lambert Chatham University From sariego9 at yahoo.com Tue Aug 11 15:11:07 2009 From: sariego9 at yahoo.com (Diego Martinez) Date: Tue, 11 Aug 2009 12:11:07 -0700 (PDT) Subject: [BiO BB] What's happened to Softberry? In-Reply-To: <370F994DA14AF6449AA6A1FFCBDF6D93F146E97EE9@MAILBOX.chatham.local> References: <370F994DA14AF6449AA6A1FFCBDF6D93F146E97EE9@MAILBOX.chatham.local> Message-ID: <96197.65709.qm@web32501.mail.mud.yahoo.com> http://www.softberry.ru/ russian site is up, I dont know why the .com does'nt resolve. if this goes down I will ask them, I used to work for on of the owners. Diego ----- Original Message ---- From: "Lambert, Lisa" To: "bbb at bioinformatics.org" Sent: Monday, August 10, 2009 3:01:11 PM Subject: [BiO BB] What's happened to Softberry? Does anyone know what's happened to Softberry.com? I use their FGENESH software on a regular basis, but I haven't been able to access their site at all for several days now. Lisa Lambert Chatham University _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From rsachdev at usc.edu Sun Aug 23 01:03:46 2009 From: rsachdev at usc.edu (Rohan Sachdeva) Date: Sat, 22 Aug 2009 22:03:46 -0700 Subject: [BiO BB] Creating animation from data? Message-ID: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com> Hi all, I have time series data. TRFLP from environmental bacterial samples to be exact. The outputs look like this http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg I would like that animate the data from month but using the actual data and not just images so it looks like a smooth animation through time. Is there anything out there that can do this? Thanks, Rohan From mbhd.pha at gmail.com Mon Aug 24 19:10:24 2009 From: mbhd.pha at gmail.com (BH) Date: Mon, 24 Aug 2009 19:10:24 -0400 Subject: [BiO BB] how to find conserved genes among viral genomes? In-Reply-To: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> Message-ID: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Hi, Does anyone know how to find the conserved genes among the genomes (virus or phage genomes in particular)? Are there Bioinformatic tools/methods available for this? Will appreciate your suggestions. Thanks. From kanagasa at i2r.a-star.edu.sg Tue Aug 25 00:35:14 2009 From: kanagasa at i2r.a-star.edu.sg (Kanagasabai Rajaraman) Date: Tue, 25 Aug 2009 12:35:14 +0800 Subject: [BiO BB] CFP: Bioinformatics Track @ ACM Symposium on Applied Computing (SAC) 2010 - due Sep 8, 2009 Message-ID: <162B8AFBFBBB2148A9A1B8F9C5753428053B5AA8@mailbe01.teak.local.net> CALL FOR PAPERS - SAC BIO 2010 Bioinformatics and Computational Systems Biology Track The 25th ACM Symposium on Applied Computing 22 - 26 March 2010 Sierre, Switzerland http://www.nrcbioinformatics.ca/acmsac2010/ *** Papers Due Sep 8, 2009 Track description and motivations The publishing of the draft of the human genome and the recent advancements in high throughput sequencing and functional genomics technologies has ushered in a new era of rapid and exponential growth of data related to how organisms function at the molecular level. A major part of the information to support this understanding is available on large number of heterogeneous databases in both structured and unstructured formats. One challenge is to obtain information and knowledge from these databases and integrate them in a semantically consistent way, in order to be able to analyze them using novel quantitative conceptual and computational approaches smoothly connecting models and experiments. This can offer life scientists a deeper system-level understanding of fundamental biological principles. Examples of computational challenges in this new research paradigm, called systems biology, include identification of biological pathways, structure annotation of proteins, inference of biochemical networks and pathways using experimental data, information, and knowledge scattered over heterogeneous databases. The convergence of computer science and biology is both a data- and model- driven new science that necessitates the development of mathematical/computational models and data mining algorithms, that can enable scientists and bio-engineers to analyze with predictive ability biological information that guide the development of therapeutic and biotechnology solutions. This track is motivated by the rapidly growing importance of the informatics vision for novel levels of understanding in complex biological and biomedical systems, and will address research issues related to the whole spectrum of bioinformatics with a particular focus on integrative, inferential and translational bioinformatics. List of topics Papers are solicited in, but not limited to the following areas: * Algebraic biology * Bio imaging * Bioinformatics for drug design & discovery * Biological databases, warehousing and management * Biomedical data integration, metadata & ontologies * Biomarker identification and annotation * Biomedical text mining * Computational and Comparative genomics * Data visualization and visual analytics * Disease informatics * Evolution and phylogenetics * Gene expression/regulation & microarrays * Healthcare applications * High-performance bio-computing * Inference of biochemical network models from experimental data * Integrative bioinformatics * Laboratory information management systems in biology * Model driven analysis of biological systems * Modeling, analysis and Inference of gene and protein networks * Molecular modeling and simulation * Molecular sequence analysis * Pathways identification * Population genetics * Proteomics * Protein & RNA structure and function * Protein structure prediction and modeling * Recognition of genes and regulatory elements * Semantic technologies for life sciences * Sequence analysis & alignment * SNPs, mutations and haplotyping * Structural bioinformatics * Tool integration, web services and workflow systems Papers submission All submissions should represent original and previously unpublished works that are currently not under review in any conference or journal. Both basic and applied research papers are welcome. The author(s) name(s) and address(s) must NOT appear in the body of the submitted paper, and self-references should be in the third person. This is to facilitate blind review required by ACM. All submitted papers must include the paper identification number on the front page, above the title of the paper provided to you by the eCMS when you register your paper. All enquiries and questions should be directed to the Track Chairs. Additional details are available at the track home page at http://www.nrcbioinformatics.ca/acmsac2010/ . Important dates Paper submission: September 8, 2009 Notification of paper acceptance/rejection: October 19, 2009 Camera ready: November 2, 2009 Conference Paper Publication All papers will be fully refereed and undergo a blind review process by at least three referees. The conference proceedings will be published by ACM. Hence, all accepted papers should be submitted in ACM 2-column camera-ready format for publication in the symposium proceedings. The final version of the paper should not be more than 5 pages long. An additional 3 pages are allowed with a charge of 80USD per extra page. Final Camera-ready submissions must follow the template available at: http://www.acm.org/conferences/sac/sac2010/. Publication in Journal/Book Expanded versions of selected papers will be published as a special IGI Global book volume. Authors will be contacted after the presentation of these papers at the SAC Conference. Poster Publication of Selected Papers A set of selected papers will be accepted as poster papers by invitation only and will be published as short papers in the symposium proceedings. Track chairs Paola Lecca, Ph.D. Microsoft Research Center University of Trento, Italy. Email: lecca at cosbi.eu Kanagasabai Rajaraman, Ph.D. Institute for Infocomm Research, Singapore. E-mail: kanagasa at i2r.a-star.edu.sg Dan Tulpan, Ph.D. Institute of Information Technology National Research Council of Canada, Canada E-mail: dan.tulpan at nrc-cnrc.gc.ca Institute for Infocomm Research disclaimer: "This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you." From dan.bolser at gmail.com Tue Aug 25 02:32:32 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 25 Aug 2009 07:32:32 +0100 Subject: [BiO BB] Creating animation from data? In-Reply-To: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com> References: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com> Message-ID: <2c8757af0908242332t2d7b2ca5wa52f88f714ed5239@mail.gmail.com> 2009/8/23 Rohan Sachdeva : > Hi all, > I have time series data. TRFLP from environmental bacterial samples to be > exact. The outputs look like this > http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg > > I would like that animate the data from month but using the actual data and > not just images so it looks like a smooth animation through time. Is there > anything out there that can do this? You can do this with imagemagick or ffmpeg (or both!)... Probably lots of other tools too! From dan.bolser at gmail.com Tue Aug 25 02:34:50 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 25 Aug 2009 07:34:50 +0100 Subject: [BiO BB] how to find conserved genes among viral genomes? In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Message-ID: <2c8757af0908242334vb5a93cai29f84915b3a639ea@mail.gmail.com> 2009/8/25 BH : > Hi, > > Does anyone know how to find the conserved genes among the genomes (virus or > phage genomes in particular)? Are there Bioinformatic tools/methods > available for this? The very broad method that is directly applicable is simply 'sequence alignment'. > Will appreciate your suggestions. Thanks. > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From marchywka at hotmail.com Tue Aug 25 07:40:25 2009 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 25 Aug 2009 07:40:25 -0400 Subject: [BiO BB] Creating animation from data? In-Reply-To: <2c8757af0908242332t2d7b2ca5wa52f88f714ed5239@mail.gmail.com> References: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com> <2c8757af0908242332t2d7b2ca5wa52f88f714ed5239@mail.gmail.com> Message-ID: ---------------------------------------- > Date: Tue, 25 Aug 2009 07:32:32 +0100 > From: > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] Creating animation from data? > > 2009/8/23 Rohan Sachdeva : >> Hi all, >> I have time series data. TRFLP from environmental bacterial samples to be >> exact. The outputs look like this >> http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg >> >> I would like that animate the data from month but using the actual data and >> not just images so it looks like a smooth animation through time. Is there >> anything out there that can do this? > > You can do this with imagemagick or ffmpeg (or both!)... Probably lots > of other tools too! What do you mean by "using the data?" You want to end up with a video file and then you want some way to generate the intermediate frames? You are looking for a tool to interpolate or annotate spectra? Certainly there are tools for composing video from snapshots but it isn't clear that is the problem. Do you really want some interactive data viewer or just a video? _________________________________________________________________ Hotmail? is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009 From skhadar at gmail.com Tue Aug 25 01:00:24 2009 From: skhadar at gmail.com (Shameer Khadar) Date: Tue, 25 Aug 2009 10:30:24 +0530 Subject: [BiO BB] how to find conserved genes among viral genomes? In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Message-ID: Hello BH, Can you explain the concept of conservation you are interested in is it conserved genomes among a single genomes or across all genomes ? Thanks, K. Shameer NCBS - TIFR On Tue, Aug 25, 2009 at 4:40 AM, BH wrote: > Hi, > > Does anyone know how to find the conserved genes among the genomes (virus > or > phage genomes in particular)? Are there Bioinformatic tools/methods > available for this? > > Will appreciate your suggestions. Thanks. > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From rsmsamal at gmail.com Tue Aug 25 01:01:07 2009 From: rsmsamal at gmail.com (rasmiprava samal) Date: Tue, 25 Aug 2009 10:31:07 +0530 Subject: [BiO BB] how to find conserved genes among viral genomes? In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Message-ID: The conserved patterns cannt be found in the genes. Rather we can determine them in the corresponding protein. The conserved patterns can be determined by BOXSHADE/TEXSHADE in CLC Biology workbench.BOXSHADE is for local and TEXSHADE for global alignment. regs, Rashmi. On Tue, Aug 25, 2009 at 4:40 AM, BH wrote: > Hi, > > Does anyone know how to find the conserved genes among the genomes (virus > or > phage genomes in particular)? Are there Bioinformatic tools/methods > available for this? > > Will appreciate your suggestions. Thanks. > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From liviu.vladutu at gmail.com Tue Aug 25 02:10:44 2009 From: liviu.vladutu at gmail.com (Liviu Vladutu) Date: Tue, 25 Aug 2009 09:10:44 +0300 Subject: [BiO BB] Creating animation from data? In-Reply-To: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com> References: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com> Message-ID: <7d76d3450908242310i29553d41m17412cfd8dbc8c0e@mail.gmail.com> Hi all, I have created a small script in Matlab (from Mathworks) that creates the movie from 10 images (.bmp) in this case. Images names are: 'ima1.bmp' ,..., 'ima10.bmp' . So you basically have to chop that chart (from http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg) in smaller time series and plot each one (as I did, but replace line 2 of CreateAviFromFrames the 'ima*.bmp'' from the fuf command with the right image extension). The 2 necessary files are attached. Hope that helps, Liviu === Dr. Liviu Vladutu On Sun, Aug 23, 2009 at 8:03 AM, Rohan Sachdeva wrote: > Hi all, > I have time series data. TRFLP from environmental bacterial samples to be > exact. The outputs look like this > http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg > > I would like that animate the data from month but using the actual data and > not just images so it looks like a smooth animation through time. Is there > anything out there that can do this? > > Thanks, > Rohan > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Liviu Vladutu From marty.gollery at gmail.com Tue Aug 25 09:37:16 2009 From: marty.gollery at gmail.com (Martin Gollery) Date: Tue, 25 Aug 2009 06:37:16 -0700 Subject: [BiO BB] how to find conserved genes among viral genomes? In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Message-ID: Try algorithms like BLAST, BLAT, Smith-Waterman, etc. etc. Marty On Mon, Aug 24, 2009 at 4:10 PM, BH wrote: > Hi, > > Does anyone know how to find the conserved genes among the genomes (virus > or > phage genomes in particular)? Are there Bioinformatic tools/methods > available for this? > > Will appreciate your suggestions. Thanks. > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com From marchywka at hotmail.com Tue Aug 25 13:37:14 2009 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 25 Aug 2009 13:37:14 -0400 Subject: [BiO BB] how to find conserved genes among viral genomes? In-Reply-To: References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com> <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Message-ID: ---------------------------------------- > Date: Tue, 25 Aug 2009 10:31:07 +0530 > From: > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] how to find conserved genes among viral genomes? > > The conserved patterns cannt be found in the genes. Rather we can determine > them in the corresponding protein. Is this a sweeping statement on science or your technology? That is, AFAIK there is know indication that all sinonimuous kodons are Handeled tHe same- between kemical interactions and interactions with rybOsum and nasscent protein, etc. Certainly you expect "synonymuous" codons to be generally more interchangeable than things that change the protein but I'm still not sure what your point is here. [ I don't have a thesaurus for cinnamons and it was easier to mis-type to make my point LOL] And all that intron and regulatory stuff, what about that? Is this a virus specific statement in sum weigh? > > The conserved patterns can be determined by BOXSHADE/TEXSHADE in CLC Biology > workbench.BOXSHADE is for local and TEXSHADE for global alignment. > > regs, > Rashmi. > > On Tue, Aug 25, 2009 at 4:40 AM, BH wrote: > >> Hi, >> >> Does anyone know how to find the conserved genes among the genomes (virus >> or >> phage genomes in particular)? Are there Bioinformatic tools/methods >> available for this? >> >> Will appreciate your suggestions. Thanks. >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Windows Live: Make it easier for your friends to see what you?re up to on Facebook. http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_facebook:082009 From mahef111 at link.net Tue Aug 25 18:51:00 2009 From: mahef111 at link.net (Mahmoud ElHefnawi) Date: Wed, 26 Aug 2009 01:51:00 +0300 Subject: [BiO BB] how to find conserved genes among viral genomes? References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com><8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> Message-ID: <90441A9EAF294FFEBFD21DE65A867559@compaqf52e788f> Hello, I think u can use also motif prediction tools like the famous MEME, weider, etc.. U can email me personally if u need more help.. Attached isone of my works on Influenza for similar purposes.. Would appreciate also comments. Best, Mahmoud ----- Original Message ----- From: "Mike Marchywka" To: Sent: Tuesday, August 25, 2009 8:37 PM Subject: Re: [BiO BB] how to find conserved genes among viral genomes? ---------------------------------------- > Date: Tue, 25 Aug 2009 10:31:07 +0530 > From: > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] how to find conserved genes among viral genomes? > > The conserved patterns cannt be found in the genes. Rather we can > determine > them in the corresponding protein. Is this a sweeping statement on science or your technology? That is, AFAIK there is know indication that all sinonimuous kodons are Handeled tHe same- between kemical interactions and interactions with rybOsum and nasscent protein, etc. Certainly you expect "synonymuous" codons to be generally more interchangeable than things that change the protein but I'm still not sure what your point is here. [ I don't have a thesaurus for cinnamons and it was easier to mis-type to make my point LOL] And all that intron and regulatory stuff, what about that? Is this a virus specific statement in sum weigh? > > The conserved patterns can be determined by BOXSHADE/TEXSHADE in CLC > Biology > workbench.BOXSHADE is for local and TEXSHADE for global alignment. > > regs, > Rashmi. > > On Tue, Aug 25, 2009 at 4:40 AM, BH wrote: > >> Hi, >> >> Does anyone know how to find the conserved genes among the genomes (virus >> or >> phage genomes in particular)? Are there Bioinformatic tools/methods >> available for this? >> >> Will appreciate your suggestions. Thanks. >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Windows Live: Make it easier for your friends to see what you?re up to on Facebook. http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_facebook:082009 _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb