From Manjula.Thimma at KAUST.EDU.SA Sat May 1 02:36:21 2010 From: Manjula.Thimma at KAUST.EDU.SA (Manjula P. Thimma) Date: Sat, 1 May 2010 09:36:21 +0300 Subject: [BiO BB] Programmatic access to pubmed In-Reply-To: Message-ID: Dear Dr.Christoph, Thanks for the message. I tried to access the application(not sure it is a downloable java app with jar or web-based app), via the link you gave. It invariably takes me to a 3-D protein structure viewer!. Could you please let me know, how do I get hold of this tool? Best Regards Manjula On 4/29/10 11:47 AM, "Dr. Christoph Gille" wrote: > > In our team we reconstruct metabolic networks and therefore we need to screen > hundreds of pubmed references to find evidences for certain > metabolic reactions in the literature. > > We use this simple Java-application: > http://www.bioinformatics.org/strap/strap.php?pubmed=t > > We proceed as follows: > We make a list of Pmid-numbers. > > We move the mouse over the list and then we go for a coffee to > give the system sufficient time to cache the Abstracts. > > Then we move again the mouse over the list of PMID numbers. This > time we observe color text high-lightings (defined by > Ctrl-F "Find") to appear in the abstract panel. > > If abstract is too big, then the high-lightings may appear in the > vertical scroll-bar of the text panel. > > > The system might automatically identify full text links by following the links > provided by NCBI. > In all cases PDF can be manually associated. > > In this case this visual text-mining can be performed for full text. > > > > http://www.bioinformatics.org/strap/strap.php?pubmed=t > > Appart from this, professional Literature managers such as Jabref or Mendeley > are able to load lists of pubmed abstracts. > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb From Sterten at aol.com Sat May 1 08:04:53 2010 From: Sterten at aol.com (Sterten at aol.com) Date: Sat, 1 May 2010 08:04:53 EDT Subject: [BiO BB] reading frame, putative protein Message-ID: <17b0a.8492579.390d72e5@aol.com> absence of stop codons From ketil.malde at imr.no Mon May 3 03:04:17 2010 From: ketil.malde at imr.no (Ketil Malde) Date: Mon, 03 May 2010 09:04:17 +0200 Subject: [BiO BB] reading frame, putative protein In-Reply-To: <3a679ebae8d7dfb8909a367ea8db9d61.squirrel@webmail.charite.de> (Christoph Gille's message of "Fri, 30 Apr 2010 15:46:01 +0200") References: <3a679ebae8d7dfb8909a367ea8db9d61.squirrel@webmail.charite.de> Message-ID: <87ljc15uvi.fsf@malde.org> "Dr. Christoph Gille" writes: > I want to verify or reject the hypothesis that there is a yet unknown > putative reading frame in a known coding viral gene. One caveat: I haven't worked with viral sequences. > The translated amino acid sequence does not yield any blast or prosite > hits. > Question: how can I verify by computational methods, that this could > be indeed an additional coding reading frame, resulting in an amino > acid sequence for which no similar sequence exist in todays databases. > I would first check in related sequences that the reading frame is > indeed open. What related sequences do you have? If you have other strains of the virus, conservation of the amino sequence (Ka/Ks ratio) might indicate a translated sequence. > Then I could look for irregularities in codon usage resulting from the > two overlapping reading frames. Is there enough codon bias that this gives useful evidence? > Then I might observe absense of base wobbling in the third position > due to the other reading frame. > Can you recommend any search for functional sites? > All methods based on sequence similarity will fail since there > is no similar sequence so far. I think that if you have conserved functional sites, they would show up as short BLAST hits as well. Perhaps HMM-based methods can be more sensitive. I don't think any purely computational method is going to be a solid indicator for this, is it an option to try to capture the putative transcript with PCR? Or use some other in vitro method? -k -- If I haven't seen further, it is by standing in the footprints of giants From contactus at sooryakiran.com Mon May 3 07:31:45 2010 From: contactus at sooryakiran.com (SooryaKiran) Date: Mon, 3 May 2010 04:31:45 -0700 Subject: [BiO BB] MSc Computational Biology @ University of Kerala In-Reply-To: References: Message-ID: * MSc Computational Biology* Applications have been invited for admission to the MSc Computational Biology programme of the Centre for Bioinformatics, University of Kerala. MSc Computational Biology at Centre for Bioinformatics is a UGC sponsored innovative programme, being the first of its kind in India. MSc Computational Biology aims at imparting theoretical and practical skills for the development and implementation of powerful computational algorithms, for representing, analyzing and simulating life at sub-cellular or molecular level. The ultimate product of Computational Biology is Software tools. The course is appropriate for graduates of computer science/ applications and those with mathematical and technical background. Post graduates of computational biology are suitable for employment in IT industries which take up software development work in Bioinformatics. In addition, the academic career and research career are other options. Research & development institutions in life science area, all over the world, seek computational biologists. Duration: Two years, Four Semesters (Under Credit & Semester System). Eligibility : BSc Computer Science, BCA, BSc Information Technology, BSc. Electronics, BTech. in any branch. (Life Science students are not eligible for admission.) Admission will be based on marks scored in qualifying examinations and also in the entrance examination. Entrance Examinations will have 100 objective type questions: 40 questions on computer field; 10 questions on general Biology, 50 questions on logic & numeric reasoning. Application Form and details can be downloaded from http://cbi.keralauniversity.edu The last date for submission of application is May 10, 2010. For more details contact admission.cbi at gmail.com regards, Srijith V. M. Lecturer, Centre for Bioinformatics, University of Kerala, Trivandrum 695581, INDIA Tel (O) 471-2308759 (M) +91 9895357521, +91 9387774141 URL : http://cbi.keralauniversity.edu http://srijith.skb.googlepages.com ................................... Kindly do not print this email unless its IMPORTANT. Let us be the initiative to SAVE TREES for a GREEN FUTURE DISCLAIMER :- The contents of this e-mail, including its attachment, are intended for the exclusive use of the recipient and may contain confidential or privileged information. If you are not the intended recipient, you are strictly prohibited from reading, using, disclosing, copying, or distributing this e-mail or any of its contents. If you received this e-mail in error, please notify the sender by reply e-mail immediately and permanently delete this e-mail and its attachments, along with any copies thereof. Thank you. From tirza at biomodel.os.biu.ac.il Tue May 4 03:20:46 2010 From: tirza at biomodel.os.biu.ac.il (Tirza Doniger) Date: Tue, 4 May 2010 10:20:46 +0300 Subject: [BiO BB] reading frame, putative protein In-Reply-To: <3a679ebae8d7dfb8909a367ea8db9d61.squirrel@webmail.charite.de> References: <3a679ebae8d7dfb8909a367ea8db9d61.squirrel@webmail.charite.de> Message-ID: Although this would not give you a conclusive answer, you could also check the amino acid composition of the resulting protein and see how it compares to the amino acid composition of the rest of the organisms proteome. Best, Tirza Doniger On Fri, Apr 30, 2010 at 4:46 PM, Dr. Christoph Gille < christoph.gille at charite.de> wrote: > Perhaps you could help with a reading frame problem: > > I want to verify or reject the hypothesis that there is a yet unknown > putative reading frame in a known coding viral gene. > > The translated amino acid sequence does not yield any blast or prosite > hits. > > Question: how can I verify by computational methods, that this could > be indeed an additional coding reading frame, resulting in an amino > acid sequence for which no similar sequence exist in todays databases. > > What I would do: > > I would first check in related sequences that the reading frame is > indeed open. > > Then I could look for irregularities in codon usage resulting from the > two overlapping reading frames. > > Then I might observe absense of base wobbling in the third position > due to the other reading frame. > > What would you suggest? > What else could I do? > > Can you recommend any search for functional sites? > > All methods based on sequence similarity will fail since there > is no similar sequence so far. > > Many thanks > > Christoph > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Tirza Doniger, M.Sc. Home: +972 8 976 0280 Lab: +972 3 531 8124 Cell: +972 52 530 8192 U.S. phone: +1 718 819 1446 ====================== Life is like riding a bicycle. To keep your balance you must keep moving. -----Albert Einstein From bc2-conference at unibas.ch Tue May 4 08:42:24 2010 From: bc2-conference at unibas.ch ([BC]2 Conference Organization) Date: Tue, 04 May 2010 14:42:24 +0200 Subject: [BiO BB] Registration reminder: [BC]2 Basel Computational Biology Conference: Regulation and Control in Biological Systems, June 24 & 25, 2010 Message-ID: <4BE01630.5030809@unibas.ch> Dear colleagues, registration for the 8th [BC]2 Basel Computational Biology Conference: "Regulation and Control in Biological Systems" June 24 & 25, 2010 at the Conference Center Basel, Switzerland is open on the conference web site: http://www.bc2.ch/2010/ Invited speakers include: * Marvin Cassman (San Francisco, CA, USA) * Dalia Cohen (Rosetta Genomics, Rehovot, Israel & Philadelphia, PA, USA) * Keith O. Elliston (Genstruct Inc., Cambridge, MA, USA) * Ernst Hafen (Institute of Molecular Systems Biology, ETH Z?rich) * Vassily Hatzimanikatis (EPF Lausanne & SIB) * Dagmar Iber (D-BSSE, ETH Z?rich & SIB) * Douglas A. Lauffenburger (MIT, Cambridge, MA, USA) * Ivan Montoliu (Nestl? Research Center Lausanne) * Erik van Nimwegen (Biozentrum University of Basel & SIB) * Corrado Priami (Microsoft Research - University of Trento Centre for Computational and Systems Biology, Italy) * Uwe Sauer (Institute of Molecular Systems Biology, ETH Z?rich) * Ehud Shapiro (Weizmann Institute of Science, Rehovot, Israel) * J?rg Stelling (ETH Z?rich & SIB) * Mihaela Zavolan (Biozentrum University of Basel & SIB) * Philip Zimmermann (ETH Z?rich) * Winners of the SIB young bioinformatician awards 2010 For abstracts, program details, and registration please visit the conference web site: http://www.bc2.ch/2010/ Looking forward to welcoming you in Basel! Torsten Schwede & Manuel Peitsch PS: Please register on line before *May 21, 2010*. ---- [BC]2 Congress Organization c/o Prof. Torsten Schwede Swiss Institute of Bioinformatics & Biozentrum, University of Basel Klingelbergstr. 50-70 4056 Basel/Switzerland http://www.bc2.ch/2010/ From pkhurana08 at gmail.com Wed May 5 05:16:28 2010 From: pkhurana08 at gmail.com (Pankaj Khurana) Date: Wed, 5 May 2010 14:46:28 +0530 Subject: [BiO BB] program for sequence length Message-ID: Hi all, I have a few 1000 fasta files. I would like to get the list showing the sequence name and their respective lengths. Is there a program for this? I can write one but why reinvent the wheel. Thanking all in advance Regards, Pankaj From maria.mirto at unisalento.it Thu May 6 02:48:57 2010 From: maria.mirto at unisalento.it (Maria Mirto) Date: Thu, 6 May 2010 08:48:57 +0200 Subject: [BiO BB] CFP: CBMS2010 - Special Track (ST-04) on HealthGrid & Cloud Computing Message-ID: <1ABDE73D-89E5-4155-BC78-4EBDD28CA347@unisalento.it> --- Apology for multiple posting --- ===================================================================== 23rd IEEE International Symposium on COMPUTER-BASED MEDICAL SYSTEMS Perth, Australia, 12-15 October 2010 4th Special Track on HealthGrid & Cloud Computing http://sara.unisalento.it/cbms2010/ ===================================================================== * * * CALL FOR PAPERS - Deadline June 17, 2010 * * * ===================================================================== One of the biggest challenges in HealthCare is the integration and analysis of disparate data coming, for instance, from genomics and proteomics experiments, as well as from clinical investigations (e.g. medical images and electronic patient records) in order to discover correlations among clinical data and genetic assessment. Many computer tools, methods and platforms for the seamless integration of biomedical data and bioinformatics tools are already available and these need large computing power in areas such as: ? The medical image processing community that is facing a growing need to analyze 2D, 3D and 4D images in order to realistically simulate medical treatments or surgery (radiotherapy, plastic surgery, etc.), and to develop computer aided surgery; ? Integration of results and easy access by physicians to all of their patients? medical data anytime, anywhere. There is a tremendous potential for end-users in many fields of science, such as Bioinformatics and Biomedicine, to routinely conduct large scale computations on distributed resources by using a combination of the following technologies: ? Distributed middleware for connecting data/cluster computing centers: this includes Grid computing middleware for user?s authentication and accounting, remote job submission, resource scheduling/reservation, and data management; ? Virtualization technologies capable of providing on demand application-specific execution environments: this involves a style of computing, Cloud Computing, in which on-demand resources are provided as a service over the Internet. HealthGrid is an environment that allows sharing of resources, in which heterogeneous and dispersed health data as well as applications can be accessed by all users as a tailored information providing system according to their authorization. However, several issues such as security and management of privacy data represent one of the biggest obstacles to adopt Grid technology as health IT. Cloud computing is emerging as a model for enabling convenient, on demand network access to a shared pool of configurable resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing poses obvious challenges and opportunities for public health informatics. Among the challenges there will be an escalation of the need for secure communications and storage especially when public health data is collected and transmitted using non-healthcare infrastructure. The main goal of the track is to exchange ideas and results related to on going grid and cloud computing research in HealthCare, with a look toward Biomedicine and Bionformatics, focusing on different aspects of middleware, technologies and applications. Topics of Interest The topics of interest will include but will be not limited to: * Grid Computing Infrastructures, Middleware and Tools for Biomedicine and Bioinformatics; * Cloud infrastructures: advances and evaluation of virtualization technologies; * Medical data and metadata management; * Security in HealthGrid and Cloud; * Semantic aspects in Grid and Cloud for Biomedical Data; * Grid/cloud applications: Service and/or algorithm design and implementation applicable to health applications; * Best practices related to solving large-scale problems on grid/ cloud infrastructures; * Parallel applications on grid/cloud infrastructures: theory and practice, programming models, intercluster communications, remote execution; * Workflow Management Systems targeting HealthGrid and Cloud applications; * Scientific gateways and user environments targeting HealthGrid and Cloud applications; * Grid-based Visualization of Biomedical Data; * Integration of HealthGrid and Cloud Applications into Clinical Practice. Important Dates Submission deadline for regular papers: 17 Jun 2010 Notification of acceptation: 2 Aug 2010 Final camera ready due: 2 Sep 2010 Author registration: 2 Sep 2010 Workshop Chairs * Maria Mirto - Euro-Mediterranean Centre for Climate Change & Univ. of Salento - Italy * Giovanni Aloisio - Euro-Mediterranean Centre for Climate Change, Univ. of Salento & SPACI ? Italy * Almerico Murli, University of Naples, Italy * Tony Solomonides, Univ. of the West of England, UK * Alfredo Tirado-Ramos, Emory University, Atlanta, USA Program Committee * Robert G. Belleman (University of Amsterdam, The Netherlands) * Vincent Breton (CNRS/IN2P3, LPC Clermont-Ferrand, France) * Marian Bubak (AGH Krakow PL/ UvA Amsterdam NL) * Massimo Cafaro (University of Salento, Lecce, Italy) * Mario Cannataro (University "Magna Gr?cia" of Catanzaro, Italy) * Rita Casadio (Biocomputing Lab, University of Bologna, Italy) * Henri Casanova (University of Hawaii, USA) * Ewa Deelman (ISI/USC, USA) * Jack Dongarra (University of Tennessee, USA) * Geoffrey Fox (Indiana University, USA) * Vicente Hernandez (Universidad Politecnica de Valencia) * Dieter Kranzlmueller (Ludwing-Maximilian University Munich & Leibniz Supercomputing Centre Germany) * Mary Kratz (University of Michigan Medical School Information Services, USA) * Giuliano Laccetti (University of Naples "Federico II", Italy) * Yannick Legre (CNRS/IN2P3 France) * David Manset (University of Savoie, France, University West of England, UK, Maat G knowledge, Madrid, Spain) * Johan Montagnat (CNRS (I3S laboratory) France) * Silvia D. Olabarriaga (University of Amsterdam, The Netherlands) * Cecilia Saccone (ITB/CNR Institute of Biomedical Technologies of Bari, Italy) * Ashish Sharma (Emory University, Atlanta, USA) * Jonathan Silverstein (Computation Institute of the University of Chicago, USA) * Richard Sinnott (National e-Science Centre, Glasgow, UK) * Albert Zomaya (University of Sydney, Australia) Paper Submission and Publication We invite original previously unpublished contributions that are not submitted concurrently to a journal or another conference. Each paper must be prepared following the IEEE 2-column format and should not exceed the length of 6 (six) letter-sized pages, submitted electronically using the paper submission system prior to the submission deadline. CBMS 2010 submission web site is http://www.cbms2010.debii.curtin.edu.au All submissions will be peer-reviewed by at least three reviewers. The proceedings will be published by the IEEE Computer Society Press. At least one of the authors of accepted papers is required to register and present the work at the conference; otherwise their papers will be removed from the digital library after the conference. Please contact mariaDOTmirtoATunisalento.it it for any question. From marchywka at hotmail.com Thu May 6 21:07:22 2010 From: marchywka at hotmail.com (Mike Marchywka) Date: Thu, 6 May 2010 21:07:22 -0400 Subject: [BiO BB] program for sequence length In-Reply-To: References: Message-ID: ---------------------------------------- > Date: Wed, 5 May 2010 14:46:28 +0530 > From: pkhurana08 at gmail.com > To: bbb at bioinformatics.org > Subject: [BiO BB] program for sequence length > > Hi all, > > I have a few 1000 fasta files. I would like to get the list showing the > sequence name and their respective lengths. > Is there a program for this? You could probably write a perl or bash script to do it more quickly than you could find something and depending on your overall objective, assuming you want to do more, it may help to have something in source code that you understand. I was doing a lot of fasta manipulation and I ended up writing a c++ fasta command line utility since I needed speed? but I never documented it and keep forgetting how it works. Consider just using sed to put the name and sequence into a single line per entry and then just look at lengths using awk or something. For things I don't do very often the learning curve can be a nuisance and it is easier to "Reinvent the wheel" with a short script rather than relearn some special purpose utility. These generalpurpose text processing tools can be used anywhere. > I can write one but why reinvent the wheel. > Thanking all in advance > > Regards, > Pankaj > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 From dan.bolser at gmail.com Fri May 7 03:44:54 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 7 May 2010 08:44:54 +0100 Subject: [BiO BB] program for sequence length In-Reply-To: References: Message-ID: Here is a BioPerl script for one processing one fasta file [1]: #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $seq_io_object = Bio::SeqIO-> new( -file => @ARGV[0], -format => 'fasta' ); while(my $seq_object = $seq_io_object->next_seq){ print $seq_object->id, "\t"; print $seq_object->length, "\n"; } warn "OK\n"; Run that over your 1000 files using bash... for f in *.fasta; do echo $f my_bp_script.plx $f done (Untested!) HTH, Dan. [1] For more info see http://www.bioperl.org/wiki/HOWTO:SeqIO On 5 May 2010 10:16, Pankaj Khurana wrote: > Hi all, > > I have a few 1000 fasta files. I would like to get the list showing the > sequence name and their respective lengths. > Is there a program for this? > I can write one but why reinvent the wheel. > Thanking all in advance > > Regards, > Pankaj > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From larye at info-engineering-svc.com Thu May 6 21:23:14 2010 From: larye at info-engineering-svc.com (Larye D. Parkins) Date: Thu, 6 May 2010 18:23:14 -0700 Subject: [BiO BB] program for sequence length In-Reply-To: References: Message-ID: <1b2f7480a7e32714b027296b03cac2c4.squirrel@webmail.parkins.org> On Thu, May 6, 2010 6:07 pm, Mike Marchywka wrote: > > ---------------------------------------- >> Date: Wed, 5 May 2010 14:46:28 +0530 >> From: pkhurana08 at gmail.com >> To: bbb at bioinformatics.org >> Subject: [BiO BB] program for sequence length >> >> Hi all, >> >> I have a few 1000 fasta files. I would like to get the list showing the >> sequence name and their respective lengths. >> Is there a program for this? > infoseq, part of the EMBOSS suite should do what you want. > You could probably write a perl or bash script to do it more quickly > than you could find something and depending on your overall objective, > assuming you want to do more, it may help to have something > in source code that you understand. I was doing a lot of fasta > manipulation > and I ended up writing a c++ fasta command line utility since I needed > speed? but I never documented it and keep forgetting how it works. > > Consider just using sed to put the name and sequence into a single line > per entry > and then just look at lengths using awk or something. For things > I don't do very often the learning curve can be a nuisance and it is > easier > to "Reinvent the wheel" with a short script rather than relearn some > special purpose utility. > These generalpurpose text processing tools can be used anywhere. > > > >> I can write one but why reinvent the wheel. >> Thanking all in advance >> >> Regards, >> Pankaj >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb > > _________________________________________________________________ > Hotmail has tools for the New Busy. Search, chat and e-mail from your > inbox. > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > > > -- Larye D. Parkins Information Engineering Services 600 Turner Ave. Shelton, WA 98584 Office: 360 426 1718 Mobile: 360 350 9645 http://www.info-engineering-svc.com "Making IT work since 1965." Member of ACM, IEEE Computer Society, USENIX, SAGE, and LOPSA From gary.bader at utoronto.ca Thu May 6 22:43:05 2010 From: gary.bader at utoronto.ca (Gary Bader) Date: Thu, 6 May 2010 22:43:05 -0400 Subject: [BiO BB] Cytoscape Network Biology symposium and retreat announcement Message-ID: <03B17498-F008-46BF-9889-9170EDC47F1C@utoronto.ca> The Cytoscape Symposium/Retreat will be in Ann Arbor, Michigan on July 18-20, 2010. This year's theme is Network Biology and will feature keynotes by Dr. Leroy Hood and Dr. Stephen Friend. Developer events include tutorial sessions, Cytoscape Plugin demos, and a Hack-a-thon. Please mark your calendar and save the date -- more information will be coming soon! http://cytoscape.wodaklab.org/wiki/CytoscapeRetreat2010 Thanks, The Cytoscape Team From maximilianh at gmail.com Thu May 6 22:44:48 2010 From: maximilianh at gmail.com (Maximilian Haussler) Date: Thu, 6 May 2010 19:44:48 -0700 Subject: [BiO BB] program for sequence length In-Reply-To: References: Message-ID: the ucsc source tools include faSize, pure C, very very fast. You would need to compile though. or use biopython: from Bio import SeqIO short_sequences = [] # Setup an empty list for record in SeqIO.parse(open("cor6_6.fa", "rU"), "fasta") : print len(record.seq) I remember that I spend several days on this problem many years ago by programming everything myself in pure perl... like so many people that start in bioinformatics... cheers Max On Wed, May 5, 2010 at 2:16 AM, Pankaj Khurana wrote: > Hi all, > > I have a few 1000 fasta files. I would like to get the list showing the > sequence name and their respective lengths. > Is there a program for this? > I can write one but why reinvent the wheel. > Thanking all in advance > > Regards, > Pankaj > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From ketil.malde at imr.no Fri May 7 02:46:25 2010 From: ketil.malde at imr.no (Ketil Malde) Date: Fri, 07 May 2010 08:46:25 +0200 Subject: [BiO BB] program for sequence length In-Reply-To: (Pankaj Khurana's message of "Wed, 5 May 2010 11:16:28 +0200") References: Message-ID: <87hbmkus3i.fsf@malde.org> Pankaj Khurana writes: > I have a few 1000 fasta files. I would like to get the list showing the > sequence name and their respective lengths. If you don't mind using Haskell: import Bio.Sequence main = do ss <- readFasta "file.fasta" putStr (unlines [toStr (seqlabel s) ++ "\t" ++ show (seqlen s) | s <- ss]) You can probably do something similar using Bio{Perl,Python,Ruby,..}. -k -- If I haven't seen further, it is by standing in the footprints of giants From pfern at igc.gulbenkian.pt Fri May 7 03:56:55 2010 From: pfern at igc.gulbenkian.pt (Pedro Fernandes) Date: Fri, 07 May 2010 08:56:55 +0100 Subject: [BiO BB] program for sequence length In-Reply-To: References: Message-ID: <1273219015.4be3c7c74dedb@webmail.igc.gulbenkian.pt> Try INFOSEQ (Gary Williams), a program available with the EMBOSS package Command line: infoseq -sequence=your.fasta -only -length -auto EMBOSS is installed in many places and has open source. I agree, do not reinvent the wheel! Pedro -- Pedro Fernandes Centro Portugu?s de Bioinform?tica Instituto Gulbenkian de Ci?ncia Apartado 14 2781 OEIRAS PORTUGAL Quoting Pankaj Khurana : > Hi all, > > I have a few 1000 fasta files. I would like to get the list showing the > sequence name and their respective lengths. > Is there a program for this? > I can write one but why reinvent the wheel. > Thanking all in advance > > Regards, > Pankaj > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From akarger at CGR.Harvard.edu Fri May 7 09:36:49 2010 From: akarger at CGR.Harvard.edu (Karger, Amir) Date: Fri, 7 May 2010 09:36:49 -0400 Subject: [BiO BB] program for sequence length In-Reply-To: References: Message-ID: <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv> Check out the Scriptome (yes, this is an advertisement.) at http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which is a set of Perl one-liners you cut and paste onto your command line to do bio-y text-y thigns. Use the change_fasta_to_tab tool to change your fasta to a tab-delimited file with ID, description, sequence. Then use the calc_col_length tool on the result, which will add another column giving the length of the sequence column. You can throw that into excel and hide the sequence column (or use choose_cols_to_delete to make a file without the seqeuences themselves) and then read through it at your leisure. Feel free to contact me offline for details. -Amir Karger > -----Original Message----- > From: bbb-bounces at bioinformatics.org [mailto:bbb- > bounces at bioinformatics.org] On Behalf Of Pankaj Khurana > Sent: Wednesday, May 05, 2010 5:16 AM > To: bbb at bioinformatics.org > Subject: [BiO BB] program for sequence length > > Hi all, > > I have a few 1000 fasta files. I would like to get the list showing > the > sequence name and their respective lengths. > Is there a program for this? > I can write one but why reinvent the wheel. > Thanking all in advance > > Regards, > Pankaj > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb From logan at cacs.louisiana.edu Fri May 7 13:45:11 2010 From: logan at cacs.louisiana.edu (Raja Loganantharaj) Date: Fri, 07 May 2010 12:45:11 -0500 Subject: [BiO BB] CFP BIOT 2010 Message-ID: <4BE451A7.1010606@cacs.louisiana.edu> ---- Apology for Multiple Posting ---------- Dear Colleagues: Call for paper for the 7th annual Biotechnology and Bioinformatics Symposium to be held in the LITE auditorium at the University of Louisiana at Lafayette on October 14 and 15, 2010. Please post the attached flier in your lab or at your institution and encourage your post doc/students to submit their contributions to this unique symposium that encourages collaboration and interaction among professionals, post doc and graduate students. Please circulate the flier or forward this e-mail to those who may be interested in the symposium. We invite your original contributions in the area of genome sequencing and annotation, functional and computational genomics, and transcriptomics. Contributions in other areas of Biotechnology and Bioinformatics are also welcome. Extended versions of the best full papers will be published in the International Journal of Bioinformatics Research and Applications. For details, view the symposium web site at http://www.biotconf.org/ Important Dates: Submission Deadline : July 8, 2010* * *Acceptance Decision: Aug. 20, 2010* *Poster Submission Deadline: Sept. 10, 2010* *Final Papers due after submission: Sept. 10, 2010* Symposium will be held on 14 and 15 th of October 2010. From marchywka at hotmail.com Fri May 7 20:12:42 2010 From: marchywka at hotmail.com (Mike Marchywka) Date: Fri, 7 May 2010 20:12:42 -0400 Subject: [BiO BB] program for sequence length In-Reply-To: <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv> References: , <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv> Message-ID: ---------------------------------------- > From: akarger at CGR.Harvard.edu > To: bbb at bioinformatics.org > Date: Fri, 7 May 2010 09:36:49 -0400 > Subject: Re: [BiO BB] program for sequence length > > Check out the Scriptome (yes, this is an advertisement.) at http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which is a set of Perl one-liners you cut and paste onto your command line to do bio-y text-y thigns. I hadn't thought of this before but it is a good idea if you can search it easily, I often use google for sed/awk one liners for stuff like this and its a great way to learn the tools and get your work done. You seem to have a bit more than flat lists of one-liners but off hand I'd think this would be a generally good idea. Now to argue," you should have done that in {perl,awk,sed,java,c++} instead of {perl, awk, sed, java, c++}" LOL > > Use the change_fasta_to_tab tool to change your fasta to a tab-delimited file with ID, description, sequence. Then use the calc_col_length tool on the result, which will add another column giving the length of the sequence column. You can throw that into excel and hide the sequence column (or use choose_cols_to_delete to make a file without the seqeuences themselves) and then read through it at your leisure. > > Feel free to contact me offline for details. > > -Amir Karger > >> -----Original Message----- >> From: bbb-bounces at bioinformatics.org [mailto:bbb- >> bounces at bioinformatics.org] On Behalf Of Pankaj Khurana >> Sent: Wednesday, May 05, 2010 5:16 AM >> To: bbb at bioinformatics.org >> Subject: [BiO BB] program for sequence length >> >> Hi all, >> >> I have a few 1000 fasta files. I would like to get the list showing >> the >> sequence name and their respective lengths. >> Is there a program for this? >> I can write one but why reinvent the wheel. >> Thanking all in advance >> >> Regards, >> Pankaj >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 From marty.gollery at gmail.com Fri May 7 20:18:39 2010 From: marty.gollery at gmail.com (Martin Gollery) Date: Fri, 7 May 2010 17:18:39 -0700 Subject: [BiO BB] program for sequence length In-Reply-To: <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv> References: <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv> Message-ID: One nice thing about this approach is that you could then sort them by length, which might be very handy. You could find things like export all the sequences of length >x but wrote: > Check out the Scriptome (yes, this is an advertisement.) at > http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which > is a set of Perl one-liners you cut and paste onto your command line to do > bio-y text-y thigns. > > Use the change_fasta_to_tab tool to change your fasta to a tab-delimited > file with ID, description, sequence. Then use the calc_col_length tool on > the result, which will add another column giving the length of the sequence > column. You can throw that into excel and hide the sequence column (or use > choose_cols_to_delete to make a file without the seqeuences themselves) and > then read through it at your leisure. > > Feel free to contact me offline for details. > > -Amir Karger > > > -----Original Message----- > > From: bbb-bounces at bioinformatics.org [mailto:bbb- > > bounces at bioinformatics.org] On Behalf Of Pankaj Khurana > > Sent: Wednesday, May 05, 2010 5:16 AM > > To: bbb at bioinformatics.org > > Subject: [BiO BB] program for sequence length > > > > Hi all, > > > > I have a few 1000 fasta files. I would like to get the list showing > > the > > sequence name and their respective lengths. > > Is there a program for this? > > I can write one but why reinvent the wheel. > > Thanking all in advance > > > > Regards, > > Pankaj > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bbb > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com From vandhana0001 at gmail.com Sat May 8 21:27:39 2010 From: vandhana0001 at gmail.com (VANDHANA) Date: Sun, 9 May 2010 06:57:39 +0530 Subject: [BiO BB] Urgent help needed Message-ID: *Hello all, I am using tinker to run dynamics on my protein. The cycle files im getting are not in the pdb format ,so im unable to use them.Anyone who has used tinker kindly tell me as soon as possible how to convert these cycle files to .pdb format for evaluation. I'll be highly obliged. Thanks, Regards, Vandhana* From dan.bolser at gmail.com Mon May 10 01:37:39 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 10 May 2010 06:37:39 +0100 Subject: [BiO BB] Fwd: [Sbforum-general] BioInformatics - Student Placement In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Sandra Borthwick Date: 7 May 2010 16:46 Subject: [Sbforum-general] BioInformatics - Student Placement To: SBForum Project spec: This placement is part of the Environmental Placement Programme ? further details can be found online at www.thebep.org.uk/epp The project is to be funded by the Genecom Orphans project and is a collaboration between the Bioinformaticians at Moredun Research Institute and FIOS Genomics, a spin out company from Edinburgh University The idea of the project is to use the genomic sequences of bacterial DNA and to use the SNP and other bioinformatics techniques to analyse the difference between individual genomes thus building a phylogenetic ?tree? of the bacterium species. This will enable ready diagnosis of the different strains of the bacteria which is of commercial use to FIOS. The student will be jointly based at MRI and FIOS. The company are ideally looking for a Bioinformatics MSc student (possibly as part of a project dissertation. They would also consider a particularly strong undergraduate. Candidates will have experience of Bioinformatics techniques and programming experience in PERL is essential and experience of JACA desirable. If anyone wants to apply, they can do so by sending their CV and a cover letter to Louise Evison at louise.evison at met.org.uk -- Sandra Borthwick, Executive Assistant Scottish Bioinformatics Forum The Royal Society of Edinburgh 22-26 George Street Edinburgh EH2 2PQ Tel: ?+44 (0)131 240 2783 Fax: +44 (0)131 240 2786 email: sandra.borthwick at sbforum.org www.sbforum.org 10th International Conference on Systems Biology Edinburgh, UK. 10th-15th October 2010 http://www.icsb-2010.net/ The SBF is a project of the RSE Scotland Foundation, Scottish Charity No. SC024636 The information contained in this e-mail is confidential, intended for the above named individual/s and may be legally privileged. This message may contain personal views which are not the views of the Foundation/Forum, unless specifically stated www.rsescotlandfoundation.org.uk ?www.sbforum.org _______________________________________________ Sbforum-general mailing list Sbforum-general at sbforum.org http://sbforum.org/mailman/listinfo/sbforum-general_sbforum.org From akarger at CGR.Harvard.edu Mon May 10 20:48:50 2010 From: akarger at CGR.Harvard.edu (Karger, Amir) Date: Mon, 10 May 2010 20:48:50 -0400 Subject: [BiO BB] program for sequence length In-Reply-To: References: , <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv>, Message-ID: <1B12003244CE894E85B4726023637888215FCB6E@FASXCH01.fasmail.priv> Scriptome has several purposes: 1) Help experienced coders avoid reinventing the wheel and silly bugs. 2) Help non-programmers do simple munging using cut and paste, without learning how to program. (And by "simple" I mean somewhere between Notepad find/replace and real programming.) 3) Help novice programmers get some examples of Perl idioms, or starting points to work with. (I had this whole plan of commenting the tools, but never got the tuits.) Re #1, with the command-line Scriptome tool, so when I'm working with a client (or even by myself) I can just do: Scriptome -t change_fasta_to_tab blah.fasta > blah.tab And some of the merge and choose tools are great when exploring data. And no, arguing about languages is only slightly less stupid than fighting a land war in Asia. -Amir ________________________________________ From: bbb-bounces at bioinformatics.org [bbb-bounces at bioinformatics.org] On Behalf Of Mike Marchywka [marchywka at hotmail.com] Sent: Friday, May 07, 2010 20:12 To: bbb at bioinformatics.org Subject: Re: [BiO BB] program for sequence length ---------------------------------------- > From: akarger at CGR.Harvard.edu > To: bbb at bioinformatics.org > Date: Fri, 7 May 2010 09:36:49 -0400 > Subject: Re: [BiO BB] program for sequence length > > Check out the Scriptome (yes, this is an advertisement.) at http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which is a set of Perl one-liners you cut and paste onto your command line to do bio-y text-y thigns. I hadn't thought of this before but it is a good idea if you can search it easily, I often use google for sed/awk one liners for stuff like this and its a great way to learn the tools and get your work done. You seem to have a bit more than flat lists of one-liners but off hand I'd think this would be a generally good idea. Now to argue," you should have done that in {perl,awk,sed,java,c++} instead of {perl, awk, sed, java, c++}" LOL From akarger at CGR.Harvard.edu Mon May 10 20:49:20 2010 From: akarger at CGR.Harvard.edu (Karger, Amir) Date: Mon, 10 May 2010 20:49:20 -0400 Subject: [BiO BB] program for sequence length In-Reply-To: References: <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv>, Message-ID: <1B12003244CE894E85B4726023637888215FCB6F@FASXCH01.fasmail.priv> 10 points! We do exactly that kind of thing on the Sequences page of the Protocols section. After you get all the sequences you like (those of a certain length, those that are unique, whatever), you can use the column choosing tool to get only the ID, Desc, sequence again, and then use change_tab_to_fasta to get back a FASTA file with just the sequences of interest. A piece of cake for a bioinformaticist, but literally impossible for a non-programmer without this or a similar tool. The coolest part was watching biologists start thinking a bit more like bioinformaticists once they realized the possibilities. My goal was to give non-programmers these tools, so that we coders would be free to work on more interesting, hard stuff. (I never quite got to the "Profit!" step.) _Amir ________________________________________ From: bbb-bounces at bioinformatics.org [bbb-bounces at bioinformatics.org] On Behalf Of Martin Gollery [marty.gollery at gmail.com] One nice thing about this approach is that you could then sort them by length, which might be very handy. You could find things like export all the sequences of length >x but wrote: > Check out the Scriptome (yes, this is an advertisement.) at > http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which > is a set of Perl one-liners you cut and paste onto your command line to do > bio-y text-y thigns. > > Use the change_fasta_to_tab tool to change your fasta to a tab-delimited > file with ID, description, sequence. Then use the calc_col_length tool on > the result, which will add another column giving the length of the sequence > column. You can throw that into excel and hide the sequence column (or use > choose_cols_to_delete to make a file without the seqeuences themselves) and > then read through it at your leisure. From chea2 at mail.nih.gov Thu May 13 16:02:05 2010 From: chea2 at mail.nih.gov (Che, Anney (NIH/NCI) [E]) Date: Thu, 13 May 2010 16:02:05 -0400 Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> Message-ID: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> Hi everyone, I have questions regarding on how to create a transcription binding sites profile. Since I have the sequences of the areas that bind to the gene from Chip-ChIp , I can align all the sequences then with the alignment that created to generate a profile. Since this is high-throughput, does any one know any tool that can align short sequences programmatically and then output an alignment file? Also a program that generates an alignment profile from an alignment. Thanks, Anney From dan.bolser at gmail.com Fri May 14 11:30:17 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 14 May 2010 16:30:17 +0100 Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> Message-ID: I don't know anything specific, but you could try using 'Gibbs sampling' algorithms to automatically discover the motif in the sequence sets? On 13 May 2010 21:02, Che, Anney (NIH/NCI) [E] wrote: > > Hi everyone, > > I have questions regarding on how to create a transcription binding sites profile. > > Since I have the sequences of the areas that bind to the gene from Chip-ChIp , I can align all the sequences then with the alignment that created to generate a profile. > > Since this is high-throughput, does any one know any tool that can align short sequences programmatically and then output an alignment file? > > Also a program that generates an alignment profile from an alignment. > > > Thanks, > > Anney > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From marty.gollery at gmail.com Fri May 14 11:33:28 2010 From: marty.gollery at gmail.com (Martin Gollery) Date: Fri, 14 May 2010 08:33:28 -0700 Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> Message-ID: Hi Anney, ClustalW will work fine programmatically with short sequences as long as you choose the 'slow, accurate' option. Best regards, Martin Gollery On Thu, May 13, 2010 at 1:02 PM, Che, Anney (NIH/NCI) [E] < chea2 at mail.nih.gov> wrote: > > Hi everyone, > > I have questions regarding on how to create a transcription binding sites > profile. > > Since I have the sequences of the areas that bind to the gene from > Chip-ChIp , I can align all the sequences then with the alignment that > created to generate a profile. > > Since this is high-throughput, does any one know any tool that can align > short sequences programmatically and then output an alignment file? > > Also a program that generates an alignment profile from an alignment. > > > Thanks, > > Anney > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com From x.sole at iconcologia.net Fri May 14 11:33:11 2010 From: x.sole at iconcologia.net (Sole Acha, Xavi) Date: Fri, 14 May 2010 17:33:11 +0200 Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov><42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> Message-ID: <50805D6FBD91904D86AF433349315E4202A9313E@ICOSRVCORREO01.ICO.SCS.local> Clustal may work for you. http://www.clustal.org/ HTH, Xavi. ------ Xavier Sol? Acha Unitat de Biomarcadors i Susceptibilitat Unit of Biomarkers and Susceptibility Institut Catal? d'Oncologia // Catalan Institute of Oncology Gran Via de L'Hospitalet 199-203 08907 L'Hospitalet de Llobregat, Barcelona, Spain. Phone: +34 93 260 71 86 / +34 93 335 90 11 (ext. 3194) Fax: +34 93 260 71 88 E-mail: x.sole (at) iconcologia.net -----Mensaje original----- De: bbb-bounces at bioinformatics.org [mailto:bbb-bounces at bioinformatics.org] En nombre de Dan Bolser Enviado el: viernes, 14 de mayo de 2010 17:30 Para: General Forum at Bioinformatics.Org Asunto: Re: [BiO BB] How to create a Transcription binding site profile I don't know anything specific, but you could try using 'Gibbs sampling' algorithms to automatically discover the motif in the sequence sets? On 13 May 2010 21:02, Che, Anney (NIH/NCI) [E] wrote: > > Hi everyone, > > I have questions regarding on how to create a transcription binding sites profile. > > Since I have the sequences of the areas that bind to the gene from Chip-ChIp , I can align all the sequences then with the alignment that created to generate a profile. > > Since this is high-throughput, does any one know any tool that can align short sequences programmatically and then output an alignment file? > > Also a program that generates an alignment profile from an alignment. > > > Thanks, > > Anney > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From harry.mangalam at uci.edu Fri May 14 11:56:19 2010 From: harry.mangalam at uci.edu (Harry Mangalam) Date: Fri, 14 May 2010 08:56:19 -0700 Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> Message-ID: <201005140856.19643.harry.mangalam@uci.edu> Depends on your output and how you want to process it. If the output are short reads (10s of bases) as from a HTS technology (Ill, 454, etc), the output is essentially 'aligned' already - that is the binding site is effectively encoded in the output. If the output is from longer sequences (100s of bases) and you want to be able to extract the binding sites from those longer sequences to increase your signal, there are a couple of ways to do it depending on prior knowledge. If you don't know the binding motif, you can use one of the de novo motif finders such as nmica to generate it from the seqs you have. If you have a good enough idea of what the sequence is and you can describe it in a regular expression or IUPAC coding with a few errors, you can extract the motifs so described plus whatever padding you want in fasta format using tacg or some other extractor and then align them using your favorite aligner (clustal, tcoffee, etc). hjm On Thursday 13 May 2010 13:02:05 Che, Anney (NIH/NCI) [E] wrote: > Hi everyone, > > I have questions regarding on how to create a transcription binding > sites profile. > > Since I have the sequences of the areas that bind to the gene from > Chip-ChIp , I can align all the sequences then with the alignment > that created to generate a profile. > > Since this is high-throughput, does any one know any tool that can > align short sequences programmatically and then output an alignment > file? > > Also a program that generates an alignment profile from an > alignment. > > > Thanks, > > Anney > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb -- Harry Mangalam - Research Computing, NACS, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 949 824-0084(o), 949 285-4487(c) MSTB=Bldg 415 (G-5 on --- Change is hard; disaster is easy. From oceanhu at 126.com Fri May 14 11:45:39 2010 From: oceanhu at 126.com (ocean) Date: Fri, 14 May 2010 23:45:39 +0800 (CST) Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> Message-ID: <12614ff.11e96.128977e0a3f.Coremail.oceanhu@126.com> MEME could work. ?2010-05-14?"Martin Gollery" ??? >Hi Anney, > >ClustalW will work fine programmatically with short sequences as long as you >choose the 'slow, accurate' option. > >Best regards, >Martin Gollery > >On Thu, May 13, 2010 at 1:02 PM, Che, Anney (NIH/NCI) [E] < >chea2 at mail.nih.gov> wrote: > >> >> Hi everyone, >> >> I have questions regarding on how to create a transcription binding sites >> profile. >> >> Since I have the sequences of the areas that bind to the gene from >> Chip-ChIp , I can align all the sequences then with the alignment that >> created to generate a profile. >> >> Since this is high-throughput, does any one know any tool that can align >> short sequences programmatically and then output an alignment file? >> >> Also a program that generates an alignment profile from an alignment. >> >> >> Thanks, >> >> Anney >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > > > >-- >-- >Martin Gollery >Senior Bioinformatics Scientist >Tahoe Informatics >www.bioinformaticist.biz >www.hiddenmarkovmodels.com >_______________________________________________ >BBB mailing list >BBB at bioinformatics.org >http://www.bioinformatics.org/mailman/listinfo/bbb From federalhillrent at yahoo.com Fri May 14 11:42:55 2010 From: federalhillrent at yahoo.com (FederalHill) Date: Fri, 14 May 2010 08:42:55 -0700 (PDT) Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: <50805D6FBD91904D86AF433349315E4202A9313E@ICOSRVCORREO01.ICO.SCS.local> Message-ID: <236552.27345.qm@web36303.mail.mud.yahoo.com> Gibbs sampling >From Wikipedia, the free encyclopedia Jump to:navigation, search In mathematics and physics, Gibbs sampling or Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution, or to compute an integral (such as an expected value). Gibbs sampling is a special case of the Metropolis?Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm. The algorithm is named after the physicist J. W. Gibbs, in reference to an analogy between the sampling algorithm and statistical physics. The algorithm was devised by brothers Stuart and Donald Geman, some eight decades after the passing of Gibbs.[1] Gibbs sampling is applicable when the joint distribution is not known explicitly, but the conditional distribution of each variable is known. The Gibbs sampling algorithm generates an instance from the distribution of each variable in turn, conditional on the current values of the other variables. It can be shown (see, for example, Gelman et al. 1995) that the sequence of samples constitutes a Markov chain, and the stationary distribution of that Markov chain is just the sought-after joint distribution. Gibbs sampling is particularly well-adapted to sampling the posterior distribution of a Bayesian network, since Bayesian networks are typically specified as a collection of conditional distributions. Contents[hide] 1 Background 2 Implementation 3 Failure modes 4 Software 5 Notes 6 References 7 External links // [edit] Background Gibbs sampling is a special case of Metropolis?Hastings algorithm. The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution than to marginalize by integrating over a joint distribution. Suppose we want to obtain samples of from a joint distribution . We begin with a value of and sample by . Once that value of is calculated, repeat by sampling for the next : . [edit] Implementation Suppose that a sample is taken from a distribution depending on a parameter vector of length , with prior distribution . It may be that is very large and that numerical integration to find the marginal densities of the would be computationally expensive. Then an alternative method of calculating the marginal densities is to create a Markov chain on the space by repeating these two steps: Pick a random index Pick a new value for according to These steps define a reversible Markov chain with the desired invariant distribution . This can be proved as follows. Define if for all and let denote the probability of a jump from to . Then, the transition probabilities are So since is an equivalence relation. Thus the detailed balance equations are satisfied, implying the chain is reversible and it has invariant distribution . In practice, the suffix is not chosen at random, and the chain cycles through the suffixes in order. In general this gives a non-reversible chain, but it will still have the desired invariant distribution (as long as the chain can access all states under the fixed ordering). [edit] Failure modes There are two ways that Gibbs sampling can fail. The first is when there are islands of high-probability states, with no paths between them. For example, consider a probability distribution over 2-bit vectors, where the vectors (0,0) and (1,1) each have probability ?, but the other two vectors (0,1) and (1,0) have probability zero. Gibbs sampling will become trapped in one of the two high-probability vectors, and will never reach the other one. More generally, for any distribution over high-dimensional, real-valued vectors, if two particular elements of the vector are perfectly correlated (or perfectly anti-correlated), those two elements will become stuck, and Gibbs sampling will never be able to change them. The second problem can happen even when all states have nonzero probability and there is only a single island of high-probability states. For example, consider a probability distribution over 100-bit vectors, where the all-zeros vector occurs with probability ?, and all other vectors are equally probable, and so have a probability of each. If you want to estimate the probability of the zero vector, it would be sufficient to take 100 or 1000 samples from the true distribution. That would very likely give an answer very close to ?. But you would probably have to take more than 2100 samples from Gibbs sampling to get the same result. No computer could do this in a lifetime. This problem occurs no matter how long the burn in period is. This is because in the true distribution, the zero vector occurs half the time, and those occurrences are randomly mixed in with the nonzero vectors. Even a small sample will see both zero and nonzero vectors. But Gibbs sampling will alternate between returning only the zero vector for long periods (about 299 in a row), then only nonzero vectors for long periods (about 299 in a row). Thus convergence to the true distribution is extremely slow, requiring much more than 299 steps; taking this many steps is not computationally feasible in a reasonable time period. The slow convergence here can be seen as a consequence of the curse of dimensionality. [edit] Software The WinBUGS software (the open source version is called OpenBUGS) does a Bayesian analysis of complex statistical models using Markov chain Monte Carlo. BUGS comes from Bayesian inference using Gibbs sampling. JAGS (Just another Gibbs sampler) is a GPL program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo. [edit] Notes ? --- On Fri, 5/14/10, Sole Acha, Xavi wrote: From: Sole Acha, Xavi Subject: Re: [BiO BB] How to create a Transcription binding site profile To: "General Forum at Bioinformatics.Org" Date: Friday, May 14, 2010, 11:33 AM Clustal may work for you. http://www.clustal.org/ HTH, Xavi. ------ Xavier Sol? Acha Unitat de Biomarcadors i Susceptibilitat Unit of Biomarkers and Susceptibility Institut Catal? d'Oncologia // Catalan Institute of Oncology Gran Via de L'Hospitalet 199-203 08907 L'Hospitalet de Llobregat, Barcelona, Spain. Phone: +34 93 260 71 86 / +34 93 335 90 11 (ext. 3194) Fax: +34 93 260 71 88 E-mail: x.sole (at) iconcologia.net -----Mensaje original----- De: bbb-bounces at bioinformatics.org [mailto:bbb-bounces at bioinformatics.org] En nombre de Dan Bolser Enviado el: viernes, 14 de mayo de 2010 17:30 Para: General Forum at Bioinformatics.Org Asunto: Re: [BiO BB] How to create a Transcription binding site profile I don't know anything specific, but you could try using 'Gibbs sampling' algorithms to automatically discover the motif in the sequence sets? On 13 May 2010 21:02, Che, Anney (NIH/NCI) [E] wrote: > > Hi everyone, > > I have questions regarding on how to create a transcription binding sites profile. > > Since I have the sequences of the areas that bind to the gene from Chip-ChIp , I can align all the sequences then with the alignment that created to generate a profile. > > Since this is high-throughput, does any one know any tool that can align short sequences programmatically and then output an alignment file? > > Also a program that generates an alignment profile from an alignment. > > > Thanks, > > Anney > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From jeedward at yahoo.com Mon May 17 13:12:11 2010 From: jeedward at yahoo.com (John Edward) Date: Mon, 17 May 2010 10:12:11 -0700 (PDT) Subject: [BiO BB] Call for papers: BCBGC-10, USA, July 2010 Message-ID: <33528.44086.qm@web45910.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From maximilianh at gmail.com Mon May 17 17:39:24 2010 From: maximilianh at gmail.com (Maximilian Haussler) Date: Mon, 17 May 2010 14:39:24 -0700 Subject: [BiO BB] How to create a Transcription binding site profile In-Reply-To: <201005140856.19643.harry.mangalam@uci.edu> References: <42AF972B7253314D85D6114C0BF1DF3903B87CA8D1@NIHMLBX07.nih.gov> <42AF972B7253314D85D6114C0BF1DF3903B87CA8D2@NIHMLBX07.nih.gov> <201005140856.19643.harry.mangalam@uci.edu> Message-ID: there are hundreds of algorithms to do this. Apart from nmica, you can try trawler: http://ani.embl.de/trawler/ On Fri, May 14, 2010 at 8:56 AM, Harry Mangalam wrote: > Depends on your output and how you want to process it. If the output > are short reads (10s of bases) as from a HTS technology (Ill, 454, > etc), the output is essentially 'aligned' already - that is the > binding site is effectively encoded in the output. > > If the output is from longer sequences (100s of bases) and you want to > be able to extract the binding sites from those longer sequences to > increase your signal, there are a couple of ways to do it depending > on prior knowledge. > > If you don't know the binding motif, you can use one of the de novo > motif finders such as nmica > to generate > it from the seqs you have. > > If you have a good enough idea of what the sequence is and you can > describe it in a regular expression or IUPAC coding with a few > errors, you can extract the motifs so described plus whatever padding > you want in fasta format using tacg or some other extractor and then > align them using your favorite aligner (clustal, tcoffee, etc). > > hjm > > On Thursday 13 May 2010 13:02:05 Che, Anney (NIH/NCI) [E] wrote: > > Hi everyone, > > > > I have questions regarding on how to create a transcription binding > > sites profile. > > > > Since I have the sequences of the areas that bind to the gene from > > Chip-ChIp , I can align all the sequences then with the alignment > > that created to generate a profile. > > > > Since this is high-throughput, does any one know any tool that can > > align short sequences programmatically and then output an alignment > > file? > > > > Also a program that generates an alignment profile from an > > alignment. > > > > > > Thanks, > > > > Anney > > > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bbb > > > > -- > Harry Mangalam - Research Computing, NACS, Rm 225 MSTB, UC Irvine > [ZOT 2225] / 92697 949 824-0084(o), 949 285-4487(c) > MSTB=Bldg 415 (G-5 on > --- > Change is hard; disaster is easy. > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From anthony.goldbloom at kaggle.com Mon May 17 22:24:19 2010 From: anthony.goldbloom at kaggle.com (Anthony Goldbloom) Date: Tue, 18 May 2010 12:24:19 +1000 Subject: [BiO BB] Open science via bioinformatics competitions Message-ID: <1274149459.2480.1536.camel@linux2> For three weeks Kaggle, a platform for data prediction competitions, has been running a bioinformatics competition (http://kaggle.com/hivprogression). The competition requires competitors to pick markers in HIV's genetic sequence that predict a change in viral load. The results have been better than we could have hoped for. Within a week and a half, the best submission had already outdone the best methods in the literature. (This is particularly surprising when you consider that the prize is just $500 and the opportunity to co-author a paper with the competition host.) There's a short post about the results so far on the Kaggle blog (http://kaggle.com/blog/2010/05/13/are-competitions-the-future-of-research/). This early result suggests that Kaggle has hit on a great way to do open science. A contributing factor in the success of the Predict HIV Progression competition is the degree of cooperation on the competition's forum. Moreover, by hosting this competition, William has opened up his dataset to other scientists, giving them access to a problem they wouldn't otherwise know about. Kaggle doesn't control the results - their fate is up to the competition host. William is planning on open sourcing the winning method to the Predict HIV Progression competition. We want to repeat this feat so we are looking for others to open up their problems. If you are interested, please get in touch (anthony.goldbloom at kaggle.com). From Alex.Bossers at wur.nl Tue May 18 01:35:30 2010 From: Alex.Bossers at wur.nl (Bossers, Alex) Date: Tue, 18 May 2010 07:35:30 +0200 Subject: [BiO BB] program for sequence length In-Reply-To: <1B12003244CE894E85B4726023637888215FCB6E@FASXCH01.fasmail.priv> References: , <1B12003244CE894E85B472602363788821E865E6@FASXCH01.fasmail.priv>, <1B12003244CE894E85B4726023637888215FCB6E@FASXCH01.fasmail.priv> Message-ID: <47C53C762312C548BF23B6556377B3EB02AD250F@scomp0038.wurnet.nl> For simple joining up manipulation and analysis tools into workflows (without programming experience) have a look at Galaxy developed by PennState University (http://galaxy.psu.edu/). You can have a local instance but there are also public instances available (all free). http://main.g2.bx.psu.edu/ Including demonstration video snippets. {This is no advertisement.... I am not associated in any way to the group or development :)} Almost all commandline scripts/utilities/services can be added quite easy into the page by a simple config file. Alex -----Original Message----- From: bbb-bounces at bioinformatics.org [mailto:bbb-bounces at bioinformatics.org] On Behalf Of Karger, Amir Sent: Tuesday, May 11, 2010 2:49 AM To: General Forum at Bioinformatics.Org Subject: Re: [BiO BB] program for sequence length Scriptome has several purposes: 1) Help experienced coders avoid reinventing the wheel and silly bugs. 2) Help non-programmers do simple munging using cut and paste, without learning how to program. (And by "simple" I mean somewhere between Notepad find/replace and real programming.) 3) Help novice programmers get some examples of Perl idioms, or starting points to work with. (I had this whole plan of commenting the tools, but never got the tuits.) Re #1, with the command-line Scriptome tool, so when I'm working with a client (or even by myself) I can just do: Scriptome -t change_fasta_to_tab blah.fasta > blah.tab And some of the merge and choose tools are great when exploring data. And no, arguing about languages is only slightly less stupid than fighting a land war in Asia. -Amir ________________________________________ From: bbb-bounces at bioinformatics.org [bbb-bounces at bioinformatics.org] On Behalf Of Mike Marchywka [marchywka at hotmail.com] Sent: Friday, May 07, 2010 20:12 To: bbb at bioinformatics.org Subject: Re: [BiO BB] program for sequence length ---------------------------------------- > From: akarger at CGR.Harvard.edu > To: bbb at bioinformatics.org > Date: Fri, 7 May 2010 09:36:49 -0400 > Subject: Re: [BiO BB] program for sequence length > > Check out the Scriptome (yes, this is an advertisement.) at http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which is a set of Perl one-liners you cut and paste onto your command line to do bio-y text-y thigns. I hadn't thought of this before but it is a good idea if you can search it easily, I often use google for sed/awk one liners for stuff like this and its a great way to learn the tools and get your work done. You seem to have a bit more than flat lists of one-liners but off hand I'd think this would be a generally good idea. Now to argue," you should have done that in {perl,awk,sed,java,c++} instead of {perl, awk, sed, java, c++}" LOL _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From dan.bolser at gmail.com Wed May 19 19:06:33 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 20 May 2010 00:06:33 +0100 Subject: [BiO BB] Global food security? Message-ID: Dear colleagues, If you are interested in global food security, consider attending the SOL 2010 conference here in sunny Dundee! 7th Solanaceae conference Dundee, Scotland, 5-9 September 2010 http://www.sol2010.org/ The remarkable solanaceae family includes not only tomato, pepper and eggplant, but also potato, the worlds third most important food crop! The soon to be released potato and tomato genome sequences look set to revolutionize breeding programs in these species, leading to 'the next generation' of food crop. * http://www.potatogenome.net * http://www.solgenomics.net Please feel free to extend this invitation to any of your colleagues who you feel may be interested. Yours Faithfully, Dan Bolser. From christoph.gille at charite.de Tue May 25 08:48:18 2010 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 25 May 2010 14:48:18 +0200 Subject: [BiO BB] Robustness of Java Webstart Message-ID: Dear Community, In the last 6 Months, two different users reported a technical problem to start the STRAP viewer developed by myself on a Mac OSX computer. Unfortunately, I cannot reproduce this problem. Posting to the PDB_L forum, all 9 people who responded, could not reproduce the problem either. It might be, that there is a bug in STRAP and that this bug is tolerated in 98% of the cases but occasionally shows up. It might also be that particular Java installations exhibit a bug and do not cope with certain webstart parameter. My hope is to identify the problem with the help of this broader forum. This might be useful for other Java projects as well. I would like you to try to start the STRAP program with one of the following links. This will open STRAP and display a PDB entry. It requires Java 1.5 or higher. If the Browser asks what todo with jnlp files, then tell him to open it the ../bin/javaws program of the Java installation. http://www.bioinformatics.org/strap/lite/strap.php?load=PDB:1gd2 http://www.bioinformatics.org/strap/lite/strap.php?load=PDB:1ryp http://www.bioinformatics.org/strap/lite/strap.php?load=PDB:2wbs The STRAP viewer is safe, it has a built in security check and can only write/modify files to the directory $HOME/.StrapAlign or C:\StrapAlign on Windows. No other files are modified. You do not need to worry about security. Here are the official Java-Webstart demos for testing whether Java is set-up properly: http://java.sun.com/javase/technologies/desktop/javawebstart/demos.html Can you please drop a short message to christoph.gille (a) charite.de telling whether it worked or not and what operation system and what kind of Java you have. For those who are interested in more exciting examples of STRAP: http://www.bioinformatics.org/strap/PDF/strapI.php?year=2003&document=PMID12595256&link=2 http://www.bioinformatics.org/strap/PDF/strapI.php?year=2003&document=PMID12595256&link=1 Many thanks Christoph From dan.bolser at gmail.com Wed May 26 05:40:57 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 26 May 2010 10:40:57 +0100 Subject: [BiO BB] Robustness of Java Webstart In-Reply-To: References: Message-ID: Cheers Christoph, On 25 May 2010 13:48, Dr. Christoph Gille wrote: > Dear Community, ... > Can you please drop a short message to christoph.gille (a) charite.de > telling whether it worked or not and ?what operation system and what > kind of Java you have. What is the best way to collect that information? Dan. From christoph.gille at charite.de Wed May 26 07:18:33 2010 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Wed, 26 May 2010 13:18:33 +0200 Subject: [BiO BB] Robustness of Java Webstart In-Reply-To: References: Message-ID: <5d58de88479523fff3c9f77c9ff113af.squirrel@webmail.charite.de> >> telling whether it worked or not and ?what operation system and what >> kind of Java you have. > > What is the best way to collect that information? Type in terminal: java -version Or activate and watch java-console. It will be printed out in first lines of Java console. In MS-Windows-Start-Button>Settings>Java Then visit Tab "Advanced and switch on Java-console" Or Type in terminal: javaws -viewer Then visit Tab "Advanced and switch on Java-console" Or on Mac I think Applications>Utilities/Dienstprogramme>Java Cheers Christoph From hlapp at gmx.net Wed May 26 14:28:21 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 May 2010 12:28:21 -0600 Subject: [BiO BB] Call for Software Bazaar entries open for Conference on Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio) Message-ID: <56314509-1AFD-41DA-A2CA-F748AB955665@gmx.net> The Call for Software Bazaar entries is now open for the inaugural conference on Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio), at http://ievobio.org/ocs/index.php/ievobio/ 2010. See below for instructions. The Software Bazaar features presenters demonstrating their software live on a laptop. At iEvoBio, this session takes the place of a poster session, and will be between 1.5-2 hours in duration. Conference attendees will be able to walk from one demonstration to the next and open a conversation with the presenters. Please also see our FAQ for this information (http://ievobio.org/faq.html#software). The Software Bazaar is part of the interactive afternoon program on the first conference day. Entries should be software aimed at advancing research in phylogenetics, evolution, and biodiversity, and can include interactive visualizations that have been pre-computed (such as SVGs, or Google Earth-compatible KML files). Submissions consist of a title, which will typically be the name of the software (or visualization method) being presented, the URL of a website where more information about the software can be obtained, and the license under which the source code is available. The provided website must contain a link to where the source code (and possibly binaries) can be downloaded. If it is not obvious from the provided website, the submission must describe what the software does. Reviewers will judge whether a submission is within scope of the conference (see above), and need to be able to verify whether the open- source requirement(*) is met. Presenters are expected to bring their own laptops for presentation, and any auxiliary devices necessary (such as a mouse). Power will be available at the presentation tables (110V/60Hz, US-style plugs; international presenters need to bring a suitable adaptor). Please let the organizing committee know as much in advance as possible if you expect to have unusually high demands for wireless network bandwidth. Note that commercial marketing activities are not permitted - presenters wishing to promote commercial or proprietary services or products should contact the Evolution conference about exhibitor space. Review and acceptance of Software Bazaar submissions will be on a rolling basis. The deadline for submission is the morning of the first day of the conference (June 29). As the number of Software Bazaar presentation slots is finite, we cannot guarantee the availability of slots up until the day of the conference. We cannot accept submissions until the open-source requirements are met. We ask all submitters of Software Bazaar presentations to be willing to also serve as reviewers of such, as described above. Softwar Bazaar demonstrations are only 1 of 5 kinds of contributed content that iEvoBio will feature. The other 4 are: 1) Full talks (closed), 2) Lightning talks, 3) Challenge entries, and 4) Birds-of-a- Feather gatherings. The Calls for Challenge entries (http://ievobio.org/challenge.html ) and Lightning Talks (same submission URL as above) remain open, and information on the Birds-of-a-Feather session is forthcoming. More details about the program and guidelines for contributing content are available at http://ievobio.org. You can also find continuous updates on the conference's Twitter feed at http://twitter.com/iEvoBio. iEvoBio is sponsored by the US National Evolutionary Synthesis Center (NESCent) in partnership with the Society of Systematic Biologists (SSB). Additional support has been provided by the Encyclopedia of Life (EOL). The iEvoBio 2010 Organizing Committee: Rod Page (University of Glasgow) Cecile Ane (University of Wisconsin at Madison) Rob Guralnick (University of Colorado at Boulder) Hilmar Lapp (NESCent) Cynthia Parr (Encyclopedia of Life) Michael Sanderson (University of Arizona) (*) iEvoBio and its sponsors are dedicated to promoting the practice and philosophy of Open Source software development (see http://www.opensource.org/docs/definition.php) and reuse within the research community. For this reason, software to be demonstrated to conference attendees must be licensed with a recognized Open Source License (see http://www.opensource.org/ licenses/), and be available for download, including source code, by a tar/zip file accessed through ftp/http or through a widely used version control system like cvs, Subversion, git, Bazaar, or Mercurial. Authors are advised that non-compliant submissions must be revised to meet the requirement by June 27 at the latest, and in the event that presentation slots run out, precedence is established by the date they are first found in compliance, not the date of submission.