From idoerg at gmail.com Sun Mar 8 19:44:08 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Sun, 08 Mar 2009 16:44:08 -0700 Subject: [BiO BB] Reminder: megagenomes, metdata metaanalysis. Message-ID: <1236555848.31881.17.camel@lafa> Call For Abstracts and Posters: Metagenomics, Metadata and Metaanalysis An ISMB / ECCB 2009 SIG Date: June 27, 2009 Location: Stockholm, Sweden URL: http://gensc.org/gc_wiki/index.php/M3 Submission deadline: April 1 There are now thousands of genomes and metagenomes available for study. Interest in improved sampling of diverse environments (e.g. ocean, soil, sediment, and a range of hosts) combined with advances in the development and application of ultra-high throughput sequence methodologies are set to vastly accelerate the pace at which new metagenomes are generated. At the same time, we have entered the era of large-scale sequencing projects, which include funded projects like the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project and the Human Microbiome Initiative, with many more visionary projects on the horizon. The M3 SIG will explore the latest concepts, algorithms, tools, informatic pipelines, databases and standards that are being developed to cope with the analysis of vast quantities of metagenomic data. Through a series of invited and contributed talks, a panel discussion, and flash talks associated with a poster session, we aim to highlight scientific advances in the field and identify core computational challenges facing the wider community. We invite you to submit extended abstracts to be considered as talks or posters for the M3 SIG. The abstracts will be reviewed by the SIG program committee for suitability and quality. Selected abstracts will be published in a special issue of the open access online journal Standards in Genomic Sciences. Topics include, but are not limited to: * Metagenome and microbiome studies of biological interest * Metagenome annotation * Consistent contextual (meta)data acquisition and storage in metagenomics * Contextual (meta)data and sequence data correlation studies * Computational infrastructures for metagenomic data processing * Algorithms, data structures and database architectures Important dates: * April 1, 2009: Talk and poster abstracts due. * April 20, 2009: notification of acceptance * May 1, 2009: final abstracts due * June 27th, 2009: M3 SIG alongside ISMB 2009 in Stockholm, Sweden Invited speakers include: * Owen White, Genome Sciences, UMB * Nikos Krypides Joint Genome Institute * Michael Ashburner, University of Cambridge * Susanna Sansone, EBI * Peer Bork, EMBL For more information including submission: http://gensc.org/gc_wiki/index.php/M3 -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org From idoerg at gmail.com Sun Mar 8 22:26:34 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Sun, 08 Mar 2009 19:26:34 -0700 Subject: [BiO BB] operon prediction software? Message-ID: <1236565594.31881.30.camel@lafa> Hi all, I need recommendations for operon, regulon and/or gene-cluster prediction software. Whatever material comes along, I shall compile and post back to this list. Thanks in advance, Iddo -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org From barry.hardy at vtxmail.ch Tue Mar 10 07:54:49 2009 From: barry.hardy at vtxmail.ch (Barry Hardy) Date: Tue, 10 Mar 2009 12:54:49 +0100 Subject: [BiO BB] CFP: eCheminfo Drug Discovery Meeting, Bryn Mawr October 2009 Message-ID: <49B65509.9000405@vtxmail.ch> eCheminfo Community of Practice Drug Discovery Meeting Bryn Mawr College, Philadelphia, 13-16 October 2009 Call For Contributions We invite contributed papers from members of academic, government research and commercial organizations on areas of new research and innovation involving drug discovery research informatics. The work presented should involve innovative new method development or application to drug discovery problems and involving methods from computational chemistry, computational biology, cheminformatics or bioinformatics. Studies including experimental work in medicinal chemistry, structural biology, screening, in vitro assay development, pre-clinical evaluation, lead optimisation and translational medicine are welcome. Abstracts for talks (300-500 words) should be submitted to echeminfo -[at]- douglasconnect.com by 31 March 2009, and be accompanied by a short biography of the presenting author (300-500 words). Abstracts approved by the scientific organizing committee will be selected for scheduling on the conference program. Authors will be notified of acceptance as soon as a review of submitted materials takes place and at the latest by 15 April 2009. Abstracts for posters will continue be accepted for review through 31 August 2009. The following sessions are currently planned: Forum on Collaboration in Drug Discovery & Development (to be published in Future Medicinal Chemistry) chaired by Barry Hardy (Douglas Connect) Workshop on Drug Binding Affinities co-moderated by Scott Brown (Abbott Laboratories), Judith Lalonde (Bryn Mawr College) and Zheng Yang (GlaxoSmithKline) Structure-Based Drug Design: The Roles of Conformation, Water and Hydrogen Bonds chaired by Alan Cheng (Amgen) Macromolecular Interactions and Networks chaired by Emil Alexov (Clemson University) Structure-Based Drug Design: Advanced Scoring Methods chaired by Natasja Brooijmans (Wyeth) Data Analysis and Visualisation Applications in Chemical Biology chaired by Brian Marsden (Structural Genomics Consortium Oxford) PDB Ligands chaired by John Westbrook (Rutgers University) and developing the work group initiative of Marc Nicklaus (National Institutes of Health) http://echeminfo.com/COMTY_confprog08pdbligands Predictive Toxicology co-chaired by Vladimir Poirikov (Russian Academy of Medical Sciences) and Richard Judson (US EPA) Bursary Awards Bursary Awards will be used to support the attendance of a selection of academic investigators at the meeting and workshops. To apply for the bursary please send an email with a) your abstract and biography (300-500 words each), b) your CV of 1-2 pages, c) a short description of your interests and career motivations related to drug discovery (300-500 words) to echeminfo -[at]- douglasconnect.com by 31 March 2009. The recipients of the bursary awards will be selected based on an evaluation of the quality and innovation of the described research and the potential positive impact of attendance at the meeting on their research and career progress. Information on the program will be updated at: Blog: http://barryhardy.blogs.com/cheminfostream/ Web: http://echeminfo.com/COMTY_conferences best regards Barry Hardy eCheminfo Community of Practice Douglas Connect GmbH Switzerland barry.hardy -[at]- douglasconnect.com Tel: +41 61 851 0170 From dan.bolser at gmail.com Wed Mar 18 15:31:01 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 18 Mar 2009 19:31:01 +0000 Subject: [BiO BB] Question about the definition of 'gaps' in blast -m8 output... Message-ID: <2c8757af0903181231m5e326f57x89e8462a429d9fc5@mail.gmail.com> Hi, I'm sure this question comes up again and again, but searching the BioPerl mailing list didn't turn up any answers (to the second question). Basically I want to manually merge HSP's into 'contigious hits', and I want to look at the effect of various parameters on an algorithm to do this. This task prompted me to run a 'sanity check' on the blast data that I had, and I found that this check fails to fulfil my expectation of the data. This means that either I don't understand the data or the results are buggy. Can someone clarify the definition of the 'gaps' column in the blast -m8 output format for me? I thought that the column 'gaps' was basically the number of columns in the HSP that contains a gap character. To test this on my data, I checked the following equality: GAPS + 2 = LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END - HIT_START) This says that the number of GAPS should be equal to the difference between the LENGTH of the alignment minus the distance between the START and END point on either the QUERY or the HIT (+2 for the 'off by one' error introduced by the two END-START calculations). e.g. 10-> MMMMMMMM**MMMM*M <-22 |||| || | | | 20-> MMMM**MMMMM*M*MM <-31 where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12, and HIT_END - HIT_START = 11. The formula gives: 7+2(9) = 16-12(4) + 16-11(5) The formula is correct for 11,282 out of 12,745 HSPs in my dataset (89%), however it fails for 1,463 cases (11%). Each of these cases has a value of MISMATCHES smaller than calculated by the formula. The difference is usually 1 or 2, but is seen to go as high as 96, and scales roughly linearly with the size of GAPS. Did I misunderstand what the value of GAPS is supposed to mean? How come it does apparently mean what I thought for so much of the data? Thanks very much for any help on the above. Dan. From clements at nescent.org Mon Mar 16 16:43:50 2009 From: clements at nescent.org (Dave Clements) Date: Mon, 16 Mar 2009 13:43:50 -0700 Subject: [BiO BB] 2009 GMOD Summer Schools - Americas & Europe Message-ID: We are now accepting applications for the 2009 GMOD Summer Schools: Americas, 16-19 July - National Evolutionary Synthesis Center (NESCent), Durham, NC, USA - Student tuition is free, thanks to NIH grant 1R01HG004483-01. - http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas Europe, 3-6 August - University of Oxford, Oxford, United Kingdom - Part of GMOD Europe 2009, which includes the next GMOD Meeting - Student tuition is ?95 - http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe GMOD (http://gmod.org/) is a collection of interoperable open source software components for managing, visualizing, annotating and integrating biological, mostly genomic, data. GMOD is also a community of developers and users dealing with similar challenges. GMOD is used in diverse contexts, with both emerging and established model organisms. GMOD Summer Schools (http://gmod.org/wiki/GMOD_Summer_School) introduce new GMOD users to the GMOD project and feature several days of hands-on training on how to install, configure and administer GMOD tools. The courses includes training on several GMOD components: * GBrowse - the widely used Generic Genome Browser * Chado - a modular and extensible database schema for biological data * Apollo - genome annotation editor * BioMart - biological data warehouse system * GBrowse_syn - a GBrowse based synteny viewer * JBrowse - a brand new Web 2.0 genome browser * Artemis-Chado Integration (Europe only) * MAKER - Genome annotation pipeline (Americas only) * Tripal - Web front end for Chado (Americas only) ***Please submit an application by the end of 6 April 2009, if you are interested in attending. *** Enrollment is limited to 25 students in each course. If applications exceed capacity (and we expect they will) then applicants will be picked based on the strength of their application. Applicants will be notified of their admission status in mid-April. Thanks, Dave Clements GMOD Help Desk help at gmod.org http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe http://gmod.org/wiki/GMOD_Europe_2009 From gary.bader at utoronto.ca Tue Mar 17 17:28:13 2009 From: gary.bader at utoronto.ca (Gary Bader) Date: Tue, 17 Mar 2009 17:28:13 -0400 Subject: [BiO BB] Announcing a new Pathway Commons Release Message-ID: <49C015ED.8090503@utoronto.ca> A new Pathway Commons release is available at www.pathwaycommons.org Pathway Commons is a collection of publicly available pathways from multiple organisms, currently containing 8 pathway and interaction databases. New features include: * BioGRID data set added to repository (January 28, 2009 Version 2.0.49). * Latest Reactome data set (December 17, 2008 Version 27). * Latest HumanCyc data set (October 15, 2008 Version 12.5). * Graphical neighborhood 'mini-maps' added to protein pages. Future releases will include: -Batch download service for Pathway Commons -Use of mini-maps on your own website Send us your feedback at http://www.pathwaycommons.org/pc/get_feedback.do Thanks, The Pathway Commons Team From hlapp at gmx.net Thu Mar 19 18:48:36 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 19 Mar 2009 18:48:36 -0400 Subject: [BiO BB] Summer of Code 2009 Message-ID: <90AE2D8F-AFD3-446E-887C-BC8BB4703779@gmx.net> *** Please disseminate widely at your local institutions *** *** so that we reach as many students as possible. *** PHYLOINFORMATICS SUMMER OF CODE 2009 http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 The Phyloinformatics Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for evolutionary informatics under the mentorship of experienced developers from around the world. The program is the participation of the US National Evolutionary Synthesis Center (NESCent) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/ ). Students in the program will receive a stipend from Google (and possibly more importantly, a T-shirt solely available to successful participants), and may work from their home, or home institution, for the duration of the 3 month program. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. NESCent is particularly targeting students interested in both evolutionary biology and software development. Initial project ideas are listed on the website. These range from hardware accerelation for phylogenetic inference, to tree visualization within a wiki, to alignment of next-gen sequencing data, to development of a reusable ontology term markup module for biocuration. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome novel project ideas that dovetail with student interests. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/ ), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students opens on Monday March 23rd and runs through Friday, April 3rd, 2009. INQUIRIES: phylosoc {at} nescent {dot} org. We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. 2009 NESCent Phyloinformatics Summer of Code: http://hackathon.nescent.net/Phyloinformatics_Summer_of_Code_2009 Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs Cyberinfrastructure Traineeships (managed separately from GSoC; postdocs also eligible): http://hackathon.nescent.org/Cyberinfrastructure_Summer_Traineeships_2009 To sign up for quarterly NESCent newsletters: http://www.nescent.org/about/contact.php --------- Todd Vision and Hilmar Lapp National Evolutionary Synthesis Center http://nescent.org From dan.bolser at gmail.com Fri Mar 20 13:13:59 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 20 Mar 2009 17:13:59 +0000 Subject: [BiO BB] Fwd: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <3FFFC788-E95D-475E-B53D-8C86F2FA8F12@ncbi.nlm.nih.gov> References: <20090319142301.46C44188039@mail2.ncbi.nlm.nih.gov> <8087412E-ABB0-455C-8B2B-2119488B1950@ncbi.nlm.nih.gov> <3FFFC788-E95D-475E-B53D-8C86F2FA8F12@ncbi.nlm.nih.gov> Message-ID: <2c8757af0903201013t559b25e6iafb972d9e91f77b1@mail.gmail.com> Here is what the man from the NCBI said: ---------- Forwarded message ---------- From: Peter Cooper Date: 2009/3/20 Subject: Re: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output... To: dan.bolser at gmail.com Cc: blast-help at ncbi.nlm.nih.gov Hello, The number reported tin the -m 8 output is the number of gap openings. This will only equal the number of gap characters if the length of each gap is 1. Peter ------------------------------- Peter S. Cooper, Ph.D. Public Services The National Center for Biotechnology Information 301-435-5951 On Mar 19, 2009, at 12:04 PM, romiti wrote: > > > Begin forwarded message: > >> From: User Services Service Account >> Date: March 19, 2009 10:23:01 AM EDT >> To: romiti at ncbi.nlm.nih.gov >> Subject: Question about the definition of 'gaps' in blast -m8 output... >> Reply-To: User Services Service Account >> >> >> ------------- Begin Forwarded Message ------------- >> >> Date: Wed, 18 Mar 2009 19:31:01 +0000 >> Subject: Question about the definition of 'gaps' in blast -m8 output... >> From: Dan Bolser >> To: bbb at bioinformatics.org, bioperl-l at lists.open-bio.org, info at ncbi.nlm.nih.gov >> >> >> Hi, >> >> I'm sure this question comes up again and again, but searching the BioPerl >> mailing list didn't turn up any answers (to the second question). Basically >> I want to manually merge HSP's into 'contigious hits', and I want to look at >> the effect of various parameters on an algorithm to do this. This task >> prompted me to run a 'sanity check' on the blast data that I had, and I >> found that this check fails to fulfil my expectation of the data. This means >> that either I don't understand the data or the results are buggy. >> >> Can someone clarify the definition of the 'gaps' column in the blast -m8 >> output format for me? >> >> I thought that the column 'gaps' was basically the number of columns in the >> HSP that contains a gap character. To test this on my data, I checked the >> following equality: >> >> GAPS + 2 = >> LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END - HIT_START) >> >> >> This says that the number of GAPS should be equal to the difference between >> the LENGTH of the alignment minus the distance between the START and END >> point on either the QUERY or the HIT (+2 for the 'off by one' error >> introduced by the two END-START calculations). >> >> e.g. >> >> 10-> MMMMMMMM**MMMM*M <-22 >> |||| || | | | >> 20-> MMMM**MMMMM*M*MM <-31 >> >> >> where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12, and HIT_END >> - HIT_START = 11. The formula gives: >> >> 7+2(9) = 16-12(4) + 16-11(5) >> >> >> The formula is correct for 11,282 out of 12,745 HSPs in my dataset (89%), >> however it fails for 1,463 cases (11%). Each of these cases has a value of >> MISMATCHES smaller than calculated by the formula. The difference is usually >> 1 or 2, but is seen to go as high as 96, and scales roughly linearly with >> the size of GAPS. >> >> >> Did I misunderstand what the value of GAPS is supposed to mean? How come it >> does apparently mean what I thought for so much of the data? >> >> >> Thanks very much for any help on the above. >> >> Dan. >> >> ------------- End Forwarded Message ------------- From idoerg at gmail.com Sun Mar 22 18:35:23 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Sun, 22 Mar 2009 15:35:23 -0700 Subject: [BiO BB] Metagenomics, Metadata & Metaanalysis: One Week Left to Submit Message-ID: <1237761323.30436.5.camel@lafa> An ISMB 2009 Special Interest Group (SIG) Date: June 27, 2009 Location: Stockholm, Sweden URL: http://gensc.org/gc_wiki/index.php/M3 Talk Aabstract due: April 1, 2009. No foolin'. There are now thousands of genomes and metagenomes available for study. Interest in improved sampling of diverse environments (e.g. ocean, soil, sediment, and a range of hosts) combined with advances in the development and application of ultra-high throughput sequence methodologies are set to vastly accelerate the pace at which new metagenomes are generated. At the same time, we have entered the era of large-scale sequencing projects, which include funded projects like the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project and the Human Microbiome Initiative, with many more visionary projects on the horizon. The M3 SIG will explore the latest concepts, algorithms, tools, informatic pipelines, databases and standards that are being developed to cope with the analysis of vast quantities of metagenomic data. Through a series of invited and contributed talks, a panel discussion, and flash talks associated with a poster session, we aim to highlight scientific advances in the field and identify core computational challenges facing the wider community. We invite you to submit extended abstracts to be considered as talks or posters for the M3 SIG. The abstracts will be reviewed by the SIG program committee for suitability and quality. Selected abstracts will be published in a special issue of the open access online journal Standards in Genomic Sciences http://standardsingenomics.org/ Topics include, but are not limited to: Metagenome and microbiome studies of biological interest Metagenome annotation Consistent contextual (meta)data acquisition and storage in metagenomics Contextual (meta)data and sequence data correlation studies Computational infrastructures for metagenomic data processing Algorithms, data structures and database architectures Important dates: April 1, 2009: Talk and poster abstracts due. April 20, 2009: notification of acceptance May 1, 2009: final abstracts due June 27th, 2009: M3 SIG alongside ISMB 2009 in Stockholm, Sweden Confirmed speakers include: Owen White, Genome Sciences, UMB Nikos Krypides Joint Genome Institute Michael Ashburner, University of Cambridge Susanna Sansone, EBI Peer Bork, EMBL For more information including submission: http://gensc.org/gc_wiki/index.php/M4 -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org From cjfields at illinois.edu Fri Mar 20 15:06:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Mar 2009 14:06:05 -0500 Subject: [BiO BB] [Bioperl-l] Fwd: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output... In-Reply-To: <2c8757af0903201013t559b25e6iafb972d9e91f77b1@mail.gmail.com> References: <20090319142301.46C44188039@mail2.ncbi.nlm.nih.gov> <8087412E-ABB0-455C-8B2B-2119488B1950@ncbi.nlm.nih.gov> <3FFFC788-E95D-475E-B53D-8C86F2FA8F12@ncbi.nlm.nih.gov> <2c8757af0903201013t559b25e6iafb972d9e91f77b1@mail.gmail.com> Message-ID: <18EF821F-77A7-4E4F-AD2F-4B581D3B09DE@illinois.edu> We have a way to check for both within HSP objects I believe. 1) gaps() is documented to return the number of gap characters within the query/hit/total 2) seq_inds('gap', 'hit/query') returns the number of gap positions, with the position repeated for every gap character I believe, so getting this in scalar context should be similar to gaps(). However, to reduce that down to just the gap positions (no repeats) use the collapse flag: seq_inds('gap', 'hit/query', 1) chris On Mar 20, 2009, at 12:13 PM, Dan Bolser wrote: > Here is what the man from the NCBI said: > > > ---------- Forwarded message ---------- > From: Peter Cooper > Date: 2009/3/20 > Subject: Re: [blast-help] Fwd: Question about the definition of 'gaps' > in blast -m8 output... > To: dan.bolser at gmail.com > Cc: blast-help at ncbi.nlm.nih.gov > > > Hello, > > The number reported tin the -m 8 output is the number of gap > openings. This will only equal the number of gap characters if the > length of each gap is 1. > > > Peter > ------------------------------- > Peter S. Cooper, Ph.D. > Public Services > The National Center for Biotechnology Information > 301-435-5951 > > > > > > > On Mar 19, 2009, at 12:04 PM, romiti wrote: > >> >> >> Begin forwarded message: >> >>> From: User Services Service Account >>> Date: March 19, 2009 10:23:01 AM EDT >>> To: romiti at ncbi.nlm.nih.gov >>> Subject: Question about the definition of 'gaps' in blast -m8 >>> output... >>> Reply-To: User Services Service Account >>> >>> >>> ------------- Begin Forwarded Message ------------- >>> >>> Date: Wed, 18 Mar 2009 19:31:01 +0000 >>> Subject: Question about the definition of 'gaps' in blast -m8 >>> output... >>> From: Dan Bolser >>> To: bbb at bioinformatics.org, bioperl-l at lists.open-bio.org, info at ncbi.nlm.nih.gov >>> >>> >>> Hi, >>> >>> I'm sure this question comes up again and again, but searching the >>> BioPerl >>> mailing list didn't turn up any answers (to the second question). >>> Basically >>> I want to manually merge HSP's into 'contigious hits', and I want >>> to look at >>> the effect of various parameters on an algorithm to do this. This >>> task >>> prompted me to run a 'sanity check' on the blast data that I had, >>> and I >>> found that this check fails to fulfil my expectation of the data. >>> This means >>> that either I don't understand the data or the results are buggy. >>> >>> Can someone clarify the definition of the 'gaps' column in the >>> blast -m8 >>> output format for me? >>> >>> I thought that the column 'gaps' was basically the number of >>> columns in the >>> HSP that contains a gap character. To test this on my data, I >>> checked the >>> following equality: >>> >>> GAPS + 2 = >>> LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END - >>> HIT_START) >>> >>> >>> This says that the number of GAPS should be equal to the >>> difference between >>> the LENGTH of the alignment minus the distance between the START >>> and END >>> point on either the QUERY or the HIT (+2 for the 'off by one' error >>> introduced by the two END-START calculations). >>> >>> e.g. >>> >>> 10-> MMMMMMMM**MMMM*M <-22 >>> |||| || | | | >>> 20-> MMMM**MMMMM*M*MM <-31 >>> >>> >>> where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12, >>> and HIT_END >>> - HIT_START = 11. The formula gives: >>> >>> 7+2(9) = 16-12(4) + 16-11(5) >>> >>> >>> The formula is correct for 11,282 out of 12,745 HSPs in my dataset >>> (89%), >>> however it fails for 1,463 cases (11%). Each of these cases has a >>> value of >>> MISMATCHES smaller than calculated by the formula. The difference >>> is usually >>> 1 or 2, but is seen to go as high as 96, and scales roughly >>> linearly with >>> the size of GAPS. >>> >>> >>> Did I misunderstand what the value of GAPS is supposed to mean? >>> How come it >>> does apparently mean what I thought for so much of the data? >>> >>> >>> Thanks very much for any help on the above. >>> >>> Dan. >>> >>> ------------- End Forwarded Message ------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kiekyon.huang at gmail.com Mon Mar 23 23:21:50 2009 From: kiekyon.huang at gmail.com (Kie Kyon Huang) Date: Tue, 24 Mar 2009 11:21:50 +0800 Subject: [BiO BB] Gene ID mapping Message-ID: Hi everybody, I need a mapping file from Refseq ID/GI number (e.g. gi|17538053|ref|NP_495443.1|) from several model organisms to their respective database ID. I need to find the corresponding ID in WormBase (e.g. WBGene00000001), FlyBase (e.g. FBgn0043467), Ecoliwiki (e.g. 1-ACYLGLYCEROL-3-P-ACYLTRANSFER-MONOMER) Can anyone direct me to the respective files? Or do i have to create the files myself? Thanks a lot. Huang kie Kyon From logan at cacs.louisiana.edu Tue Mar 24 17:21:01 2009 From: logan at cacs.louisiana.edu (Raja Loganantharaj) Date: Tue, 24 Mar 2009 16:21:01 -0500 Subject: [BiO BB] Gene ID mapping In-Reply-To: References: Message-ID: <49C94EBD.3070905@cacs.louisiana.edu> Kie Kyon Huang wrote: > Hi everybody, > > I need a mapping file from Refseq ID/GI number (e.g. > gi|17538053|ref|NP_495443.1|) from several model organisms to their > respective database ID. > > I need to find the corresponding ID in WormBase (e.g. WBGene00000001), > FlyBase (e.g. FBgn0043467), Ecoliwiki (e.g. > 1-ACYLGLYCEROL-3-P-ACYLTRANSFER-MONOMER) > > Can anyone direct me to the respective files? Or do i have to create the > files myself? > > Thanks a lot. > > Huang kie Kyon > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > look at the gene id conversion in DAVID at http://david.abcc.ncifcrf.gov/conversion.jsp Hope this will work for you. Raja Loganantharaj Bioinformatics Research Lab From hlapp at gmx.net Sun Mar 29 14:38:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 29 Mar 2009 14:38:55 -0400 Subject: [BiO BB] Reminder: Student application deadline for Summer of Code 2009 Message-ID: <74CA841A-79C4-415D-923D-3DB16479382C@gmx.net> *** Please disseminate widely to students at your institution. *** PHYLOINFORMATICS SUMMER OF CODE 2009 - STUDENT APPLICATION DEADLINE IS APRIL 3 http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 The Phyloinformatics Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for evolutionary informatics under the mentorship of experienced developers from around the world. The program is the participation of the US National Evolutionary Synthesis Center (NESCent) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/ ). Students in the program will receive a stipend from Google (and a T- shirt solely available to successful participants), and may work from their home, or home institution, for the duration of the 3 month program. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. NESCent is particularly targeting students interested in both evolutionary biology and software development. Project ideas are listed on the website and range from hardware acceleration for phylogenetic inference, to support for phyloinformatics standards within the BioPerl and BioRuby toolkits, to alignment of next-gen sequencing data, to ontology term markup for biocuration, to semantic interoperability of web-services, to 3D-printing of phylogenies. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome novel project ideas that dovetail with student interests. TO APPLY: Instructions are at the website (see "When you apply"). You can find GSoC program rules and eligibility requirements at http://socghop.appspot.com . ***The 12-day application period for students ends on Friday, April 3rd, 2009, at 19:00 UTC (3pm EDT, 12pm PDT).*** INQUIRIES: phylosoc {at} nescent {dot} org. We strongly encourage all interested students to get in touch with us with their ideas as early as possible. 2009 NESCent Phyloinformatics Summer of Code: http://hackathon.nescent.net/Phyloinformatics_Summer_of_Code_2009 Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2009/faqs Cyberinfrastructure Traineeships (managed separately from GSoC; postdocs also eligible): http://hackathon.nescent.org/Cyberinfrastructure_Summer_Traineeships_2009 To sign up for quarterly NESCent newsletters: http://www.nescent.org/about/contact.php --------- Todd Vision and Hilmar Lapp National Evolutionary Synthesis Center http://nescent.org From hec at ut.ee Tue Mar 24 19:56:53 2009 From: hec at ut.ee (Hedi Peterson) Date: Wed, 25 Mar 2009 01:56:53 +0200 Subject: [BiO BB] Gene ID mapping In-Reply-To: <49C94EBD.3070905@cacs.louisiana.edu> References: <49C94EBD.3070905@cacs.louisiana.edu> Message-ID: <49C97345.7060503@ut.ee> Raja Loganantharaj wrote: > Kie Kyon Huang wrote: >> Hi everybody, >> >> I need a mapping file from Refseq ID/GI number (e.g. >> gi|17538053|ref|NP_495443.1|) from several model organisms to their >> respective database ID. >> >> I need to find the corresponding ID in WormBase (e.g. WBGene00000001), >> FlyBase (e.g. FBgn0043467), Ecoliwiki (e.g. >> 1-ACYLGLYCEROL-3-P-ACYLTRANSFER-MONOMER) >> >> Can anyone direct me to the respective files? Or do i have to create the >> files myself? >> >> Thanks a lot. >> >> Huang kie Kyon >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > look at the gene id conversion in DAVID at > http://david.abcc.ncifcrf.gov/conversion.jsp > Hope this will work for you. > > Raja Loganantharaj > Bioinformatics Research Lab > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > hi Try using g:convert for database identifier conversions within a specie and g:orth for mapping IDs between species @ http://biit.cs.ut.ee/gprofiler/ One example : http://biit.cs.ut.ee/gprofiler/gorth.cgi?organism=celegans&target=dmelanogaster&output=html&query=NP_495443 Cheers, hedi. From nir at rosettadesigngroup.com Wed Mar 25 12:43:32 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Wed, 25 Mar 2009 18:43:32 +0200 Subject: [BiO BB] Rosetta Academic Training Webinar Message-ID: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> The Rosetta Design Group is proud to present the first webinar in the Rosetta Academic Workshop Series. For the first webinar, we have selected to focus on Protein-Protein Docking based on the answers to the interest poll. We hope this will be the first in a line of helpful and inspiring webinars to kick-off our Rosetta Academic Workshop Series. What: Protein-Protein Docking When: May 4th 2009, 0800-1000 AM EST Where: Your office! Click here for more details and registration (For non html emails: http://rosettadesigngroup.com/RDGLS/index.php?sid=54479&lang=en ) Pleas note: This is not a promotional webinar. Rosetta is open-source and freeware for academic and non-profit organizations and can be downloaded here from University of Washington's TechTransfer Digital Ventures. The majority of the webinar is concerned with Rosetta 2.3.0. Rosetta 3.0 is still a beta version. Hope to see you there, Nir London. Rosetta Design Group | http://rosettadesigngroup.com/ From clements at nescent.org Mon Mar 30 00:28:57 2009 From: clements at nescent.org (Dave Clements) Date: Sun, 29 Mar 2009 21:28:57 -0700 Subject: [BiO BB] 2009 GMOD Summer Schools - Americas & Europe In-Reply-To: References: Message-ID: ***The application deadline for both GMOD summer schools is April 6, one week from now.*** GMOD Summer School - Americas will be held 16-19 July at the National Evolutionary Synthesis Center (NESCent), in Durham, NC, USA. Student tuition is free. See http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas GMOD Summer School - Europe will be held 3-6 August at the University of Oxford, in Oxford, UK. This is a part of GMOD Europe 2009, which includes the next GMOD Meeting. Student tuition is ?95. See http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe Please contact the GMOD Help Desk (help at gmod.org) if you have questions. We hope to see you in Durham or Oxford, Dave C. On Mon, Mar 16, 2009 at 1:43 PM, Dave Clements wrote: > We are now accepting applications for the 2009 GMOD Summer Schools: > > Americas, 16-19 July > ?- National Evolutionary Synthesis Center (NESCent), Durham, NC, USA > ?- Student tuition is free, thanks to NIH grant 1R01HG004483-01. > ?- http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas > > Europe, 3-6 August > ?- University of Oxford, Oxford, United Kingdom > ?- Part of GMOD Europe 2009, which includes the next GMOD Meeting > ?- Student tuition is ?95 > ?- http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe > > GMOD (http://gmod.org/) is a collection of interoperable open source > software components for managing, visualizing, annotating and > integrating biological, mostly genomic, data. ?GMOD is also a > community of developers and users dealing with similar challenges. > GMOD is used in diverse contexts, with both emerging and established > model organisms. > > GMOD Summer Schools (http://gmod.org/wiki/GMOD_Summer_School) > introduce new GMOD users to the GMOD project and feature several days > of hands-on training on how to install, configure and administer GMOD > tools. > > The courses includes training on several GMOD components: > ?* GBrowse - the widely used Generic Genome Browser > ?* Chado - a modular and extensible database schema for biological data > ?* Apollo - genome annotation editor > ?* BioMart - biological data warehouse system > ?* GBrowse_syn - a GBrowse based synteny viewer > ?* JBrowse - a brand new Web 2.0 genome browser > ?* Artemis-Chado Integration (Europe only) > ?* MAKER - Genome annotation pipeline (Americas only) > ?* Tripal - Web front end for Chado (Americas only) > > ***Please submit an application by the end of 6 April 2009, if you are > interested in attending. *** > > Enrollment is limited to 25 students in each course. ?If applications > exceed capacity (and we expect they will) then applicants will be > picked based on the strength of their application. ?Applicants will be > notified of their admission status in mid-April. > > Thanks, > > Dave Clements > GMOD Help Desk > help at gmod.org > > http://gmod.org/wiki/2009_GMOD_Summer_School_-_Americas > http://gmod.org/wiki/2009_GMOD_Summer_School_-_Europe > http://gmod.org/wiki/GMOD_Europe_2009 > From lijo.skb at gmail.com Tue Mar 31 00:29:57 2009 From: lijo.skb at gmail.com (Lijo) Date: Tue, 31 Mar 2009 09:59:57 +0530 Subject: [BiO BB] Rosetta Academic Training Webinar In-Reply-To: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> References: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> Message-ID: Dear Nir, It says open source, but it asking money(may be of its own reasons) on its first page. I really confused. regds lijo Centre for Bioinformatics UoK On Wed, Mar 25, 2009 at 10:13 PM, Nir London wrote: > The Rosetta Design Group is proud to present the first webinar in the > Rosetta Academic Workshop Series. For the first webinar, we have selected to > focus on Protein-Protein Docking based on the answers to the interest poll. > We hope this will be the first in a line of helpful and inspiring webinars > to kick-off our Rosetta Academic Workshop Series. > > What: Protein-Protein Docking > When: May 4th 2009, 0800-1000 AM EST > > Where: Your office! > > Click here for more details and registration > > (For non html emails: > http://rosettadesigngroup.com/RDGLS/index.php?sid=54479&lang=en ) > > Pleas note: This is not a promotional webinar. Rosetta is open-source and > freeware for academic and non-profit organizations and can be downloaded > here from University of Washington's TechTransfer Digital Ventures. The > majority of the webinar is concerned with Rosetta 2.3.0. Rosetta 3.0 is > still a beta version. > > Hope to see you there, > > Nir London. > > Rosetta Design Group | http://rosettadesigngroup.com/ > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Centre for Bioinformatics, University of Kerala, India. +91 9446515705(res)