From dalesan at gmail.com Wed Apr 1 15:17:15 2009 From: dalesan at gmail.com (dale richardson) Date: Wed, 1 Apr 2009 20:17:15 +0100 Subject: [BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank? In-Reply-To: References: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> Message-ID: <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> Hello All, Please forgive me if this post comes off as inexperienced, but if any of you have the time I would like to hear your suggestions on the following problem. I've got a set of genomic DNA sequences for a number of species. What I want to do is to obtain only full-length cDNA matches to these genomic sequences from GenBank, excluding Refseq sequences. What I've been doing so far is blasting these genomic sequences against the nr nucleotide database and manually evaluating which hits to keep or discard, depending on the coverage of the subject sequence to the query. While this method may be suitable for organisms with poorly characterized expression data, when trying to do this for mouse or human the task becomes entirely daunting. So my question is this: What is the most efficient way to obtain a set of cDNA sequences that match to a set of genomic DNA sequences while excluding spurious hits , RefSeq sequences and "pseudo" full length cDNAs? As you can imagine, I am interesting in looking for alternative splice variants for a number of genes. Any information or help that you could graciously muster would be very much appreciated. with sincere regards, dale richardson From pfern at igc.gulbenkian.pt Thu Apr 2 11:41:51 2009 From: pfern at igc.gulbenkian.pt (Pedro Fernandes) Date: Thu, 02 Apr 2009 16:41:51 +0100 Subject: [BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank? In-Reply-To: <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> References: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> Message-ID: <1238686911.49d4dcbf801e9@webmail.igc.gulbenkian.pt> Hi I would do it programmatically. You do not even need to know much of PERL to create your own simple scripts and the ENSEMBL APIs. Go to http://www.ensembl.org and look for the APIs in the Docs & FAQ's section. It is full of instructions and examples. Good luck Pedro -- Pedro Fernandes Centro Portugu?s de Bioinform?tica Instituto Gulbenkian de Ci?ncia Apartado 14 2781 OEIRAS PORTUGAL -------------------------------------- Quoting dale richardson : > Hello All, > > Please forgive me if this post comes off as inexperienced, but if any > of you have the time I would like to hear your suggestions on the > following problem. > > I've got a set of genomic DNA sequences for a number of species. What > I want to do is to obtain only full-length cDNA matches to these > genomic sequences from GenBank, excluding Refseq sequences. What I've > been doing so far is blasting these genomic sequences against the nr > nucleotide database and manually evaluating which hits to keep or > discard, depending on the coverage of the subject sequence to the > query. While this method may be suitable for organisms with poorly > characterized expression data, when trying to do this for mouse or > human the task becomes entirely daunting. > > So my question is this: > > What is the most efficient way to obtain a set of cDNA sequences that > match to a set of genomic DNA sequences while excluding spurious > hits , RefSeq sequences and "pseudo" full length cDNAs? > > As you can imagine, I am interesting in looking for alternative splice > variants for a number of genes. > > Any information or help that you could graciously muster would be very > much appreciated. > > with sincere regards, > > dale richardson > > > > > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From marchywka at hotmail.com Thu Apr 2 12:09:55 2009 From: marchywka at hotmail.com (Mike Marchywka) Date: Thu, 2 Apr 2009 12:09:55 -0400 Subject: [BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank? In-Reply-To: <1238686911.49d4dcbf801e9@webmail.igc.gulbenkian.pt> References: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> <1238686911.49d4dcbf801e9@webmail.igc.gulbenkian.pt> Message-ID: ---------------------------------------- > Date: Thu, 2 Apr 2009 16:41:51 +0100 > From: pfern at igc.gulbenkian.pt > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank? > > Hi > > I would do it programmatically. You do not even need to know much of PERL to > create your own simple scripts and the ENSEMBL APIs. > I was using bash scripts with various things ( sed/awk) to parse blast output on short probe queries and then using wget or curl to request genome sequence near the hits ( alt, you can just download the complete genomes locally and use your favorite random access facility, perl would work, to get pieces you want). IIRC, I then used my own c++ code for various tests. For unrelated work on splicing, many arguable splicing cues could be formulated as regular expressions with reverse-complement matches. You can also set up your own local blast DB or get other patterns or rules against which to search. Not sure if there are canned tools but it isn't hard to do a lot of this locally once you get coarse hits for marginal candidates. > > Go to http://www.ensembl.org and look for the APIs in the Docs & FAQ's section. > It is full of instructions and examples. > > Good luck > Pedro > > -- > Pedro Fernandes > Centro Portugu?s de Bioinform?tica > Quoting dale richardson : > >> >> So my question is this: >> >> What is the most efficient way to obtain a set of cDNA sequences that >> match to a set of genomic DNA sequences while excluding spurious >> hits , RefSeq sequences and "pseudo" full length cDNAs? >> >> As you can imagine, I am interesting in looking for alternative splice >> variants for a number of genes. _________________________________________________________________ Rediscover Hotmail?: Get quick friend updates right in your inbox. http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009 From mike.fursov at gmail.com Fri Apr 3 03:06:50 2009 From: mike.fursov at gmail.com (Mikhail Fursov) Date: Fri, 3 Apr 2009 14:06:50 +0700 Subject: [BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank? In-Reply-To: References: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> <1238686911.49d4dcbf801e9@webmail.igc.gulbenkian.pt> Message-ID: What is the size of species genomes you use? Do you have them locally? If genomes size is < RAM on you computer a simple example could be: 1) Merge all your sequences into a single sequence with ~100 'N' chars between them 2) Merge all genomes 3) Find repeats (common hits) between 2 resulted sequences On Thu, Apr 2, 2009 at 11:09 PM, Mike Marchywka wrote: > > ---------------------------------------- > > Date: Thu, 2 Apr 2009 16:41:51 +0100 > > From: pfern at igc.gulbenkian.pt > > To: bbb at bioinformatics.org > > Subject: Re: [BiO BB] Efficient way to retrieve full length cDNA > sequences from GenBank? > > > > Hi > > > > I would do it programmatically. You do not even need to know much of PERL > to > > create your own simple scripts and the ENSEMBL APIs. > > > > I was using bash scripts with various things ( sed/awk) to parse blast > output > on short probe queries and then using wget or curl to request > genome sequence near the hits ( alt, you can just download > the complete genomes locally and use your favorite random access > facility, perl would work, to get pieces you want). > IIRC, I then used my own c++ code for various tests. > > For unrelated work on splicing, many arguable splicing cues could be > formulated as regular expressions with reverse-complement matches. > You can also set up your own local blast DB or get other patterns > or rules against which to search. Not sure if there are canned > tools but it isn't hard to do a lot of this locally once you > get coarse hits for marginal candidates. > > > > > > > Go to http://www.ensembl.org and look for the APIs in the Docs & FAQ's > section. > > It is full of instructions and examples. > > > > Good luck > > Pedro > > > > -- > > Pedro Fernandes > > Centro Portugu?s de Bioinform?tica > > > Quoting dale richardson : > > > >> > >> So my question is this: > >> > >> What is the most efficient way to obtain a set of cDNA sequences that > >> match to a set of genomic DNA sequences while excluding spurious > >> hits , RefSeq sequences and "pseudo" full length cDNAs? > >> > >> As you can imagine, I am interesting in looking for alternative splice > >> variants for a number of genes. > > > _________________________________________________________________ > Rediscover Hotmail?: Get quick friend updates right in your inbox. > > http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009 > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Mikhail Fursov From paolo.romano at istge.it Fri Apr 3 11:14:56 2009 From: paolo.romano at istge.it (Paolo Romano) Date: Fri, 03 Apr 2009 17:14:56 +0200 Subject: [BiO BB] CFP: NETTAB 2009 on Collaborative Bioinformatics Research and Development (social networks, wiki, ...) Message-ID: <200904031515.n33FEvVs001864@clus2.istge.it> Apologies if you receive more copies. =========================== Announce and Preliminary Call for Papers NETTAB 2009 Workshop on "Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development" with a Special Session on: "Methods and Tools for RNA Structure and Functional Analysis" June 10-13, 2009 Department of Computer Science, University of Catania, Italy http://www.nettab.org/2009/ Submissions deadlines: - April 28, 2009: Oral communication submission - May 15, 2009: Posters submission Submissions must be short papers of around 3 pages or 12.000 characters long. Special Issues in peer-review journals on workshop's topics planned: post-workshop ad hoc Call for papers will be issued. RATIONALE Advent of Wide Area Networks (WAN) allowed the availability of distributed information and prompted the need for searching and retrieving this data (Network Information Retrieval tools, NIR), as well the development of unprecedented communications between users (Computer Mediated Communication tools, CMC). Initially, CMC was asynchronous and based on electronic mail and newsgroups. From email systems, mailing lists and newsletter were soon derived, while newsgroups generated, shortly after, electronic fora. Synchronous communication were introduced through the advent of chat services. On this line, current multimedia teleconference systems were then set up. Virtual reality was first introduced for educational purposes by means of MUD (Multi-users Domain) systems, and especially by means of MOO (MUD Object-oriented). This line produced current virtual reality environment, like the emerging Second Life system. Life Sciences researchers largely took profit from CMC tools. The bionet newsgroups hierarchy remains one of the most famous and useful CMC system supporting life science research. Many mailing lists that were born in that context are still used. The development of open source software was largely made possible by the possibility of exchanging, in an effective way, knowledge, practices and skills among researchers. Web sites of communities of scientists were set up and often constituted the base for a real collaborative development and research. The most recent developments of collaborative development tools are impressive. Researchers can now collaboratively develop software (open source systems), discuss and compare development strategies (social networks), write documents (google docs, wiki systems), build knowledge bases. So, it may now be the time for presenting current technologies, tools and applications for collaborative work and for discussing perspectives of their utilization in support of Bioinformatics. For these reasons, NETTAB 2009 will be devoted to "Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development". Moreover, there will be a special session on "Methods and Tools for RNA Structure and Functional Analysis". The transcription of almost all genomes generates a great number of coding and non-coding RNAs (ncRNAs). Although RNA is central to the synthesis of proteins, it is not only a messenger of genetic information: many cellular functions depend on ncRNAs, which exert their functions by their sequence and structure. In particular, small silencing RNAs (miRNAs, siRNAs and piRNAs) play a crucial role in many physiological processes and their aberrant expression is a common feature of human diseases including cancer. Models and tools able to increase our understanding of RNAs functions and their involvement in diseases may lead to the design of new RNA-based therapeutics. The RNA community is also taking advantage of collaborative research tools such as Wikis and other virtual environments. The RNA WikiProject contains now over 600 articles describing families of noncoding RNAs based on the Rfam database, and invite the community to update, edit, and correct those articles. Therefore, the NETTAB 2009 special session will focus on collaborative research project, computational methods and tools for the analysis of RNA structures and functions, with a special emphasis on ncRNAs. KEYNOTE SPEAKER # Alex Bateman Wellcome Trust Sanger Institute Hinxton, Cambridge, UK # Tim Clark Director of Informatics, MassGeneral Institute for Neurodegenerative Disease Neurology Research Department, Massachusetts General Hospital, Boston, USA # Duncan Hull School of Chemistry, University of Manchester, Manchester, UK # Michael Levitt Stanford University, USA # Debora Marks Systems Biology Department, Harvard Medical School Boston, USA TOPICS - Collaborative Web sites (bioinformatics.org, biojava, bioperl, ) - Communities of Practices (CoPs) Scientific practices in scientific communities Automatic detection / gathering / modelling of scientific practices Implementations of CoPs - Social networking (myExperiment, Annotea, myScience) Social Bookmarking Semantic Document Markup Relationships mining from literature - Open Source development Sharing of data models, libraries, interfaces - Social software for collaborative documentation development Wikis, blogs, google docs Knowledge Wikis Social-software-mediated collaborative scientific research Social-software-mediated collaborative tools' development Knowledge base collaborative development Ontologies collaborative development - Education and training tools E-learning Virtual environments Methods and Tools for RNA Structure and Functional Analysis - RNA structure prediction - Collaborative studies of RNAs - ncRNAs functional analysis and classification - miRNAs and networks - Genome-wide functional studies - Identification of ncRNAs - Databases of ncRNAs and miRNA targets - miRNA targets prediction - Synthetic miRNA and siRNA design - Gene expression analysis - Analysis of viral RNAs - RNAi therapeutics - Identification of ncRNAs biomarkers - RNA-protein interaction prediction DEADLINES Submissions for both oral communications and posters must be short papers of around THREE A4 pages or 12.000 characters long. - April 28, 2009: Oral communication submission Acceptation communication: May 12, 2009 - May 15, 2009: Posters submission - May 17, 2009: Early registration - June 10-13, 2009: Tutorials and Workshop Calls for SPECIAL ISSUES We plan to launch Calls for Special Issues on the themes of the workshop in peer-review journals with associated Impact factor around July for submission in September 2009. Best regards. Paolo Romano on behalf of NETTAB 2009 Chairs Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 AIUTACI AD AIUTARE: Il tuo 5 per MILLE a sostegno della nostra RICERCA. Come fare: Nella prossima dichiarazione dei redditi metti la firma nell'apposito riquadro del 5 per mille, scrivendo anche il codice fiscale dell'Istituto Nazionale per la Ricerca sul Cancro di Genova : c.f. 80 100 850 108 Istituto Nazionale per la Ricerca sul Cancro L.go R. Benzi, 10 -16132 Genova http://www.istge.it From mmokrejs at ribosome.natur.cuni.cz Thu Apr 9 15:33:29 2009 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Thu, 09 Apr 2009 21:33:29 +0200 Subject: [BiO BB] Efficient way to retrieve full length cDNA sequences from GenBank? In-Reply-To: <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> References: <819C1600-4AAF-4F40-8C50-F02EED410A26@rosettadesigngroup.com> <41E72E38-197D-4DC2-90DB-19C253155677@gmail.com> Message-ID: <49DE4D89.1020607@ribosome.natur.cuni.cz> Hi Dale, forget about GenBank, go for http://h-invitational.jp/ which collected the full length cDNA data for you already. martin dale richardson wrote: > Hello All, > > Please forgive me if this post comes off as inexperienced, but if any of > you have the time I would like to hear your suggestions on the following > problem. > > I've got a set of genomic DNA sequences for a number of species. What I > want to do is to obtain only full-length cDNA matches to these genomic > sequences from GenBank, excluding Refseq sequences. What I've been doing > so far is blasting these genomic sequences against the nr nucleotide > database and manually evaluating which hits to keep or discard, > depending on the coverage of the subject sequence to the query. While > this method may be suitable for organisms with poorly characterized > expression data, when trying to do this for mouse or human the task > becomes entirely daunting. > > So my question is this: > > What is the most efficient way to obtain a set of cDNA sequences that > match to a set of genomic DNA sequences while excluding spurious hits , > RefSeq sequences and "pseudo" full length cDNAs? > > As you can imagine, I am interesting in looking for alternative splice > variants for a number of genes. > > Any information or help that you could graciously muster would be very > much appreciated. > > with sincere regards, > > dale richardson -- Dr. Martin Mokrejs Dept. of Genetics and Microbiology Faculty of Science, Charles University Vinicna 5, 128 43 Prague, Czech Republic tel: +420-2-2195 1716 http://www.iresite.org http://www.iresite.org/~mmokrejs From isbra-l at engr.uconn.edu Wed Apr 15 17:33:38 2009 From: isbra-l at engr.uconn.edu (ISBRA Symposium Announcements) Date: Wed, 15 Apr 2009 17:33:38 -0400 (EDT) Subject: [BiO BB] [ISBRA-L] ISBRA'09/CIGE'09 Call for Participation Message-ID: 5th International Workshop on Bioinformatics Research and Applications in conjunction with 2nd Workshop on Computational Issues in Genetic Epidemiology Nova Southeastern University, Ft. Lauderdale, Florida, USA May 13-May 16, 2009 The International Symposium on Bioinformatics Research and Applications (ISBRA) provides a forum for the exchange of ideas and results among researchers, developers, and practitioners working on all aspects of bioinformatics and computational biology and their applications. The fifth edition of the symposium will be held on May 13-May 16, 2009 at Nova Southeastern University, Ft. Lauderdale, Florida, USA in conjunction with the 2nd Workshop on Computational Issues in Genetic Epidemiology. The technical program features 6 invited keynote talks: * Web-based, participant-driven association studies: research at 23andMe Nick Eriksson (23andMe) * Evolution of Regulatory Systems in Bacteria Mikhail Gelfand (Russian Academy of Sciences) * Networks of Relatedness Within and Across Populations Itsik Pe'er (Columbia University) * Interpreting population sequencing data Shamil Sunyaev (Brigham and Women's Hospital/Harvard Medical School) * Bioinformatics Challenges in Translational Research Nicholas Tsinoremas (University of Miami) * Motif Construction from High-Throughput SELEX Data Esko Ukkonen (University of Helsinki) The program includes contributed papers and posters covering a broad range of topics in bioinformatics and computational genetic epidemiology, including gene expression analysis, biological networks, comparative genomics, phylogenetics, structure prediction, DNA self-assembly, sequence analysis, population genomics, and genome-wide association studies. For further details and travel information see the symposium website at http://www.cs.gsu.edu/isbra09/ _______________________________________________ ISBRA-L mailing list ISBRA-L at dna.engr.uconn.edu http://dna.engr.uconn.edu/mailman/listinfo/isbra-l From training at ebi.ac.uk Fri Apr 17 04:58:51 2009 From: training at ebi.ac.uk (European Bioinformatics Institute - EMBL) Date: Fri, 17 Apr 2009 10:58:51 +0200 Subject: [BiO BB] LAST CHANCE TO APPLY FOR THE EBI's HANDS ON COURSE: A WALK THROUGH THE Message-ID: LAST CHANCE TO APPLY FOR THE EBI's HANDS ON COURSE: A WALK THROUGH THE EBI'S DATA RESOURCES REGISTRATION DEADLINE EXTENDED TO 12:00 NOON BST ON TUESDAY 21 APRIL For full details of the course and to register, please visit: http://www.ebi.ac.uk/training/handson/course_090511_walkthrough.html Do you need access to information on biological sequences, structures, reactions or pathways as part of your research? Are you baffled by what's out there, unsure which data sets you're searching, or worried by strange and seemingly irrelevant search results? If so, this course is for you. The EBI's experts will walk you through Europe's main biological databases, explaining where the data come from, how we make sure that the databases remain up to date and comprehensive, and how to search them effectively. We'll give you plenty of opportunities to practice using them for yourself and to ask questions that will help you to solve your own biological data-mining problems. The course is especially intended for experimental biologists with no formal training in bioinformatics. Data resources for the following research areas will be covered: - genomics (Ensembl, sequence search tools) - transcriptomics (ArrayExpress) - proteomics (UniProt, InterPro, PRIDE, PICR) - protein structures (PDBe) - small molecules (ChEBI) - molecular interactions (IntAct) - pathways (Reactome) - searching and mining the scientific literature - RNA resources The course will combine lectures and hands-on tutorials to help you actively learn and resolve biologically relevant problems. It will take place in the EBI's IT training facility at The Wellcome Trust Genome Campus near Cambridge in the UK. The course has been insignificantly subsidised to keep costs as low as possible. Included in the registration fee are four nights' accommodation (11, 12, 13 & 14 May) and all meals and refreshments for the duration of the course. Accommodation is on site at The Wellcome Trust Campus, which is set in 55 acres of beautiful parkland a few miles south of Cambridge, UK. Single accommodation ?400 Shared accommodation ?270 THE REGISTRATION DEADLINE IS 12:00 NOON BST ON TUESDAY 21 APRIL FOR MORE INFORMATION ON THE EMBL-EBI'S HANDS ON PROGRAMME, PLEASE VISIT: http://www.ebi.ac.uk/training/handson/ SIGN UP AT http://www.ebi.ac.uk/support/traininglist.php TO RECEIVE UPDATES ON OUR COURSES OR GO TO http://www.ebi.ac.uk/Information/events/calendar/rss.php?events_subcategory_id=42 TO SUBSCRIBE TO OUR RSS FEED If you do not want to receive any further event announcements of the European Molecular Biology Laboratory, please reply to this email with the subject "unsubscribe". Thanks for your understanding. Yours sincerely, Cath Brooksbank EMBL Outstation - Hinxton European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge, CB10 1SD United Kingdom cath at ebi.ac.uk Phone: +44 (0)1223 492 525 Fax: +44 (0)1223 494 468 From dan.bolser at gmail.com Thu Apr 23 03:46:18 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 23 Apr 2009 08:46:18 +0100 Subject: [BiO BB] Bourne award... strange statement Message-ID: <2c8757af0904230046l73f4c76dl1f447a23ab36631c@mail.gmail.com> One of the statements about the award in a recent email reads: "As co-director of the PDB, Bourne has transformed an under-utilized database into a major international resource." I find that hard to believe. Does anyone have information about this statement? I think the PDB has always been a major international resource from the day it was created. I'd like to find out more, but as far as I know this statement is totally erroneous (or at least misleading). In particular, what happened at the PDB to turn it from 'under-utilized' into 'major international resource'? Cheers, Dan. From dan.bolser at gmail.com Thu Apr 23 11:42:42 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 23 Apr 2009 16:42:42 +0100 Subject: [BiO BB] Comparing sequences from GenBank and RefSeq... Message-ID: <2c8757af0904230842h7e2e3a12gc5ecd1ae1294a492@mail.gmail.com> Hi, I found that the potato chloroplast sequence from GenBank (DQ231562.1) has several differences (260 SNPs and 30 indels) relative to the same sequence in RefSeq (NC_008096.1). As far as I am aware this sequence has only been obtained once, why would the two differ? In general should I trust the refseq sequence? For your reference here is the output of dnadiff over the two files: Reference/DQ231562.fasta Query/NC_008096.fasta NUCMER [REF] [QRY] [Sequences] TotalSeqs 1 1 AlignedSeqs 1(100.00%) 1(100.00%) UnalignedSeqs 0(0.00%) 0(0.00%) [Bases] TotalBases 155312 155298 AlignedBases 155312(100.00%) 155298(100.00%) UnalignedBases 0(0.00%) 0(0.00%) [Alignments] 1-to-1 1 1 TotalLength 155312 155298 AvgLength 155312.00 155298.00 AvgIdentity 99.81 99.81 M-to-M 1 1 TotalLength 155312 155298 AvgLength 155312.00 155298.00 AvgIdentity 99.81 99.81 [Feature Estimates] Breakpoints 0 0 Relocations 0 0 Translocations 0 0 Inversions 0 0 Insertions 0 0 InsertionSum 0 0 InsertionAvg 0.00 0.00 TandemIns 0 0 TandemInsSum 0 0 TandemInsAvg 0.00 0.00 [SNPs] TotalSNPs 260 260 AC 23(8.85%) 14(5.38%) AG 24(9.23%) 30(11.54%) AT 15(5.77%) 14(5.38%) CA 14(5.38%) 23(8.85%) CG 24(9.23%) 18(6.92%) CT 32(12.31%) 19(7.31%) GA 30(11.54%) 24(9.23%) GC 18(6.92%) 24(9.23%) GT 13(5.00%) 34(13.08%) TA 14(5.38%) 15(5.77%) TC 19(7.31%) 32(12.31%) TG 34(13.08%) 13(5.00%) TotalGSNPs 113 113 AC 9(7.96%) 8(7.08%) AG 17(15.04%) 17(15.04%) AT 5(4.42%) 3(2.65%) CA 8(7.08%) 9(7.96%) CG 6(5.31%) 7(6.19%) CT 15(13.27%) 8(7.08%) GA 17(15.04%) 17(15.04%) GC 7(6.19%) 6(5.31%) GT 6(5.31%) 12(10.62%) TA 3(2.65%) 5(4.42%) TC 8(7.08%) 15(13.27%) TG 12(10.62%) 6(5.31%) TotalIndels 30 30 A. 14(46.67%) 4(13.33%) C. 1(3.33%) 0(0.00%) G. 0(0.00%) 0(0.00%) T. 7(23.33%) 4(13.33%) TotalGIndels 24 24 A. 10(41.67%) 4(16.67%) C. 1(4.17%) 0(0.00%) G. 0(0.00%) 0(0.00%) T. 5(20.83%) 4(16.67%) Thanks for any pointers, Dan. From marty.gollery at gmail.com Thu Apr 23 08:58:53 2009 From: marty.gollery at gmail.com (Martin Gollery) Date: Thu, 23 Apr 2009 05:58:53 -0700 Subject: [BiO BB] Bourne award... strange statement In-Reply-To: <2c8757af0904230046l73f4c76dl1f447a23ab36631c@mail.gmail.com> References: <2c8757af0904230046l73f4c76dl1f447a23ab36631c@mail.gmail.com> Message-ID: I noticed that too, Dan, and I have no explanation for it. Perhaps this just means that more people use it now. Marty On Thu, Apr 23, 2009 at 12:46 AM, Dan Bolser wrote: > One of the statements about the award in a recent email reads: > > "As co-director of the PDB, Bourne has transformed an under-utilized > database into a major international resource." > > I find that hard to believe. Does anyone have information about this > statement? I think the PDB has always been a major international > resource from the day it was created. I'd like to find out more, but > as far as I know this statement is totally erroneous (or at least > misleading). In particular, what happened at the PDB to turn it from > 'under-utilized' into 'major international resource'? > > Cheers, > Dan. > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com From ryan.raaum at gmail.com Thu Apr 23 12:15:34 2009 From: ryan.raaum at gmail.com (Ryan Raaum) Date: Thu, 23 Apr 2009 12:15:34 -0400 Subject: [BiO BB] Comparing sequences from GenBank and RefSeq... In-Reply-To: <2c8757af0904230842h7e2e3a12gc5ecd1ae1294a492@mail.gmail.com> References: <2c8757af0904230842h7e2e3a12gc5ecd1ae1294a492@mail.gmail.com> Message-ID: The refseq entry tells you which non-refseq entry/entries it was derived from. In this case it says DQ386163, which suggests there are at least 2 pototo chloroplast sequences available - one by an Italian group and one by a Korean group. On Thu, Apr 23, 2009 at 11:42 AM, Dan Bolser wrote: > Hi, > > I found that the potato chloroplast sequence from GenBank (DQ231562.1) > has several differences (260 SNPs and 30 indels) relative to the same > sequence in RefSeq (NC_008096.1). As far as I am aware this sequence > has only been obtained once, why would the two differ? In general > should I trust the refseq sequence? > > > For your reference here is the output of dnadiff over the two files: > > Reference/DQ231562.fasta Query/NC_008096.fasta > NUCMER > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? [REF] ? ? ? ? ? ? ? ?[QRY] > [Sequences] > TotalSeqs ? ? ? ? ? ? ? ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? ? ?1 > AlignedSeqs ? ? ? ? ? ? ? 1(100.00%) ? ? ? ? ? 1(100.00%) > UnalignedSeqs ? ? ? ? ? ? ? 0(0.00%) ? ? ? ? ? ? 0(0.00%) > > [Bases] > TotalBases ? ? ? ? ? ? ? ? ? ?155312 ? ? ? ? ? ? ? 155298 > AlignedBases ? ? ? ? 155312(100.00%) ? ? ?155298(100.00%) > UnalignedBases ? ? ? ? ? ? ?0(0.00%) ? ? ? ? ? ? 0(0.00%) > > [Alignments] > 1-to-1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?1 > TotalLength ? ? ? ? ? ? ? ? ? 155312 ? ? ? ? ? ? ? 155298 > AvgLength ? ? ? ? ? ? ? ? ?155312.00 ? ? ? ? ? ?155298.00 > AvgIdentity ? ? ? ? ? ? ? ? ? ?99.81 ? ? ? ? ? ? ? ?99.81 > > M-to-M ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?1 > TotalLength ? ? ? ? ? ? ? ? ? 155312 ? ? ? ? ? ? ? 155298 > AvgLength ? ? ? ? ? ? ? ? ?155312.00 ? ? ? ? ? ?155298.00 > AvgIdentity ? ? ? ? ? ? ? ? ? ?99.81 ? ? ? ? ? ? ? ?99.81 > > [Feature Estimates] > Breakpoints ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 > Relocations ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 > Translocations ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 > Inversions ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 > > Insertions ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 > InsertionSum ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 > InsertionAvg ? ? ? ? ? ? ? ? ? ?0.00 ? ? ? ? ? ? ? ? 0.00 > > TandemIns ? ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 > TandemInsSum ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 > TandemInsAvg ? ? ? ? ? ? ? ? ? ?0.00 ? ? ? ? ? ? ? ? 0.00 > > [SNPs] > TotalSNPs ? ? ? ? ? ? ? ? ? ? ? ?260 ? ? ? ? ? ? ? ? ?260 > AC ? ? ? ? ? ? ? ? ? ? ? ? 23(8.85%) ? ? ? ? ? ?14(5.38%) > AG ? ? ? ? ? ? ? ? ? ? ? ? 24(9.23%) ? ? ? ? ? 30(11.54%) > AT ? ? ? ? ? ? ? ? ? ? ? ? 15(5.77%) ? ? ? ? ? ?14(5.38%) > CA ? ? ? ? ? ? ? ? ? ? ? ? 14(5.38%) ? ? ? ? ? ?23(8.85%) > CG ? ? ? ? ? ? ? ? ? ? ? ? 24(9.23%) ? ? ? ? ? ?18(6.92%) > CT ? ? ? ? ? ? ? ? ? ? ? ?32(12.31%) ? ? ? ? ? ?19(7.31%) > GA ? ? ? ? ? ? ? ? ? ? ? ?30(11.54%) ? ? ? ? ? ?24(9.23%) > GC ? ? ? ? ? ? ? ? ? ? ? ? 18(6.92%) ? ? ? ? ? ?24(9.23%) > GT ? ? ? ? ? ? ? ? ? ? ? ? 13(5.00%) ? ? ? ? ? 34(13.08%) > TA ? ? ? ? ? ? ? ? ? ? ? ? 14(5.38%) ? ? ? ? ? ?15(5.77%) > TC ? ? ? ? ? ? ? ? ? ? ? ? 19(7.31%) ? ? ? ? ? 32(12.31%) > TG ? ? ? ? ? ? ? ? ? ? ? ?34(13.08%) ? ? ? ? ? ?13(5.00%) > > TotalGSNPs ? ? ? ? ? ? ? ? ? ? ? 113 ? ? ? ? ? ? ? ? ?113 > AC ? ? ? ? ? ? ? ? ? ? ? ? ?9(7.96%) ? ? ? ? ? ? 8(7.08%) > AG ? ? ? ? ? ? ? ? ? ? ? ?17(15.04%) ? ? ? ? ? 17(15.04%) > AT ? ? ? ? ? ? ? ? ? ? ? ? ?5(4.42%) ? ? ? ? ? ? 3(2.65%) > CA ? ? ? ? ? ? ? ? ? ? ? ? ?8(7.08%) ? ? ? ? ? ? 9(7.96%) > CG ? ? ? ? ? ? ? ? ? ? ? ? ?6(5.31%) ? ? ? ? ? ? 7(6.19%) > CT ? ? ? ? ? ? ? ? ? ? ? ?15(13.27%) ? ? ? ? ? ? 8(7.08%) > GA ? ? ? ? ? ? ? ? ? ? ? ?17(15.04%) ? ? ? ? ? 17(15.04%) > GC ? ? ? ? ? ? ? ? ? ? ? ? ?7(6.19%) ? ? ? ? ? ? 6(5.31%) > GT ? ? ? ? ? ? ? ? ? ? ? ? ?6(5.31%) ? ? ? ? ? 12(10.62%) > TA ? ? ? ? ? ? ? ? ? ? ? ? ?3(2.65%) ? ? ? ? ? ? 5(4.42%) > TC ? ? ? ? ? ? ? ? ? ? ? ? ?8(7.08%) ? ? ? ? ? 15(13.27%) > TG ? ? ? ? ? ? ? ? ? ? ? ?12(10.62%) ? ? ? ? ? ? 6(5.31%) > > TotalIndels ? ? ? ? ? ? ? ? ? ? ? 30 ? ? ? ? ? ? ? ? ? 30 > A. ? ? ? ? ? ? ? ? ? ? ? ?14(46.67%) ? ? ? ? ? ?4(13.33%) > C. ? ? ? ? ? ? ? ? ? ? ? ? ?1(3.33%) ? ? ? ? ? ? 0(0.00%) > G. ? ? ? ? ? ? ? ? ? ? ? ? ?0(0.00%) ? ? ? ? ? ? 0(0.00%) > T. ? ? ? ? ? ? ? ? ? ? ? ? 7(23.33%) ? ? ? ? ? ?4(13.33%) > > TotalGIndels ? ? ? ? ? ? ? ? ? ? ?24 ? ? ? ? ? ? ? ? ? 24 > A. ? ? ? ? ? ? ? ? ? ? ? ?10(41.67%) ? ? ? ? ? ?4(16.67%) > C. ? ? ? ? ? ? ? ? ? ? ? ? ?1(4.17%) ? ? ? ? ? ? 0(0.00%) > G. ? ? ? ? ? ? ? ? ? ? ? ? ?0(0.00%) ? ? ? ? ? ? 0(0.00%) > T. ? ? ? ? ? ? ? ? ? ? ? ? 5(20.83%) ? ? ? ? ? ?4(16.67%) > > > Thanks for any pointers, > Dan. > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Ryan Raaum Assistant Professor Department of Anthropology Lehman College, The City University of New York 250 Bedford Park Blvd. West Bronx, NY 10468 e: ryan.raaum at lehman.cuny.edu w: http://www.raaum.org o: (718) 960-8845 f: (718) 960-8406 From landman at scalableinformatics.com Thu Apr 23 12:31:46 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 23 Apr 2009 12:31:46 -0400 Subject: [BiO BB] Bourne award... strange statement In-Reply-To: <2c8757af0904230046l73f4c76dl1f447a23ab36631c@mail.gmail.com> References: <2c8757af0904230046l73f4c76dl1f447a23ab36631c@mail.gmail.com> Message-ID: <49F097F2.3090502@scalableinformatics.com> Dan Bolser wrote: > One of the statements about the award in a recent email reads: > > "As co-director of the PDB, Bourne has transformed an under-utilized > database into a major international resource." > > I find that hard to believe. Does anyone have information about this > statement? I think the PDB has always been a major international > resource from the day it was created. I'd like to find out more, but > as far as I know this statement is totally erroneous (or at least > misleading). In particular, what happened at the PDB to turn it from > 'under-utilized' into 'major international resource'? Transfer from Brookhaven to UCSD? I had heard from some involved that it was ... er ... contentious. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From jqb at Cs.Nott.AC.UK Fri Apr 24 08:28:31 2009 From: jqb at Cs.Nott.AC.UK (Jaume Bacardit) Date: Fri, 24 Apr 2009 13:28:31 +0100 Subject: [BiO BB] [Call for Participation] Plant Bioinformatics, Systems and Synthetic Biology Summer School. 27-31 July 2009, Nottingham Message-ID: <49F1B06F.9030000@cs.nott.ac.uk> [Apologies if you receive this announcement multiple times] We would like to invite participants, specially doctoral students, to the Plant Bioinformatics, Systems and Synthetic Biology Summer School, that will take place in Nottingham, UK from the 27th to the 31st of July, 2009. The summer school is funded by the European Science Foundation (ESF). Funding for tuition, accommodation, meals & travel is available for a limited number of students from ESF member countries. Attached is a flyer for this event. We would be very grateful if you can distribute this announcement to anybody that may have an interest in attending the summer school. Best regards, Dr. Jaume Bacardit (on behalf of the of the summer school co-chairs) Prof. Malcolm Bennett Dr. Natalio Krasnogor -- ------------------------------------------------------------------- Jaume Bacardit, PhD Lecturer in Bioinformatics University of Nottingham Automated Scheduling, Planning and Optimisation research group, School of Computer Science, Jubilee Campus, Nottingham, NG8 1BB, UK Multidisciplinary Centre for Integrative Biology, School of Biosciences, Sutton Bonington, LE12 5RD, UK Tel: +441159516276 Fax: +44 1159516292 Email: jaume _dot_ bacardit _at_ nottingham _dot_ ac _dot_ uk Web: http://www.cs.nott.ac.uk/~jqb -------------------------------------------------------------------- This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. From nir at rosettadesigngroup.com Sat Apr 25 11:13:37 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Sat, 25 Apr 2009 18:13:37 +0300 Subject: [BiO BB] What is your favorite journal in the field of computational structural biology? Message-ID: Following the success of our previous poll which was followed by an extensive and interesting discussion, and in order to better our Bi(Tri,..)-Weekly Digests, we present a new poll which will determine once and for all, which is the best journal for the computational structural biologist. http://rosettadesigngroup.com/blog/353/what-is-your-favorite-journal-in-the-field-of-computational-structural-biology/ Of course we?re all reading all of the journals mentioned, and certainly each has its own benefits. There is also the question if by ?favorite? journal we mean where would you like to publish, or which are you enjoying reading the most? We know the question isn?t adequately defined. Nontheless, we urge you to pick one and try to explain in the comments what makes that journal special. Nir London. http://rosettadesigngroup.com/blog/ From marchywka at hotmail.com Tue Apr 28 14:57:49 2009 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 28 Apr 2009 14:57:49 -0400 Subject: [BiO BB] Swine Flu genome link, if anyone is interested, In-Reply-To: References: Message-ID: I wasn't sure if anyone had any interest but the NCBI has some inluenza sequences all in one place, http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html Link courtesy of this free mail list, NFLUENZA A (H1N1) "SWINE FLU": WORLDWIDE (03) ********************************************** A ProMED-mail post http://www.promedmail.org ProMED-mail is a program of the International Society for Infectious Diseases http://www.isid.org In this update: [1] Some questions [2] New Zealand [3] Israel [4] Comment on seasonality [The genome sequences of several US isolates are now available at GenBank: see . - Mod.CP] ****** Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 415-264-8477 (w)<- use this 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: If I am asking for free stuff, I normally use for hobby/non-profit information but may use in investment forums, public and private. Please indicate any concerns if applicable. Note: hotmail is getting cumbersom, try also marchywka at yahoo.com _________________________________________________________________ Rediscover Hotmail?: Now available on your iPhone or BlackBerry http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Mobile2_042009 From dan.bolser at gmail.com Tue Apr 28 08:50:32 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 28 Apr 2009 13:50:32 +0100 Subject: [BiO BB] Comparing sequences from GenBank and RefSeq... In-Reply-To: References: <2c8757af0904230842h7e2e3a12gc5ecd1ae1294a492@mail.gmail.com> Message-ID: <2c8757af0904280550h697f0251t77ec53ed15b38796@mail.gmail.com> 2009/4/23 Ryan Raaum : > The refseq entry tells you which non-refseq entry/entries it was > derived from. In this case it says DQ386163, which suggests there are > at least 2 pototo chloroplast sequences available - one by an Italian > group and one by a Korean group. Right I see. Any way to judge the quality of the two? In the RefSeq record I read "PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review." - Anyway to kick them about that? i.e. Dear RefSeq, I have DQ231562 and DQ386163, should they be merged into NC_008096? Thanks for the info, Dan. > On Thu, Apr 23, 2009 at 11:42 AM, Dan Bolser wrote: >> Hi, >> >> I found that the potato chloroplast sequence from GenBank (DQ231562.1) >> has several differences (260 SNPs and 30 indels) relative to the same >> sequence in RefSeq (NC_008096.1). As far as I am aware this sequence >> has only been obtained once, why would the two differ? In general >> should I trust the refseq sequence? >> >> >> For your reference here is the output of dnadiff over the two files: >> >> Reference/DQ231562.fasta Query/NC_008096.fasta >> NUCMER >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? [REF] ? ? ? ? ? ? ? ?[QRY] >> [Sequences] >> TotalSeqs ? ? ? ? ? ? ? ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? ? ?1 >> AlignedSeqs ? ? ? ? ? ? ? 1(100.00%) ? ? ? ? ? 1(100.00%) >> UnalignedSeqs ? ? ? ? ? ? ? 0(0.00%) ? ? ? ? ? ? 0(0.00%) >> >> [Bases] >> TotalBases ? ? ? ? ? ? ? ? ? ?155312 ? ? ? ? ? ? ? 155298 >> AlignedBases ? ? ? ? 155312(100.00%) ? ? ?155298(100.00%) >> UnalignedBases ? ? ? ? ? ? ?0(0.00%) ? ? ? ? ? ? 0(0.00%) >> >> [Alignments] >> 1-to-1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?1 >> TotalLength ? ? ? ? ? ? ? ? ? 155312 ? ? ? ? ? ? ? 155298 >> AvgLength ? ? ? ? ? ? ? ? ?155312.00 ? ? ? ? ? ?155298.00 >> AvgIdentity ? ? ? ? ? ? ? ? ? ?99.81 ? ? ? ? ? ? ? ?99.81 >> >> M-to-M ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?1 >> TotalLength ? ? ? ? ? ? ? ? ? 155312 ? ? ? ? ? ? ? 155298 >> AvgLength ? ? ? ? ? ? ? ? ?155312.00 ? ? ? ? ? ?155298.00 >> AvgIdentity ? ? ? ? ? ? ? ? ? ?99.81 ? ? ? ? ? ? ? ?99.81 >> >> [Feature Estimates] >> Breakpoints ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 >> Relocations ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 >> Translocations ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 >> Inversions ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 >> >> Insertions ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 >> InsertionSum ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 >> InsertionAvg ? ? ? ? ? ? ? ? ? ?0.00 ? ? ? ? ? ? ? ? 0.00 >> >> TandemIns ? ? ? ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 >> TandemInsSum ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? ? ? ?0 >> TandemInsAvg ? ? ? ? ? ? ? ? ? ?0.00 ? ? ? ? ? ? ? ? 0.00 >> >> [SNPs] >> TotalSNPs ? ? ? ? ? ? ? ? ? ? ? ?260 ? ? ? ? ? ? ? ? ?260 >> AC ? ? ? ? ? ? ? ? ? ? ? ? 23(8.85%) ? ? ? ? ? ?14(5.38%) >> AG ? ? ? ? ? ? ? ? ? ? ? ? 24(9.23%) ? ? ? ? ? 30(11.54%) >> AT ? ? ? ? ? ? ? ? ? ? ? ? 15(5.77%) ? ? ? ? ? ?14(5.38%) >> CA ? ? ? ? ? ? ? ? ? ? ? ? 14(5.38%) ? ? ? ? ? ?23(8.85%) >> CG ? ? ? ? ? ? ? ? ? ? ? ? 24(9.23%) ? ? ? ? ? ?18(6.92%) >> CT ? ? ? ? ? ? ? ? ? ? ? ?32(12.31%) ? ? ? ? ? ?19(7.31%) >> GA ? ? ? ? ? ? ? ? ? ? ? ?30(11.54%) ? ? ? ? ? ?24(9.23%) >> GC ? ? ? ? ? ? ? ? ? ? ? ? 18(6.92%) ? ? ? ? ? ?24(9.23%) >> GT ? ? ? ? ? ? ? ? ? ? ? ? 13(5.00%) ? ? ? ? ? 34(13.08%) >> TA ? ? ? ? ? ? ? ? ? ? ? ? 14(5.38%) ? ? ? ? ? ?15(5.77%) >> TC ? ? ? ? ? ? ? ? ? ? ? ? 19(7.31%) ? ? ? ? ? 32(12.31%) >> TG ? ? ? ? ? ? ? ? ? ? ? ?34(13.08%) ? ? ? ? ? ?13(5.00%) >> >> TotalGSNPs ? ? ? ? ? ? ? ? ? ? ? 113 ? ? ? ? ? ? ? ? ?113 >> AC ? ? ? ? ? ? ? ? ? ? ? ? ?9(7.96%) ? ? ? ? ? ? 8(7.08%) >> AG ? ? ? ? ? ? ? ? ? ? ? ?17(15.04%) ? ? ? ? ? 17(15.04%) >> AT ? ? ? ? ? ? ? ? ? ? ? ? ?5(4.42%) ? ? ? ? ? ? 3(2.65%) >> CA ? ? ? ? ? ? ? ? ? ? ? ? ?8(7.08%) ? ? ? ? ? ? 9(7.96%) >> CG ? ? ? ? ? ? ? ? ? ? ? ? ?6(5.31%) ? ? ? ? ? ? 7(6.19%) >> CT ? ? ? ? ? ? ? ? ? ? ? ?15(13.27%) ? ? ? ? ? ? 8(7.08%) >> GA ? ? ? ? ? ? ? ? ? ? ? ?17(15.04%) ? ? ? ? ? 17(15.04%) >> GC ? ? ? ? ? ? ? ? ? ? ? ? ?7(6.19%) ? ? ? ? ? ? 6(5.31%) >> GT ? ? ? ? ? ? ? ? ? ? ? ? ?6(5.31%) ? ? ? ? ? 12(10.62%) >> TA ? ? ? ? ? ? ? ? ? ? ? ? ?3(2.65%) ? ? ? ? ? ? 5(4.42%) >> TC ? ? ? ? ? ? ? ? ? ? ? ? ?8(7.08%) ? ? ? ? ? 15(13.27%) >> TG ? ? ? ? ? ? ? ? ? ? ? ?12(10.62%) ? ? ? ? ? ? 6(5.31%) >> >> TotalIndels ? ? ? ? ? ? ? ? ? ? ? 30 ? ? ? ? ? ? ? ? ? 30 >> A. ? ? ? ? ? ? ? ? ? ? ? ?14(46.67%) ? ? ? ? ? ?4(13.33%) >> C. ? ? ? ? ? ? ? ? ? ? ? ? ?1(3.33%) ? ? ? ? ? ? 0(0.00%) >> G. ? ? ? ? ? ? ? ? ? ? ? ? ?0(0.00%) ? ? ? ? ? ? 0(0.00%) >> T. ? ? ? ? ? ? ? ? ? ? ? ? 7(23.33%) ? ? ? ? ? ?4(13.33%) >> >> TotalGIndels ? ? ? ? ? ? ? ? ? ? ?24 ? ? ? ? ? ? ? ? ? 24 >> A. ? ? ? ? ? ? ? ? ? ? ? ?10(41.67%) ? ? ? ? ? ?4(16.67%) >> C. ? ? ? ? ? ? ? ? ? ? ? ? ?1(4.17%) ? ? ? ? ? ? 0(0.00%) >> G. ? ? ? ? ? ? ? ? ? ? ? ? ?0(0.00%) ? ? ? ? ? ? 0(0.00%) >> T. ? ? ? ? ? ? ? ? ? ? ? ? 5(20.83%) ? ? ? ? ? ?4(16.67%) >> >> >> Thanks for any pointers, >> Dan. >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > > > > -- > Ryan Raaum > Assistant Professor > Department of Anthropology > Lehman College, The City University of New York > 250 Bedford Park Blvd. West > Bronx, NY 10468 > e: ryan.raaum at lehman.cuny.edu > w: http://www.raaum.org > o: (718) 960-8845 > f: (718) 960-8406 > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From paolo.romano at istge.it Tue Apr 28 09:04:24 2009 From: paolo.romano at istge.it (Paolo Romano) Date: Tue, 28 Apr 2009 15:04:24 +0200 Subject: [BiO BB] NETTAB 2009: Deadline postponed to May 4, 2009, for Oral communications Message-ID: <200904281318.n3SDIYxK036366@ibm43p.biotech.ist.unige.it> Due to many requests for a new deadline for submission of contributions for oral communications, the related deadline has been postponed to: Monday May 4, 2009, at 12.00 (noon), EST (GMT+1). ===== Last Call for Oral communications NETTAB 2009 Workshop on "Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development" with a Special Session on: "Methods and Tools for RNA Structure and Functional Analysis" June 10-13, 2009 Department of Computer Science, University of Catania, Italy http://www.nettab.org/2009/ Deadline approaching: May 4, 2009: Oral communication submission Contributions must be short papers of around THREE A4 pages or 12.000 characters long. Submit through the EasyChair system at: http://www.easychair.org/conferences/?conf=nettab2009 . See web site for details. Motivation The most recent developments of collaborative development tools are impressive. Researchers can now collaboratively develop software (open source systems), discuss and compare development strategies (social networks), write documents (google docs, wiki systems), build knowledge bases. So, it may now be the time for presenting current technologies, tools and applications for collaborative work and for discussing perspectives of their utilization in support of Bioinformatics. For these reasons, NETTAB 2009 will be devoted to "Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development". The RNA community is also taking advantage of collaborative research tools such as Wikis and other virtual environments. The RNA WikiProject contains now over 600 articles describing families of noncoding RNAs based on the Rfam database, and invite the community to update, edit, and correct those articles. Therefore, the NETTAB 2009 special session will focus on collaborative research project, computational methods and tools for the analysis of RNA structures and functions, with a special emphasis on ncRNAs. Invited Speakers (more to be announced) # Alex Bateman Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK # Tim Clark Director of Informatics, MassGeneral Institute for Neurodegenerative Disease Neurology Research Department, Massachusetts General Hospital, Boston, USA # Duncan Hull School of Chemistry, University of Manchester, Manchester, UK # Gabriel Valiente Technical University of Catalonia, Department of Software, Barcelona, Spain # Debora Marks Systems Biology Department, Harvard Medical School, Boston, USA Topics - Collaborative Web sites (bioinformatics.org, biojava, bioperl, ) - Communities of Practices (CoPs) Scientific practices in scientific communities Automatic detection / gathering / modelling of scientific practices Implementations of CoPs - Social networking (myExperiment, Annotea, myScience) Social Bookmarking Semantic Document Markup Relationships mining from literature - Open Source development Sharing of data models, libraries, interfaces - Social software for collaborative documentation development Wikis, blogs, google docs Knowledge Wikis Social-software-mediated collaborative scientific research Social-software-mediated collaborative tools' development Knowledge base collaborative development Ontologies collaborative development - Education and training tools E-learning Virtual environments Methods and Tools for RNA Structure and Functional Analysis - RNA structure prediction - Collaborative studies of RNAs - ncRNAs functional analysis and classification - miRNAs and networks - Genome-wide functional studies - Identification of ncRNAs - Databases of ncRNAs and miRNA targets - miRNA targets prediction - Synthetic miRNA and siRNA design - Gene expression analysis - Analysis of viral RNAs - RNAi therapeutics - Identification of ncRNAs biomarkers - RNA-protein interaction prediction Deadlines Contributions for both oral communications and posters must be short papers of around THREE A4 pages or 12.000 characters long. They must be submitted through the EasyChair system at: http://www.easychair.org/conferences/?conf=nettab2009 . - May 4, 2009: Oral communication submission - May 15, 2009: Posters submission - May 17, 2009: Early registration - June 10-13, 2009: Tutorials and Workshop Calls for SPECIAL ISSUES We plan to launch Calls for Special Issues on the themes of the workshop in peer-review journals with associated Impact factor around July for submission in September 2009. Best regards. Paolo Romano on behalf of NETTAB 2009 Chairs NETTAB '09 - Ninth International Workshop on Network Tools and Applications in Biology 10-13 June 2009, Catania, Italy http://www.nettab.org/2009/ Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) From Sterten at aol.com Tue Apr 28 15:18:56 2009 From: Sterten at aol.com (Sterten at aol.com) Date: Tue, 28 Apr 2009 15:18:56 EDT Subject: [BiO BB] Swine Flu genome link, if anyone is interested, Message-ID: all the sequences in one file in my format are here: _http://magictour.free.fr/panflu/swflb_ (http://magictour.free.fr/panflu/swflb) analysis and discussion here: _http://www.flutrackers.com/forum/showthread.php?t=101112_ (http://www.flutrackers.com/forum/showthread.php?t=101112) From richard.squires at utsouthwestern.edu Tue Apr 28 15:23:47 2009 From: richard.squires at utsouthwestern.edu (Burke Squires) Date: Tue, 28 Apr 2009 14:23:47 -0500 Subject: [BiO BB] Swine Flu genome link, if anyone is interested, In-Reply-To: References: Message-ID: <2F3738BF-C55D-4607-9516-1EB2CEE4FF9B@utsouthwestern.edu> Hi Mike, Thanks! I would like to let everyone know as well that we have the sequence in the BioHealthBase BRC at http://www.biohealthbase.org/GSearch/home.do?decorator=influenza Burke Squires On Apr 28, 2009, at 1:57 PM, Mike Marchywka wrote: > > I wasn't sure if anyone had any interest but > the NCBI has some inluenza sequences all in one > place, > > > http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html > > Link courtesy of this free mail list, > NFLUENZA A (H1N1) "SWINE FLU": WORLDWIDE (03) > ********************************************** > A ProMED-mail post > http://www.promedmail.org > ProMED-mail is a program of the > International Society for Infectious Diseases > http://www.isid.org > > In this update: > [1] Some questions > [2] New Zealand > [3] Israel > [4] Comment on seasonality > [The genome sequences of several US isolates are now available at > GenBank: see > . - Mod.CP] > > ****** > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 415-264-8477 (w)<- use this > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > Note: If I am asking for free stuff, I normally use for hobby/non- > profit > information but may use in investment forums, public and private. > Please indicate any concerns if applicable. > Note: hotmail is getting cumbersom, try also marchywka at yahoo.com > > > > > _________________________________________________________________ > Rediscover Hotmail?: Now available on your iPhone or BlackBerry > http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Mobile2_042009 > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb From kiekyon.huang at gmail.com Wed Apr 29 12:16:41 2009 From: kiekyon.huang at gmail.com (Kie Kyon Huang) Date: Thu, 30 Apr 2009 00:16:41 +0800 Subject: [BiO BB] protein-protein interaction prediction Message-ID: Hi, Is there any good software to predict protein-protein interaction at genome level? I prefer methods that do not need to incorporate orthology so that the network could be expanded to more gene. Thanks a lot Kie Kyon From EliDraizen at drewschool.org Wed Apr 29 15:42:22 2009 From: EliDraizen at drewschool.org (Eli Draizen) Date: Wed, 29 Apr 2009 12:42:22 -0700 Subject: [BiO BB] college Message-ID: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> Hello- I am not sure if this is the right place to post this but I do not know who else to ask. I am currently applying to college and want to study bioinformatics. I have a few questions which my college advisors cannot answer: 1) Which schools have the best programs and do not care about SAT's? 2) Would it be better to double major in biology and computer science and then be more focused in grad school? 3) What is the difference between the major's bioinformatics and computational biology? Thanks for you time, Eli Draizen From phoebe.chen at deakin.edu.au Thu Apr 30 00:13:16 2009 From: phoebe.chen at deakin.edu.au (Phoebe Chen) Date: Thu, 30 Apr 2009 14:13:16 +1000 Subject: [BiO BB] APBC2010 First CFP - Bangalore, India, 18-21 January 2010 Message-ID: <096466DE3CBF11439436E7B189DC440908A28CA0@mirzam-1.du.deakin.edu.au> =============================================================== First Call for Papers - APBC2010 The Eighth Asia-Pacific Bioinformatics Conference (APBC2010) Bangalore, India, 18-21 January 2010 http://cs.nyu.edu/parida/APBC2010/index.html =============================================================== The Asia Pacific Bioinformatics Conference (APBC) series, founded in 2003, (Previous APBC meetings APBC2009 (13-16 January 2009, Beijing, China) APBC2008 (14-17 January 2008, Kyoto, Japan) APBC2007 (14-17 January 2007, Hong Kong) APBC2006 (13-16 February 2006, Taipei, Taiwan) APBC2005 (17-21 January 2005, Singapore) APBC2004 (18-22 January 2004, Dunedin, New Zealand) APBC2003 (4-7 February 2003, Adelaide, Australia)) is an annual international forum for exploring research, development and applications of Bioinformatics and Computational Biology. The Eighth Asia-Pacific Bioinformatics Conference, APBC2010 will be held in Bangalore, India. The aim of the conference is to bring together researchers, professionals, and industrial practitioners from all over the world for interaction and exchange of knowledge and ideas in all areas of bioinformatics and computational biology. ---------------------------------------------------------------- Important Dates Paper submission deadline July 20, 2009 Paper acceptance decision Sep 14, 2009 Camera-ready copy of papers and Author registration Oct 9, 2009 Poster submission open July 21, 2009 Poster submission deadline Sep 25, 2009 Poster acceptance decision Oct 9, 2009 Registration open Sept 20, 2009 Early-bird registration Nov 20, 2009 Conference Jan 18-21, 2010 From dan.bolser at gmail.com Thu Apr 30 10:26:33 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 30 Apr 2009 15:26:33 +0100 Subject: [BiO BB] protein-protein interaction prediction In-Reply-To: References: Message-ID: <2c8757af0904300726p7a2541cx882d7daab0386530@mail.gmail.com> You could try the PIP's server http://www.compbio.dundee.ac.uk/www-pips/ (made by some guys in my lab). Or you can find a list here: http://biodatabase.org/index.php/Category:Protein-protein_interactions Dan. 2009/4/29 Kie Kyon Huang : > Hi, > > Is there any good software to predict protein-protein interaction at genome > level? > > I prefer methods that do not need to incorporate orthology so that the > network could be expanded to more gene. > > Thanks a lot > > Kie Kyon > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From dan.bolser at gmail.com Thu Apr 30 10:47:19 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 30 Apr 2009 15:47:19 +0100 Subject: [BiO BB] college In-Reply-To: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> References: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> Message-ID: <2c8757af0904300747i5396950cr2db51b07871ea740@mail.gmail.com> Hi Eli, I asked your question on the bioinformatics IRC channel: irc://irc.freenode.net/#bioinformatics. Its generated a bit of conversation, so you may like to go there and join in. If you don't have an IRC client installed you can get to that channel through Mibbit http://www.mibbit.com/chat/ See also: http://irchelp.org/irchelp/irctutorial.html Cheers, Dan. 2009/4/29 Eli Draizen : > Hello- > > I am not sure if this is the right place to post this but I do not know > who else to ask. I am currently applying to college and want to study > bioinformatics. I have a few questions which my college advisors cannot > answer: > > > > 1) ? ? ? Which schools have the best programs and do not care about > SAT's? > > 2) ? ? ? Would it be better to double major in biology and computer > science and then be more focused in grad school? > > 3) ? ? ? What is the difference between the major's bioinformatics and > computational biology? > > > > Thanks for you time, > > Eli Draizen > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From keithcallenberg at gmail.com Thu Apr 30 05:07:37 2009 From: keithcallenberg at gmail.com (Keith Callenberg) Date: Thu, 30 Apr 2009 05:07:37 -0400 Subject: [BiO BB] college In-Reply-To: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> References: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> Message-ID: Hey Eli, I'm in a computational biology PhD program at CMU-Pitt and I was asking your exact questions about 6 years ago. I ended up doing a computer science undergrad, but I'm not sure that I'd recommend that. I think it would be best to start with your 2nd and 3rd questions, because I think the most important thing is what you picture yourself doing. You've probably seen the Bioinformatics FAQ [1]. It has some explanation of terms like computational biology, but you'll notice that people disagree over these definitions. At least here in Pittsburgh, "bioinformatics" signifies a focus on the biological data and how to manage and extract information from it, whereas "computational biology" is a broader term that signifies a focus on the biological process that is creating the data [2]. I have heard people consider the two fields the complete opposite of those descriptions as well, but whatever you want to call them, the question remains whether you'd like to target biological processes or the practical problem of managing the information that comes from biological processes. For the latter, my opinion is that a computer science degree with some good statistics and biology courses mixed in wouldn't be a bad setup. For the former, I think it is a bit trickier and more based on the type of biology you're interested in. For instance if you are interested in structural biology, a solid background in physics and biochemisry is very helpful. Whereas if you're more interested in genomics, machine learning and statistics are essential. My personal opinion is that unless you are an extraordinarily active and energy-filled person (and I do know one such person who was able to do this), it is difficult to be able to get a strong background in all of the fields that a bioinformatics undergrad degree dips into. You just become spread too thin. A common perception that I agree with is that it is best to get a fundamental degree like biology, math, computer science, statistics, physics, biochem (if you can find one that has a perspective you find natural) and then augment it with a few courses in the stuff you're missing... at least if you're aiming for grad school. For your first question, they will all care at least a bit about your SATs. You might be able to find some indicators at www.review.com or elsewhere about schools that tend to use SATs less, but I'm not sure how valid those are. If you can afford it, I would just write some good essays and apply to as many schools as you can to increase your chances. good luck! Keith refs: [1] http://wiki.bioinformatics.org/Bioinformatics_FAQ [2] http://www.compbio.cmu.edu/background.html On Wed, Apr 29, 2009 at 3:42 PM, Eli Draizen wrote: > Hello- > > I am not sure if this is the right place to post this but I do not know > who else to ask. I am currently applying to college and want to study > bioinformatics. I have a few questions which my college advisors cannot > answer: > > > > 1) ? ? ? Which schools have the best programs and do not care about > SAT's? > > 2) ? ? ? Would it be better to double major in biology and computer > science and then be more focused in grad school? > > 3) ? ? ? What is the difference between the major's bioinformatics and > computational biology? > > > > Thanks for you time, > > Eli Draizen > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From jeff at bioinformatics.org Thu Apr 30 13:46:26 2009 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Thu, 30 Apr 2009 13:46:26 -0400 Subject: [BiO BB] college In-Reply-To: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> References: <9BE4A105CF42C64A844E8077087BC21901E88141@mercury.drewschool.org> Message-ID: <49F9E3F2.2040506@bioinformatics.org> Hi Eli, I agree with everything that Keith wrote. And please see our FAQ :-) As for the SATs, some US colleges/universities don't require them for transfer students. So, you could consider taking night classes in the "continuing education" division of a university and then transferring into a degree program there or elsewhere. It might also be possible to avoid the GREs using the same strategy. I would also emphasize that your undergraduate degree might only serve to prepare you for your graduate studies and not for employment, unless you're looking for a low-level technical position. Since bioinformatics is a science, at least in part, most positions will require a graduate degree. This could be a simple rule (and it's probably controversial): If you'd like to stop at a master's degree, choose CS as an undergraduate major. But, if you'd like to get a doctorate, choose biology as a major. As Keith indicated, the former path is usually considered a bioinformatics education, while the latter path is usually considered a computational biology education. That's not to say there are no doctorate programs in bioinformatics. It's just that the trend I've seen is that bioinformatics is being increasingly associated with CS and not biology (look at the curricula for the graduate programs out there), and people often advise against "going the doctorate route" in CS (where you could find yourself overqualified for most positions). For bioinformatics, that would mean a master's degree should be the goal, should you be looking for the largest number of positions available. That's my 2 cents anyway. I'm others will disagree. Cheers, Jeff Eli Draizen wrote: > Hello- > > I am not sure if this is the right place to post this but I do not know > who else to ask. I am currently applying to college and want to study > bioinformatics. I have a few questions which my college advisors cannot > answer: > > 1) Which schools have the best programs and do not care about > SAT's? > > 2) Would it be better to double major in biology and computer > science and then be more focused in grad school? > > 3) What is the difference between the major's bioinformatics and > computational biology? > -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 978 562 4800 --