From p.molenaar at amc.uva.nl Fri Oct 5 05:37:54 2007 From: p.molenaar at amc.uva.nl (piet molenaar) Date: Fri, 5 Oct 2007 11:37:54 +0200 Subject: [BiO BB] Symposium: Integrative Bioinformatics; schedule and titles of talks definitive Message-ID: <554293e20710050237n316bc1aey65421de2c742834b@mail.gmail.com> One day symposium 8 November 2007 - Academic Medical Center - Amsterdam- Netherlands Integrative Bioinformatics: At the cutting edge of network analysis and biological data integration Each molecular biologist working with high throughput data is confronted with the need to reconstruct and validate the underlying complex regulatory networks. At this one-day symposium, some of the most prominent leaders in the field will present their approaches and views. This symposium offers a platform for a discussion of state of the art biological network analyses. Confirmed speakers and titles: - Leroy Hood: Biological networks and disease - Ruedi Aebersold: Protein-centered networks in systems biology: Analysis and visualization - Trey Ideker: Gaining power in gene association studies with Cytoscape - Andrew Hopkins: Network Pharmacology: chemical opportunities for systems biology - Peter Sorger: Modeling Mammalian Death and Survival Pathways - Ewan Birney: Reactome, Networks and Genomes - Rogier Versteeg: Oncogenic networks of cancer pathways in childhood cancer - Benno Schwikowski: Computational tools increasing sensitivity and reliability of mass spec-based proteomics - Chris Sander: Systems Biology of Cancer Pathways: from Molecular Perturbations to Cellular Phenotypes - Jean-Daniel Fekete: Visualizing Dense Networks with Enhanced and Hybrid Matrices Please distribute the attached poster (pdf) in your institute. The symposium is part of the 5th annual Cytoscape Public Symposium and Developers Retreat. For registration, Symposium programme and directions go to www.cytoscape.org/retreat2007 We look forward to welcoming you to Amsterdam! The Organizing Committee, 5th Cytoscape Retreat 2007 w: www.cytoscape.org/retreat2007 e: cytoretreat at cytoscape.org Department of Human Genetics - M1-131 Academic Medical Center - University of Amsterdam Meibergdreef 9 - 1100 DD - Amsterdam - the Netherlands From jeedward at gmail.com Fri Oct 5 20:24:33 2007 From: jeedward at gmail.com (John Edward) Date: Fri, 5 Oct 2007 20:24:33 -0400 Subject: [BiO BB] BCBGC-2008 Call for papers Message-ID: <000301c807af$475d0990$6401a8c0@cisnotebookbp> Call for papers The 2008 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-08) (website: www.PromoteResearch.org ) will be held during July 7-10 2008 in Orlando, FL, USA. We invite draft paper submissions and session proposals. The conference will be held at the same time and place where several other major events are taking place. The website contains more details. Sincerely John Edward From christoph.gille at charite.de Tue Oct 9 12:09:19 2007 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 9 Oct 2007 18:09:19 +0200 (CEST) Subject: [BiO BB] genbank2swissprot ? Message-ID: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de> Is there a mapping of identifiers from genbank nt sequences to identifiers of swissprot (protein) ? Using some tables in ftp://ftp.ncbi.nih.gov/refseq/ ftp://ftp.ncbi.nih.gov/gene/DATA there seems to be a way indirectly over the proteinkb. But perhaps there is a more direct way? Many thanks Christoph From boris.steipe at utoronto.ca Tue Oct 9 14:17:49 2007 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Tue, 9 Oct 2007 14:17:49 -0400 Subject: [BiO BB] genbank2swissprot ? In-Reply-To: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de> References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de> Message-ID: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca> Does the UniProt ID mapping service fit your requirements? http://www.pir.uniprot.org/search/idmapping.shtml Boris On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote: > Is there a mapping of identifiers from genbank nt sequences > to identifiers of swissprot (protein) ? > Using some tables in > ftp://ftp.ncbi.nih.gov/refseq/ > ftp://ftp.ncbi.nih.gov/gene/DATA > there seems to be a way indirectly over the proteinkb. > But perhaps there is a more direct way? > Many thanks > > Christoph > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From christoph.gille at charite.de Tue Oct 9 16:56:53 2007 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 9 Oct 2007 22:56:53 +0200 (CEST) Subject: [BiO BB] genbank2swissprot ? In-Reply-To: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca> References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de> <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca> Message-ID: <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de> Many thanks Boris, this is quite good, though the table is incomplete. I think I somewhere saw an assignment of mRNA to each swissprot entry. Unfortunately I do not find it any more. Christoph From Stan.Gaj at BIGCAT.unimaas.nl Wed Oct 10 07:58:55 2007 From: Stan.Gaj at BIGCAT.unimaas.nl (Gaj Stan (BIGCAT)) Date: Wed, 10 Oct 2007 13:58:55 +0200 Subject: [BiO BB] genbank2swissprot ? In-Reply-To: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca> References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de> <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca> Message-ID: <1C6586068DF1054DA3899159C38B32C90B04ABD1@um-mail0136.unimaas.nl> Hi Christoph, There are two other possibilities: a) Use BioMART at www.ensembl.org to retrieve EnsEMBL gene IDs using your list of RefSeq ID (You did mention you used NM_-ID's, so I assume you mean RefSeq IDs) and export this list with their UniProt crosslinking as well. A problem you'll surely encounter using this approach is that there are situations where more than one UniProt ID has been associated with an EnsEMBL gene. The generated list contains this information, but on seperate lines. You'll need to filter the list for this. b) The RefSeq group has recently announced that they updated their databases with information towards UniProt (since they collaborated closely on this one). I can't find the archive of their Gene-Announce-list, but here is the announcement: ======= Announcing the availability of RefSeq-UniProtKB cross-link data In collaboration with UniProtKB (http://www.pir.uniprot.org/) , the RefSeq group is now reporting explicit cross-references to Swiss-Prot and TrEMBL proteins that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt's release cycle. The data are being made available from several sites within NCBI: 1. The full report from Entrez Gene, in the Reference Sequences section. For an example, go to the Full Report page for the sevenless gene of Drosophila melanogaster (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=DetailsSearch&Term=32039%5Buid%5D) and click on the Reference Sequences section in the table of contents on the right. You will see mRNA and Protein(s) NM_078559.2?NP_511114.2 sevenless CG18085-PA [Drosophila melanogaster] UniProtKB/Swiss-Prot P13368 <--- new data 2. Links in NCBI's Protein database Explicit links between corresponding RefSeq and Swiss-Prot proteins are now provided within the NCBI Protein database. These links are available in the ?Links? menu located at the upper right of the protein display page. The link names are: Protein (RefSeq): provides a link from a Swiss-Prot record the corresponding RefSeq record Protein (UniProtKB): provides a link to the equivalent Swiss-Prot record 3. Filter choices in NCBI's Protein database protein protein refseq2uniprot find RefSeq protein records with a link to a UniProtKB protein in NCBI's protein database protein protein uniprot2refseq find UniProtKB protein records with a link to a RefSeq protein in NCBI's protein database 4. ftp sites A new file was added to the gene and refseq ftp sites to report the relationship between NCBI Reference Sequence protein accessions and UniProtKB protein accessions. The new gene_refseq_uniprotkb_collab.gz file specifies the corresponding pairs of NCBI and UniProtKB protein accessions. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz or ftp://ftp.ncbi.nlm.nih.gov/refseq/uniprotkb/gene_refseq_uniprotkb_collab.gz The README file on the gene and refseq ftp sites has been updated to document this addition. See: ftp://ftp.ncbi.nlm.nih.gov/gene/README ftp://ftp.ncbi.nlm.nih.gov/refseq/README 5. the ASN.1 in Entrez Gene New implementation of a gene-commentary: Each cross-reference will be reported in a gene-commentary of type other. Note: more than one cross-reference per RefSeq protein record is possible. type other, source { { src { db "UniProtKB/Swiss-Prot", tag str "P23760" }, anchor "P23760" } { src { db "UniProtKB/TrEMBL", tag str "O23760" }, anchor "O23760" } ==== I haven't tested this one out myself, but I think it might do the trick for you (: Best wishes, -- Stan -----Original Message----- From: bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org [mailto:bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org] On Behalf Of Boris Steipe Sent: 09 October 2007 20:18 To: General Forum at Bioinformatics.Org Subject: Re: [BiO BB] genbank2swissprot ? Does the UniProt ID mapping service fit your requirements? http://www.pir.uniprot.org/search/idmapping.shtml Boris On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote: > Is there a mapping of identifiers from genbank nt sequences > to identifiers of swissprot (protein) ? > Using some tables in > ftp://ftp.ncbi.nih.gov/refseq/ > ftp://ftp.ncbi.nih.gov/gene/DATA > there seems to be a way indirectly over the proteinkb. > But perhaps there is a more direct way? > Many thanks > > Christoph > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From dan.bolser at gmail.com Wed Oct 10 05:26:07 2007 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 10 Oct 2007 11:26:07 +0200 Subject: [BiO BB] genbank2swissprot ? In-Reply-To: <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de> References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de> <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca> <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de> Message-ID: <2c8757af0710100226r68e05c9elc1f03f636415036a@mail.gmail.com> >From a recent 'refseq-announce' email; *Announcing the availability of RefSeq-UniProtKB cross-link data* In collaboration with UniProtKB (http://www.pir.uniprot.org/) , the RefSeqgroup is now reporting explicit cross-references to Swiss-Prot and TrEMBL proteins that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt's release cycle. The data are being made available from several sites within NCBI: 2.* **Links in NCBI's Protein database* Explicit links between corresponding RefSeq and Swiss-Prot proteins are now provided within the NCBI Protein database. These links are available in the 'Links' menu located at the upper right of the protein display page. The link names are: * **Protein (RefSeq)*: provides a link from a Swiss-Prot record the corresponding RefSeq record * **Protein (UniProtKB*): provides a link to the equivalent Swiss-Prot record 4*. ftp sites* A new file was added to the gene and refseq ftp sites to report the relationship between NCBI Reference Sequence protein accessions and UniProtKB protein accessions. The new gene_refseq_uniprotkb_collab.gz file specifies the corresponding pairs of NCBI and UniProtKB protein accessions. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz or ftp://ftp.ncbi.nlm.nih.gov/refseq /uniprotkb/gene_refseq_uniprotkb_collab.gz The README file on the gene and refseq ftp sites has been updated to document this addition. See: ftp://ftp.ncbi.nlm.nih.gov/gene/README ftp://ftp.ncbi.nlm.nih.gov/refseq/README HTH, Dan. On 09/10/2007, Dr. Christoph Gille wrote: > > Many thanks Boris, > > this is quite good, though the table is incomplete. > I think I somewhere saw > an assignment of mRNA to each swissprot entry. > Unfortunately I do not find it any more. > > Christoph > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- hello From yanz at email.unc.edu Mon Oct 22 18:38:49 2007 From: yanz at email.unc.edu (Yan Zhang) Date: Mon, 22 Oct 2007 18:38:49 -0400 Subject: [BiO BB] Call for Participation: NSF Biomedical Informatics Workshop, Dec. Message-ID: <20071022183849.mxbzw6t7kk4ksow8@webmail3.isis.unc.edu> Call for Participation: NSF Biomedical Informatics Workshop, Dec. 4-5, 2007, Portland, OR We are seeking US-based participants for the upcoming NSF sponsored workshop on Biomedical Informatics. The core program of the workshop will feature speakers and panelists who have been invited to join the workshop. However, we reserved a few speaker and panel slots for accommodating participants who have not been invited but may qualify based on current research and interests. All costs associated with the workshop event will be covered for most participants selected based on this CFP. If you currently have funding from a US Government Agency to conduct research on an area related to the main themes of the workshop, it is highly likely you will be selected to participate and you will be reimbursed for your cost from _our_ NSF workshop grant. You will be expected to conduct a panel presentation or a poster presentation at the workshop. Please contact Javed Mostafa (jm at unc.edu) if you are interested in participating. In your email to Javed, do include a paragraph describing your current research and / or point to current research materials, and describe any funded projects you are currently engaged in or recently completed. We will be able to respond to you with an answer fairly soon after we receive your interest statement. Below we are providing a brief overview of the workshop. More information on the workshop can be found at the workshop website: http://biomedweb.info. We are planning to explore research challenges and emerging solutions for handling health data from the point of origin (i.e., data collection) to the presentation and manipulation stages. We are also interested in exploring a wide spectrum of health data ? starting at the genomics and cellular level, to the patients? level, and ultimately to the population level. Some specific topics that we will cover include: * Data Acquisition: Capturing data, conversion to appropriate structures/formats, and natural language processing * Data Standards: Limitations of current standards such as the HL7V3 standard but also potential utility of emerging standards * Semantic Interoperability: Vocabularies, ontologies, and techniques for semantic level sharing of data * Data Management: Challenges associated with scale, heterogeneity, distributed, and fragmentary nature of data * Data Presentation: Visual, adaptive, and optimal presentation of data for enhancing use and understanding * Data Services: Emerging applications for supporting research, quality and safety management, public health studies, etc. Beyond the areas above, we are also interested in the the following topics: Cyberinfrastructures for health information delivery, social/economical/political barriers in expanding access to health information, and privacy/security issues in health information delivery. We are planning to have panels focusing on these topics at the workshop. To encourage further research on improving health care information systems, NSF has provided funds to Indiana University and University of North Carolina at Chapel Hill to conduct this workshop. Our aim is to bring together experts from research, professional, and government sectors. The workshop will dedicate two days to individual and panel presentations and birds-of- feather events to identify and prioritize key challenges. Current NSF and NIH grantees, academics with substantial track records in the area, industry experts, and program officers in key US agencies have been invited to participate. The workshop will be held on December 4th and 5th, 2007, in Portland, OR. Yan Zhang [Submitted on behalf of Dr. Javed Mostafa: jm at unc.edu] From ngadewal at yahoo.com Tue Oct 23 06:24:49 2007 From: ngadewal at yahoo.com (nikhil gadewal) Date: Tue, 23 Oct 2007 03:24:49 -0700 (PDT) Subject: [BiO BB] sequence analysis Message-ID: <838860.9612.qm@web51504.mail.re2.yahoo.com> Hello all I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human. Is there any specific tool available to do so. Can BLAST or FASTA handle it by changing the parameters. Thankyou in advance. Nikhil NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From marty.gollery at gmail.com Tue Oct 23 19:27:26 2007 From: marty.gollery at gmail.com (Martin Gollery) Date: Tue, 23 Oct 2007 16:27:26 -0700 Subject: [BiO BB] sequence analysis In-Reply-To: <838860.9612.qm@web51504.mail.re2.yahoo.com> References: <838860.9612.qm@web51504.mail.re2.yahoo.com> Message-ID: For an exact match you can simply use grep, or open the target in an editor such as textpad and use the search function. Marty On 10/23/07, nikhil gadewal wrote: > Hello all > > I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human. > Is there any specific tool available to do so. > Can BLAST or FASTA handle it by changing the parameters. > > Thankyou in advance. > > Nikhil > > > NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people. > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Senior Bioinformatics Scientist TimeLogic- a Division of Active Motif 775-833-9113 880 Northwood Blvd. Suite 7 Incline Village, NV 89451 From marty.gollery at gmail.com Tue Oct 23 19:54:02 2007 From: marty.gollery at gmail.com (Martin Gollery) Date: Tue, 23 Oct 2007 16:54:02 -0700 Subject: [BiO BB] sequence analysis In-Reply-To: <200710231843.55373.kanzure@gmail.com> References: <838860.9612.qm@web51504.mail.re2.yahoo.com> <200710231843.55373.kanzure@gmail.com> Message-ID: No, the proteins are not too big, only about 20 MB. Marty On 10/23/07, Bryan Bishop wrote: > > "Of the protein sequences from human." Martin, it sounds like what > Nikhil wants to do would require much more memory than textpad would be > allowed to allocate. Right? > > - Bryan > > On Tuesday 23 October 2007 18:27, Martin Gollery wrote: > > For an exact match you can simply use grep, or open the target in an > > editor such as textpad and use the search function. > > > > Marty > > > > On 10/23/07, nikhil gadewal wrote: > > > Hello all > > > > > > I am interested to find tetramer peptide matching 100% to the > > > carboxyl terminal of the protein sequences from human. Is there any > > > specific tool available to do so. > > > Can BLAST or FASTA handle it by changing the parameters. > > > > > > Thankyou in advance. > > > > > > Nikhil > -- -- Martin Gollery Senior Bioinformatics Scientist TimeLogic- a Division of Active Motif 775-833-9113 880 Northwood Blvd. Suite 7 Incline Village, NV 89451 From marchywka at hotmail.com Tue Oct 23 21:13:28 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 23 Oct 2007 21:13:28 -0400 Subject: [BiO BB] sequence analysis In-Reply-To: Message-ID: > > > > I am interested to find tetramer peptide matching 100% to the > > > > carboxyl terminal of the protein sequences from human. Is there any > > > > specific tool available to do so. > > > > Can BLAST or FASTA handle it by changing the parameters. I took a quick look at BLAST documentation and didn't find anything. I also tried standard PERL regex anchors $ and ^ but didn't seem to work. What is your objective? I've had a heck of a time with downloaded prosite rules that have anchors as I have to turn them off for translation products from non-edited transcripts. The reason I mention this is that you may in fact not want to limit yourself until you see what the other hits look like. With only a few acids you may have to sort through a lot of hits but also see if the conserved domain tools offer any help as apparently some of these can be position sensitive. I've got my own code that does the opposite of what you want ( :) ). That is, given a few hundred sequences and some patterns in the form of PERL regex, I can find each one in each sequence and use this information for alignment and clustering ( at least that is the hope, and initial results don't look foolish). If there is a real need , I guess I could modify this to do the opposite. That is, if you get a few 1000 proteins that contain your query anywhere, and you can't separate with existing tools but you can phrase your query as a PERL regex, I may have something that helps. But, sure you can just download all the blast hits that have your peptide anywhere and grep for all the occurences ( this can be a hassle as some straddle line breaks etc) anchored to the end. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. _________________________________________________________________ Make every IM count. Download Messenger and join the i?m Initiative now. It?s free. http://im.live.com/messenger/im/home/?source=TAGHM From christoph.gille at charite.de Wed Oct 24 03:17:08 2007 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Wed, 24 Oct 2007 09:17:08 +0200 (CEST) Subject: [BiO BB] sequence analysis In-Reply-To: <838860.9612.qm@web51504.mail.re2.yahoo.com> References: <838860.9612.qm@web51504.mail.re2.yahoo.com> Message-ID: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de> You could use fgrep. Fgrep is faster than grep. Before, you should transform the database file such that each sequence takes one line without blank and without line breaks (using tr and sed) Database files are optimized for hole cards for historical reasons. Lines are wrapped after at least after 72 characters, preventing the use of fgrep. From Sterten at aol.com Wed Oct 24 03:26:09 2007 From: Sterten at aol.com (Sterten at aol.com) Date: Wed, 24 Oct 2007 03:26:09 EDT Subject: [BiO BB] genbank orthography Message-ID: names are not spelled uniformly, e.g. Viet Nam and Vietnam, also many typos, this makes it very difficult to sort and analyse the entries by computer. I'm looking for a complete list of different spellings (thousands of entries...) and the suggested standard so we can correct/uniformify them automatically. From dan.bolser at gmail.com Wed Oct 24 04:41:05 2007 From: dan.bolser at gmail.com (Dan Bolser) Date: Wed, 24 Oct 2007 10:41:05 +0200 Subject: [BiO BB] genbank orthography In-Reply-To: References: Message-ID: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com> On 24/10/2007, Sterten at aol.com wrote: > > names are not spelled uniformly, e.g. Viet Nam and Vietnam, > also many typos, this makes it very difficult to sort and analyse the entries > by computer. > I'm looking for a complete list of different spellings > (thousands of entries...) and the suggested standard so we can > correct/uniformify them automatically. Great idea. The PDB needs something similar also! > > > > > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- hello From marchywka at hotmail.com Wed Oct 24 08:27:36 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 24 Oct 2007 08:27:36 -0400 Subject: [BiO BB] sequence analysis In-Reply-To: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de> Message-ID: >Before, you should transform the database file such that I've taken my local blast databases and used their fasta form for "grepping" ( using my own code that calls either greta or boos regex libraries) against various genome sequences. It turns out to be too slow for repetitive usage but I would comment as follow. The patterns of biological interest tend to be subsets of regex so you can implement special code that is a lot faster when your query isn't blast-friendly. For example, a "conserved" domain may look like "neutral"-many irrlelvant- cysteine-X-cysteine-many irrelevant-H- etc (I just made this up but it is based on many thing I've seen in the literature). You may have a hard time blasting for this but you can grep for it with something like [ANCQGILMFPSTWYV].{50,60}C.C.{10,100}H If you want a real-life example, here are some from prosite using my prosite to PERL translation scheme ( I hate illustrating with real things that may not be right): [LIVM][VIC].[^H]G[DENQTA].[GAC][^L].[LIVMFY]{4}.{2}G >rule|16|PEPDTIDE Prosite CNMP_BINDING_1 [EQ][^LNYH].[ATV][FY][^LDAM][^T]W[^PG]N >rule|18|PEPDTIDE Prosite ACTININ_1 >From what I've seen, this is too slow for grep against many genes ( or pre-translated peptides) but you can compile the query and target for much faster searching ( similar to a transient database index ). Even literal string matching can be slow without doing this - I have 500k empirically discovered ( highly-redundant lots of junk ) repeats that I can now label against 100, 60kb sequences in "reasonable" time which I could not do before. This works fine for the 600 or so mirna sequences I finally figured out how to download from sanger too :) Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. >From: "Dr. Christoph Gille" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: Re: [BiO BB] sequence analysis >Date: Wed, 24 Oct 2007 09:17:08 +0200 (CEST) > >You could use fgrep. Fgrep is faster than grep. > >Before, you should transform the database file such that >each sequence takes one line without blank and without >line breaks (using tr and sed) > >Database files are optimized for hole cards for >historical reasons. Lines are wrapped after at least after 72 >characters, preventing the use of fgrep. > > > > >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ Get a FREE Web site and more from Microsoft Office Live Small Business! http://clk.atdmt.com/MRT/go/aub0930004958mrt/direct/01/ From Sterten at aol.com Wed Oct 24 09:38:00 2007 From: Sterten at aol.com (Sterten at aol.com) Date: Wed, 24 Oct 2007 09:38:00 EDT Subject: [BiO BB] genbank orthography Message-ID: I once made a list for influenza, but not nearly complete. Just some hundred of the most common (mis-)spellings i.e. wrt. the genes In einer eMail vom 24.10.2007 10:41:32 Westeurop?ische Normalzeit schreibt dan.bolser at gmail.com: On 24/10/2007, Sterten at aol.com wrote: > > names are not spelled uniformly, e.g. Viet Nam and Vietnam, > also many typos, this makes it very difficult to sort and analyse the entries > by computer. > I'm looking for a complete list of different spellings > (thousands of entries...) and the suggested standard so we can > correct/uniformify them automatically. Great idea. The PDB needs something similar also! From marchywka at hotmail.com Wed Oct 24 10:25:25 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 24 Oct 2007 10:25:25 -0400 Subject: [BiO BB] genbank orthography In-Reply-To: Message-ID: I deleted most of the posts on this thread but as with other thread if you can reduce the db to text, there are plenty of good tools for one-time text processing- this is easy with sed and perl. There are indexing scripts on the web that are only 10-20 lines long. It isn't hard to find typos in such a list, with or without a spelling dictionary. >From: Sterten at aol.com >Reply-To: "General Forum at Bioinformatics.Org" > >To: bio_bulletin_board at bioinformatics.org >Subject: [BiO BB] genbank orthography >Date: Wed, 24 Oct 2007 03:26:09 EDT > > >names are not spelled uniformly, e.g. Viet Nam and Vietnam, >also many typos, this makes it very difficult to sort and analyse the >entries >by computer. >I'm looking for a complete list of different spellings >(thousands of entries...) and the suggested standard so we can >correct/uniformify them automatically. > > > > >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ i'm making a difference.?Make every IM count for the cause of your choice. Join Now. http://im.live.com/messenger/im/home/?source=TAGHM From kanzure at gmail.com Tue Oct 23 19:43:55 2007 From: kanzure at gmail.com (Bryan Bishop) Date: Tue, 23 Oct 2007 18:43:55 -0500 Subject: [BiO BB] sequence analysis In-Reply-To: References: <838860.9612.qm@web51504.mail.re2.yahoo.com> Message-ID: <200710231843.55373.kanzure@gmail.com> "Of the protein sequences from human." Martin, it sounds like what Nikhil wants to do would require much more memory than textpad would be allowed to allocate. Right? - Bryan On Tuesday 23 October 2007 18:27, Martin Gollery wrote: > For an exact match you can simply use grep, or open the target in an > editor such as textpad and use the search function. > > Marty > > On 10/23/07, nikhil gadewal wrote: > > Hello all > > > > I am interested to find tetramer peptide matching 100% to the > > carboxyl terminal of the protein sequences from human. Is there any > > specific tool available to do so. > > Can BLAST or FASTA handle it by changing the parameters. > > > > Thankyou in advance. > > > > Nikhil From Lambert at Chatham.edu Tue Oct 23 20:46:51 2007 From: Lambert at Chatham.edu (Lambert, Lisa) Date: Tue, 23 Oct 2007 20:46:51 -0400 Subject: [BiO BB] sequence analysis References: <838860.9612.qm@web51504.mail.re2.yahoo.com> Message-ID: <7980CF1A43C6564184A9A92D278F839E06AFBDDD@hickory.chatham.local> I would suggest using PatScan: http://www-unix.mcs.anl.gov/compbio/PatScan/. Unlike doing a plain text search, it will let you specify that the pattern must be at the end or the beginning of a sequence. Lisa -----Original Message----- From: bio_bulletin_board-bounces+lambert=chatham.edu at bioinformatics.org on behalf of nikhil gadewal Sent: Tue 10/23/2007 6:24 AM To: bio_bulletin_board at bioinformatics.org Cc: Subject: [BiO BB] sequence analysis[Scanned] Hello all I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human. Is there any specific tool available to do so. Can BLAST or FASTA handle it by changing the parameters. Thankyou in advance. Nikhil NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From Richard.Squires at UTSouthwestern.edu Wed Oct 24 11:10:48 2007 From: Richard.Squires at UTSouthwestern.edu (Richard Squires) Date: Wed, 24 Oct 2007 10:10:48 -0500 Subject: [BiO BB] genbank orthography Message-ID: <471F1A28020000E50003C47E@swnw124.swmed.edu> For correct spellings you can use a Gazetteer. As far as a resource of misspellings I am not aware of one. Burke --- Burke Squires BioHealthBase BRC Influenza Bioinformaticist University of Texas Southwestern Medical Center at Dallas richard.squires at utsouthwestern.edu (214) 648-4952 >>> 10/24/07 2:26 AM >>> names are not spelled uniformly, e.g. Viet Nam and Vietnam, also many typos, this makes it very difficult to sort and analyse the entries by computer. I'm looking for a complete list of different spellings (thousands of entries...) and the suggested standard so we can correct/uniformify them automatically. _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From ma11 at gen.cam.ac.uk Wed Oct 24 12:58:38 2007 From: ma11 at gen.cam.ac.uk (Michael Ashburner) Date: Wed, 24 Oct 2007 18:58:38 +0200 Subject: [BiO BB] genbank orthography In-Reply-To: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com> References: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com> Message-ID: I agree it is a terrible mess. Not the new gaz.obo project. This is an attempt to build an artefact in OBO format for geographical locations. The current version has about 20,000 locations. We have a parse of about 45,000 from Genbank but it will take some time to check them and get them in to this file. http://obo.cvs.sourceforge.net/obo/obo/ontology/environmental/gaz.obo? view=log Michael Ashburner The file is available from the OBO CVS site On 24 Oct 2007, at 10:41, Dan Bolser wrote: > On 24/10/2007, Sterten at aol.com wrote: >> >> names are not spelled uniformly, e.g. Viet Nam and Vietnam, >> also many typos, this makes it very difficult to sort and analyse >> the entries >> by computer. >> I'm looking for a complete list of different spellings >> (thousands of entries...) and the suggested standard so we can >> correct/uniformify them automatically. > > Great idea. The PDB needs something similar also! > > >> >> >> >> >> _______________________________________________ >> General Forum at Bioinformatics.Org - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > > -- > hello > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From postelv at gmail.com Thu Oct 25 06:33:38 2007 From: postelv at gmail.com (vladi postal) Date: Thu, 25 Oct 2007 12:33:38 +0200 Subject: [BiO BB] readseq(converting genbank to gcg format) Message-ID: <81e4d3400710250333k75b9823fv80480ba8280a23e0@mail.gmail.com> Hi, I use readseq program in unix to convert genbank files to gcg format. The problem is that for some files the program don't work, for example: gbpat2.seq, gbpat3.seq,gbpat8.seq don't work but for gbpat1.seq, gbpat4.seq, gbpat6 , seq,gbpat7.seq it works fine. does anyone have the same problem? How to solve it? any help will be appreciated. vladi, From jeff at bioinformatics.org Fri Oct 26 18:00:05 2007 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Fri, 26 Oct 2007 18:00:05 -0400 Subject: [BiO BB] Course: Mitochondriomics Message-ID: <47226365.8030309@bioinformatics.org> Mitochondriomics Online at the Bioinformatics Organization In collaboration with Roskilde University, Denmark November 5-9, 2007 CONTENTS: 1. BACKGROUND 2. OBJECTIVE 3. INSTRUCTORS 4. COURSE OUTLINE 5. ADDITIONAL INFORMATION ---------------------------------------- 1. BACKGROUND ---------------------------------------- Mitochondria are semiautonomous organelles, presumed to be the evolutionary product of a symbiosis between a eukaryote and a prokaryote. The organelle is present in almost all eukaryotic cells in an extent from 103-104 copies. The main function of mitochondria is production of ATP by oxidative phosphorylation and its involvement in apoptosis. The organelles contain almost exclusively maternally inherited mtDNA, and they have specific systems for transcription, translation and replication of mtDNA. Mitochondrial dysfunction has been correlated with mitochondrial diseases where the clinical pathologies are believed to include infertility, diabetes, blindness, deafness, stroke, migraine and heart-, kidney-, and liver diseases. Recently cancer was added to this list when investigations into human cancer cells from breast, bladder, neck, and lung, revealed a high occurrence of mutations in mtDNA. With the emerging understanding of the role of mitochondria in a vast array of pathologies, research of mitochondria and mitochondrial dysfunction have in the last decade yielded a huge amount of data in form of publications and databases. Nevertheless, the field of mitochondrial research is still far from exhausted with many unknown factors yet to be discovered. ---------------------------------------- 2. OBJECTIVE ---------------------------------------- The purpose of this course is to introduce the student to the various databases and wet-lab methods available. Furthermore the course will through selected articles give an understanding of the pitfalls and limitations of the various databases and methods. ---------------------------------------- 3. INSTRUCTORS ---------------------------------------- * Claus Desler (cdesler at ruc.dk) * Prashanth Suravajhala (prash at ruc.dk) ---------------------------------------- 4. COURSE OUTLINE ---------------------------------------- * Day 1: Introduction to mitochondria and its pathways, genetics; proteomics of mitochondria * Day 2: Various assays used and advances in mitochondrial research * Day 3: Tools and databases used in mitochondrial research; exercises * Day 4: Exercises, report, review of literature continues * Day 5: Summary and questions and answers; evaluation ---------------------------------------- 5. ADDITIONAL INFORMATION ---------------------------------------- Please visit: * http://wiki.bioinformatics.org/BI221A_Mitochondriomics From harry.mangalam at uci.edu Fri Oct 26 20:07:11 2007 From: harry.mangalam at uci.edu (Harry Mangalam) Date: Fri, 26 Oct 2007 17:07:11 -0700 Subject: [BiO BB] sequence analysis In-Reply-To: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de> References: <838860.9612.qm@web51504.mail.re2.yahoo.com> <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de> Message-ID: <200710261707.11239.harry.mangalam@uci.edu> And not to start a 'grep' war, but nrgrep and agrep are also faster than either fgrep or grep and allow searching with errors. nrgrep decays more smoothly with more complex patterns but agrep has more implementations. Both allow searching across arbitrary boundaries. Both are free. hjm On Wednesday 24 October 2007, Dr. Christoph Gille wrote: > You could use fgrep. Fgrep is faster than grep. > > Before, you should transform the database file such that > each sequence takes one line without blank and without > line breaks (using tr and sed) > > Database files are optimized for hole cards for > historical reasons. Lines are wrapped after at least after 72 > characters, preventing the use of fgrep. > > > > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway, UC Irvine 92697 949 824 0084(o), 949 285 4487(c) harry.mangalam at uci.edu From me.lixue at gmail.com Sat Oct 27 00:14:06 2007 From: me.lixue at gmail.com (Xue Li) Date: Fri, 26 Oct 2007 23:14:06 -0500 Subject: [BiO BB] need help on HMM(Hidden Markov Model) package Message-ID: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com> Hello all, Does anyone know some good HMM(Hidden Markov Model) package? It would be perfect if it is written in Perl, or can be called in Perl. Thank a lot! -- Xue, Li Bioinformatics and Computational Biology program @ ISU Ames, IA 50010 515-450-7183 From landman at scalableinformatics.com Sat Oct 27 00:20:14 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Sat, 27 Oct 2007 00:20:14 -0400 Subject: [BiO BB] need help on HMM(Hidden Markov Model) package In-Reply-To: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com> References: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com> Message-ID: <4722BC7E.8010207@scalableinformatics.com> Xue Li wrote: > Hello all, > > Does anyone know some good HMM(Hidden Markov Model) package? It would be > perfect if it is written in Perl, or can be called in Perl. HMMer (http://hmmer.janelia.org/) is one of the standard tools, and it can be used from within BioPerl (http://www.bioperl.org/wiki/Main_Page) fairly easily (see http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SearchIO/hmmer.html ) > Thank a lot! Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615