From A.Bossers at id.dlo.nl Tue Apr 1 06:29:37 2003 From: A.Bossers at id.dlo.nl (Bossers, A.) Date: Tue, 01 Apr 2003 13:29:37 +0200 Subject: [BiO BB] Clustering EST sequences Message-ID: Dear All, I have a very basic problem of which I wonder how others have solved this. I want to make a unigene collection of a large EST database. We have chromat files in ABI format and I use Linux on the intel platform. I have phred and phrap running but since phrap was originally designed for genomic sequences we get lots of misaasemblies on poly-A or poly-T stretches. Therefore I installed the TIGR tigcl package which is designed for EST databases and also runs very well on multi node machines. However, it uses multi fasta files (and corresponding (optional) quality files) as input. I wanted to use the phred package to generate the required fasta and qual files. This runs fine but the fasta file has in the >name line additional info separated with spaces. These files are not accepted by TGICL. Is there an easy unix (linux) utility to convert these multi fasta files and quality fasta files in simpel >name {CRT} seq files so they kan be used as input for tgicl? Or is a conversion utility available to convert/extract phreds phd files into fasta-seq and fasta-qual? Any help would be appreciated, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From shalpine at ecomplexsystems.com Tue Apr 1 15:13:37 2003 From: shalpine at ecomplexsystems.com (Scott A. Halpine) Date: Tue, 1 Apr 2003 15:13:37 -0500 Subject: [BiO BB] Clustering EST sequences References: Message-ID: <002601c2f88b$2e56d3a0$1308a8c0@scott01> Clustering EST sequencesI don't know of any conversion utilities but you can certainly write a quick conversion in Perl. I'm not familiar with the specific layouts but it sounds like you simply need to properly truncate each row of data. There shouldn't be a problem if your field partition is white space (or any other specific delimiter for that matter). If you don't get a better offer, send me a small data file of what you need converted, the field delimiter used, and an example of what it needs converted into. I should be able to write you a Perl routine and send it back to you. Scott A. Halpine Ecologic Complex Systems, LLC 4640 Forbes Blvd, Suite 200 Lanham, MD 20706-4885 Phone: 301-918-3283 Fax: 301-429-8762 ----- Original Message ----- From: Bossers, A. To: bio_bulletin_board at bioinformatics.org Cc: biodevelopers at bioinformatics.org Sent: Tuesday, April 01, 2003 6:29 AM Subject: [BiO BB] Clustering EST sequences Dear All, I have a very basic problem of which I wonder how others have solved this. I want to make a unigene collection of a large EST database. We have chromat files in ABI format and I use Linux on the intel platform. I have phred and phrap running but since phrap was originally designed for genomic sequences we get lots of misaasemblies on poly-A or poly-T stretches. Therefore I installed the TIGR tigcl package which is designed for EST databases and also runs very well on multi node machines. However, it uses multi fasta files (and corresponding (optional) quality files) as input. I wanted to use the phred package to generate the required fasta and qual files. This runs fine but the fasta file has in the >name line additional info separated with spaces. These files are not accepted by TGICL. Is there an easy unix (linux) utility to convert these multi fasta files and quality fasta files in simpel >name {CRT} seq files so they kan be used as input for tgicl? Or is a conversion utility available to convert/extract phreds phd files into fasta-seq and fasta-qual? Any help would be appreciated, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgollery at unr.edu Tue Apr 1 17:58:43 2003 From: mgollery at unr.edu (Martin Gollery) Date: Tue, 1 Apr 2003 14:58:43 -0800 Subject: [BiO BB] Clustering EST sequences In-Reply-To: <002601c2f88b$2e56d3a0$1308a8c0@scott01> References: <002601c2f88b$2e56d3a0$1308a8c0@scott01> Message-ID: <1049237923.3e8a19a31ee27@webmail.unr.edu> This is very strange- spaces are allowed in fasta, at least in the description section. In the first part you may need to replace the spaces with | symbols, as follows: Change >gi 29125973 emb AJ550374.1 USO550374 Uncultured soil bacterium partial nosZ gene for putative nitrous oxide reductase, clone T8C23 GGCTGGGG... to >gi|29125973|emb|AJ550374.1|USO550374 Uncultured soil bacterium partial nosZ gene for putative nitrous oxide reductase, clone T8C23 GGCTGGGG... Quoting "Scott A. Halpine" : > Clustering EST sequencesI don't know of any conversion utilities but you can > certainly write a quick conversion in Perl. I'm not familiar with the > specific layouts but it sounds like you simply need to properly truncate each > row of data. There shouldn't be a problem if your field partition is white > space (or any other specific delimiter for that matter). > If you don't get a better offer, send me a small data file of what you need > converted, the field delimiter used, and an example of what it needs > converted into. I should be able to write you a Perl routine and send it back > to you. > Scott A. Halpine > Ecologic Complex Systems, LLC > 4640 Forbes Blvd, Suite 200 > Lanham, MD 20706-4885 > Phone: 301-918-3283 > Fax: 301-429-8762 > > ----- Original Message ----- > From: Bossers, A. > To: bio_bulletin_board at bioinformatics.org > Cc: biodevelopers at bioinformatics.org > Sent: Tuesday, April 01, 2003 6:29 AM > Subject: [BiO BB] Clustering EST sequences > > > Dear All, > > I have a very basic problem of which I wonder how others have solved this. > > > I want to make a unigene collection of a large EST database. We have > chromat files in ABI format and I use Linux on the intel platform. > > I have phred and phrap running but since phrap was originally designed for > genomic sequences we get lots of misaasemblies on poly-A or poly-T > stretches. > > Therefore I installed the TIGR tigcl package which is designed for EST > databases and also runs very well on multi node machines. > > However, it uses multi fasta files (and corresponding (optional) quality > files) as input. > I wanted to use the phred package to generate the required fasta and qual > files. This runs fine but the fasta file has in the >name line additional > info separated with spaces. These files are not accepted by TGICL. > > Is there an easy unix (linux) utility to convert these multi fasta files > and quality fasta files in simpel >name {CRT} seq files so they kan be used > as input for tgicl? Or is a conversion utility available to convert/extract > phreds phd files into fasta-seq and fasta-qual? > > Any help would be appreciated, > > Alex > > > Martin Gollery Associate Director of Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS200 (775)784-6048 ------------------------------------------------- This mail sent through https://webmail.unr.edu From gianluca.dellavedova at unimib.it Tue Apr 1 11:57:25 2003 From: gianluca.dellavedova at unimib.it (Gianluca Della Vedova) Date: 01 Apr 2003 18:57:25 +0200 Subject: [BiO BB] JCST Special Issue on Bioinformatics Message-ID: <1049216244.1174.21.camel@localhost> ****** Journal of Computer and Science Technology ****** ****** Special Issue on Bioinformatics ****** ****** Call for Papers ****** The Journal of Computer and Science Technology is inviting papers for a special issue devoted to Bioinformatics scheduled to appear in late 2003. The aim: The special issue aims at giving an up-to-date snapshot of the current trends of research in Bioinformatics. Papers reporting on original research in all areas of bioinformatics and computational molecular biology are preferred, even though surveys of particular relevance are good candidate. Topics: All aspects of bioinformatics, both theoretical and practical. Some not exclusive topics of interest include: Genomics, Recognition of genes and regulatory elements, Molecular evolution, Phylogenetic inference, Protein structure, Gene expression, Gene networks, Genetic variation (SNPs, haplotypes, etc.), Metabolic Pathways, Combinatorial libraries and drug design, Computational proteomics, Data management methods and systems, Sequence analysis, motifs, and pattern matching, Comparative genomics and annotation. Paper submission: Authors are invited to send one copy of a full-length paper to the email address jcst at bioinformatics.bio.disco.unimib.it. Electronic submissions via email, in the form of a postscript or PDF file are encouraged, alternatively sending a hardcopy to the contact person is acceptable. Successful printing or reception of the paper will be acknowledged via email. Submissions must be received by May 26, 2003. Authors will be notified of acceptance by July 14, 2003. Final version: The usual authors' instructions of Journal of Computer Science and Technology apply (available at http://www.ict.ac.cn/jcst/efile3.html). Special Issue Editors: Paola Bonizzoni Dipartimento di Informatica, Sistemistica e Comunicazione Universit? degli Studi di Milano-Bicocca bonizzoni at disco.unimib.it Gianluca Della Vedova Dipartimento di Statistica Universit? degli Studi di Milano-Bicocca gianluca.dellavedova at unimib.it Tao Jiang Department of Computer Science University of California at Riverside jiang at cs.ucr.edu Contact Person: Paola Bonizzoni DISCo Universit? degli Studi di Milano-Bicocca via Bicocca degli Arcimboldi 8 20126 - Milano (Italy) bonizzoni at disco.unimib.it tel: ++39-0264487814 fax: ++39-0264487839 Important Dates: Submission deadline: May 26, 2003 Notification of acceptance: July 14, 2003 Final version: August 4, 2003 At the URL http://www.bio.disco.unimib.it/jcst it is possible to download a printable version of the call for paper. -- Gianluca Della Vedova Dip. Statistica, Univ. Milano-Bicocca http://www.statistica.unimib.it/utenti/dellavedova/ From A.Bossers at id.dlo.nl Wed Apr 2 01:31:52 2003 From: A.Bossers at id.dlo.nl (Bossers, A.) Date: Wed, 02 Apr 2003 08:31:52 +0200 Subject: [BiO BB] FW/Re: Fasta convertion in large EST assemblies Message-ID: Dear all, thanks for the quick replies and help with the fasta conversion problem. I already started fiddling around in perl to convert the fasta files into files acceptable to tgicl for EST assembly. But Eitan provided the most simpel solution in his one line perl 'script' that exactly did what I needed. BIG THANKS. The script just gets rid of all stuff after the filename (as long as no spaces are in the filename) and preserves all sequence or quality info behind it. His solution is below. I still don't get why tgicl does't accept files in allowed fastA format. But I don't bother anymore. My EST assembly is one step further. Thanks again to all people sending me perl solutions! Alex -----Oorspronkelijk bericht----- Van: Eitan Rubin [mailto:Eitan.Rubin at weizmann.ac.il] Verzonden: dinsdag 1 april 2003 20:28 Aan: A.Bossers at ID.DLO.NL Onderwerp: Fasta convertion Hi, If I am not mistaken, you question is "how do I convert format A below to format B". If this indeed what you need, the following should do the trick: perl -pe 's/^>(\S+).*/>$1/;' old_format_file > new_format_file Format A: ========== >seqname1 some text with spaces ACGTAGACTGACT.. >seqname2 some other text etc ACGATCGATAGCT Format B: ======== >seqname1 ACGTAGACTGACT.. >seqname2 ACGATCGATAGCT Eitan ------------------------------------------------------------------------ Eitan Rubin, PhD Head of Bioinformatics and Biological Computing Dept. Biological Services Weizmann Institute of Science Tel: +972-8-9343456 Fax: +972-8-9346006 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at bioinformatics.org Wed Apr 2 10:27:44 2003 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Wed, 02 Apr 2003 10:27:44 -0500 Subject: [BiO BB] BioDarwin mailing list Message-ID: <3E8B0170.7060300@bioinformatics.org> Greetings. With the help of Apple Computer, which is an organizational member of Bioinformatics.Org, we have set up a forum for open source bioinformatic developers to discuss development on the Mac OSX platform. For starters, there is a mailing list on which Apple Computer engineers will be subscribed and available for help. To subscribe to the mailing list yourself, please go to the following URL: https://bioinformatics.org/lists/biodarwin "BioDarwin" will likely grow into an "open lab" (special interest group) at Bioinformatics.Org and include many resources for the developer and Mac user. As a reminder of the resources available, Bioinformatics.Org has a G4 Server which can be used by Bioinformatics.Org members for software development and other projects related to bioinformatics. There's also Apple's treasure trove of developer resources, documentation and help: http://developer.apple.com/ Cheers. Jeff -- J.W. Bizzaro jeff at bioinformatics.org President, Bioinformatics.Org http://bioinformatics.org/~jeff "As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously." -- Benjamin Franklin -- From crasmen at magic.fr Wed Apr 2 17:25:17 2003 From: crasmen at magic.fr (Corentin =?iso-8859-1?Q?Cras=2DM=E9neur?=) Date: Wed, 2 Apr 2003 16:25:17 -0600 Subject: [BiO BB] BioDarwin mailing list In-Reply-To: <3E8B0170.7060300@bioinformatics.org> References: <3E8B0170.7060300@bioinformatics.org> Message-ID: At 10:27 -0500 on 2/04/03, you wrote: >Greetings. > >With the help of Apple Computer, which is an organizational member >of Bioinformatics.Org, we have set up a forum for open source >bioinformatic developers to discuss development on the Mac OSX >platform. For starters, there is a mailing list on which Apple >Computer engineers will be subscribed and available for help. > >To subscribe to the mailing list yourself, please go to the following URL: >https://bioinformatics.org/lists/biodarwin Thanks a lot for the information. I thought other people could be interested on the Apple Scitech list as well so I send a copy of your announcement there. People in the MicroArray Yahoo Group could be interested as well (but I didn't forward the message there yet)? Sincerely, Corentin Cras-M?neur From Ttlusa at aol.com Fri Apr 4 08:58:09 2003 From: Ttlusa at aol.com (Ttlusa at aol.com) Date: Fri, 04 Apr 2003 08:58:09 -0500 Subject: [BiO BB] RE: Confirmation Message-ID: <1B867DFC.1701E96D.00055F61@aol.com> Gentlemen, I would like to confirm my subscription to the mailing list. Regards, ttlclinical From derek at biotechrecruiter.org Fri Apr 4 09:03:09 2003 From: derek at biotechrecruiter.org (Derek Pyper) Date: Fri, 4 Apr 2003 06:03:09 -0800 Subject: [BiO BB] Dynamics Engineer Position Message-ID: Hi All, I am working to fill the following position, if you know of anyone who may be qualified or where I may look to find people with the skill set listed, please send me an e-mail. Dynamics Engineer Company is a growing, privately held company headquartered in California, is the commercial leader in predictive biosimulation for in silico drug discovery and development. Employing its patented technology, company develops dynamic, large-scale, mathematical models of human disease, called "confidential" platforms, and utilizes them in all stages of drug discovery and development. We currently engage in in silico research in the areas of asthma, rheumatoid arthritis, obesity, and diabetes. Company collaborates with pharmaceutical and biotech companies to develop effective new treatments for disease and dramatically reduce the time and cost needed to develop them. Company is seeking engineers and applied mathematicians for positions on our In Silico Research and Development staff. The Dynamics Engineer works with scientists and other engineers to perform in silico research and development. He or she provides mathematical modeling expertise, helps develop complex mathematical models that meet the needs of pharmaceutical researchers, uses these models in research projects, identifies novel experiments that can be done to address major knowledge gaps, writes project proposals and reports, and makes technical presentations of their work. The ideal candidate would have: * A Ph.D. in Chemical, Mechanical, or Aerospace Engineering, Applied Mathematics, Physics, or closely related field with a strong background in nonlinear dynamics and control theory. * The ability to translate complex, real world problems into compelling mathematical models. * Experience with collaborative, multidisciplinary research projects. * A strong interest in biology. * Very strong communication skills, both oral and written, including technical writing and presentation. * The ability to work independently in a diverse, cross-functional team environment. Derek Pyper Biotech Recruiters International 2640 Castro Way Sacramento, Ca Work 916-455-7091 Fax 916-455-7082 www.biotech-recruiters.com Derek at biotechrecruiter.org CONFIDENTIALITY STATEMENT: This electronic message contains privileged and confidential information from Biotech Recruiters International. This information is intended solely for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the contents of this message is prohibited. If you have received this email in error, please notify us immediately by telephone at 916-455-7091 or by email reply. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Derek Pyper.vcf Type: text/x-vcard Size: 493 bytes Desc: not available URL: From Ttlusa at aol.com Fri Apr 4 11:48:37 2003 From: Ttlusa at aol.com (Ttlusa at aol.com) Date: Fri, 04 Apr 2003 11:48:37 -0500 Subject: [BiO BB] RE: Confirmation Message-ID: <63F99B6E.5D91DEF7.00055F61@aol.com> Gentlemen, I would like to confirm my subscription to the mailing list. Regards, ttlclinical _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From jeff at bioinformatics.org Fri Apr 4 12:02:35 2003 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Fri, 04 Apr 2003 12:02:35 -0500 Subject: [BiO BB] RE: Confirmation References: <63F99B6E.5D91DEF7.00055F61@aol.com> Message-ID: <3E8DBAAB.5020207@bioinformatics.org> If you're able to post and you're reading this message, then you are subscribed :-) Jeff Ttlusa at aol.com wrote: > Gentlemen, > > I would like to confirm my subscription to the mailing list. > Regards, > ttlclinical -- J.W. Bizzaro jeff at bioinformatics.org President, Bioinformatics.Org http://bioinformatics.org/~jeff "As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously." -- Benjamin Franklin -- From KarenD721 at aol.com Fri Apr 4 14:17:56 2003 From: KarenD721 at aol.com (KarenD721 at aol.com) Date: Fri, 4 Apr 2003 14:17:56 EST Subject: [BiO BB] (no subject) Message-ID: <3a.36fda4e7.2bbf3464@aol.com> From chrisg at sbs.bangor.ac.uk Tue Apr 8 07:13:29 2003 From: chrisg at sbs.bangor.ac.uk (Chris Gliddon) Date: Tue, 08 Apr 2003 12:13:29 +0100 Subject: [BiO BB] Mascot alternatives Message-ID: <3E92AED9.2080501@sbs.bangor.ac.uk> Hi All, I'm looking for alternatives to Mascot from Matrix science which uses mass spectrometry data to identify proteins from primary sequence databases. I would prefer open source software running on Linux. Thanks for your help. Chris -- ____________________________________________________ Dr. Chris Gliddon School of Biological Sciences University of Wales, Bangor LL57 2UW United Kingdom Tel: +44 (0)1248 382533 FAX: +44 (0)1248 382569 Mobile: +44 (0)7941 060423 ____________________________________________________ From marchal at mediagen.fr Tue Apr 8 12:18:46 2003 From: marchal at mediagen.fr (Ingrid Marchal) Date: Tue, 08 Apr 2003 18:18:46 +0200 Subject: [BiO BB] Re: Mascot alternatives In-Reply-To: <20030408160103.8917BD2830@www.bioinformatics.org> Message-ID: <5.2.0.9.0.20030408180926.00a92478@pop.mediagen.fr> Hi, emowse (http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/emowse.html) is an algorithm from the EMBOSS package that does the same job. It is free and runs in command line under linux. I am not sure, but I think that originally Mascot was derived from mowse, now replaced by emowse. Regards, Ingrid At 18:01 08/04/2003, you wrote: >Today's Topics: > > 1. Mascot alternatives (Chris Gliddon) > >--__--__-- > >Message: 1 >Date: Tue, 08 Apr 2003 12:13:29 +0100 >From: Chris Gliddon >To: bio_bulletin_board at bioinformatics.org >Subject: [BiO BB] Mascot alternatives >Reply-To: bio_bulletin_board at bioinformatics.org > >Hi All, > >I'm looking for alternatives to Mascot from Matrix science which uses >mass spectrometry data to identify proteins from primary sequence >databases. I would prefer open source software running on Linux. > >Thanks for your help. > >Chris >-- >____________________________________________________ >Dr. Chris Gliddon >School of Biological Sciences >University of Wales, Bangor >LL57 2UW United Kingdom > >Tel: +44 (0)1248 382533 >FAX: +44 (0)1248 382569 >Mobile: +44 (0)7941 060423 >____________________________________________________ Ingrid Marchal, Ph.D. MEDIAGEN - bioinformatic solutions www.mediagen.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasa at muh.biglobe.ne.jp Wed Apr 9 08:22:00 2003 From: sasa at muh.biglobe.ne.jp (Takeshi Sasayama) Date: Wed, 9 Apr 2003 21:22:00 +0900 Subject: [BiO BB] Audio archives Message-ID: Hi all, I am looking for a web site which has audio archives of biological or medical presentations or news in English. I hope the archives should be mp3 files and can be downloaded. Does anyone know a good site? Takeshi Sasayama From mkaut at bu.edu Wed Apr 9 11:46:23 2003 From: mkaut at bu.edu (Maurya Kaut) Date: Wed, 09 Apr 2003 11:46:23 -0400 Subject: [BiO BB] multinomial distribution and Ig light chains Message-ID: <3E94404F.5020507@bu.edu> Hello, I'm attempting to replicate the analysis of immunoglobulin genes as in this article: Lossos, et al. The Inference of Antigen Selection on Ig Genes. The Journal of Immunology 165(9): 5122-5126 (2000) The article is available here: http://www.jimmunol.org/cgi/content/full/165/9/5122 The Java applet mentioned in the paper is here: http://www-stat.stanford.edu/immunoglobin/ The contact information in the paper no longer appears to be valid. Basically, I'd like to understand the multinomial tail probability as they've applied it to heavy chains, so that I might apply it to light chains. The information I've gathered on statistics thus far only explains bits and pieces of the equations, but I'm having trouble putting it all together. I've written a Perl script that calculates expected replacement frequency for Ig light chain germline genes with some success, but the numbers I get for P values are two or three orders of magnitude off. Firstly, I would just like to know if there is anyone who is familiar with this type of statistical work. Also, I've heard of the "S" engine, and its cousin "R", but I'm not quite sure if they are applicable here. Has anyone used them in conjuction with Perl/CGI? Any advice is appreciated. Maurya Kaut -- <><><><><><><><><><><><><><><><><><> Maurya G. Kaut Gerry Amyloid Research Laboratory Boston University School of Medicine 715 Albany St. K-508 Boston, MA 02118 (617) 638-5389 - Voice <><><><><><><><><><><><><><><><><><> From cjh02 at liverpool.ac.uk Wed Apr 9 09:26:50 2003 From: cjh02 at liverpool.ac.uk (Chris Houseman) Date: Wed, 09 Apr 2003 14:26:50 +0100 Subject: [BiO BB] Audio archives In-Reply-To: Message-ID: <200304091326.h39DQpX14917@webmail2.liv.ac.uk> Don't know of mp3's, but http://www.s-star.org has ~600mb worth of video bioinformatics lectures regards, CJH --------------- reply ---------------- > Hi all, > > I am looking for a web site which has audio archives of > biological or medical presentations or news in English. I > hope the archives should be mp3 files and can be > downloaded. Does anyone know a good site? > > Takeshi Sasayama > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > Chris Houseman Research Fellow University of Liverpool University Department of Pathology Duncan Building Daulby Street Liverpool L69 3GA 0151 7064965 From lhewei at TripathImaging.com Wed Apr 9 12:04:59 2003 From: lhewei at TripathImaging.com (Li, Hewei) Date: Wed, 9 Apr 2003 12:04:59 -0400 Subject: [BiO BB] Tools to search AG rich fragments around a long DNA sequence? Message-ID: <235AEBD0012949478E0E2FB3449D7A7D014FF453@tpe-exch.tripathimaging.com> Dear All, I am looking for DNA sequence analysis tools which could locate DNA fragments around 20 bases in length with rich purines (i.e., A or G) over a given DNA sequence. Could someone help me out? Many thanks! Hewei From rossini at blindglobe.net Wed Apr 9 12:25:20 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Wed, 09 Apr 2003 09:25:20 -0700 Subject: [BiO BB] multinomial distribution and Ig light chains In-Reply-To: <3E94404F.5020507@bu.edu> (Maurya Kaut's message of "Wed, 09 Apr 2003 11:46:23 -0400") References: <3E94404F.5020507@bu.edu> Message-ID: <87of3fvdsf.fsf@jeeves.blindglobe.net> Maurya Kaut writes: > I'm attempting to replicate the analysis of immunoglobulin genes as in > this article: > Lossos, et al. The Inference of Antigen Selection on Ig Genes. The > Journal of Immunology 165(9): 5122-5126 (2000) > The article is available here: > http://www.jimmunol.org/cgi/content/full/165/9/5122 > The Java applet mentioned in the paper is here: > http://www-stat.stanford.edu/immunoglobin/ > The contact information in the paper no longer appears to be > valid. Basically, I'd like to understand the multinomial tail Both Rob and Naras are still at Stanford Stat. > having trouble putting it all together. I've written a Perl script > that calculates expected replacement frequency for Ig light chain > germline genes with some success, but the numbers I get for P values > are two or three orders of magnitude off. Sounds like an implementation error -- I doubt if the distribution is pathological enough to admit to round-off problems on that order of magnitude. > Firstly, I would just like to know if there is anyone who is > familiar with this type of statistical work. Also, I've heard of > the "S" engine, and its cousin "R", but I'm not quite sure if they > are applicable here. Has anyone used them in conjuction with > Perl/CGI? Any advice is appreciated. The S statistical programming language, as implemented by S (not generally available), S-PLUS (commercially available) and R (open-source), is a full featured language for programming, not unlike a functional, white-space agnostic version of Python (in a sense). I would (and generally only) use R for data analysis, and it'll make programming this problem up much simpler (assuming that you know both how to program as well as have intuition for statistical data analysis). best, -tony -- A.J. Rossini rossini at u.washington.edu http://software.biostat.washington.edu/ Biostatistics, U Washington and Fred Hutchinson Cancer Research Center FHCRC:Tu: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW : Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. From jrg at world.std.com Wed Apr 9 12:59:18 2003 From: jrg at world.std.com (James Graham) Date: Wed, 9 Apr 2003 12:59:18 -0400 Subject: [BiO BB] Tools to search AG rich fragments around a long DNA sequence? In-Reply-To: <235AEBD0012949478E0E2FB3449D7A7D014FF453@tpe-exch.tripathimaging.com> Message-ID: <99FF337A-6AAC-11D7-8169-000A956A62E4@world.std.com> On Wednesday, April 9, 2003, at 12:04 PM, Li, Hewei wrote: > I am looking for DNA sequence analysis tools which could locate DNA > fragments around 20 bases in length with rich purines (i.e., A or G) > over a > given DNA sequence. Could someone help me out? Many thanks! can you define purine richness in terms of a percentage? it seems writing such a tool in a perl script would be very easy and very quick. do you have access to a unix/linux machine or are you looking more for an app with a GUI? james From cmdobson at ucalgary.ca Wed Apr 9 16:39:01 2003 From: cmdobson at ucalgary.ca (C. Melissa Dobson) Date: Wed, 09 Apr 2003 14:39:01 -0600 Subject: [BiO BB] Transcription factor Message-ID: <3E9484E5.4090802@ucalgary.ca> Hello, Can anyone suggest a good promoter binding site locator program for mouse genomic DNA sequences? I am aware of Transfac are there any others that can be recommended? Melissa Dobson From mgollery at unr.edu Wed Apr 9 17:07:07 2003 From: mgollery at unr.edu (Martin Gollery) Date: Wed, 9 Apr 2003 14:07:07 -0700 Subject: [BiO BB] Transcription factor In-Reply-To: <3E9484E5.4090802@ucalgary.ca> References: <3E9484E5.4090802@ucalgary.ca> Message-ID: <1049922427.3e948b7b09b23@webmail.unr.edu> Hi Melissa, Try the Genomatix suite- the promoter inspector program will predict Mammalian Promoter regions, and MatInspector looks for transcription factor binding sites. Marty Quoting "C. Melissa Dobson" : > > Hello, > Can anyone suggest a good promoter binding site locator program for > mouse genomic DNA sequences? I am aware of Transfac are there any > others that can be recommended? > > Melissa Dobson Martin Gollery Associate Director of Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS200 (775)784-6048 ------------------------------------------------- This mail sent through https://webmail.unr.edu From mkgovindis at yahoo.com Wed Apr 9 22:43:47 2003 From: mkgovindis at yahoo.com (govind mk) Date: Wed, 9 Apr 2003 19:43:47 -0700 (PDT) Subject: [BiO BB] Re: Extracting location from a genbank flatfile In-Reply-To: <1049922427.3e948b7b09b23@webmail.unr.edu> Message-ID: <20030410024347.78989.qmail@web40109.mail.yahoo.com> Hi all I am stuck with a rather simple problem. I would like to extract locations of specific features (Eg .CDS)from a Genbank flat file. I tried using Bioperl but couldnt manage to get the exact locations for complicated representations of locations such as complement(join(295405..295443,295492..295529)) as Bioperl modules return the minimum start and maximum stop. Any suggestions ??? -Govind __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com From daniel.ducat at metalife.de Thu Apr 10 03:44:38 2003 From: daniel.ducat at metalife.de (Daniel Ducat) Date: Thu, 10 Apr 2003 10:44:38 +0300 Subject: [BiO BB] Re: Extracting location from a genbank flatfile In-Reply-To: <20030410024347.78989.qmail@web40109.mail.yahoo.com> Message-ID: <00d601c2ff35$0aab1910$3c01a8c0@metalife.bg> Hello Govind We had the same problem with Genbank locations. What we did here is to write a C++ program that parse a Genbank entry file into flatfiles, ready to be imported into a relational DB. In the database(MSSQL) we wrote a stored procedure, that parse every location, notwithstanding how complicated is it , and break it into a set of smaller simple ones. Note, that a location can have a link to other entry, for example (100..130, A01234.12..15). It look complicated, but in a such a way we get all we need. For more simple solution write a program or perl script (or bash script) that extracts the location from of the feature and parse it. This task is not so difficult as it seems, since there are clear rules for feature table location positions in Genbank entry files. Regards Daniel Ducat Senior Database Developer Metalife AG e-mail: daniel.ducat at metalife.de Phone: +359 (02) 950-18-04 URL: http://www.metalife.de -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of govind mk Sent: Thursday, April 10, 2003 5:44 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Re: Extracting location from a genbank flatfile Hi all I am stuck with a rather simple problem. I would like to extract locations of specific features (Eg .CDS)from a Genbank flat file. I tried using Bioperl but couldnt manage to get the exact locations for complicated representations of locations such as complement(join(295405..295443,295492..295529)) as Bioperl modules return the minimum start and maximum stop. Any suggestions ??? -Govind __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From pmr at ebi.ac.uk Thu Apr 10 05:50:04 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 10 Apr 2003 10:50:04 +0100 Subject: [BiO BB] Re: Extracting location from a genbank flatfile References: <20030410024347.78989.qmail@web40109.mail.yahoo.com> Message-ID: <3E953E4C.9060800@ebi.ac.uk> govind mk wrote: > I am stuck with a rather simple problem. > I would like to extract locations of specific features > (Eg .CDS)from a Genbank flat file. > > I tried using Bioperl but couldnt manage to get the > exact locations for complicated representations of > locations such as > complement(join(295405..295443,295492..295529)) > as Bioperl modules return the minimum start and > maximum stop. You can use EMBOSS (the European Molecular Biology Open Software Suite) http://www.uk.embnet.org/Software/EMBOSS/ EMBOSS is an open source (GPL/LGPL) package of sequence analysis libraries and programs. Among other features, EMBOSS can read EMBL/Genbank, SwissProt and PIR feature tables and convert to/from GFF without losing information (although this does require adding some extra GFF tags to retain information about complex feature locations). The internals are similar to the ARTEMIS feature table editor from the Sanger Institute. I am currently extending the feature table internals of EMBOSS for the next release, to allow deletion/insertion of sequence ranges, and would be interested in any feedback - especially things that are hard to do with existing tools. regards, Peter Rice European Bioinformatics Institute. From sasa at muh.biglobe.ne.jp Thu Apr 10 16:19:56 2003 From: sasa at muh.biglobe.ne.jp (Takeshi Sasayama) Date: Fri, 11 Apr 2003 05:19:56 +0900 Subject: [BiO BB] Transcription factor In-Reply-To: <3E9484E5.4090802@ucalgary.ca> Message-ID: Hello, There is a program called tfscan in EMBOSS package, which scans transcription factors in DNA sequences. See this page http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/tfscan.html Takeshi Sasayama From a.menze at kws.de Fri Apr 11 12:23:49 2003 From: a.menze at kws.de (a.menze at kws.de) Date: Fri, 11 Apr 2003 18:23:49 +0200 Subject: [BiO BB] (removed) Message-ID: (removed) From sasa at muh.biglobe.ne.jp Fri Apr 11 12:43:51 2003 From: sasa at muh.biglobe.ne.jp (Takeshi Sasayama) Date: Sat, 12 Apr 2003 01:43:51 +0900 Subject: [BiO BB] Audio archives In-Reply-To: <200304091326.h39DQpX14917@webmail2.liv.ac.uk> Message-ID: Hi Chris, I could download the lecture videos and saw some of them. This was better than I expected! Thanks for your valuable information. Takeshi Sasayama > -----Original Message----- > From: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Chris > Houseman > Sent: Wednesday, April 09, 2003 10:27 PM > To: bio_bulletin_board at bioinformatics.org > Subject: Re: [BiO BB] Audio archives > > > Don't know of mp3's, but http://www.s-star.org has ~600mb worth of > video bioinformatics lectures > regards, > > CJH > > --------------- reply ---------------- > > Hi all, > > > > I am looking for a web site which has audio archives of > > biological or medical presentations or news in English. I > > hope the archives should be mp3 files and can be > > downloaded. Does anyone know a good site? > > > > Takeshi Sasayama > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > Chris Houseman > Research Fellow > University of Liverpool > University Department of Pathology > Duncan Building > Daulby Street > Liverpool > L69 3GA > > 0151 7064965 > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > From a.menze at kws.de Sat Apr 12 12:19:49 2003 From: a.menze at kws.de (a.menze at kws.de) Date: Sat, 12 Apr 2003 18:19:49 +0200 Subject: [BiO BB] (removed) Message-ID: (removed) From dfield at ceh.ac.uk Sun Apr 13 18:18:38 2003 From: dfield at ceh.ac.uk (Dawn Field) Date: Sun, 13 Apr 2003 23:18:38 +0100 Subject: [BiO BB] PhD available in comparative genomics, Oxford, UK Message-ID: A NERC funded PhD position is available entitled: "Comparative Genomics, Phylogeny, Ecology and the study of Collections of Genomes" Supervisors: Dr Dawn Field, Prof Mark Bailey, Ed Feil (University Advisor, University of Bath) Primary Location: CEH Oxford, Oxford UK Approaches: Molecular Evolution and Bioinformatics, Large-scale comparative genomics Application Deadline: May 2 (Eligibility extends to UK residents. EU residents can apply but only for fee-based support). For more information please contact Dawn Field (dfield at ceh.ac.uk) or Ed Feil (e.feil at bath.ac.uk). To apply, please send a CV, brief statement of past research experience and future research goals, and the names of two academic referees to Angela Morrison (asmor at ceh.ac.uk, 0865-281630). Summary Whole genome sequencing is fuelling an information revolution that is changing the face of biology. The ability to determine the complete complement of DNA of a wide range of organisms is allowing us to ask in unprecedented detail fundamental questions about the molecular basis of life. There are now more than 1900 genomes from bacteria (Eubacteria and Archaea), plasmids, phage, viruses and organelles in public databases. These genomes are so numerous that they constitute 'collections' of genomes instead of small sets. These collections are rapidly growing and their evolutionary and ecological richness provides an unparalleled opportunity to investigate the molecular basis of ecological adaptation using computational approaches. We are in the process of establishing a database that combines complete genomes with evolutionary and ecological meta-data. Specific tasks to be carried out in this PhD include 1) writing programming code to detect and characterise core genomic features, 2) collecting a wider range of descriptive meta-data, and most importantly, 3) using this resource to test a range of hypotheses about the evolution and biological significance of shared and unshared genomic features. Key areas of research to be addressed using data primarily from bacterial genome sequences include, but are not limited to 1) testing for relationships between ecological features and genomic features, 2) examining the rate of evolution of features like G+C content and genome size by mapping traits onto 16S RNA phylogenies and trees based on whole proteome comparisons, 3) extracting 16S RNA operons from all bacterial genomes to study the evolution of 16S RNA operons and their flanking sequences, 4) detecting and studying the numbers and distributions of orphan genes (putative proteins with no known homologues), and 5) detecting and quantifying the number of conserved hypothetical proteins. From jeff at bioinformatics.org Fri Apr 18 18:57:36 2003 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Fri, 18 Apr 2003 18:57:36 -0400 Subject: [BiO BB] Volunteers for a quartly newsletter Message-ID: <3EA082E0.3060306@bioinformatics.org> Greetings. We're looking for some volunteers to help produce a quartly newsletter for Bioinformatics.Org. Some of the topics that could be covered in the newsletter are as follows: - Hosted project spotlights: a short review of a project hosted at Bioinformatics.Org (we have 105!) - Articles about free/open source/access in bioinformatics, including projects outside of Bioinformatics.Org - Articles about events in which Bioinformatics.Org is involved (e.g., the Annual Meeting) - Notes about changes and developments in the Organization itself - Reviews of books related to bioinformatics - Listings of repository submissions (note that we are slowly developing a repository mechanism, which will include software, algorithms, data, publications/literature, and educational material--more on this later) We will need several people to be involved. Some of the work would require good English, writing, and desktop publishing skills, but other work wouldn't. (It would be nice to produce the newsletter in LaTeX, but other programs can be used.) You can even contribute cartoons if you'd like ;-) If you're interested (and note that this is a *volunteer* position, like everything else at Bioinformatics.Org for now), please contact me off of the list . Cheers. Jeff -- J.W. Bizzaro jeff at bioinformatics.org President, Bioinformatics.Org http://bioinformatics.org/~jeff "As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously." -- Benjamin Franklin -- From yhkhoo at wspc.com Tue Apr 29 00:01:26 2003 From: yhkhoo at wspc.com (Khoo Yee Hong) Date: Tue, 29 Apr 2003 12:01:26 +0800 Subject: [BiO BB] Journal of Bioinformatics and Computational Biology (JBCB) Message-ID: <3EADF916.4090901@wspc.com> Journal of Bioinformatics and Computational Biology (JBCB) The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information and the use of such information in biomedicine. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field. Researchers in Computer Science and Bioinformatics will find the journal a useful resource. Please go to: http://www.worldscinet.com/journals/jbcb/jbcb.shtml to find out how to submit papers, request for a complimentary copy or for a detailed description of the journal. The free electronic version of JBCB's inaugural issue can also be found on this web site. Warm Regards, WorldSciNet From landman at scalableinformatics.com Wed Apr 30 01:27:26 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: 30 Apr 2003 01:27:26 -0400 Subject: [BiO BB] Script available for running mpiBLAST using SGE mpich parallel environment Message-ID: <1051680446.16458.61.camel@protein.scalableinformatics.com> Hi folks: I had been working on integrating mpiBLAST into various environments for a number of customers recently. After completing this work, I began playing with the SGE mpich environment for another customer code. Some things clicked, and I developed a simple shell script to run mpiBLAST through the SGE mpich environment. You can find it at http://scalableinformatics.com/downloads/sge_mpiblast, and the writeup is at http://scalableinformatics.com/sge_mpiblast.html . This isn't terribly fancy, but it worked well for me. I have since integrated the basic ideas into other customer environments with success. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615