From budhi19 at bi.itb.ac.id Mon Sep 1 02:25:10 2003 From: budhi19 at bi.itb.ac.id (Narumi ayumu) Date: Mon, 01 Sep 2003 13:25:10 +0700 Subject: [BiO BB] Bio modelling In-Reply-To: <003901c36e49$29e09d40$1efea8c0@raptor> Message-ID: Dear All, i want to make some biomodelling in biology aplication like growth of ecosystem, growth of rotifera etc, is there any opensource software which can do that? and where i can find source of papers or publication about biomodelling? sincrelly yours Budhi, From dmb at mrc-dunn.cam.ac.uk Mon Sep 1 05:45:30 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Mon, 1 Sep 2003 10:45:30 +0100 (BST) Subject: [BiO BB] Bio modelling In-Reply-To: Message-ID: Have you heard of swarm? I think that is OS. http://www.linuxselfhelp.com/HOWTO/AI-Alife-HOWTO-5.html Looks relevant. Best of luck ! Dan. On Mon, 1 Sep 2003, Narumi ayumu wrote: > Dear All, > > i want to make some biomodelling in biology aplication > like growth of ecosystem, growth of rotifera etc, > is there any opensource software which can do that? > and where i can find source of papers or publication > about biomodelling? > > sincrelly yours > > Budhi, > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From rls at cin.ufpe.br Mon Sep 1 08:40:07 2003 From: rls at cin.ufpe.br (Rafael Luiz da Silva) Date: Mon, 1 Sep 2003 09:40:07 -0300 (BRT) Subject: [BiO BB] Re: Help In-Reply-To: Message-ID: []s Rafael Luiz (www.cin.ufpe.br/~rls) From rls at cin.ufpe.br Mon Sep 1 09:15:47 2003 From: rls at cin.ufpe.br (Rafael Luiz da Silva) Date: Mon, 1 Sep 2003 10:15:47 -0300 (BRT) Subject: [BiO BB] Re: Help In-Reply-To: Message-ID: I need the complete protein sequence of arabidopsis Thaliana and Plasmodium falciparum. Can some of you help me (or say to me where I get easily)? Thanks :) []s Rafael Luiz (www.cin.ufpe.br/~rls) From idoerg at burnham.org Mon Sep 1 16:03:18 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Mon, 01 Sep 2003 13:03:18 -0700 Subject: [BiO BB] Re: Help In-Reply-To: References: Message-ID: <3F53A606.9080503@burnham.org> NCBI: http://www.ncbi.nlm.nih.gov/genomes/static/euk_g.html ./I Rafael Luiz da Silva wrote: > I need the complete protein sequence of arabidopsis Thaliana and > Plasmodium falciparum. > > Can some of you help me (or say to me where I get easily)? > > Thanks :) > > []s Rafael Luiz > (www.cin.ufpe.br/~rls) > > > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From zfu at cs.ucr.edu Tue Sep 2 16:03:53 2003 From: zfu at cs.ucr.edu (Zheng Fu) Date: Tue, 2 Sep 2003 13:03:53 -0700 (PDT) Subject: [BiO BB] Bio modelling In-Reply-To: Message-ID: I used Swarm a little bit before. It is not a OS. It is a Jave/Objective C package. They provides very useful library for complex system simulation. On Mon, 1 Sep 2003, Dan Bolser wrote: > > Have you heard of swarm? > > I think that is OS. > > http://www.linuxselfhelp.com/HOWTO/AI-Alife-HOWTO-5.html > > Looks relevant. > > Best of luck ! > > Dan. > > On Mon, 1 Sep 2003, Narumi ayumu wrote: > > > Dear All, > > > > i want to make some biomodelling in biology aplication > > like growth of ecosystem, growth of rotifera etc, > > is there any opensource software which can do that? > > and where i can find source of papers or publication > > about biomodelling? > > > > sincrelly yours > > > > Budhi, > > > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Love & Peace From lucifer at chiark.greenend.org.uk Tue Sep 2 07:03:36 2003 From: lucifer at chiark.greenend.org.uk (Lucy McWilliam) Date: Tue, 2 Sep 2003 12:03:36 +0100 (BST) Subject: [BiO BB] Re: Help In-Reply-To: <20030901160112.BB4BBD2865@www.bioinformatics.org> Message-ID: Rafael Luiz da Silva wrote: > I need the complete protein sequence of arabidopsis Thaliana and > Plasmodium falciparum. ftp://ftp.arabidopsis.org/home/tair/ (currently refusing connections) http://plasmodb.org/restricted/GridddPf.shtml Lucy McWilliam http://www.flychip.org.uk/ From dmb at mrc-dunn.cam.ac.uk Wed Sep 3 04:32:09 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Wed, 3 Sep 2003 09:32:09 +0100 (BST) Subject: [BiO BB] Bio modelling In-Reply-To: Message-ID: Sorry, bad abbriv. (Open Source). Looks like there is a lot of work in the field of population dynamics though, can you apply that to growth? I know growth processes have been studied a lot in fractal geometry. Cheers, On Tue, 2 Sep 2003, Zheng Fu wrote: > I used Swarm a little bit before. It is not a OS. It is a Jave/Objective C > package. They provides very useful library for complex system simulation. > > On Mon, 1 Sep 2003, Dan Bolser wrote: > > > > > Have you heard of swarm? > > > > I think that is OS. > > > > http://www.linuxselfhelp.com/HOWTO/AI-Alife-HOWTO-5.html > > > > Looks relevant. > > > > Best of luck ! > > > > Dan. > > > > On Mon, 1 Sep 2003, Narumi ayumu wrote: > > > > > Dear All, > > > > > > i want to make some biomodelling in biology aplication > > > like growth of ecosystem, growth of rotifera etc, > > > is there any opensource software which can do that? > > > and where i can find source of papers or publication > > > about biomodelling? > > > > > > sincrelly yours > > > > > > Budhi, > > > > > > > > > _______________________________________________ > > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > From dmb at mrc-dunn.cam.ac.uk Wed Sep 3 05:37:29 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Wed, 3 Sep 2003 10:37:29 +0100 (BST) Subject: [BiO BB] Clustering In-Reply-To: <003901c36e49$29e09d40$1efea8c0@raptor> Message-ID: Hi, What packages support clustering of points with a with a similarity matrix? How can I derive the similarity of two matrices? Cheers, Dan. From a.bashir at bioc.cam.ac.uk Wed Sep 3 10:59:03 2003 From: a.bashir at bioc.cam.ac.uk (Asam Bashir) Date: Wed, 3 Sep 2003 17:59:03 +0300 Subject: [BiO BB] Re: Looking for suggestions on where to hold next meeting In-Reply-To: <1f1.efe98a4.2c8417d6@aol.com> References: <1f1.efe98a4.2c8417d6@aol.com> Message-ID: >Me to (Boston) >Mark Fancy Athens, Greece? http://bioinformatics.biol.uoa.gr/ From idoerg at burnham.org Wed Sep 3 12:19:08 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 03 Sep 2003 09:19:08 -0700 Subject: [BiO BB] Clustering In-Reply-To: References: Message-ID: <3F56147C.4070600@burnham.org> Dan Bolser wrote: > Hi, > > What packages support clustering of points > with a with a similarity matrix? I don't think I quite understand the question, can you elaborate on that? > > How can I derive the similarity of two matrices? > If you mean that you would like to check how "close" two similarity matrices (e.g. BLOSUM, PAM) are to each other, then one method is to compare the amino-acid pair frequency distributions used to construct these matrices. Look to the following paper (fig 4, and the last paragraph in the "methods" section) for one example on how to do this, although other methods of comparing distributions may be used just as effectively: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=11790845&dopt=Abstract ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From derek at biotechrecruiter.org Wed Sep 3 12:41:58 2003 From: derek at biotechrecruiter.org (Derek Pyper) Date: Wed, 03 Sep 2003 09:41:58 -0700 Subject: [BiO BB] Computational Chemist/Biologist Positions Message-ID: Hi All, I am working on positions for extraordinarily gifted computational chemists and other computational scientists that are sought to join a rapidly growing New York-based research group that is pursuing an ambitious, long-term strategy aimed at fundamentally transforming the process of drug discovery. Candidates should have world-class credentials in computational chemistry, biology, or physics, or in a relevant area of computer science or applied mathematics, and must have unusually strong research and software engineering skills. Relevant areas of experience might include the computation of protein-ligand binding free energies, molecular dynamics and/or Monte Carlo simulations of biomolecular systems, application of statistical mechanics to biomolecular systems, free energy perturbation methods, and methods for speeding up evaluation of electrostatic energies -- but specific knowledge of any of these areas is less critical than exceptional intellectual ability and a demonstrated track record of achievement. Current areas of interest within the group include the prediction of protein structures and binding free energies, structure- and ligand-based drug design, de novo ligand design algorithms, and the development of special-purpose hardware to accelerate computational chemistry simulations. This research effort is being financed by a confidential investment and technology development firm with approximately $5 billion in aggregate capital. The project was initiated by the firm's founder, and operates under his direct scientific leadership. We are eager to add both senior- and junior-level members to our world-class team, and are prepared to offer above-market compensation to candidates of truly exceptional ability. Please send your CV to derek at biotech-recruiters.com Please send: -your resume (including list of publications, thesis topic, and advisor, if applicable), -history of academic performance (including GPAs as well as SAT, GRE, and other standardized test scores), -salary requirement -relocation considerations (if any) -your work authorization (h1B, etc) Warm Regards, Derek Pyper Biotech Recruiters International, Inc. Principal Office 916-652-2186 Fax 916-652-2178 Email: Derek at biotech-recruiters.com URL: www.biotech-recruiters.com CONFIDENTIALITY STATEMENT: This electronic message contains privileged and confidential information from Biotech Recruiters International, Inc. This information is intended solely for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the contents of this message is prohibited. If you have received this email in error, please notify us immediately by telephone at 916-652-2186 or by email reply. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Derek Pyper (derek at biotech-recruiters.com).vcf Type: text/x-vcard Size: 524 bytes Desc: not available URL: From derek at biotechrecruiter.org Wed Sep 3 12:43:06 2003 From: derek at biotechrecruiter.org (Derek Pyper) Date: Wed, 03 Sep 2003 09:43:06 -0700 Subject: [BiO BB] Computational Chemist/Biologist Positions Message-ID: Hi All, I am working on positions for extraordinarily gifted computational chemists and other computational scientists that are sought to join a rapidly growing New York-based research group that is pursuing an ambitious, long-term strategy aimed at fundamentally transforming the process of drug discovery. Candidates should have world-class credentials in computational chemistry, biology, or physics, or in a relevant area of computer science or applied mathematics, and must have unusually strong research and software engineering skills. Relevant areas of experience might include the computation of protein-ligand binding free energies, molecular dynamics and/or Monte Carlo simulations of biomolecular systems, application of statistical mechanics to biomolecular systems, free energy perturbation methods, and methods for speeding up evaluation of electrostatic energies -- but specific knowledge of any of these areas is less critical than exceptional intellectual ability and a demonstrated track record of achievement. Current areas of interest within the group include the prediction of protein structures and binding free energies, structure- and ligand-based drug design, de novo ligand design algorithms, and the development of special-purpose hardware to accelerate computational chemistry simulations. This research effort is being financed by a confidential investment and technology development firm with approximately $5 billion in aggregate capital. The project was initiated by the firm's founder, and operates under his direct scientific leadership. We are eager to add both senior- and junior-level members to our world-class team, and are prepared to offer above-market compensation to candidates of truly exceptional ability. Please send your CV to derek at biotech-recruiters.com Please send: -your resume (including list of publications, thesis topic, and advisor, if applicable), -history of academic performance (including GPAs as well as SAT, GRE, and other standardized test scores), -salary requirement -relocation considerations (if any) -your work authorization (h1B, etc) Warm Regards, Derek Pyper Biotech Recruiters International, Inc. Principal Office 916-652-2186 Fax 916-652-2178 Email: Derek at biotech-recruiters.com URL: www.biotech-recruiters.com CONFIDENTIALITY STATEMENT: This electronic message contains privileged and confidential information from Biotech Recruiters International, Inc. This information is intended solely for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the contents of this message is prohibited. If you have received this email in error, please notify us immediately by telephone at 916-652-2186 or by email reply. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Derek Pyper (derek at biotech-recruiters.com).vcf Type: text/x-vcard Size: 524 bytes Desc: not available URL: From dmb at mrc-dunn.cam.ac.uk Wed Sep 3 12:46:56 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Wed, 3 Sep 2003 17:46:56 +0100 (BST) Subject: [BiO BB] Clustering In-Reply-To: <3F56147C.4070600@burnham.org> Message-ID: > > What packages support clustering of points > > with a with a similarity matrix? > > I don't think I quite understand the question, can you elaborate on that? Yup... I am always finding that I have some similarities between things, and I would like to be able to do a simple clustering of the points, but I am not familiar with the algoithms, so I would just like to play around a bit. I know you can do phylogenetic analysis on any similarity matrix, but I don't need the high resolution (many similar points closly linked to one short branch). I would like to generally see what 'blobs' of data I have without investing too much time into the analysis (or the computation!). For example I might have the AA composition of 1000 sequences, and we may suspect that the composition is biased across these sequences (not uniform). So we think - maby I should break up into secondary structure, maby into families, maby I should perform chi-squaird between every possible combination of groups of the 1000 to find sub populations within which the composition isn't biased... If I take each protein and compare it's composition to every other, I have an N**2/2 similarity matrix, which I would like to cluster, just to see if any protein families, structural classes or taxonomic groups have a particular bias in terms of AA composition, but this is a long complicated analysis (I think to myself), so I don't bother. Now I ask I am sure there are 1000's of clustering toolkits out there, I should just google. Does anyone have any recomendations? > > How can I derive the similarity of two matrices? > > > > If you mean that you would like to check how "close" two similarity > matrices (e.g. BLOSUM, PAM) are to each other, then one method is to > compare the amino-acid pair frequency distributions used to construct > these matrices. You mean the similarity of two distributions? sounds interesting... > Look to the following paper (fig 4, and the last > paragraph in the "methods" section) for one example on how to do this, > although other methods of comparing distributions may be used just as > effectively: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=11790845&dopt=Abstract Thanks very much, Dan. > ./I > > > > > From mgollery at unr.edu Wed Sep 3 13:00:03 2003 From: mgollery at unr.edu (Martin Gollery) Date: Wed, 3 Sep 2003 10:00:03 -0700 Subject: [BiO BB] InterProScan on SGE Message-ID: <1062608403.3f561e13ab43c@webmail.unr.edu> Has anyone gotten InterProScan to work on a cluster using SGE? I have substituted qsub for bsub in the queueing configuration, to no avail. Anybody who has gotten this to work, please contact me. Martin Gollery Associate Director of Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 New phone number! 775-784-7042 ------------------------------------------------- This mail sent through https://webmail.unr.edu From derek at biotechrecruiter.org Wed Sep 3 13:34:49 2003 From: derek at biotechrecruiter.org (Derek Pyper) Date: Wed, 03 Sep 2003 10:34:49 -0700 Subject: [BiO BB] Applications Scientist Position Message-ID: Hi, Application Scientist/ Collaborative eR&D Senior Consultant 1. Mission * To deliver to our customers a solution which exceeds their expectations while remaining on time and within budget * To ensure that each deployment of our software is completed in a professional, consistent and successful manner. * To maximize customer retention and penetration by offering a world class client support service 2. Key Tasks 2.1. Responsible for the successful deployment of our product within client organisations 2.2. Working with researchers within the client company to analyse the current research procedures and processes and to configure our product to replicate these 2.3. Write requirements specifications based on customer feedback and liase with the product development group to ensure these customisations are delivered on time and to spec 2.4. The training of client staff in the use of the product 2.5. Working closely with sales and business development to ensure that each client's sales potential is fully exploited 2.6. To liase with the client's IT dept to ensure that our product is installed and configured correctly 2.7. Once deployment has been completed, be the primary point of contact for client support issues 2.8. Be involved in designing customer-training materials and handling change management issues within our client sites. 2.9. Liase with other departments within the clients organization to ensure that the usage of our product is as widespread as possible 3. Key Relationships 3.1. Reporting relationship - reporting to the VP of Products and Services 3.2. Liase closely with the Product Development manager 4. Key Values We are a global company headquartered in Cork dedicated to a clear mission. It is also a knowledge based company which is concerned to harness fully the knowledge and skills of its staff. The Collaborative eR&D Senior Consultant is expected to adopt a professional style which embodies in practice certain key organisational values: 4.1. Skill in dealing with people issues. 4.2. Solidarity with other members of the team 4.3. Fairness in dealing with staff regardless of their background and position and in handling all personnel issues and situations. 4.4. Harnessing the talents, ideas and abilities of staff. 4.5. Proactive development of the attitudes, knowledge, and skill of people. 4.6. The creation and maintenance of standards of performance and work discipline. 5. Key Expectations 5.1. Mainly based on the west coast or east coast depending on individuals preference. 5.2. Expected to travel. 5.3. In exceptional circumstances people will be expected to work outside of normal business hours. 5.4. Expected to remain on site for the duration of a deployment (can be up to 2 months) 6. Key Technical Skills 6.1. Worked as a researcher in an R&D organization 6.2. Familiar with general R&D processes and procedures e.g. GLP 6.3. Knowledge of statistics e.g. Design of Experiments 6.4. Intermediate IT knowledge 6.5. Requirements gathering, systems analysis experience (ideally) 6.6. Have been involved in a support role before (ideally) 7. Key People Skills 7.1. Excellent communication skills both spoken and written. 7.2. Excellent Interpersonal skills and a strong team player. 7.3. Negotiation, persuasion and presentation skills. 7.4. Ability to communicate to a wide variety of audiences - executive, management and researcher. 8. Key Business Skills 8.1. Managed the deployment of an enterprise IT system. 8.2. Project management 8.3. Account Management/Sales (Ideally) 9. Educational/Professional Requirements 9.1. Science qualification (preferably to Phd level), with a strong statistical content. 9.2. Ideally would have a foreign language 9.3. You will from a Proteomics or Biochemistry background 9.4. Superb interpersonal skills Please email a copy of your CV to Derek at biotech-recruiters.com for further consideration. Derek Pyper Biotech Recruiters International, Inc. Principal Office 916-652-2186 Fax 916-652-2178 Email: Derek at biotech-recruiters.com URL: www.biotech-recruiters.com CONFIDENTIALITY STATEMENT: This electronic message contains privileged and confidential information from Biotech Recruiters International, Inc. This information is intended solely for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the contents of this message is prohibited. If you have received this email in error, please notify us immediately by telephone at 916-652-2186 or by email reply. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Derek Pyper (derek at biotech-recruiters.com).vcf Type: text/x-vcard Size: 524 bytes Desc: not available URL: From thompson at wadsworth.org Wed Sep 3 13:40:41 2003 From: thompson at wadsworth.org (William Thompson) Date: Wed, 3 Sep 2003 13:40:41 -0400 (EDT) Subject: [BiO BB] Clustering Message-ID: <200309031740.h83Hefk13720@csserv.wadsworth.org> Dan If all you are looking for is simple clustering, check out R http://www.r-project.org/ It has an extensive clustering package. Bill Bill Thompson, PhD Center for Bioinformatics Wadsworth Center NY State Dept of Health ESP C-644 P.O. Box 509 Albany, NY 12201-0509 phone: (518) 486-7882 > Date: Wed, 3 Sep 2003 17:46:56 +0100 (BST) > From: Dan Bolser > To: bio_bulletin_board at bioinformatics.org > Subject: Re: [BiO BB] Clustering > Reply-To: bio_bulletin_board at bioinformatics.org > > > > > What packages support clustering of points > > > with a with a similarity matrix? > > > > I don't think I quite understand the question, can you elaborate on that? > > Yup... I am always finding that I have some similarities between things, > and I would like to be able to do a simple clustering of the points, > but I am not familiar with the algoithms, so I would just like to play > around a bit. > > I know you can do phylogenetic analysis on any similarity matrix, but > I don't need the high resolution (many similar points closly linked to > one short branch). I would like to generally see what 'blobs' of data > I have without investing too much time into the analysis (or the > computation!). > > For example I might have the AA composition of 1000 sequences, and we > may suspect that the composition is biased across these sequences (not > uniform). So we think - maby I should break up into secondary structure, > maby into families, maby I should perform chi-squaird between every > possible combination of groups of the 1000 to find sub populations within > which the composition isn't biased... > > If I take each protein and compare it's composition to every other, I have > an N**2/2 similarity matrix, which I would like to cluster, just to see > if any protein families, structural classes or taxonomic groups have a > particular bias in terms of AA composition, but this is a long complicated > analysis (I think to myself), so I don't bother. > > Now I ask I am sure there are 1000's of clustering toolkits out there, > I should just google. Does anyone have any recomendations? > From tfiedler at rsmas.miami.edu Wed Sep 3 15:09:34 2003 From: tfiedler at rsmas.miami.edu (Tristan Fiedler) Date: Wed, 3 Sep 2003 15:09:34 -0400 (EDT) Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: <20030903160114.A85CDD2853@www.bioinformatics.org> References: <20030903160114.A85CDD2853@www.bioinformatics.org> Message-ID: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> Dear Bio Gurus! Two quick questions : 1. could someone please assist me in writing a shell script (awk, sed, etc.) which would use a loop to run thru about 1000 files (filenames all end in '.seq') and remove all occurences of control-M, resulting in a file containing the sequence on a single line. Currently each file looks similar to : % cat -v seq_018_G05.seq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^M AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGG^M TTTTTTTTTTTTTTTTCCCAAAAAAAAAAAAA^M 2. We are planning to buy a workstation for our local (~3 labs producing sequences from an ABI sequencer) genomics needs (lots of blast runs, database management, standard bioinformatics software), and were planning on getting something like : 4 GB RAM (is this enough for doing local blast searches against genbank?) 2 x 3 GHz Xeon processors (how about Mac OSX?) 400 GB storage Thank you - and feel free to reply directly to me (not waste bb resources). Cheers! -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler at rsmas.miami.edu t.fiedler at umiami.edu (alias) 305-361-4626 From idoerg at burnham.org Wed Sep 3 17:00:18 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 03 Sep 2003 14:00:18 -0700 Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> References: <20030903160114.A85CDD2853@www.bioinformatics.org> <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> Message-ID: <3F565662.4020405@burnham.org> Tristan Fiedler wrote: > Dear Bio Gurus! > > Two quick questions : > > 1. could someone please assist me in writing a shell script (awk, sed, > etc.) which would use a loop to run thru about 1000 files (filenames all > end in '.seq') and remove all occurences of control-M, resulting in a file > containing the sequence on a single line. > > Currently each file looks similar to : > > % cat -v seq_018_G05.seq > AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^M > AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGG^M > TTTTTTTTTTTTTTTTCCCAAAAAAAAAAAAA^M > Sounds like you need the dos2unix utility. Comes bundled in with Linux, in case you are working on another OS, you can download it free.. use Google to find it. > > 2. We are planning to buy a workstation for our local (~3 labs producing > sequences from an ABI sequencer) genomics needs (lots of blast runs, > database management, standard bioinformatics software), and were planning > on getting something like : > > 4 GB RAM (is this enough for doing local blast searches against genbank?) Definitely, that's what I have, haven't had any issues. BLAST/PSI-BLAST is not that memory-intensive actually. > 2 x 3 GHz Xeon processors (how about Mac OSX?) The more processors, the merrier. BLAST parallelizes nicely. Regarding OS: I'm partial to Linux, but that's me. > 400 GB storage > You can always add more, and 400 is ample for starters. > > Thank you - and feel free to reply directly to me (not waste bb resources). > > Cheers! > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From p.pagel at gsf.de Thu Sep 4 02:51:50 2003 From: p.pagel at gsf.de (Philipp Pagel) Date: Thu, 4 Sep 2003 08:51:50 +0200 Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> References: <20030903160114.A85CDD2853@www.bioinformatics.org> <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> Message-ID: <20030904065149.GB1960@porcupine.gsf.de> > 1. could someone please assist me in writing a shell script (awk, sed, > etc.) which would use a loop to run thru about 1000 files (filenames all > end in '.seq') and remove all occurences of control-M, resulting in a file > containing the sequence on a single line. No script required. Just a one-liner... cd into the folder with your sequences and do this: for f in *.seq; do tr -d '\n' < $f > tmp_sequence; mv tmp_sequence $f; done > 2. We are planning to buy a workstation for our local (~3 labs producing > sequences from an ABI sequencer) genomics needs (lots of blast runs, > database management, standard bioinformatics software), and were planning > on getting something like : > > 4 GB RAM (is this enough for doing local blast searches against genbank?) > 2 x 3 GHz Xeon processors (how about Mac OSX?) > 400 GB storage Sounds like a nice machine. Certainly big enough for BLAST. cu Philipp -- Dr. Philipp Pagel Tel. +49-89-3187-3675 Institute for Bioinformatics / MIPS Fax. +49-89-3187-3585 GSF - National Research Center for Environment and Health Ingolstaedter Landstrasse 1 85764 Neuherberg, Germany From mkgovindis at yahoo.com Thu Sep 4 05:07:08 2003 From: mkgovindis at yahoo.com (govind mk) Date: Thu, 4 Sep 2003 02:07:08 -0700 (PDT) Subject: [BiO BB] Swissprot to Refseq entry mapping In-Reply-To: <20030904065149.GB1960@porcupine.gsf.de> Message-ID: <20030904090708.1449.qmail@web40104.mail.yahoo.com> hi all I would like to know if there is any file which can map human proteins of swissprot to Locuslink or Refseq entries of NCBI or is there any round about way to achieve that. Thank you Regards M.K.Govind __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com From dmb at mrc-dunn.cam.ac.uk Thu Sep 4 06:23:54 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Thu, 4 Sep 2003 11:23:54 +0100 (BST) Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> Message-ID: > 2. We are planning to buy a workstation for our local (~3 labs producing > sequences from an ABI sequencer) genomics needs (lots of blast runs, > database management, standard bioinformatics software), and were planning > on getting something like : > > 4 GB RAM (is this enough for doing local blast searches against genbank?) > 2 x 3 GHz Xeon processors (how about Mac OSX?) > 400 GB storage The new dual processor xeon with hyperthreading (gives 4 cpu!) are great! Get as much storage as you can! From idh at poulet.org Thu Sep 4 08:03:04 2003 From: idh at poulet.org (Yannick Wurm) Date: Thu, 4 Sep 2003 14:03:04 +0200 Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> Message-ID: Hi, I don't have any hands-on experience with large-volume blasting, but you might want to have a look at Apple's new G5 computers. The numbers shown in the "Performance Whitepaper" of june 2003 (you'll find a link at http://www.apple.com/lae/g5/ ) are quite impressive. Apple compared the performance of the dual 2GHz Power Mac G5 running Apple/Genentech BLAST with a 3GHz Pentium 4-based system and a dual 3.06GHz Xeon-based system, both running Red Hat Linux 9.0 and NCBI BLAST. A/G BLAST is an optimized version of NCBI BLAST developed by Apple in collaboration with Genentech. Optimized for dual PowerPC G5 processors, the Velocity Engine, and the symmetric multiprocessing capabilities of Mac OS X, A/G BLAST makes a wide variety of searches available at higher speeds. According to the graph they show, using a word length of 40, the Dual G5 ran 3 million nucleotides per second whereas the Linux boxes did only about 0.75. The same paper also states that HMMer runs 4 times faster on the dual g5 than on the dual xeon. And Microsoft Word runs on it too :) Are the results shown surprising? The code for A/G Blast and HMMer seem to be optimized for the G5, whereas the standard vanilla versions where used on the linux boxes. Could optimization reduce the bias against the xeon? Yannick Wurm \\\\\\\\\\\\\\\\\\\ \\ http://yannick.poulet.org icq: 22044361 \\ idh at poulet.org tel: ++33.6.16.41.71.92 From dmb at mrc-dunn.cam.ac.uk Thu Sep 4 08:13:32 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Thu, 4 Sep 2003 13:13:32 +0100 (BST) Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: Message-ID: On Thu, 4 Sep 2003, Yannick Wurm wrote: > > And Microsoft Word runs on it too :) ;) > > Are the results shown surprising? The code for A/G Blast and HMMer seem > to be optimized for the G5, whereas the standard vanilla versions where > used on the linux boxes. Could optimization reduce the bias against the > xeon? I hope so! Did they run blast with the --number_of_cpus option set to 2 or 4? The kernel level hyperthreading on the Xeon simulates 2n cpu's, but I am not sure if this 'really' works (i.e. 4*2Gh from a 2*2Gh board). Also I found the --number_of_cpus to be sub optimal in terms of cpu usage. I prefer to simply start 4 jobs at the same time, so maby this would improve the benchmark considerably (I got from ~50% average usage on each cpu to ~100%). Did they write jobs to /dev/null ? if not the IO could be a (hidden?) factor. Cheers, > > Yannick Wurm > > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From steletch at biomedicale.univ-paris5.fr Thu Sep 4 09:01:48 2003 From: steletch at biomedicale.univ-paris5.fr (=?ISO-8859-1?Q?Teletch=E9a_St=E9phane?=) Date: 04 Sep 2003 15:01:48 +0200 Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: References: Message-ID: <1062680508.11157.8.camel@pcumr70.biomedicale.univ-paris5.fr> Le jeu 04/09/2003 ? 14:03, Yannick Wurm a ?crit : > Hi, > I don't have any hands-on experience with large-volume blasting, but > you might want to have a look at Apple's new G5 computers. > The numbers shown in the "Performance Whitepaper" of june 2003 (you'll > find a link at http://www.apple.com/lae/g5/ ) are quite impressive. > Apple compared the performance of the dual 2GHz Power Mac G5 running > Apple/Genentech BLAST with a 3GHz Pentium 4-based system and a dual > 3.06GHz Xeon-based system, both running Red Hat Linux 9.0 and NCBI > BLAST. > > A/G BLAST is an optimized version of NCBI > BLAST developed by Apple in collaboration > with Genentech. Optimized for dual PowerPC > G5 processors, the Velocity Engine, and the > symmetric multiprocessing capabilities of > Mac OS X, A/G BLAST makes a wide variety > of searches available at higher speeds. > > According to the graph they show, using a word length of 40, the Dual > G5 ran 3 million nucleotides per second whereas the Linux boxes did > only about 0.75. > > The same paper also states that HMMer runs 4 times faster on the dual > g5 than on the dual xeon. > > And Microsoft Word runs on it too :) > > Are the results shown surprising? The code for A/G Blast and HMMer seem > to be optimized for the G5, whereas the standard vanilla versions where > used on the linux boxes. Could optimization reduce the bias against the > xeon? > > Yannick Wurm > Personnaly speaking, i would not that easily accept a benchmark without doing it myself first ... Secopnd, why didn't they compare the dual-G5 to its real counterpart (what you will get for the same price): Opteron or Itanium. I'll bet they figures will be dramatically different. The dual-G5 is not a working station but a server for desktop, so let's consider the server solutions ! See some comparisons at : http://www.alineos.com/Benchs/bench1.html Stef -- -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From mgollery at unr.edu Thu Sep 4 12:17:41 2003 From: mgollery at unr.edu (Martin Gollery) Date: Thu, 4 Sep 2003 09:17:41 -0700 Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: References: Message-ID: <1062692261.3f5765a546a62@webmail.unr.edu> The real key in this performance whitepaper may not be the processor or operating system. Note that the word size used here was 40, which will lead to dramatically fewer HSP extensions, and thus a much higher speed on any system. Of course, you might miss a lot of hits, depending on the data. Due to the larger word size, one might more accurately compare A/G blast with NCBI's megablast. On the other hand, if you would like speed with sensitivity check out PatternHunter from Bioinformatics Solutions, or if you have the money, TeraBlast from TimeLogic. Still, I think that Apple is becoming an attractive solution for bioinformatics due to optimizations with the Altivec. The HMMpfam info looks promising, although I have not tested them myself. Marty Quoting Yannick Wurm : > Hi, > I don't have any hands-on experience with large-volume blasting, but > you might want to have a look at Apple's new G5 computers. > The numbers shown in the "Performance Whitepaper" of june 2003 (you'll > find a link at http://www.apple.com/lae/g5/ ) are quite impressive. > Apple compared the performance of the dual 2GHz Power Mac G5 running > Apple/Genentech BLAST with a 3GHz Pentium 4-based system and a dual > 3.06GHz Xeon-based system, both running Red Hat Linux 9.0 and NCBI > BLAST. > > A/G BLAST is an optimized version of NCBI > BLAST developed by Apple in collaboration > with Genentech. Optimized for dual PowerPC > G5 processors, the Velocity Engine, and the > symmetric multiprocessing capabilities of > Mac OS X, A/G BLAST makes a wide variety > of searches available at higher speeds. > > According to the graph they show, using a word length of 40, the Dual > G5 ran 3 million nucleotides per second whereas the Linux boxes did > only about 0.75. > > The same paper also states that HMMer runs 4 times faster on the dual > g5 than on the dual xeon. > > And Microsoft Word runs on it too :) > > Are the results shown surprising? The code for A/G Blast and HMMer seem > to be optimized for the G5, whereas the standard vanilla versions where > used on the linux boxes. Could optimization reduce the bias against the > xeon? > > Yannick Wurm > > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > Martin Gollery Associate Director of Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 New phone number! 775-784-7042 ------------------------------------------------- This mail sent through https://webmail.unr.edu From luo_2005 at yahoo.com Thu Sep 4 19:20:29 2003 From: luo_2005 at yahoo.com (Phil Luo) Date: Thu, 4 Sep 2003 16:20:29 -0700 (PDT) Subject: [BiO BB] orthologs vs in-paralogs Message-ID: <20030904232029.18724.qmail@web20704.mail.yahoo.com> Dear all, As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two species that have directly evolved from a single gene in the last common ancestor are called orthologs. A set of homologous genes that have diverged from each other as a consequence of genetic duplication are called paralogs. Sometime those paralogs which arose from a duplication after the speciation event are called in-paralogs. My question is how to distinguish the in-paralogs from orthologs. Which one is supposed to be more similar, in-paralogs or orthologs? Best regards, Phil --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmb at mrc-dunn.cam.ac.uk Fri Sep 5 02:38:56 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Fri, 5 Sep 2003 07:38:56 +0100 (BST) Subject: [BiO BB] orthologs vs in-paralogs In-Reply-To: <20030904232029.18724.qmail@web20704.mail.yahoo.com> References: <20030904232029.18724.qmail@web20704.mail.yahoo.com> Message-ID: <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk> Phil Luo said: > Dear all, > > As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two > species that have directly evolved from a single gene in the last common ancestor > are called orthologs. A set of homologous genes that have diverged from each other > as a consequence of genetic duplication are called paralogs. Sometime those > paralogs which arose from a duplication after the speciation event are called > in-paralogs. > > My question is how to distinguish the in-paralogs from orthologs. Which one is > supposed to be more similar, in-paralogs or orthologs? Hi, Good question! Maby someone on the sequence searching mailing list can help answer, http://bioinformatics.org/mailman/listinfo/ssml-general I know of some work trying to uncover 'lineage specific gene expansion' by Eugene Koonin (sp?) at the NCBI. That sounds a bit like the in-paralogues you describe. Also he and coworkers define an algorithm for predicting orthologous pairs, simply 'best hits' between genome 1 and 2. Although I understand the definition of orthology and paralogy, I find the concepts a bit confusing. I don't know what information you loose by simply talking about gene families, and ignoring the within / between genome distinction. At some level does't ortholog mean 'same gene', and paralog mean 'copy'? Cheers, > Best regards, > Phil > > > --------------------------------- > Do you Yahoo!? > Yahoo! SiteBuilder - Free, easy-to-use web site design software From hz5 at njit.edu Fri Sep 5 08:31:05 2003 From: hz5 at njit.edu (hz5 at njit.edu) Date: Fri, 05 Sep 2003 08:31:05 -0400 (EDT) Subject: [BiO BB] orthologs vs in-paralogs In-Reply-To: <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk> References: <20030904232029.18724.qmail@web20704.mail.yahoo.com> <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk> Message-ID: <1062765065.3f5882091fc3a@webmail.njit.edu> This is a good question, please drop me a note if you guys get some answers. For all I know, besides the homolog part, the key difference is that ortholog is same function in different species, while paralog is different function in same species. Please keep me posted! Thanks! haibo //cheers Quoting Dan Bolser : > Phil Luo said: > > Dear all, > > > > As we know ,there are two kinds of homolog, ortholog and paralog. > Genes in two > > species that have directly evolved from a single gene in the last > common ancestor > > are called orthologs. A set of homologous genes that have diverged > from each other > > as a consequence of genetic duplication are called paralogs. Sometime > those > > paralogs which arose from a duplication after the speciation event are > called > > in-paralogs. > > > > My question is how to distinguish the in-paralogs from orthologs. > Which one is > > supposed to be more similar, in-paralogs or orthologs? > > Hi, > Good question! Maby someone on the sequence searching mailing list can > help answer, > > http://bioinformatics.org/mailman/listinfo/ssml-general > > > I know of some work trying to uncover 'lineage specific gene expansion' > by Eugene > Koonin (sp?) at the NCBI. That sounds a bit like the in-paralogues you > describe. > Also he and coworkers define an algorithm for predicting orthologous > pairs, simply > 'best hits' between genome 1 and 2. > > Although I understand the definition of orthology and paralogy, I find > the concepts > a bit confusing. I don't know what information you loose by simply > talking about > gene families, and ignoring the within / between genome distinction. > > At some level does't ortholog mean 'same gene', and paralog mean > 'copy'? > > Cheers, > > > Best regards, > > Phil > > > > > > --------------------------------- > > Do you Yahoo!? > > Yahoo! SiteBuilder - Free, easy-to-use web site design software > > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > ========================================================= Haibo Zhang, PhD student Computational Biology, NJIT & Rutgers University Center for Applied Genomics, PHRI http://afs13.njit.edu/~hz5 From boris.steipe at utoronto.ca Fri Sep 5 08:48:15 2003 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Fri, 05 Sep 2003 08:48:15 -0400 Subject: [BiO BB] orthologs vs in-paralogs References: <20030904232029.18724.qmail@web20704.mail.yahoo.com> Message-ID: <3F58860E.7E591024@utoronto.ca> Phil Luo wrote: > > Dear all, > > As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two > species that have directly evolved from a single gene in the last common > ancestor are called orthologs. A set of homologous genes that have diverged > from each other as a consequence of genetic duplication are called paralogs. > Sometime those paralogs which arose from a duplication after the speciation > event are called in-paralogs. > > My question is how to distinguish the in-paralogs from orthologs. Paralogs have different functions. biochemistry, not bioinformatics. > Which one is > supposed to be more similar, in-paralogs or orthologs? To the degree that the evolutionary rates remain the same, the difference will be proportional to the time of separation from the common ancestor. For orthologs this is the speciation event. For in-paralogs this is the duplication event. Accordingly you would expect a situation in which pradoxically a protein and it's paralog would be more similar than that protein and its ortholog in another species. That need not be universally true however, since the divergence of an in-paralog (post-duplication) may occur under reduced selective pressure since the original protein still fulfills its function. Thus the evolutionary rates need not be the same. Accordingly: if you find paradoxical similarities as above, your best explanation will be "in-paralogs". But if you have a duplication event in a species (possible evidence could come from comparative genomics) you cannot necessarily conclude that the proteins must be unusually similar. In fact, looking for such events systematically and analysing divergence rates, would make an interesting project to quantify the evolutionary pressure on genes after duplication events. Best regards, Boris --- Boris Steipe University of Toronto Program in Proteomics & Bioinformatics Departments of Biochemistry & Molecular and Medical Genetics http://biochemistry.utoronto.ca/steipe From dmb at mrc-dunn.cam.ac.uk Fri Sep 5 08:07:11 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Fri, 5 Sep 2003 13:07:11 +0100 (BST) Subject: [BiO BB] orthologs vs in-paralogs In-Reply-To: <1062765065.3f5882091fc3a@webmail.njit.edu> References: <20030904232029.18724.qmail@web20704.mail.yahoo.com> <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk> <1062765065.3f5882091fc3a@webmail.njit.edu> Message-ID: <37350.80.4.6.223.1062763631.squirrel@www.mrc-dunn.cam.ac.uk> > This is a good question, please drop me a note if you guys get some answers. > > For all I know, besides the homolog part, the key difference is that ortholog is > same function in different species, while paralog is different function in same > species. Yup, I know this definition, but it doesn't address the idea or redundancy within a genome. This is (apparently) a big issue for functional genomics. Cheers > Please keep me posted! > Thanks! > haibo > //cheers > > Quoting Dan Bolser : > >> Phil Luo said: >> > Dear all, >> > >> > As we know ,there are two kinds of homolog, ortholog and paralog. >> Genes in two >> > species that have directly evolved from a single gene in the last >> common ancestor >> > are called orthologs. A set of homologous genes that have diverged >> from each other >> > as a consequence of genetic duplication are called paralogs. Sometime >> those >> > paralogs which arose from a duplication after the speciation event are >> called >> > in-paralogs. >> > >> > My question is how to distinguish the in-paralogs from orthologs. >> Which one is >> > supposed to be more similar, in-paralogs or orthologs? >> >> Hi, >> Good question! Maby someone on the sequence searching mailing list can help >> answer, >> >> http://bioinformatics.org/mailman/listinfo/ssml-general >> >> >> I know of some work trying to uncover 'lineage specific gene expansion' by >> Eugene >> Koonin (sp?) at the NCBI. That sounds a bit like the in-paralogues you describe. >> Also he and coworkers define an algorithm for predicting orthologous pairs, >> simply >> 'best hits' between genome 1 and 2. >> >> Although I understand the definition of orthology and paralogy, I find the >> concepts >> a bit confusing. I don't know what information you loose by simply talking about >> gene families, and ignoring the within / between genome distinction. >> >> At some level does't ortholog mean 'same gene', and paralog mean 'copy'? >> >> Cheers, >> >> > Best regards, >> > Phil >> > >> > >> > --------------------------------- >> > Do you Yahoo!? >> > Yahoo! SiteBuilder - Free, easy-to-use web site design software >> >> >> >> _______________________________________________ >> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > > > ========================================================= > Haibo Zhang, PhD student > Computational Biology, NJIT & Rutgers University > Center for Applied Genomics, PHRI > http://afs13.njit.edu/~hz5 From dmb at mrc-dunn.cam.ac.uk Fri Sep 5 08:15:18 2003 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Fri, 5 Sep 2003 13:15:18 +0100 (BST) Subject: [BiO BB] orthologs vs in-paralogs In-Reply-To: <3F58860E.7E591024@utoronto.ca> References: <20030904232029.18724.qmail@web20704.mail.yahoo.com> <3F58860E.7E591024@utoronto.ca> Message-ID: <37453.80.4.6.223.1062764118.squirrel@www.mrc-dunn.cam.ac.uk> Boris Steipe said: > Phil Luo wrote: >> >> Dear all, >> >> As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two >> species that have directly evolved from a single gene in the last common >> ancestor are called orthologs. A set of homologous genes that have diverged from >> each other as a consequence of genetic duplication are called paralogs. Sometime >> those paralogs which arose from a duplication after the speciation event are >> called in-paralogs. >> >> My question is how to distinguish the in-paralogs from orthologs. > > Paralogs have different functions. biochemistry, not bioinformatics. How about redundant functions? >> Which one is >> supposed to be more similar, in-paralogs or orthologs? > > > To the degree that the evolutionary rates remain the same, the difference will be > proportional to the time of separation from the common ancestor. For orthologs > this is the speciation event. For in-paralogs this is the duplication event. > Accordingly you would expect a situation in which pradoxically a protein and it's > paralog would be more similar than that protein and its ortholog in another > species. I think this should be common in 'lineage specific gene expansion' re: Eugene Koonin. > That need not be universally true however, since the divergence of an in-paralog > (post-duplication) may occur under reduced selective pressure since the original > protein still fulfills its function. Thus the evolutionary rates need not be the > same. This has been argued to be one of the major driving forces for evolution (gene function innovation). I am sorry but I can't remember the reference for this concept. > Accordingly: if you find paradoxical similarities as above, your best explanation > will be "in-paralogs". But if you have a duplication event in a species (possible > evidence could come from comparative genomics) you cannot necessarily conclude > that the proteins must be unusually similar. In fact, looking for such events > systematically and analysing divergence rates, would make an interesting project > to quantify the evolutionary pressure on genes after duplication events. This has been the focus of study for researchers such as Andras Wagner, looking at the innovation of interaction partners after duplication events using the interactomes of different yeast strains and worm. Eugene Koonin et al have also looked at this systematically, but I think a general (conceptually clean) framework for this analysis would be a major step forward. Cheers. > > > > Best regards, > > > Boris > > --- > Boris Steipe > University of Toronto > Program in Proteomics & Bioinformatics > Departments of Biochemistry & Molecular and Medical Genetics > http://biochemistry.utoronto.ca/steipe > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From shenhav at wicc.weizmann.ac.il Thu Sep 4 06:07:04 2003 From: shenhav at wicc.weizmann.ac.il (Barak Shenhav) Date: Thu, 4 Sep 2003 12:07:04 +0200 Subject: [BiO BB] Swissprot to Refseq entry mapping Message-ID: <3F55F6AA@wiccweb> Try GeneCards from Weizmann Institute of Science (http://bioinformatics.weizmann.ac.il/genecards/) >===== Original Message From bio_bulletin_board at bioinformatics.org ===== >hi all > >I would like to know if there is any file which can >map human proteins of swissprot to Locuslink or Refseq >entries of NCBI or is there any round about way to >achieve that. > >Thank you > >Regards >M.K.Govind > >__________________________________ >Do you Yahoo!? >Yahoo! SiteBuilder - Free, easy-to-use web site design software >http://sitebuilder.yahoo.com >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ==================================== Barak Shenhav Department of Molecular Genetics Weizmann Institute of Science +972 8 9343098 (office) +972 8 9344487 (fax) +972 5 2955550 (cellular) From charles at moulinette.dyndns.org Thu Sep 4 09:31:53 2003 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Thu, 4 Sep 2003 15:31:53 +0200 Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation In-Reply-To: References: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> Message-ID: <20030904133153.GC3999@plessy.org> On Thu, Sep 04, 2003 at 02:03:04PM +0200, Yannick Wurm wrote: > Hi, > According to the graph they show, using a word length of 40, the Dual > G5 ran 3 million nucleotides per second whereas the Linux boxes did > only about 0.75. It would be intersting in addition to compare OSX on G5 to Linux on G5... -- Charles Plessy From tfiedler at rsmas.miami.edu Thu Sep 4 17:12:39 2003 From: tfiedler at rsmas.miami.edu (Tristan Fiedler) Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT) Subject: [BiO BB] Unigene FormatDB error on Mac OS X In-Reply-To: <20030904160136.DEB43D284F@www.bioinformatics.org> References: <20030904160136.DEB43D284F@www.bioinformatics.org> Message-ID: <50701.129.171.111.5.1062709959.squirrel@domino.rsmas.miami.edu> Thanks all for the hardware tips! I am currently setting up a blast run of marine sequences against Ciona intestinalis from Unigene. A couple of bugs, which maybe someone has overcome : 1. In the file which I downloaded 'File: Cin.seq.uniq.Z', the various entries in this FASTA format file have headers such as : >gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12 14 ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA A How can I get a more descriptive/functional definition of the various unigene clusters? 2. I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona' command. Is this correct? In the formatdb logfile, how can the following errors be corrected (if necessary) : /Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log ========================[ Sep 4, 2003 4:53 PM ]======================== Version 2.2.6 [Apr-09-2003] Started database file "Cin.seq.uniq" NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r") failed NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed NOTE: [000.000] No number of link bits used found in config file. Ignoring NOTE: [000.000] No number of membership bits used found in config file. Ignoring Formatted 13699 sequences in volume 0 Thank you all very much for the assistance!!! Cheers, Tristan Fiedler From tfiedler at rsmas.miami.edu Thu Sep 4 17:12:39 2003 From: tfiedler at rsmas.miami.edu (Tristan Fiedler) Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT) Subject: [BiO BB] Unigene FormatDB error on Mac OS X In-Reply-To: <20030904160136.DEB43D284F@www.bioinformatics.org> References: <20030904160136.DEB43D284F@www.bioinformatics.org> Message-ID: <50701.129.171.111.5.1062709959.squirrel@domino.rsmas.miami.edu> Thanks all for the hardware tips! I am currently setting up a blast run of marine sequences against Ciona intestinalis from Unigene. A couple of bugs, which maybe someone has overcome : 1. In the file which I downloaded 'File: Cin.seq.uniq.Z', the various entries in this FASTA format file have headers such as : >gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12 14 ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA A How can I get a more descriptive/functional definition of the various unigene clusters? 2. I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona' command. Is this correct? In the formatdb logfile, how can the following errors be corrected (if necessary) : /Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log ========================[ Sep 4, 2003 4:53 PM ]======================== Version 2.2.6 [Apr-09-2003] Started database file "Cin.seq.uniq" NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r") failed NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed NOTE: [000.000] No number of link bits used found in config file. Ignoring NOTE: [000.000] No number of membership bits used found in config file. Ignoring Formatted 13699 sequences in volume 0 Thank you all very much for the assistance!!! Cheers, Tristan Fiedler From lsrubin at wicc.weizmann.ac.il Sun Sep 7 20:19:57 2003 From: lsrubin at wicc.weizmann.ac.il (Eitan Rubin) Date: Mon, 8 Sep 2003 02:19:57 +0200 Subject: [BiO BB] Using blast to compare genomes: a warning Message-ID: <3F56C9C2@wiccweb> Hi, I used BLAST to compare genomes some time ago, and got lots of short, poor, statistiaclly significant similarities. I than met Altchul in a conferance, and asked him how to tell real similarities from chance ones. He said: "don't use BLAST for DNA. We did a poor job on the statistics for DNA - who thought people will actually use it?". When I asked what to do he said "use FASTA - bill did a better job with DNA". Eitan >Message: 3 >Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT) >From: "Tristan Fiedler" >To: bio_bulletin_board at bioinformatics.org >Cc: bio_bulletin_board at bioinformatics.org >Subject: [BiO BB] Unigene FormatDB error on Mac OS X >Reply-To: bio_bulletin_board at bioinformatics.org > >Thanks all for the hardware tips! > >I am currently setting up a blast run of marine sequences against Ciona >intestinalis from Unigene. A couple of bugs, which maybe someone has >overcome : > > > >1. In the file which I downloaded 'File: Cin.seq.uniq.Z', the various >entries in this FASTA format file have headers such as : > >>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full >insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12 >14 >ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA >AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA >A > > >How can I get a more descriptive/functional definition of the various >unigene clusters? > >2. I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona' >command. Is this correct? In the formatdb logfile, how can the following >errors be corrected (if necessary) : > >/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log > >========================[ Sep 4, 2003 4:53 PM ]======================== >Version 2.2.6 [Apr-09-2003] >Started database file "Cin.seq.uniq" >NOTE: CoreLib [002.003] >FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed >NOTE: CoreLib [002.003] >FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r") >failed >NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed >NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed >NOTE: [000.000] No number of link bits used found in config file. Ignoring >NOTE: [000.000] No number of membership bits used found in config file. >Ignoring >Formatted 13699 sequences in volume 0 > >Thank you all very much for the assistance!!! > >Cheers, > >Tristan Fiedler > >--__--__-- > >Message: 4 >Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT) >From: "Tristan Fiedler" >To: bio_bulletin_board at bioinformatics.org >Cc: bio_bulletin_board at bioinformatics.org >Subject: [BiO BB] Unigene FormatDB error on Mac OS X >Reply-To: bio_bulletin_board at bioinformatics.org > >Thanks all for the hardware tips! > >I am currently setting up a blast run of marine sequences against Ciona >intestinalis from Unigene. A couple of bugs, which maybe someone has >overcome : > > > >1. In the file which I downloaded 'File: Cin.seq.uniq.Z', the various >entries in this FASTA format file have headers such as : > >>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full >insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12 >14 >ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA >AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA >A > > >How can I get a more descriptive/functional definition of the various >unigene clusters? > >2. I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona' >command. Is this correct? In the formatdb logfile, how can the following >errors be corrected (if necessary) : > >/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log > >========================[ Sep 4, 2003 4:53 PM ]======================== >Version 2.2.6 [Apr-09-2003] >Started database file "Cin.seq.uniq" >NOTE: CoreLib [002.003] >FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed >NOTE: CoreLib [002.003] >FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r") >failed >NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed >NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed >NOTE: [000.000] No number of link bits used found in config file. Ignoring >NOTE: [000.000] No number of membership bits used found in config file. >Ignoring >Formatted 13699 sequences in volume 0 > >Thank you all very much for the assistance!!! > >Cheers, > >Tristan Fiedler > > >--__--__-- > >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > >End of BiO_Bulletin_Board Digest From psijyoti at yahoo.com Mon Sep 8 14:06:17 2003 From: psijyoti at yahoo.com (Jyoti Kapoor) Date: Mon, 8 Sep 2003 11:06:17 -0700 (PDT) Subject: [BiO BB] Queries regarding the certificate course in bioinformatics Message-ID: <20030908180618.48336.qmail@web13207.mail.yahoo.com> I am an Oracle database developer but have been looking for a job since quite some now but in vain. Hence, I have been thinking about changing my career-path. I am considering getting into bioinformatics or biotechnology and need some help with a few queries: 1. which is a better career option - bioinformatics or biotechnology? 2. what type of career options are available for each of the above fields? 3. what kind of companies recruit biotech. or bioinfo. professionals? 4. considering the downward trend in the economy,what is the job oppurtunity in the bay area if I take either the bioinformatics or the biotechnology certificate course? 5. Will a certificate course with my background help me get a job (in the bay area)? 6. what is the salary range for an entry level bioinformatics/biotech. professional? Thanks in advance for all your time. Jyoti --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From priyaa_b at yahoo.com Tue Sep 9 08:25:37 2003 From: priyaa_b at yahoo.com (Priya) Date: Tue, 9 Sep 2003 05:25:37 -0700 (PDT) Subject: [BiO BB] Re: [inbios] ?: paralogy group In-Reply-To: <20030909101828.41441.qmail@web14901.mail.yahoo.com> Message-ID: <20030909122537.18584.qmail@web41202.mail.yahoo.com> Hi! It would be helpful if someone can tell me the defination and meaning of "paralogy group" in the context of genes and proteins. Thanx. priya --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From MEC at Stowers-Institute.org Tue Sep 9 10:16:31 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Tue, 9 Sep 2003 09:16:31 -0500 Subject: [BiO BB] QBlast implementing Message-ID: Implementation of the NCBI QBlast server is NOT publicly available, however the URL API which it implements is very well documented ( http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html ) I am considering implementing a blast server to support this API. 1) has anyone done this already who can share either code, tips, or recommend approaches 2) if there are other parties who might be interested in the results of the effort Malcolm Cook Database Applications Manager Stowers Institute for Medical Research 1000 E 50th Street Kansas City, MO 64110 tel: 816-926-4449 fax: (816) 926-2098 From landman at scalableinformatics.com Tue Sep 9 10:55:14 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 9 Sep 2003 10:55:14 -0400 (EDT) Subject: [BiO BB] QBlast implementing In-Reply-To: Message-ID: Hi Malcom: I am working on some similar things for one of my company's products. Contact me offlist if you would like to discuss. Joe On Tue, 9 Sep 2003, Cook, Malcolm wrote: > > Implementation of the NCBI QBlast server is NOT publicly available, however the URL API which it implements is very well documented ( http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html ) > > I am considering implementing a blast server to support this API. > > 1) has anyone done this already who can share either code, tips, or recommend approaches > > 2) if there are other parties who might be interested in the results of the effort > > Malcolm Cook > Database Applications Manager > Stowers Institute for Medical Research > 1000 E 50th Street > Kansas City, MO 64110 > tel: 816-926-4449 > fax: (816) 926-2098 > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From hjm at tacgi.com Tue Sep 9 12:13:26 2003 From: hjm at tacgi.com (Harry Mangalam) Date: Tue, 09 Sep 2003 09:13:26 -0700 Subject: [BiO BB] BLASTing SCO (re: Linux IP, IBM suite, shred algorithm, etc) In-Reply-To: References: Message-ID: <3F5DFC26.4060703@tacgi.com> I've been watching the SCO vs IBM suit with some interest and this piqued my interest. Eric Raymond has apparently reworked some old 'shred' code which calculates MD5 hashes for long (from the molbio perspective) words (~3 lines at a time) and then sorts the hashes to identify sections of the Linux source code tree which are identical to those from SCO-owned System V Unix base. This sounds a bit like the initial pass for BLAT, which generates hashes for much smaller words and uses the hashes in comparisons. http://www.eweek.com/article2/0,4149,1257617,00.asp Could BLAST not be used to faster & much more sensitively identify not only identical but similar sections of code? It would have to be modified to do an 'all against all' approach and would have to also take into account line numbers and file names, but here'a a good undergrad programming project for someone, with the possibility of getting some good press and creating a tool that will undoubtedly be used again in litigation (read: it could be worth real money) Then again, the Raymond's shred code approach is probably good enough. Comments? -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hjm at tacgi.com <> From jhaveri at usc.edu Tue Sep 9 15:03:45 2003 From: jhaveri at usc.edu (jinal jhaveri) Date: Tue, 09 Sep 2003 12:03:45 -0700 Subject: [BiO BB] NCBI Viewer Message-ID: <819d6e818c44.818c44819d6e@usc.edu> Hi there, I am developing a zoom viewer for chromosomes. Can any one give tips on some available software to use for that. I want this zoom viewer to be online (same as the one ncbi has (http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?org=arabid&chr=I) , i.e the entrez one. thank you --Jinal From tfiedler at rsmas.miami.edu Tue Sep 9 17:00:55 2003 From: tfiedler at rsmas.miami.edu (Tristan Fiedler) Date: Tue, 9 Sep 2003 17:00:55 -0400 (EDT) Subject: [BiO BB] Poly A tail length - script help please In-Reply-To: <20030904160136.DEB43D284F@www.bioinformatics.org> References: <20030904160136.DEB43D284F@www.bioinformatics.org> Message-ID: <52557.129.171.111.5.1063141255.squirrel@domino.rsmas.miami.edu> Thanks for the scripting tips! I have a 'counting' issue which I need to quickly resolve. A typical sequence input file (5 - 700 bases) looks like : AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA I have over 500 files, named *.seq. I would like to create a script which : a. runs through all the files, b. counts the length of the 'poly A' tail (defined as the longest stretch of A or N) c. sends the output to a file, eg. 25 1.seq 87 2.seq 13 3.seq Example valid poly A tails : AAAANANANANAAANNAAAAAA AAAAAAAAAAAAAA NNNNNNNNNNNNN AAANNNNNNNNNNNAAAAAAAAA Thank you so much for your expertise! Tristan -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler at rsmas.miami.edu t.fiedler at umiami.edu (alias) 305-361-4626 From landman at scalableinformatics.com Tue Sep 9 19:57:34 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 09 Sep 2003 19:57:34 -0400 Subject: [BiO BB] Poly A tail length - script help please In-Reply-To: <52557.129.171.111.5.1063141255.squirrel@domino.rsmas.miami.edu> References: <20030904160136.DEB43D284F@www.bioinformatics.org> <52557.129.171.111.5.1063141255.squirrel@domino.rsmas.miami.edu> Message-ID: <1063151854.10843.144.camel@protein.scalableinformatics.com> First one is free ... #!/usr/bin/perl use strict; my ($directory,$directory_handle,$file, at files,$sequence); my ($file_handle,$poly_a_tail,$rseq); $directory = "./"; # directory to open if (!(opendir $directory_handle,$directory)) { die "FATAL ERROR: Unable to open directory = ".$directory."\n"; } # select only the .seq files @files = grep { /\.seq$/ } readdir($directory_handle); # loop over these selected files foreach $file (@files) { # try to open the file if (!(open($file_handle,"< ".$file))) { # if we cannot open it, warn the user, and skip to the next file warn "Warning: unable to open file = ".$file."\. Skipping\.\n"; next; } else { # assume one line per file, or we will have to modify this chomp($sequence=<$file_handle>); # now time to bring out the heavy artillery $rseq=reverse $sequence; # poly-a is now at the head $rseq =~ /^([AN]+)\w+$/; # match A's and/or N's at the front $poly_a_tail = $1; # return the match ... printf "%i %s\n",length($poly_a_tail),$file; # tell the world ... close($file_handle); } } On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote: > Thanks for the scripting tips! I have a 'counting' issue which I need to > quickly resolve. A typical sequence input file (5 - 700 bases) looks like > : > > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA > > I have over 500 files, named *.seq. I would like to create a script which : > > a. runs through all the files, > b. counts the length of the 'poly A' tail (defined as the longest stretch > of A or N) > c. sends the output to a file, eg. > > 25 1.seq > 87 2.seq > 13 3.seq > > Example valid poly A tails : > > AAAANANANANAAANNAAAAAA > > AAAAAAAAAAAAAA > > NNNNNNNNNNNNN > > AAANNNNNNNNNNNAAAAAAAAA > > Thank you so much for your expertise! > > Tristan -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 From MEC at Stowers-Institute.org Wed Sep 10 16:12:16 2003 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Wed, 10 Sep 2003 15:12:16 -0500 Subject: [BiO BB] Poly A tail length - script help please Message-ID: But that does not compute the 'longest stretch'. The attached perl script does, and will allow you to write: > polyfind [-all] *.seq > polyfind.results Enjoy, Malcolm Cook > -----Original Message----- > From: Joseph Landman [mailto:landman at scalableinformatics.com] > Sent: Tuesday, September 09, 2003 6:58 PM > To: BiO BB > Cc: biodevelopers > Subject: Re: [BiO BB] Poly A tail length - script help please > > > First one is free ... > > #!/usr/bin/perl > > use strict; > > my ($directory,$directory_handle,$file, at files,$sequence); > my ($file_handle,$poly_a_tail,$rseq); > > $directory = "./"; # directory to open > if (!(opendir $directory_handle,$directory)) > { > die "FATAL ERROR: Unable to open directory = > ".$directory."\n"; > } > > # select only the .seq files > @files = grep { /\.seq$/ } readdir($directory_handle); > > # loop over these selected files > foreach $file (@files) > { > # try to open the file > if (!(open($file_handle,"< ".$file))) > { > # if we cannot open it, warn the user, and > skip to the next file > warn "Warning: unable to open file = > ".$file."\. Skipping\.\n"; > next; > } > else > { > # assume one line per file, or we will have > to modify this > chomp($sequence=<$file_handle>); > # now time to bring out the heavy artillery > $rseq=reverse $sequence; # poly-a is now > at the head > $rseq =~ /^([AN]+)\w+$/; # match A's > and/or N's at the front > $poly_a_tail = $1; # return the match ... > printf "%i %s\n",length($poly_a_tail),$file; > # tell the world ... > close($file_handle); > } > } > > > > On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote: > > Thanks for the scripting tips! I have a 'counting' issue > which I need to > > quickly resolve. A typical sequence input file (5 - 700 > bases) looks like > > : > > > > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA > > > > I have over 500 files, named *.seq. I would like to create > a script which : > > > > a. runs through all the files, > > b. counts the length of the 'poly A' tail (defined as the > longest stretch > > of A or N) > > c. sends the output to a file, eg. > > > > 25 1.seq > > 87 2.seq > > 13 3.seq > > > > Example valid poly A tails : > > > > AAAANANANANAAANNAAAAAA > > > > AAAAAAAAAAAAAA > > > > NNNNNNNNNNNNN > > > > AAANNNNNNNNNNNAAAAAAAAA > > > > Thank you so much for your expertise! > > > > Tristan > -- > Joseph Landman, Ph.D > Scalable Informatics LLC > email: landman at scalableinformatics.com > web: http://scalableinformatics.com > phone: +1 734 612 4615 > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -------------- next part -------------- A non-text attachment was scrubbed... Name: polyafind Type: application/octet-stream Size: 3438 bytes Desc: polyafind URL: From landman at scalableinformatics.com Wed Sep 10 04:23:51 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 10 Sep 2003 04:23:51 -0400 Subject: [Biodevelopers] RE: [BiO BB] Poly A tail length - script help please In-Reply-To: References: Message-ID: <1063182231.5267.1.camel@squash.scalableinformatics.com> Malcom Good catch. I paid attention to the tail part, not the longest sequence part. Should be easy to modify the regex, and generate an length sorted array of matches, but as you have already solved the (correct) problem ... Joe On Wed, 2003-09-10 at 16:12, Cook, Malcolm wrote: > But that does not compute the 'longest stretch'. > > The attached perl script does, and will allow you to write: > > > polyfind [-all] *.seq > polyfind.results > > Enjoy, > > Malcolm Cook > > > -----Original Message----- > > From: Joseph Landman [mailto:landman at scalableinformatics.com] > > Sent: Tuesday, September 09, 2003 6:58 PM > > To: BiO BB > > Cc: biodevelopers > > Subject: Re: [BiO BB] Poly A tail length - script help please > > > > > > First one is free ... > > > > #!/usr/bin/perl > > > > use strict; > > > > my ($directory,$directory_handle,$file, at files,$sequence); > > my ($file_handle,$poly_a_tail,$rseq); > > > > $directory = "./"; # directory to open > > if (!(opendir $directory_handle,$directory)) > > { > > die "FATAL ERROR: Unable to open directory = > > ".$directory."\n"; > > } > > > > # select only the .seq files > > @files = grep { /\.seq$/ } readdir($directory_handle); > > > > # loop over these selected files > > foreach $file (@files) > > { > > # try to open the file > > if (!(open($file_handle,"< ".$file))) > > { > > # if we cannot open it, warn the user, and > > skip to the next file > > warn "Warning: unable to open file = > > ".$file."\. Skipping\.\n"; > > next; > > } > > else > > { > > # assume one line per file, or we will have > > to modify this > > chomp($sequence=<$file_handle>); > > # now time to bring out the heavy artillery > > $rseq=reverse $sequence; # poly-a is now > > at the head > > $rseq =~ /^([AN]+)\w+$/; # match A's > > and/or N's at the front > > $poly_a_tail = $1; # return the match ... > > printf "%i %s\n",length($poly_a_tail),$file; > > # tell the world ... > > close($file_handle); > > } > > } > > > > > > > > On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote: > > > Thanks for the scripting tips! I have a 'counting' issue > > which I need to > > > quickly resolve. A typical sequence input file (5 - 700 > > bases) looks like > > > : > > > > > > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA > > > > > > I have over 500 files, named *.seq. I would like to create > > a script which : > > > > > > a. runs through all the files, > > > b. counts the length of the 'poly A' tail (defined as the > > longest stretch > > > of A or N) > > > c. sends the output to a file, eg. > > > > > > 25 1.seq > > > 87 2.seq > > > 13 3.seq > > > > > > Example valid poly A tails : > > > > > > AAAANANANANAAANNAAAAAA > > > > > > AAAAAAAAAAAAAA > > > > > > NNNNNNNNNNNNN > > > > > > AAANNNNNNNNNNNAAAAAAAAA > > > > > > Thank you so much for your expertise! > > > > > > Tristan > > -- > > Joseph Landman, Ph.D > > Scalable Informatics LLC > > email: landman at scalableinformatics.com > > web: http://scalableinformatics.com > > phone: +1 734 612 4615 > > > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From luo_2005 at yahoo.com Thu Sep 11 17:09:55 2003 From: luo_2005 at yahoo.com (Phil Luo) Date: Thu, 11 Sep 2003 14:09:55 -0700 (PDT) Subject: [BiO BB] About Homolog Gene Database Message-ID: <20030911210955.64091.qmail@web60205.mail.yahoo.com> Dear all, I know there are some database of Homologous Gene,such as COG ,HOBACGEN etc. But they are all based on the computational method prediction. I was wondering where I can find the homolog information, whcih are more accurate and based on biological experiment. Because I wanna do homolog analysis, I need some real data, especiallly for those popular genome, like E.coli. You help is really appreciated. Best Regards, Phil --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From mourad12345678 at yahoo.com Wed Sep 10 11:14:48 2003 From: mourad12345678 at yahoo.com (Mourad Elloumi) Date: Wed, 10 Sep 2003 08:14:48 -0700 (PDT) Subject: [BiO BB] 4th IEEE Symposium on Bioinformatics and Bioengineering Message-ID: <20030910151448.16753.qmail@web12301.mail.yahoo.com> CALL FOR PAPERS Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04) May 19-21, 2004, Taichung, Taiwan, ROC http://bibe2004.ece.uci.edu/ Sponsored by the IEEE Computer Society Co-sponsored by IEEE Neural Network Society The BIBE Symposium provides a common platform for the cross fertilization of ideas, and to help shape knowledge and scientific achievements by bridging these two very important and complementary disciplines into an interactive and attractive forum. Keeping this objective in mind, BIBE solicits original contributions in the following non-exclusive list of areas: Biomedical Informatics and Computation Bio-molecular and Phylogenetic Databases, Query Languages, Interoperability, Bio-Ontology and Data Mining, System Biology, Identification and Classification of Genes, Sequence Search and Alignment, Protein Structure Prediction and Molecular Simulation, Molecular Evolution and Phylogeny, Functional Genomics, Proteomics, Drug Discovery Gene Expression Analysis, Biolanguages, Bioinformatics Engineering, Data Visualization, Signaling and Computation Biomedical Data Engineering, Medical Image Processing (Segmentation, Registration, Fusion), Telemedicine, Modeling and Simulation, Biomedical Imaging Bio-Engineering Biological Systems and Models, Engineering Models in Biomedicine, Biomedical Sensors, Computer Assisted Intervention Systems and Robotics, Bionic Human, Cell Engineering, Molecular and Cellular Systems, Body's and Cell's Bio-signatures, Tissue Engineering, Biomaterials Paper Submissions The written and spoken language of BIBE2004 is English. Authors should submit a full paper via electronic submission . Full papers must not exceed 20 pages printed using at least 11-point type and double spacing. All papers should in PDF or PostScript format. The paper should include a 200-word abstract, a list of keywords (research areas), and author's phone number and e-mail address. The Conference Proceedings will be published by the IEEE Computer Society Press. A number of the papers presented at the conference will be selected after review process for a possible publication in Int. Journal on Bioinformatics Engineering (IJBE) and Int. Journal on AI Tools (IJAIT) journals. Important Dates January 15, 2004 Submission deadline February 28, 2004 Notification of acceptance March 20, 2004 Camera-ready copy of accepted papers and author registration due General Co-Chairs Satoru Miyano, University of Tokyo, Jeffrey J. P. Tsai, University of Illinois, Chicago, C. Y. Kao, Natl. Taiwan Univ. Program Chair Phillip C.Y. Sheu, University of California, Irvine Program Co-Chairs & Committee Members Ruediger W. Brause, JW Goethe-Univ., Germany, Du Zhang, CSU, Sacramento, George Karypis, Univ. of Minnesota, Phoebe Chen Queensland Univ. Technology, Australia, Ming-Jing Hwang, Academia Sinica, Chuan-Yi Tang, Natl. Tsing Hua Univ., Taiwan C. S. Wang, TARI, Jorng-Tzong Horng, Natl. Central Univ., Fong-Rong Hsu, Taichung Healthcare and Management University, Taiwan Yuhwa Lo, UC San Diego, Limsoon Wong, Institute for Infocomm Research, Singapore, Cheng Li, Harvard. Metin Akay, Dartmouth Sang Yup Lee, KAIST, Korea, Arvind Bansal, Kent State Univ., Tao Jiang, UC Riverside, Philip E. Bourne, UC San Diego, Robert L. Sah, UC San Diego, Yuval Shahar, Ben-Gurion Univ. of the Negev and Stanford Univ., Kun-Mao Chao, Natl. Taiwan University, Wen-Lian Hsu Academia Sinica Arif Ghafoor, Purdue University, Huerta, Michael, Natl. Institutes of Health, James C. Gee, Univ. of Pennsylvania, Joerg Meyer, UC Irvine, Ghassan S. Kassab, UC Irvine, Jung-Hsin Lin, Natl. Taiwan University, Stefano Lonardi, UC Riverside, Shugo Nakamura, Univ. of Tokyo, Japan, Andrew F. Laine, Columbia., Huimin Zhao, Univ. of Illinois, Urbana Champaign, Ronney B Panerai Leicester Royal, Infirmary, UK, Jinn-Moon Yang, Natl. Chiao Tung Univ. Jan-Ming Ho, Academia Sinica, Maryellen L. Giger, Univ. of Chicago, Jim Brody, UC Irvine, Andrew McCulloch, UC San Diego, Yuh-Jyh Hu, Natl. Chiao Tung University, Taiwan, Mia K. Markey Univ. of Texas at Austin, Chuan-Hsiung Chang, Natl. Yang-Ming University, Taiwan, Stanley M. Finkelstein, Univ. of Minnesota, Xia Shunren, Zhejiang Univ., China, Zina Ben Miled, IUPU Indianapolis, Vittorio Christini, UC Irvine Wesley Chu, UCLA, Gary Huber, UC San Diego, Shankar Subramaniam, UC San Diego, William Tang, UC Irvine, Vasant Honavar Iowa State University, Alfredo Colosimo Univ. of Rome ??La Sapienza", Italy, Victor Maojo, Univ. of Madrid, Spain, Gisbert Schneider, JW Goethe-Univ., Germany, Graham Kemp, Chalmers University of Technology, Sweden, Hakan Ferhatosmanoglu, Ohio State Univ., Tatsuya Akutsu, Kyoto Univ., Japan Industrial Chair Chung-Cheng Liu, Industrial Technology Research Institute, Taiwan Publicity Chair Stephen J.H. Yang, Natl. Central Univ., Taiwan Publication Chair Taehyung Wang, California State University, Northridge Web Chair Donghua Deng, University of California, Irvine Financial Co-Chairs Rong-Ming Chen and Shih-Nung Chen, Taichung Healthcare and Management University, Taiwan Local Arrangement Co-Chairs Han-Wen Hsiao and Anthony Y. H. Liao , Taichung Healthcare and Management Universit, Taiwan Steering Committee Chair Nikolaos Bourbakis, ITRI, Wright State University __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com From vkode78 at yahoo.com Fri Sep 12 01:07:05 2003 From: vkode78 at yahoo.com (Venu Kode) Date: Thu, 11 Sep 2003 22:07:05 -0700 (PDT) Subject: [BiO BB] Implementing HMM models in Hardware (FPGA) Message-ID: <20030912050705.48283.qmail@web40508.mail.yahoo.com> Hello everyone, I am graduate student in computer engineering doing my thesis in the following topic: "Matching Protein sequences with HMM models in FPGAs ( Field Programmable Gate Arrays ) using Run Time Reconfiguration" If you are familiar with Decypher tool from TimeLogic, my work involves something similar to that. Decypher is a very comprehensive and expensive tool and sure it does deliver excellent performance compared to the software tools such as HMMR, SAM or the other tools. My work falls in between the software solution and that of Decypher. Specifically, when developed my tool would be considerably faster than that of the software tools currently available and also be affordable at a fraction of the cost of using Decypher. I have just started digging into Bioinformatics and have read quite a number of papers and all, but I am still a little confused and would like any comments or suggestions from you: 1) Does my tool make any sense at all? 2) What is the current customer base like for this technology? 3) What sort of companies do the work of matching protein/DNA sequences with existing models? 4) Is there a need for a less comprehensive and less expensive tool as opposed to Decypher for customers who want to get it done a lot less cheaper but wouldnt mind the extra penalty in performace ( ofcourse will be very much better than that of the software searching) 5) Any comments, questions, suggestions? 6) Any pointers for me in terms of websites or resources. I would very much appreciate your comments TIA Kode --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From prathibha_562 at yahoo.co.in Fri Sep 12 09:51:22 2003 From: prathibha_562 at yahoo.co.in (=?iso-8859-1?q?prathibha=20bharathi?=) Date: Fri, 12 Sep 2003 14:51:22 +0100 (BST) Subject: [BiO BB] i am a graduate student requesting help for my project work on protein sequence analysis In-Reply-To: <20030912050705.48283.qmail@web40508.mail.yahoo.com> Message-ID: <20030912135122.24608.qmail@web8102.in.yahoo.com> hai all.................... and my sincere wishes to seniors and experts in this very much interesting field........ i am a graduate student doing my final year b.tech at sree vidyaniketan engg college in computer science. for my project work(in final semester)....i have chosen this field and selected "protein sequence analysis" as the title for my project. in this-- i planning to build a simple database of protein sequences(a simple prototype database) and i am writing my own tool for the sequence analysis ( a simple but useful one) and in this tool....... i am planning to use dynamic programming method for finding the related proteins and then i am also plotting the dot matrix showing the similarity along with the info regarding the proteins for seqs having more than 20% seq similarity(ie.above twilight zone) . and after that i am also planning to perform phylogenetic analysis on the result dataset of my own tool(for this i have downloaded PHYLIP). and after that i am also planning to do protein structure prediction and for this i am planning to use RASMOL. for converting one database format to another i am going to useREADSEQ. so................i am requesting you all to please help me in this project work and send me your valuable suggestions and opinions about my project.......i will be eagerly waiting for your reply........................................and also i hope i can get response to my msg as soon as possible. and also for all these i have chosen java for implementing my ideas and also suggest me which database model (relational or object oriented data model) i have to use for building simple database of mine or can i do all these things simply by using files............................. i am praying the god ,sincerely,to help me...........and to be with me throughout this project work and forever........................ thank you one and all from prathibha Yahoo! India Matrimony: Find your partner online.Post your profile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgollery at unr.edu Fri Sep 12 13:14:36 2003 From: mgollery at unr.edu (Martin Gollery) Date: Fri, 12 Sep 2003 10:14:36 -0700 Subject: [BiO BB] Implementing HMM models in Hardware (FPGA) In-Reply-To: <20030912050705.48283.qmail@web40508.mail.yahoo.com> References: <20030912050705.48283.qmail@web40508.mail.yahoo.com> Message-ID: <1063386876.3f61fefcc8c5a@webmail.unr.edu> Hi Venu, Yes, this idea makes sense. Anyone who uses InterProscan wishes it were faster! The current customer base includes all the big genome centers, universities, many biotechs and larger Pharmas. Other people have done projects in the field- Kestral at UCSC has been around for years, I don't know if it has been updated recently. Hokiegene out of Virginia is a newer system that shows great performance. There are several other commercial ventures that are in stealth mode right now. HMM's really lend themselves well to acceleration of this type. Perhaps you could focus on accelerating SAM instead of HMMer, as many people prefer it and TimeLogic, Compugen and Paracel do not accelerate SAM. Also, nobody (to my knowledge) has accelerated ClustalW, so this might be something else to consider. Best of luck! Sounds like an interesting project! Marty 3. Quoting Venu Kode : > Hello everyone, > > I am graduate student in computer engineering doing my thesis in the > following topic: > > "Matching Protein sequences with HMM models in FPGAs ( Field Programmable > Gate Arrays ) using Run Time Reconfiguration" > > If you are familiar with Decypher tool from TimeLogic, my work involves > something similar to that. Decypher is a very comprehensive and expensive > tool and sure it does deliver excellent performance compared to the software > tools such as HMMR, SAM or the other tools. My work falls in between the > software solution and that of Decypher. > > Specifically, when developed my tool would be considerably faster than that > of the software tools currently available and also be affordable at a > fraction of the cost of using Decypher. > > I have just started digging into Bioinformatics and have read quite a number > of papers and all, but I am still a little confused and would like any > comments or suggestions from you: > > 1) Does my tool make any sense at all? > 2) What is the current customer base like for this technology? > 3) What sort of companies do the work of matching protein/DNA sequences with > existing models? > 4) Is there a need for a less comprehensive and less expensive tool as > opposed to Decypher for customers who want to get it done a lot less cheaper > but wouldnt mind the extra penalty in performace ( ofcourse will be very much > better than that of the software searching) > 5) Any comments, questions, suggestions? > 6) Any pointers for me in terms of websites or resources. > > I would very much appreciate your comments > > TIA > Kode > > > > --------------------------------- > Do you Yahoo!? > Yahoo! SiteBuilder - Free, easy-to-use web site design software Martin Gollery Associate Director of Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 New phone number! 775-784-7042 ------------------------------------------------- This mail sent through https://webmail.unr.edu From Eitan.Rubin at weizmann.ac.il Sat Sep 13 18:23:21 2003 From: Eitan.Rubin at weizmann.ac.il (Eitan Rubin) Date: Sun, 14 Sep 2003 00:23:21 +0200 Subject: [BiO BB] RE: Hardware implemntation of HMMs References: <20030913160139.825B9D2610@www.bioinformatics.org> Message-ID: <001101c37a45$a55d19a0$0101c80a@weizmannnhc4zn> Hi, As far as I know, Compugen sells hardware accelerators for HMMs (see http://www.cgen.com). Their website specifically mentions HMMs acceleration. Eitan ------------------------------------------------------------------------ Eitan Rubin, PhD Head of Bioinformatics and Biological Computing Dept. Biological Services Weizmann Institute of Science Tel: +972-8-9343456 Fax: +972-8-9346006 ----- Original Message ----- From: To: Sent: Saturday, September 13, 2003 6:01 PM Subject: BiO_Bulletin_Board digest, Vol 1 #521 - 1 msg > When replying, PLEASE edit your Subject line so it is more specific > than "Re: BiO_Bulletin_Board digest, Vol..." > > > Today's Topics: > > 1. Re: Implementing HMM models in Hardware (FPGA) (Martin Gollery) > > --__--__-- > > Message: 1 > Date: Fri, 12 Sep 2003 10:14:36 -0700 > From: Martin Gollery > To: bio_bulletin_board at bioinformatics.org > Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA) > Reply-To: bio_bulletin_board at bioinformatics.org > > Hi Venu, > Yes, this idea makes sense. Anyone who uses InterProscan wishes it were > faster! The current customer base includes all the big genome centers, > universities, many biotechs and larger Pharmas. Other people have done projects > in the field- Kestral at UCSC has been around for years, I don't know if it has > been updated recently. Hokiegene out of Virginia is a newer system that shows > great performance. There are several other commercial ventures that are in > stealth mode right now. > HMM's really lend themselves well to acceleration of this type. Perhaps you > could focus on accelerating SAM instead of HMMer, as many people prefer it and > TimeLogic, Compugen and Paracel do not accelerate SAM. Also, nobody (to my > knowledge) has accelerated ClustalW, so this might be something else to > consider. > > Best of luck! Sounds like an interesting project! > > Marty > > > 3. Quoting Venu Kode : > > > Hello everyone, > > > > I am graduate student in computer engineering doing my thesis in the > > following topic: > > > > "Matching Protein sequences with HMM models in FPGAs ( Field Programmable > > Gate Arrays ) using Run Time Reconfiguration" > > > > If you are familiar with Decypher tool from TimeLogic, my work involves > > something similar to that. Decypher is a very comprehensive and expensive > > tool and sure it does deliver excellent performance compared to the software > > tools such as HMMR, SAM or the other tools. My work falls in between the > > software solution and that of Decypher. > > > > Specifically, when developed my tool would be considerably faster than that > > of the software tools currently available and also be affordable at a > > fraction of the cost of using Decypher. > > > > I have just started digging into Bioinformatics and have read quite a number > > of papers and all, but I am still a little confused and would like any > > comments or suggestions from you: > > > > 1) Does my tool make any sense at all? > > 2) What is the current customer base like for this technology? > > 3) What sort of companies do the work of matching protein/DNA sequences with > > existing models? > > 4) Is there a need for a less comprehensive and less expensive tool as > > opposed to Decypher for customers who want to get it done a lot less cheaper > > but wouldnt mind the extra penalty in performace ( ofcourse will be very much > > better than that of the software searching) > > 5) Any comments, questions, suggestions? > > 6) Any pointers for me in terms of websites or resources. > > > > I would very much appreciate your comments > > > > TIA > > Kode > > > > > > > > --------------------------------- > > Do you Yahoo!? > > Yahoo! SiteBuilder - Free, easy-to-use web site design software > > > Martin Gollery > Associate Director of Bioinformatics > University of Nevada at Reno > Dept. of Biochemistry / MS330 > New phone number! 775-784-7042 > > > ------------------------------------------------- > This mail sent through https://webmail.unr.edu > > > > --__--__-- > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > End of BiO_Bulletin_Board Digest > From johnjakson at yahoo.com Sun Sep 14 03:00:07 2003 From: johnjakson at yahoo.com (John Jakson) Date: Sun, 14 Sep 2003 00:00:07 -0700 (PDT) Subject: [BiO BB] Implementing HMM models in Hardware (FPGA) In-Reply-To: <20030912050705.48283.qmail@web40508.mail.yahoo.com> Message-ID: <20030914070007.4364.qmail@web13104.mail.yahoo.com> Hi Venu Interesting to see others interested in applying FPGAs to Bioinformatics. FPGAs don't get much mention here. I am not convinced the Bio industry really cares for EE solutions it doesn't understand. Linux clusters are bad enough but what the hell are FPGAs. As an EE VLSI/FPGA hardhat visitor at the BioWorld show, held here in Boston not that long ago all I saw was disinterest and plenty of tower server racks. Not one HW company showed up with anything but Linux clusters or the SGI/IBM/HP/... equivalent. TimeLogic & the 1 or 2 other (defunct ?) accelerator companies were noshow. On talking with the floor folks I found no interest or basic understanding of possible HW alternatives. The issue comes down to how the problem is stated and how it can be implemented in a solution that most Bio SW types can understand. That means whatever the engine is, it must just run C code, simple as that, preferably the free stuff from NCBI. That always leads to the same solution, clusters of ever faster & ever hotter farms of todays x86. Any rational computer scientist knows this is crazy, and that dedicated HW should be built. TimeLogic says it very well on their web site. In crypto, video or DSP processing, it is relatively easy to turn C code into HW since they are all math intensive and are likely created by the same EEs. It may come as a surprise to SW types but HW is routinely modeled in C, but that code is used only to double check the design written in a decent HW description languages like Verilog or VHDL both of which are implicitly parallel languages. There is usually some formal mathematical model often written in Matlab for the real heavy stuff. Its also interesting that the Matlab code usually floating point intensive & the final ASIC/FPGA solutions are not expected to produce identical results since HW is best built integer fashion. One might regard the current Bio C codes as just simulations of HW that hasn't been built yet since few know how to recode them in HW language. TimeLogic did a few but not in a way that can be easily duplicated across the industry. To turn C code into really fast HW requires understanding what the C code is really doing and having permission to make subtle but harmless changes to it to allow the really big speed ups. That means eliminating floating point. If the Bio author of such SW is also a HW expert (of which there are probably only a handfull or even 0 in the whole world) then equivalent algorithms could be used that are relatively simply to map onto HW structures. I don't see the Bio world hiring too many HW EEs either, we are far too different culturally and we usually don't have Phds, esp not from the right schools. There are other ways to turn C code into HW, maybe use a C based HW language such as HandelC which is based on Occam & CSP. And there's the clue. If the SW is broken up into the constituent parallel processes that are naturally there but impossible to describe in plain C, then it becomes almost trivial to map those parallel processes onto FPGA fabric or even something like a Transputer farm. The only difference is the granularity. FPGAs are hot today but can only readily be engineered by HW types because their most efficient use requires detailed understanding of pipelines and combinatorial logic and basic cpu design. Transputers if they still existed would be the natural way to go because they are ameniable to both SW & HW engineers but they still worked best when SW & HW were both understood. Occam was just a way to describe parallel processes that decribed HW in a funny syntax. Transputers only died out because the implementation fell far behind x86 performance and was sin gle sourced & underfunded. Most Transputer projects & users ultimately switched to standard DSPs & FPGA leaving the SW user base behind. Another approach would be to use one of the cpu farms on a chip such as Clearspeed or PicoChip or BOPs (RIP) who have developed risc cpus that can be upto 420 instances on a chip running at 100MHz plus. Interesting to see if those devices can escape cell phone basestations. So I have taken my passive interest in this subject back to the drawing board to recreate a modern FPGA hosted Transputer that would naturally execute sequential C code, or parallel Occam code & even Verilog code. That means that if code can be partially migrated from seq C to par Occam style C (ie HandelC) then to Verilog ( a C'ish like HW language), the same code still runs on the same cpu (but a little slower perhaps). Extra process scheduling HW is needed to support very fine grained concurrency in a modern Transputer and also a logic simulator. The big pay off is that properly parallized code once in Verilog form still runs either as compiled source code on a farm of cpus using message passing and links, or it can be synthesized with industry standard HW tools back onto the FPGA fabric for the desired speed ups. In effect, sequential procedures in C code can be morphed into on chip HW coproceesors using the reconfigurable features of many FPGAs. Stable FPGA coprocessor e ngines can then be turned into much faster and cheaper ASICs in return for nasty upfront NRE. Such solutions could go much farther than current TimeLogics products for many industries beside Bio. Xilinx & Intel can give us a clue here. A cluster cpu node based on a P4 at say 3GHz might run to $2K per node depending on whats there even though the fastest P4 chip is always say $600. An FPGA RISC cpu node based on MicroBlaze runs at maybe only 125MHz but will cost about $1.40 per node in volume plus extra support. Now if the cpu can be farmed by adding those Transputer extensions, the 24x clock difference doesn't looks so bad compared to the est 400+ fold cpu cost difference. Also a lot of slower cpus each with local RLDRAM don't have the memory latency that P4s suffer from ie 1 DRAM cycle is a few cpu cycle instead of hundreds, and distributed bandwidth is much easier to manage. Its also interesting to see the changes at TimeLogic, the departure of Jim and the merger with a company that I see has no obvious HW background. Regards John Jakson sorry for long rant --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From alokkumar_ait at rediffmail.com Mon Sep 15 01:40:28 2003 From: alokkumar_ait at rediffmail.com (Alok Kumar) Date: 15 Sep 2003 05:40:28 -0000 Subject: [BiO BB] Re: BiO_Bulletin_Board digest, Vol 1 #518 - 5 msgs Message-ID: <20030915054028.15697.qmail@webmail18.rediffmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From prathibha_562 at yahoo.co.in Mon Sep 15 09:30:15 2003 From: prathibha_562 at yahoo.co.in (=?iso-8859-1?q?prathibha=20bharathi?=) Date: Mon, 15 Sep 2003 14:30:15 +0100 (BST) Subject: [BiO BB] feeling discouraging after seeing this much poor response Message-ID: <20030915133015.92588.qmail@web8104.in.yahoo.com> i am feeling discouraging after seeing this much poor response.......... and i am also thinking about dropping my idea of doing my project on protein seq analysis..........................bye all............................. Yahoo! India Matrimony: Find your partner online.Post your profile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From val at vtek.com Mon Sep 15 10:00:12 2003 From: val at vtek.com (val) Date: Mon, 15 Sep 2003 10:00:12 -0400 Subject: [BiO BB] feeling discouraging after seeing this much poor response References: <20030915133015.92588.qmail@web8104.in.yahoo.com> Message-ID: <266201c37b91$b1b09540$6400a8c0@vt1000> Yes, a good point. Drop it, and look around to find a really exciting idea to invest your time.. And keep on enjoying life.. my very best, val ----- Original Message ----- From: prathibha bharathi To: bio_bulletin_board at bioinformatics.org Sent: Monday, September 15, 2003 9:30 AM Subject: [BiO BB] feeling discouraging after seeing this much poor response i am feeling discouraging after seeing this much poor response.......... and i am also thinking about dropping my idea of doing my project on protein seq analysis..........................bye all............................. Yahoo! India Matrimony: Find your partner online. Post your profile. From deletto at unisa.it Mon Sep 15 10:14:44 2003 From: deletto at unisa.it (deletto at unisa.it) Date: Mon, 15 Sep 2003 16:14:44 +0200 Subject: [BiO BB] functional clustering among Affymetrix data In-Reply-To: <266201c37b91$b1b09540$6400a8c0@vt1000> References: <20030915133015.92588.qmail@web8104.in.yahoo.com> <266201c37b91$b1b09540$6400a8c0@vt1000> Message-ID: <1063635284.3f65c954e8941@webmail.unisa.it> Dear All, I am sorry of distrurbing you (anyone of you!!!) with a very trivial question: I am wondering whether it is available a simple tool online I could find useful for my purpose: I would like to cluseter a data set (collected from Affym. GeneChip U34A,B & C) regarding on the biological functions. In other words, I'd like to draw a statistical graph (like a nice cake) where I can put in all my data set, selecting all genes with a clear and known function (biological, physiological and for compartment localization). Could anyone (of you) help me, showing how I can purchase this (hard almost for me) task? I will appreciate it too much :::))) Thanks in advance, all the best Davide (phD student) ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From boris.steipe at utoronto.ca Mon Sep 15 10:28:39 2003 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Mon, 15 Sep 2003 10:28:39 -0400 Subject: [BiO BB] functional clustering among Affymetrix data References: <20030915133015.92588.qmail@web8104.in.yahoo.com> <266201c37b91$b1b09540$6400a8c0@vt1000> <1063635284.3f65c954e8941@webmail.unisa.it> Message-ID: <3F65CC95.2ECEE923@utoronto.ca> deletto at unisa.it wrote: > [...] > I would like to cluster a data set (collected from Affym. > GeneChip U34A,B & C) regarding on the biological functions. In other words, I'd > like to draw a statistical graph (like a nice cake) where I can put in all my > data set, selecting all genes with a clear and known function (biological, > physiological and for compartment localization). [...] Have you tried the GenePublisher site ? Best regards, Boris --- Boris Steipe University of Toronto Program in Proteomics & Bioinformatics Departments of Biochemistry & Molecular and Medical Genetics http://biochemistry.utoronto.ca/steipe From mgollery at unr.edu Mon Sep 15 13:12:04 2003 From: mgollery at unr.edu (Martin Gollery) Date: Mon, 15 Sep 2003 10:12:04 -0700 Subject: [BiO BB] RE: Hardware implemntation of HMMs In-Reply-To: <001101c37a45$a55d19a0$0101c80a@weizmannnhc4zn> References: <20030913160139.825B9D2610@www.bioinformatics.org> <001101c37a45$a55d19a0$0101c80a@weizmannnhc4zn> Message-ID: <1063645924.3f65f2e4ba007@webmail.unr.edu> Yes, they do. They accelerate HMMer models, not SAM. Marty Quoting Eitan Rubin : > Hi, > > As far as I know, Compugen sells hardware accelerators for HMMs (see > http://www.cgen.com). Their website specifically mentions HMMs > acceleration. > > Eitan > ------------------------------------------------------------------------ > Eitan Rubin, PhD > Head of Bioinformatics and Biological Computing > Dept. Biological Services > Weizmann Institute of Science > Tel: +972-8-9343456 > Fax: +972-8-9346006 > ----- Original Message ----- > From: > To: > Sent: Saturday, September 13, 2003 6:01 PM > Subject: BiO_Bulletin_Board digest, Vol 1 #521 - 1 msg > > > > When replying, PLEASE edit your Subject line so it is more specific > > than "Re: BiO_Bulletin_Board digest, Vol..." > > > > > > Today's Topics: > > > > 1. Re: Implementing HMM models in Hardware (FPGA) (Martin Gollery) > > > > --__--__-- > > > > Message: 1 > > Date: Fri, 12 Sep 2003 10:14:36 -0700 > > From: Martin Gollery > > To: bio_bulletin_board at bioinformatics.org > > Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA) > > Reply-To: bio_bulletin_board at bioinformatics.org > > > > Hi Venu, > > Yes, this idea makes sense. Anyone who uses InterProscan wishes it were > > faster! The current customer base includes all the big genome centers, > > universities, many biotechs and larger Pharmas. Other people have done > projects > > in the field- Kestral at UCSC has been around for years, I don't know if > it has > > been updated recently. Hokiegene out of Virginia is a newer system that > shows > > great performance. There are several other commercial ventures that are > in > > stealth mode right now. > > HMM's really lend themselves well to acceleration of this type. > Perhaps > you > > could focus on accelerating SAM instead of HMMer, as many people prefer > it > and > > TimeLogic, Compugen and Paracel do not accelerate SAM. Also, nobody (to > my > > knowledge) has accelerated ClustalW, so this might be something else to > > consider. > > > > Best of luck! Sounds like an interesting project! > > > > Marty > > > > > > 3. Quoting Venu Kode : > > > > > Hello everyone, > > > > > > I am graduate student in computer engineering doing my thesis in the > > > following topic: > > > > > > "Matching Protein sequences with HMM models in FPGAs ( Field > Programmable > > > Gate Arrays ) using Run Time Reconfiguration" > > > > > > If you are familiar with Decypher tool from TimeLogic, my work involves > > > something similar to that. Decypher is a very comprehensive and > expensive > > > tool and sure it does deliver excellent performance compared to the > software > > > tools such as HMMR, SAM or the other tools. My work falls in between > the > > > software solution and that of Decypher. > > > > > > Specifically, when developed my tool would be considerably faster than > that > > > of the software tools currently available and also be affordable at a > > > fraction of the cost of using Decypher. > > > > > > I have just started digging into Bioinformatics and have read quite a > number > > > of papers and all, but I am still a little confused and would like any > > > comments or suggestions from you: > > > > > > 1) Does my tool make any sense at all? > > > 2) What is the current customer base like for this technology? > > > 3) What sort of companies do the work of matching protein/DNA sequences > with > > > existing models? > > > 4) Is there a need for a less comprehensive and less expensive tool as > > > opposed to Decypher for customers who want to get it done a lot less > cheaper > > > but wouldnt mind the extra penalty in performace ( ofcourse will be > very > much > > > better than that of the software searching) > > > 5) Any comments, questions, suggestions? > > > 6) Any pointers for me in terms of websites or resources. > > > > > > I would very much appreciate your comments > > > > > > TIA > > > Kode > > > > > > > > > > > > --------------------------------- > > > Do you Yahoo!? > > > Yahoo! SiteBuilder - Free, easy-to-use web site design software > > > > > > Martin Gollery > > Associate Director of Bioinformatics > > University of Nevada at Reno > > Dept. of Biochemistry / MS330 > > New phone number! 775-784-7042 > > > > > > ------------------------------------------------- > > This mail sent through https://webmail.unr.edu > > > > > > > > --__--__-- > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > End of BiO_Bulletin_Board Digest > > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > Martin Gollery Associate Director of Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 New phone number! 775-784-7042 ------------------------------------------------- This mail sent through https://webmail.unr.edu From val at vtek.com Mon Sep 15 14:57:22 2003 From: val at vtek.com (val) Date: Mon, 15 Sep 2003 14:57:22 -0400 Subject: [BiO BB] Implementing HMM models in Hardware (FPGA) References: <20030914070007.4364.qmail@web13104.mail.yahoo.com> Message-ID: <271401c37bbb$343ab7b0$6400a8c0@vt1000> Hi John/All: Thanx John for an interesting and refreshing post. Your points sound very reasonable to me, although this is a CS/cpu side of the story. What about other side, biochip side, a direction which might be taken more comfortable than HW accelerators. In other words, a computational acceleration seems to be a good thing, but this is just a fragment of the whole cell analysis *pipeline*. Indeed, a final goal of bioinformatics and generally in-silico cell analysis is to understand cell mechanisms/processes and then based on that proceeed to drug design/discovery activities. >From that perspective, further evolution and advancements in biochip design and functionality would be a step in right direction. And i mean silicon functionality, when talking about biochips, and related data. Silicon designed for (floating point) computing, including multiprocessor and cluster options, is still very much silicon designed ~50 years ago and having little to do with cell mechanisms analysis, understanding its result and then using it for biomedical applications. So when designing silicon functionality, why don't start right from using silicon to implement a whole cell analysis pipeline? Silicon - but not just a computational one, rather a *biotechnology* (bt) silicon. That is, silicon directly interfaced electrically with cells (in culture, 'a real sample'). The interface would include an *input plane" (sensor plane) and an "output plane" (driving plane). And a recognition and storage logic in-between. This is indeed a quite known "system/lab-on-the-chip" approach, with the lab directly interfaced with a sample, including (on the following phases) electrical driving facilities designed to move and/or immobilize cells, perform transfection, electroporation and other cell modification operations. Of course, such an active biochip would be a massively parallel processor, and can be called a biotechnology (bt)processor (vs. computational processor) since it directly implements a programmable cell analysis technology pipeline - input, processing and modification. Optical fluorescent binding patterns can also be measured with such a chip. Its obvious advantage is that dynamic analysis in time can be performed on the same chip, say, yeast life cycle dynamics with a fine time resolution (say, seconds and less instead of minutes). What seems to be a really good news is that such a silicon can and needs to be designed as an *array*, a massively parallel, fine-grain architecture with a relatively simple microcell (vs. spagetti-like x86s). If the total number of transistors on the "lab-on-the-chip" is ~10B (which is what possible now), a grain (microprocessor) in the mega-array (1000x1000 microprocessors) may have up to 10K transistors which is quite enough to implement the basic input/output and processing functionality at a grain level. For a ~1 sq.in. chip, a grain would be of 250 um size. The input plane of a grain might have up to 32x32 sensors, so that linear spatial resolution for cell analysis would be ~8 um which is Ok for mid-size and large cells (on average, an animal cell size is ~10um). So, i guess my point is that it does not make a lot of sense to accelerate, optimize, etc a fragment of the pipeline without looking at an integrated cell/tissue analysis pipeline - how/where silicon functionality can be applied. cheers, val ----- Original Message ----- From: John Jakson To: bio_bulletin_board at bioinformatics.org Sent: Sunday, September 14, 2003 3:00 AM Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA) Hi Venu Interesting to see others interested in applying FPGAs to Bioinformatics. FPGAs don't get much mention here. I am not convinced the Bio industry really cares for EE solutions it doesn't understand. Linux clusters are bad enough but what the hell are FPGAs. As an EE VLSI/FPGA hardhat visitor at the BioWorld show, held here in Boston not that long ago all I saw was disinterest and plenty of tower server racks. Not one HW company showed up with anything but Linux clusters or the SGI/IBM/HP/... equivalent. TimeLogic & the 1 or 2 other (defunct ?) accelerator companies were noshow. On talking with the floor folks I found no interest or basic understanding of possible HW alternatives. The issue comes down to how the problem is stated and how it can be implemented in a solution that most Bio SW types can understand. That means whatever the engine is, it must just run C code, simple as that, preferably the free stuff from NCBI. That always leads to the same solution, clusters of ever faster & ever hotter farms of todays x86. Any rational computer scientist knows this is crazy, and that dedicated HW should be built. TimeLogic says it very well on their web site. In crypto, video or DSP processing, it is relatively easy to turn C code into HW since they are all math intensive and are likely created by the same EEs. It may come as a surprise to SW types but HW is routinely modeled in C, but that code is used only to double check the design written in a decent HW description languages like Verilog or VHDL both of which are implicitly parallel languages. There is usually some formal mathematical model often written in Matlab for the real heavy stuff. Its also interesting that the Matlab code usually floating point intensive & the final ASIC/FPGA solutions are not expected to produce identical results since HW is best built integer fashion. One might regard the current Bio C codes as just simulations of HW that hasn't been built yet since few know how to recode them in HW language. TimeLogic did a few but not in a way that can be easily duplicated across the industry. To turn C code into really fast HW requires understanding what the C code is really doing and having permission to make subtle but harmless changes to it to allow the really big speed ups. That means eliminating floating point. If the Bio author of such SW is also a HW expert (of which there are probably only a handfull or even 0 in the whole world) then equivalent algorithms could be used that are relatively simply to map onto HW structures. I don't see the Bio world hiring too many HW EEs either, we are far too different culturally and we usually don't have Phds, esp not from the right schools. There are other ways to turn C code into HW, maybe use a C based HW language such as HandelC which is based on Occam & CSP. And there's the clue. If the SW is broken up into the constituent parallel processes that are naturally there but impossible to describe in plain C, then it becomes almost trivial to map those parallel processes onto FPGA fabric or even something like a Transputer farm. The only difference is the granularity. FPGAs are hot today but can only readily be engineered by HW types because their most efficient use requires detailed understanding of pipelines and combinatorial logic and basic cpu design. Transputers if they still existed would be the natural way to go because they are ameniable to both SW & HW engineers but they still worked best when SW & HW were both understood. Occam was just a way to describe parallel processes that decribed HW in a funny syntax. Transputers only died out because the implementation fell far behind x86 performa nce and was single sourced & underfunded. Most Transputer projects & users ultimately switched to standard DSPs & FPGA leaving the SW user base behind. Another approach would be to use one of the cpu farms on a chip such as Clearspeed or PicoChip or BOPs (RIP) who have developed risc cpus that can be upto 420 instances on a chip running at 100MHz plus. Interesting to see if those devices can escape cell phone basestations. So I have taken my passive interest in this subject back to the drawing board to recreate a modern FPGA hosted Transputer that would naturally execute sequential C code, or parallel Occam code & even Verilog code. That means that if code can be partially migrated from seq C to par Occam style C (ie HandelC) then to Verilog ( a C'ish like HW language), the same code still runs on the same cpu (but a little slower perhaps). Extra process scheduling HW is needed to support very fine grained concurrency in a modern Transputer and also a logic simulator. The big pay off is that properly parallized code once in Verilog form still runs either as compiled source code on a farm of cpus using message passing and links, or it can be synthesized with industry standard HW tools back onto the FPGA fabric for the desired speed ups. In effect, sequential procedures in C code can be morphed into on chip HW coproceesors using the reconfigurable features of many FPGAs. Stable FPGA coproc essor engines can then be turned into much faster and cheaper ASICs in return for nasty upfront NRE. Such solutions could go much farther than current TimeLogics products for many industries beside Bio. Xilinx & Intel can give us a clue here. A cluster cpu node based on a P4 at say 3GHz might run to $2K per node depending on whats there even though the fastest P4 chip is always say $600. An FPGA RISC cpu node based on MicroBlaze runs at maybe only 125MHz but will cost about $1.40 per node in volume plus extra support. Now if the cpu can be farmed by adding those Transputer extensions, the 24x clock difference doesn't looks so bad compared to the est 400+ fold cpu cost difference. Also a lot of slower cpus each with local RLDRAM don't have the memory latency that P4s suffer from ie 1 DRAM cycle is a few cpu cycle instead of hundreds, and distributed bandwidth is much easier to manage. Its also interesting to see the changes at TimeLogic, the departure of Jim and the merger with a company that I see has no obvious HW background. Regards John Jakson sorry for long rant Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software From vkode78 at yahoo.com Mon Sep 15 16:37:23 2003 From: vkode78 at yahoo.com (Venu Kode) Date: Mon, 15 Sep 2003 13:37:23 -0700 (PDT) Subject: [BiO BB] Implementing HMM models in Hardware (FPGA) In-Reply-To: <271401c37bbb$343ab7b0$6400a8c0@vt1000> Message-ID: <20030915203723.32578.qmail@web40512.mail.yahoo.com> Mark, John, and Val, Thank you all for your perspectives. I couldnt have asked for any thing better. I am sure one day (soon enough) there will be a happy family of EEs, CSs and Bioinformatics all on the same page. Compared to you guys, I am a novice both in Hardware design and the Bio side. It was just great to get all these persepctives from insiders. And I thank you all for it. But for what it is worth, here is my take on it. Clusters seems to be the buzz word in the Bioinformatics world these days. They want to build bigger and better clusters. And I dont blame them. There is a huge loads of data that need to be processed in these massive parallel implementations. Here is probably the cost involved in these clusters: 1) Each Node ~ $600 2) Bottlenecks, Overhead relating to the parallel implementation This almost is analogous to using Brute force using multi purpose processors, and parallel implementations and all this with huge cost. I am sure there can be a better solution to all this. John mentioned Biochips, that sure would be nice but it certainly wont be a reality until a decade from now ( ignorant guess, please pardon me). With the current status quo sticking to computational processors, one of the alternatives could be Taaddaaa FPGA!!!! FPGAs as Hardware Accelerators: 1) Run Time Reconfigurable(RTR) 2) High Density Devices that can implement highly parallel implemenations There will be a huge cost/performance difference between clusters and array of FPGAs. Computatations like Viterbi Decoding, Forward & Backward algorithms, and Log odd scores all of these can be very efficiently implemeted on FPGA because of the Look up Table architecture of FPGAs. Thats just the beginning. And these devices are run time reconfigurable. So I can have the Processing Engines ( PEs) which I call fixed logic which work on the data ( HMM profiles) which I call the Reconfigurable Logic(RC). These PEs will be working on the RC data, and I can independently re configure just the RC to load new data ( next HMM profile ) using RTR and ofcourse this is just one PE and RC. and depending on the size of the FPGA device used I can have more than one PE and RC all working in parallel. Having said that, the trick is to make sure that the PEs are always busy, and the reconfiguration delay, and the computing cost involved in the host processor, all justfy the Cost/Performance parameter. And since everything is generated on the fly, there has to be an optimal way of scheduling reconfiguration. And that can be hard as well. And ofcourse this eliminates the need for dedicated memory ON/OFF chip, at the cost of the reconfiguration delay. Key to all this is Cost/Performance factor. Please feel free to throw in your comments,suggestions & questions. I sure would like to know if there any issues that you see with the route I am taking. I hope to have a website soon where I can post my progress. Thanks again , Venu val wrote: Hi John/All: Thanx John for an interesting and refreshing post. Your points sound very reasonable to me, although this is a CS/cpu side of the story. What about other side, biochip side, a direction which might be taken more comfortable than HW accelerators. In other words, a computational acceleration seems to be a good thing, but this is just a fragment of the whole cell analysis *pipeline*. Indeed, a final goal of bioinformatics and generally in-silico cell analysis is to understand cell mechanisms/processes and then based on that proceeed to drug design/discovery activities. >From that perspective, further evolution and advancements in biochip design and functionality would be a step in right direction. And i mean silicon functionality, when talking about biochips, and related data. Silicon designed for (floating point) computing, including multiprocessor and cluster options, is still very much silicon designed ~50 years ago and having little to do with cell mechanisms analysis, understanding its result and then using it for biomedical applications. So when designing silicon functionality, why don't start right from using silicon to implement a whole cell analysis pipeline? Silicon - but not just a computational one, rather a *biotechnology* (bt) silicon. That is, silicon directly interfaced electrically with cells (in culture, 'a real sample'). The interface would include an *input plane" (sensor plane) and an "output plane" (driving plane). And a recognition and storage logic in-between. This is indeed a quite known "system/lab-on-the-chip" approach, with the lab directly interfaced with a sample, including (on the following phases) electrical driving facilities designed to move and/or immobilize cells, perform transfection, electroporation and other cell modification operations. Of course, such an active biochip would be a massively parallel processor, and can be called a biotechnology (bt)processor (vs. computational processor) since it directly implements a programmable cell analysis technology pipeline - input, processing and modification. Optical fluorescent binding patterns can also be measured with such a chip. Its obvious advantage is that dynamic analysis in time can be performed on the same chip, say, yeast life cycle dynamics with a fine time resolution (say, seconds and less instead of minutes). What seems to be a really good news is that such a silicon can and needs to be designed as an *array*, a massively parallel, fine-grain architecture with a relatively simple microcell (vs. spagetti-like x86s). If the total number of transistors on the "lab-on-the-chip" is ~10B (which is what possible now), a grain (microprocessor) in the mega-array (1000x1000 microprocessors) may have up to 10K transistors which is quite enough to implement the basic input/output and processing functionality at a grain level. For a ~1 sq.in. chip, a grain would be of 250 um size. The input plane of a grain might have up to 32x32 sensors, so that linear spatial resolution for cell analysis would be ~8 um which is Ok for mid-size and large cells (on average, an animal cell size is ~10um). So, i guess my point is that it does not make a lot of sense to accelerate, optimize, etc a fragment of the pipeline without looking at an integrated cell/tissue analysis pipeline - how/where silicon functionality can be applied. cheers, val ----- Original Message ----- From: John Jakson To: bio_bulletin_board at bioinformatics.org Sent: Sunday, September 14, 2003 3:00 AM Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA) Hi Venu Interesting to see others interested in applying FPGAs to Bioinformatics. FPGAs don't get much mention here. I am not convinced the Bio industry really cares for EE solutions it doesn't understand. Linux clusters are bad enough but what the hell are FPGAs. As an EE VLSI/FPGA hardhat visitor at the BioWorld show, held here in Boston not that long ago all I saw was disinterest and plenty of tower server racks. Not one HW company showed up with anything but Linux clusters or the SGI/IBM/HP/... equivalent. TimeLogic & the 1 or 2 other (defunct ?) accelerator companies were noshow. On talking with the floor folks I found no interest or basic understanding of possible HW alternatives. The issue comes down to how the problem is stated and how it can be implemented in a solution that most Bio SW types can understand. That means whatever the engine is, it must just run C code, simple as that, preferably the free stuff from NCBI. That always leads to the same solution, clusters of ever faster & ever hotter farms of todays x86. Any rational computer scientist knows this is crazy, and that dedicated HW should be built. TimeLogic says it very well on their web site. In crypto, video or DSP processing, it is relatively easy to turn C code into HW since they are all math intensive and are likely created by the same EEs. It may come as a surprise to SW types but HW is routinely modeled in C, but that code is used only to double check the design written in a decent HW description languages like Verilog or VHDL both of which are implicitly parallel languages. There is usually some formal mathematical model often written in Matlab for the real heavy stuff. Its also interesting that the Matlab code usually floating point intensive & the final ASIC/FPGA solutions are not expected to produce identical results since HW is best built integer fashion. One might regard the current Bio C codes as just simulations of HW that hasn't been built yet since few know how to recode them in HW language. TimeLogic did a few but not in a way that can be easily duplicated across the industry. To turn C code into really fast HW requires understanding what the C code is really doing and having permission to make subtle but harmless changes to it to allow the really big speed ups. That means eliminating floating point. If the Bio author of such SW is also a HW expert (of which there are probably only a handfull or even 0 in the whole world) then equivalent algorithms could be used that are relatively simply to map onto HW structures. I don't see the Bio world hiring too many HW EEs either, we are far too different culturally and we usually don't have Phds, esp not from the right schools. There are other ways to turn C code into HW, maybe use a C based HW language such as HandelC which is based on Occam & CSP. And there's the clue. If the SW is broken up into the constituent parallel processes that are naturally there but impossible to describe in plain C, then it becomes almost trivial to map those parallel processes onto FPGA fabric or even something like a Transputer farm. The only difference is the granularity. FPGAs are hot today but can only readily be engineered by HW types because their most efficient use requires detailed understanding of pipelines and combinatorial logic and basic cpu design. Transputers if they still existed would be the natural way to go because they are ameniable to both SW & HW engineers but they still worked best when SW & HW were both understood. Occam was just a way to describe parallel processes that decribed HW in a funny syntax. Transputers only died out because the implementation fell far behind x86 performa nce and was single sourced & underfunded. Most Transputer projects & users ultimately switched to standard DSPs & FPGA leaving the SW user base behind. Another approach would be to use one of the cpu farms on a chip such as Clearspeed or PicoChip or BOPs (RIP) who have developed risc cpus that can be upto 420 instances on a chip running at 100MHz plus. Interesting to see if those devices can escape cell phone basestations. So I have taken my passive interest in this subject back to the drawing board to recreate a modern FPGA hosted Transputer that would naturally execute sequential C code, or parallel Occam code & even Verilog code. That means that if code can be partially migrated from seq C to par Occam style C (ie HandelC) then to Verilog ( a C'ish like HW language), the same code still runs on the same cpu (but a little slower perhaps). Extra process scheduling HW is needed to support very fine grained concurrency in a modern Transputer and also a logic simulator. The big pay off is that properly parallized code once in Verilog form still runs either as compiled source code on a farm of cpus using message passing and links, or it can be synthesized with industry standard HW tools back onto the FPGA fabric for the desired speed ups. In effect, sequential procedures in C code can be morphed into on chip HW coproceesors using the reconfigurable features of many FPGAs. Stable FPGA coproc essor engines can then be turned into much faster and cheaper ASICs in return for nasty upfront NRE. Such solutions could go much farther than current TimeLogics products for many industries beside Bio. Xilinx & Intel can give us a clue here. A cluster cpu node based on a P4 at say 3GHz might run to $2K per node depending on whats there even though the fastest P4 chip is always say $600. An FPGA RISC cpu node based on MicroBlaze runs at maybe only 125MHz but will cost about $1.40 per node in volume plus extra support. Now if the cpu can be farmed by adding those Transputer extensions, the 24x clock difference doesn't looks so bad compared to the est 400+ fold cpu cost difference. Also a lot of slower cpus each with local RLDRAM don't have the memory latency that P4s suffer from ie 1 DRAM cycle is a few cpu cycle instead of hundreds, and distributed bandwidth is much easier to manage. Its also interesting to see the changes at TimeLogic, the departure of Jim and the merger with a company that I see has no obvious HW background. Regards John Jakson sorry for long rant Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From madhavi+bioml at cs.cmu.edu Sat Sep 13 14:51:31 2003 From: madhavi+bioml at cs.cmu.edu (Madhavi Ganapathiraju) Date: Sat, 13 Sep 2003 14:51:31 -0400 Subject: [BiO BB] BLC2003: Call for papers Message-ID: <5.1.0.14.2.20030913143134.028e28b8@128.2.254.145> This seemed an appropriate forum to post this information Madhavi ========================================= Biological Language Conference : Call for papers Scope: Integration of language technologies in bioinformatics/computational biology research Protein sequences from different organisms may be viewed as texts written in different languages. The mapping of protein sequence to their structure, dynamics and function then becomes analogous to the mapping of words to meaning in natural languages. This analogy can be exploited by application of statistical language modeling and text classification techniques to biological sequences, thereby generating testable hypotheses regarding the fundamental building blocks of "protein sequence language". The biology-language analogy enables novel applications of language technologies to the biology domain, but is to a great extent overlapping with existing other computational biology/bioinformatics applications. The purpose of the Biological Language Conference is to facilitate scientific exchange between researchers using the language analogy approach directly and researchers using other approaches. We invite papers in the following areas of interest: ------------------------------------------------------------ secondary structure prediction tertiary structure prediction repetitive fold prediction membrane protein-specific prediction challenges protein folding/misfolding conformational changes genome evolution/comparison sequence alignment protein family classification immune system protein-protein interactions protein/gene networks and pathways Because of the challenge in bioinformatics research of involvement of non-biology domain experts in biology research, we also encourage submission of papers describing new approaches to cross-disciplinary education. Venue and dates: November 20-21, 2003 at Carnegie Mellon University in Pittsburgh, USA The conference is organized by: Profs. Raj Reddy and Judith Klein-Seetharaman of the NSF-funded Center for Biological Language Modeling (http://www.cs.cmu.edu/~blmt). Papers should be a maximum of 15 pages and will be peer-reviewed. All accepted papers will appear in the conference proceedings. Papers should be prepared according to the guidelines (http://flan.blm.cs.cmu.edu/meeting2003/template.doc) and submitted online here. Further information on the conference will be available at http://flan.blm.cs.cmu.edu/meeting2003/. September 20th, 2003 (Optional) White-paper abstract and indication of intention to submit a paper by email to judithks at cs.cmu.edu October 20th, 2003 Deadline for electronic paper submission November 1st, 2003 Notification of acceptance November 10th, 2003 Final camera-ready manuscript due November 10th, 2003 Registration deadline Contact Judith Klein-Seetharaman, Language Technologies Institute, School of Computer Science, Carnegie Melon University, Pittsburgh 15213 PA USA email: judithks at cs.cmu.edu, phone: 412 383 7325, fax: 412 648 1945. From umajumdar at locuz.com Mon Sep 15 11:00:06 2003 From: umajumdar at locuz.com (Uttam Majumdar) Date: Mon, 15 Sep 2003 20:30:06 +0530 Subject: [BiO BB] feeling discouraging after seeing this much poor response References: <20030915133015.92588.qmail@web8104.in.yahoo.com> Message-ID: <002501c37b9a$0dd579a0$3fc0c0c0@UMAJUMDAR> PLS UNSUBSCRIBE... ----- Original Message ----- From: prathibha bharathi To: bio_bulletin_board at bioinformatics.org Sent: Monday, September 15, 2003 7:00 PM Subject: [BiO BB] feeling discouraging after seeing this much poor response i am feeling discouraging after seeing this much poor response.......... and i am also thinking about dropping my idea of doing my project on protein seq analysis..........................bye all............................. Yahoo! India Matrimony: Find your partner online. Post your profile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From patents at patentworks.com Mon Sep 15 11:11:53 2003 From: patents at patentworks.com (Goneau & Lemens) Date: Mon, 15 Sep 2003 11:11:53 -0400 Subject: [BiO BB] To whom it may concern Message-ID: <00e701c37b9b$b281bc60$6c00a8c0@denis1> I am looking for Genbank accession number AB047240 Do not know where to go, Can somebody help me. Thanks in advance Denis. Goneau & Lemens Intellectual Property Library PatentWorks Group Main Floor Reception 170 St-Joseph Boul. Hull, Quebec J8Y 3W9 Phone (819)772-2770 Fax (819)772-0061 email: patents at patentworks.com web: www.patentworks.com ____________________________ This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lejm3 at hermes.cam.ac.uk Mon Sep 15 12:23:30 2003 From: lejm3 at hermes.cam.ac.uk (Lucy McWilliam) Date: Mon, 15 Sep 2003 17:23:30 +0100 (BST) Subject: [BiO BB] Re: functional clustering among Affymetrix data Message-ID: deletto at unisa.it wrote: > I am wondering whether it is available a simple tool online I could find > useful for my purpose: I would like to cluseter a data set (collected > from Affym. GeneChip U34A,B & C) regarding on the biological functions. > In other words, I'd like to draw a statistical graph (like a > nice cake) where I can put in all my data set, selecting all genes with > a clear and known function (biological, physiological and for > compartment localization). You can use the Gene Ontology mining tool in the analysis section of Affymetrix's website (requires free registration). http://www.affymetrix.com/analysis/index.affx -- Lucy McWilliam FlyChip Microarray Facility Department of Genetics University of Cambridge http://www.flychip.org.uk/ From idoerg at burnham.org Mon Sep 15 17:45:34 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Mon, 15 Sep 2003 14:45:34 -0700 Subject: [BiO BB] To whom it may concern In-Reply-To: <00e701c37b9b$b281bc60$6c00a8c0@denis1> References: <00e701c37b9b$b281bc60$6c00a8c0@denis1> Message-ID: <3F6632FE.3090503@burnham.org> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AB047240 Goneau & Lemens wrote: > > > I am looking for Genbank accession number AB047240 > Do not know where to go, Can somebody help me. > > Thanks in advance > > > Denis. > > > > Goneau & Lemens > Intellectual Property Library > PatentWorks Group > Main Floor Reception > 170 St-Joseph Boul. > Hull, Quebec J8Y 3W9 > > Phone (819)772-2770 > Fax (819)772-0061 > > email: patents at patentworks.com > web: www.patentworks.com > > ____________________________ > This message (including any attachments) contains confidential > information intended for a specific individual and purpose, and is > protected by law. If you are not the intended recipient, you should > delete this message and are hereby notified that any disclosure, > copying, or distribution of this message, or the taking of any action > based on it, is strictly prohibited. -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From drjohn08318 at yahoo.com Mon Sep 15 18:12:01 2003 From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.) Date: Mon, 15 Sep 2003 15:12:01 -0700 (PDT) Subject: [BiO BB] To whom it may concern In-Reply-To: <3F6632FE.3090503@burnham.org> Message-ID: <20030915221201.45455.qmail@web14405.mail.yahoo.com> If you're looking for just the information related to this clone, go to ncbi.nlm.nih.gov. Type in the accession number. If you want the actual gene itself, you can use a company such as Napro Genomics to isolate it for you. Incidentally, they do terrific work....highly recommended! JGH Iddo Friedberg wrote: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AB047240 Goneau & Lemens wrote: > > > I am looking for Genbank accession number AB047240 > Do not know where to go, Can somebody help me. > > Thanks in advance > > > Denis. > > > > Goneau & Lemens > Intellectual Property Library > PatentWorks Group > Main Floor Reception > 170 St-Joseph Boul. > Hull, Quebec J8Y 3W9 > > Phone (819)772-2770 > Fax (819)772-0061 > > email: patents at patentworks.com > web: www.patentworks.com > > ____________________________ > This message (including any attachments) contains confidential > information intended for a specific individual and purpose, and is > protected by law. If you are not the intended recipient, you should > delete this message and are hereby notified that any disclosure, > copying, or distribution of this message, or the taking of any action > based on it, is strictly prohibited. -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From drjohn08318 at yahoo.com Mon Sep 15 18:15:46 2003 From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.) Date: Mon, 15 Sep 2003 15:15:46 -0700 (PDT) Subject: [BiO BB] To whom it may concern In-Reply-To: <00e701c37b9b$b281bc60$6c00a8c0@denis1> Message-ID: <20030915221546.37223.qmail@web14410.mail.yahoo.com> I just went to the ncbi website, and hit the Entrez tab. After typing in the accession #, this is the journal article I found. 1: Sugimoto J, Matsuura N, Kinjo Y, Takasu N, Oda T, Jinno Y.Related Articles, Links Transcriptionally active HERV-K genes: identification, isolation, and chromosomal mapping. Genomics. 2001 Mar 1;72(2):137-44. PMID: 11401426 [PubMed - indexed for MEDLINE] JGH Goneau & Lemens wrote: I am looking for Genbank accession number AB047240 Do not know where to go, Can somebody help me. Thanks in advance Denis. Goneau & Lemens Intellectual Property Library PatentWorks Group Main Floor Reception 170 St-Joseph Boul. Hull, Quebec J8Y 3W9 Phone (819)772-2770 Fax (819)772-0061 email: patents at patentworks.com web: www.patentworks.com ____________________________ This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From drjohn08318 at yahoo.com Mon Sep 15 18:18:44 2003 From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.) Date: Mon, 15 Sep 2003 15:18:44 -0700 (PDT) Subject: [BiO BB] To whom it may concern In-Reply-To: <20030915221201.45455.qmail@web14405.mail.yahoo.com> Message-ID: <20030915221844.6315.qmail@web14407.mail.yahoo.com> For those who might be interested in their gene cloning/gene editing service, here is the link. www.Naprobio.com "John G. Hoey, Ph.D." wrote:If you're looking for just the information related to this clone, go to ncbi.nlm.nih.gov. Type in the accession number. If you want the actual gene itself, you can use a company such as Napro Genomics to isolate it for you. Incidentally, they do terrific work....highly recommended! JGH Iddo Friedberg wrote: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AB047240 Goneau & Lemens wrote: > > > I am looking for Genbank accession number AB047240 > Do not know where to go, Can somebody help me. > > Thanks in advance > > > Denis. > > > > Goneau & Lemens > Intellectual Property Library > PatentWorks Group > Main Floor Reception > 170 St-Joseph Boul. > Hull, Quebec J8Y 3W9 > > Phone (819)772-2770 > Fax (819)772-0061 > > email: patents at patentworks.com > web: www.patentworks.com > > ____________________________ > This message (including any attachments) contains confidential > information intended for a specific individual and purpose, and is > protected by law. If you are not the intended recipient, you should > delete this message and are hereby notified that any disclosure, > copying, or distribution of this message, or the taking of any action > based on it, is strictly prohibited. -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From pooja at igc.gulbenkian.pt Tue Sep 16 07:53:34 2003 From: pooja at igc.gulbenkian.pt (pooja at igc.gulbenkian.pt) Date: Tue, 16 Sep 2003 12:53:34 +0100 (WEST) Subject: [BiO BB] To whom it may concern In-Reply-To: <20030915221546.37223.qmail@web14410.mail.yahoo.com> References: <00e701c37b9b$b281bc60$6c00a8c0@denis1> <20030915221546.37223.qmail@web14410.mail.yahoo.com> Message-ID: <2787.193.126.26.74.1063713214.squirrel@webmail.igc.gulbenkian.pt> Please change the search option from PubMed to Nucleotide and try searching again. Hope this helps. -Pooja > I just went to the ncbi website, and hit the Entrez tab. After typing in > the accession #, this is the journal article I found. > > 1: Sugimoto J, Matsuura N, Kinjo Y, Takasu N, Oda T, Jinno Y.Related > Articles, Links > Transcriptionally active HERV-K genes: identification, isolation, and > chromosomal mapping. > Genomics. 2001 Mar 1;72(2):137-44. > PMID: 11401426 [PubMed - indexed for MEDLINE] > > > > JGH > > > Goneau & Lemens wrote: > > > I am looking for Genbank accession number AB047240 > > Do not know where to go, Can somebody help me. > > Thanks in advance > > > Denis. > > > > Goneau & Lemens > Intellectual Property Library > PatentWorks Group > Main Floor Reception > 170 St-Joseph Boul. > Hull, Quebec J8Y 3W9 > > Phone (819)772-2770 > Fax (819)772-0061 > > email: patents at patentworks.com > web: www.patentworks.com > > ____________________________ > This message (including any attachments) contains confidential information > intended for a specific individual and purpose, and is protected by law. > If you are not the intended recipient, you should delete this message and > are hereby notified that any disclosure, copying, or distribution of this > message, or the taking of any action based on it, is strictly prohibited. > > > --------------------------------- > Do you Yahoo!? > Yahoo! SiteBuilder - Free, easy-to-use web site design software From Ingrid.Marchal at univ-lille1.fr Tue Sep 16 13:54:14 2003 From: Ingrid.Marchal at univ-lille1.fr (Ingrid.Marchal at univ-lille1.fr) Date: Tue, 16 Sep 2003 19:54:14 +0200 (MET DST) Subject: [BiO BB] Find a region of identity in a set of sequences Message-ID: <200309161754.TAA30283@cri.univ-lille1.fr> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From moyc at comcast.net Wed Sep 17 12:18:04 2003 From: moyc at comcast.net (Chris Moy) Date: 17 Sep 2003 12:18:04 -0400 Subject: [BiO BB] Re: Find a region of identity in a set of sequences Message-ID: <1063815484.2214.2.camel@laptop> You may want to try PipMaker (Percent Identity Plot). It is located at http://bio.cse.psu.edu/pipmaker/. I have not used it but it may be what you are looking for. From drjohn08318 at yahoo.com Wed Sep 17 14:37:32 2003 From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.) Date: Wed, 17 Sep 2003 11:37:32 -0700 (PDT) Subject: [BiO BB] Find a region of identity in a set of sequences In-Reply-To: <200309161754.TAA30283@cri.univ-lille1.fr> Message-ID: <20030917183732.69102.qmail@web14405.mail.yahoo.com> Sequencher will do this for you. Also, if you have access to Vector NTI, you can do it with this program. drjohn Ingrid.Marchal at univ-lille1.fr wrote: Hi, I am looking for a program that takes a set of sequences and outputs the region(s) of identity (not similarity), if there is any. I am pretty sure that such a program exists, but I can't remember which. Anyone has an idea ? I would be very pleased if I don't have to recode it ! Thanks a lot Ingrid --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: From cat_worth at hotmail.com Wed Sep 17 16:48:00 2003 From: cat_worth at hotmail.com (Catherine Worth) Date: Wed, 17 Sep 2003 20:48:00 +0000 Subject: [BiO BB] Re: Find a region of identity in a set of sequences Message-ID: An HTML attachment was scrubbed... URL: From hjm at tacgi.com Mon Sep 22 16:01:54 2003 From: hjm at tacgi.com (Harry Mangalam) Date: Mon, 22 Sep 2003 13:01:54 -0700 Subject: [BiO BB] HMM in Hardware (FPGA) In-Reply-To: <20030914070007.4364.qmail@web13104.mail.yahoo.com> References: <20030914070007.4364.qmail@web13104.mail.yahoo.com> Message-ID: <3F6F5532.3040406@tacgi.com> Re the previous thread about FPGAs, there's a nice intro to the process (and a bunch of good URLs) in the latest Linux Journal: http://www.linuxjournal.com/article.php?sid=6857 Also a reference to Open Source tools and FPGAs here: http://www.linuxdevices.com/articles/AT6411901280.html -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hjm at tacgi.com <> From Reichelt at gbf.de Thu Sep 25 03:18:35 2003 From: Reichelt at gbf.de (Joachim Reichelt) Date: Thu, 25 Sep 2003 09:18:35 +0200 Subject: [BiO BB] Blast from C or C++ Message-ID: <3F7296CB.1020806@gbf.de> Dear all, We want to submit jobs to QBlast at ncbi out of a C/C++ Program without installing perl. We tried it in Qt but we cannot rely on this version. Too often the job got lost. Joachim From lhaifeng at dso.org.sg Thu Sep 25 22:50:30 2003 From: lhaifeng at dso.org.sg (Liu Haifeng) Date: Fri, 26 Sep 2003 10:50:30 +0800 Subject: [BiO BB] protein families References: Message-ID: <000e01c383d8$f32dde70$706712ac@GENETHON> Hi, I am trying to find a collection of protein sequences which have been correctly assigned to different families. Anybody can suggest where to obtain such kind of data? PDB and Swiss-Prot seem to provide sequences but without family information. Would appreciate your help, thanks a lot! Sincerely Haifeng Liu From idoerg at burnham.org Fri Sep 26 02:51:18 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu, 25 Sep 2003 23:51:18 -0700 Subject: [BiO BB] protein families In-Reply-To: <000e01c383d8$f32dde70$706712ac@GENETHON> Message-ID: Depends on what you mean by "family" CATH and SCOP assign proteins in a hierarchical manner to classes, folds, superfamilies, families based on sequence based (SCOP) and structure based (CATH) similarities. Both are manually or semi-manually curated. FSSP does the same task automatically. All three are in a high rate of agreement (75-80%) regarding their calssifications. Of course, you are limited to the proteins in PDB only: the ones for which there are solved structures. If you wish to consider more sequecnes, then other assignemetns are possible, depending on your purpose. Pfam is a good example. I suggest you get a good bioinformatics textbook, such as David Mount's, or Baxevantis, and look through the different classification schemes, and choose the one which suits your purpose best. ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo On Fri, 26 Sep 2003, Liu Haifeng wrote: > Hi, > > I am trying to find a collection of protein sequences which have been > correctly assigned to different families. Anybody can suggest where to > obtain such kind of data? PDB and Swiss-Prot seem to provide sequences but > without family information. > > Would appreciate your help, thanks a lot! > > Sincerely > > Haifeng Liu > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From pooja at igc.gulbenkian.pt Fri Sep 26 05:57:17 2003 From: pooja at igc.gulbenkian.pt (pooja at igc.gulbenkian.pt) Date: Fri, 26 Sep 2003 10:57:17 +0100 (WEST) Subject: [BiO BB] protein families In-Reply-To: References: <000e01c383d8$f32dde70$706712ac@GENETHON> Message-ID: <34487.194.117.22.137.1064570237.squirrel@webmail.igc.gulbenkian.pt> Hi, You can find all the proteins classified in superfamily, at any of the following resources; 1. SuperFamily - Classification is based on profile hidden Markove Model that represents all proteins of known structure based on SCOP (Structural Classification Of Proteins). 2. PIR SuperFamily (PIRSF) - The classification system is based on evolutionary relationship of whole proteins. If you are interested in identifying a possible family for an uncharacterized protein ( as presently I am looking for) you may be interested to try InterPro. Interpro is a joint effort of proteins sequence databases like SWISS-PROT and TrEMBL, functional sites, motifs and domain databases, PRINTS, Pfam, and ProDom, and resources for Protein families, like PIRSF and SUPERFAMILY. Features like functional sites, motifs and domains which are exptracted from known protein sequences and known protein families are applied to unknown protein sequence while making the prediction. But the sequences I am trying with, InterPro always leaves me with the same family status I have started from...... Unknown protein or Hypothetical protein ! So I am also looking for a good protein family prediction tool, or Gene Family Prediction tool. I will be very greatful if someone from the list can give me some insight I will soon look into the suggested Bioinformatics readings. May be helpful for me as well. Thank you. Regards, -Pooja > > Depends on what you mean by "family" > > CATH and SCOP assign proteins in a hierarchical manner to classes, folds, > superfamilies, families based on sequence based (SCOP) and structure based > (CATH) similarities. Both are manually or semi-manually curated. FSSP does > the same task automatically. All three are in a high rate of agreement > (75-80%) regarding their calssifications. Of course, you are limited to > the proteins in PDB only: the ones for which there are solved structures. > > If you wish to consider more sequecnes, then other assignemetns are > possible, depending on your purpose. Pfam is a good example. > > I suggest you get a good bioinformatics textbook, such as David Mount's, > or Baxevantis, and look through the different classification schemes, and > choose the one which suits your purpose best. > > ./I > > -- > Iddo Friedberg, Ph.D. > The Burnham Institute > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037, USA > Tel: +1 (858) 646 3100 x3516 > Fax: +1 (858) 646 3171 > http://ffas.ljcrf.edu/~iddo > > On Fri, 26 Sep 2003, Liu Haifeng wrote: > >> Hi, >> >> I am trying to find a collection of protein sequences which have been >> correctly assigned to different families. Anybody can suggest where to >> obtain such kind of data? PDB and Swiss-Prot seem to provide sequences >> but >> without family information. >> >> Would appreciate your help, thanks a lot! >> >> Sincerely >> >> Haifeng Liu >> >> _______________________________________________ >> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From hz5 at njit.edu Fri Sep 26 08:32:00 2003 From: hz5 at njit.edu (hz5 at njit.edu) Date: Fri, 26 Sep 2003 08:32:00 -0400 (EDT) Subject: [BiO BB] protein families In-Reply-To: <000e01c383d8$f32dde70$706712ac@GENETHON> References: <000e01c383d8$f32dde70$706712ac@GENETHON> Message-ID: <1064579520.3f7431c01d534@webmail.njit.edu> Pfam, SMART, ProDom, you can find links here: http://afs13.njit.edu/~hz5/biolink.html#pro Quoting Liu Haifeng : > Hi, > > I am trying to find a collection of protein sequences which have been > correctly assigned to different families. Anybody can suggest where > to > obtain such kind of data? PDB and Swiss-Prot seem to provide sequences > but > without family information. > > Would appreciate your help, thanks a lot! > > Sincerely > > Haifeng Liu > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > ========================================================= Haibo Zhang, PhD student Computational Biology, NJIT & Rutgers University Center for Applied Genomics, PHRI http://afs13.njit.edu/~hz5 From gilbertd at bio.indiana.edu Mon Sep 29 12:41:50 2003 From: gilbertd at bio.indiana.edu (Don Gilbert) Date: Mon, 29 Sep 2003 11:41:50 -0500 (EST) Subject: [BiO BB] Bioinformatics software reviews for Briefings in Bioinformatics Message-ID: <200309291641.h8TGfo719895@cricket.bio.indiana.edu> The journal Briefings in Bioinformatics is seeking reviews of bioinformatics software. If you have a favorite program, or have compared two or more programs and would like to write about their benefits and drawbacks, please contact me. The target audience includes a range of biologists and bioinformaticians, from academia, goverment and industry. A review should be about practical aspects of the software, and be helpful to this range of readers. One caveat is that reviewers should not have associations with the software reviewed, and approach it impartially. Also suggestions for software to be reviewed are welcome. See more at http://www.henrystewart.com/journals/bib/ http://marmot.bio.indiana.edu/bibsoft/ -- software reviews Don Gilbert, software editor, BiB. gilbertd at indiana.edu bioinformatics, biology dept., indiana university, bloomington, in 47405 USA From pabloj79 at yahoo.com.ar Tue Sep 30 17:49:07 2003 From: pabloj79 at yahoo.com.ar (=?iso-8859-1?q?pablo=20gonzalez?=) Date: Tue, 30 Sep 2003 18:49:07 -0300 (ART) Subject: [BiO BB] searching information In-Reply-To: <3F6B1913.2080408@bioinformatics.org> Message-ID: <20030930214908.66913.qmail@web41905.mail.yahoo.com> Mr. Jeff W. Bizzarro: thank you for aswer my letter. I?m working with pseudomonas and I would like to ask you about how can i do to reduce the number of hypothetical proteins ( if it?s possible) and obtein more information about known proteins, when i use BLAST program. Yours sincerely Pablo J. Gonzalez Universidad Nacional de Rio Cuarto - C?rdoba - ARGENTINA "J.W. Bizzaro" wrote: Hi Pablo. You're probably looking for sequence alignment tools such as BLAST or Clustal. You may want to post the question with more detail (what specifically do you want to do) to the bio_bulletin_board at bioinformatics.org Cheers. Jeff Pablo J.Gonzalez wrote: > I am argentinian and i am studing microbiology and i want > information about how to use the bioinformatics tools,(for > search homologies between microorganism sequences and > proteins) becausse the bad economic situation i can?t buy a > book for learn the keys for a good handling of the tools > that offer internet, so, i will be gratefull for > information about free sites for consult the information > what i want. > Thank you for your time. > PABLO J. GONZALEZ -- J.W. Bizzaro jeff at bioinformatics.org President, Bioinformatics.Org http://bioinformatics.org/~jeff "As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously." -- Benjamin Franklin -- --------------------------------- Internet GRATIS es Yahoo! Conexi?n. Usuario: yahoo; contrase?a: yahoo Desde Buenos Aires: 4004-1010 M?s ciudades: clic aqu?. -------------- next part -------------- An HTML attachment was scrubbed... URL: