From B.A.T.Svensson at lumc.nl Fri Oct 1 02:16:31 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: Fri, 1 Oct 2004 08:16:31 +0200 Subject: [BiO BB] What? Message-ID: From: Dan Bolser Subject: Re: [BiO BB] What? > It takes me here whatever word I type in the URL. Your browser most likely supports the "take-me-to-my-predefine-page-if-you-cannot-lookup-the-funny-entry-I-gave-yo u"-feature. From aleman at linuxmail.org Fri Oct 1 03:00:35 2004 From: aleman at linuxmail.org (Stephan Aleman) Date: Thu, 30 Sep 2004 23:00:35 -0800 Subject: [BiO BB] What? Message-ID: <20041001070035.3A0F6398198@ws5-1.us4.outblaze.com> If you think your version has a bug, find the fix in Bugzilla at https://bugzilla.mozilla.org/enter_bug.cgi?format=guided or download a recent version that may already contain the fix. Try to toggle your browser's location field search capability. Stephan ~/~ ----- Original Message ----- From: "Svensson, B.A.T. (HKG)" Date: Fri, 1 Oct 2004 08:16:31 +0200 To: "'bio_bulletin_board at bioinformatics.org '" Subject: RE: [BiO BB] What? > From: Dan Bolser > Subject: Re: [BiO BB] What? > > > It takes me here whatever word I type in the URL. > > Your browser most likely supports the > "take-me-to-my-predefine-page-if-you-cannot-lookup-the-funny-entry-I-gave-yo > u"-feature. > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- ______________________________________________ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze From ryangolhar at hotmail.com Fri Oct 1 10:47:27 2004 From: ryangolhar at hotmail.com (Ryan Golhar) Date: Fri, 1 Oct 2004 10:47:27 -0400 Subject: [BiO BB] What? In-Reply-To: <20041001070035.3A0F6398198@ws5-1.us4.outblaze.com> Message-ID: <001301c4a7c5$921c9880$0b43fea9@GOLHARMOBILE1> And what does this URL thing have to do with Bio BB? -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Stephan Aleman Sent: Friday, October 01, 2004 3:01 AM To: bio_bulletin_board at bioinformatics.org Subject: RE: [BiO BB] What? If you think your version has a bug, find the fix in Bugzilla at https://bugzilla.mozilla.org/enter_bug.cgi?format=guided or download a recent version that may already contain the fix. Try to toggle your browser's location field search capability. Stephan ~/~ ----- Original Message ----- From: "Svensson, B.A.T. (HKG)" Date: Fri, 1 Oct 2004 08:16:31 +0200 To: "'bio_bulletin_board at bioinformatics.org '" Subject: RE: [BiO BB] What? > From: Dan Bolser > Subject: Re: [BiO BB] What? > > > It takes me here whatever word I type in the URL. > > Your browser most likely supports the > "take-me-to-my-predefine-page-if-you-cannot-lookup-the-funny-entry-I-g > ave-yo > u"-feature. > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- ______________________________________________ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From letondal at pasteur.fr Fri Oct 1 12:50:52 2004 From: letondal at pasteur.fr (Catherine Letondal) Date: Fri, 01 Oct 2004 18:50:52 +0200 Subject: [Bio BB] Implementation or software packages of suffix tree In-Reply-To: Your message of "Thu, 30 Sep 2004 16:00:32 +0800." <004701c4a6c3$92be8710$6300a8c0@emilysu> Message-ID: <200410011650.i91GoqCp209544@electre.pasteur.fr> "Chia-Yu Su" wrote: > > Hi, > > I want to construct suffix trees to extract repeats from sequences. > I found several softwares from the web (by Dan Gusfield, Ting Chen, = > etc). > Has anyone used any recommended suffix tree software? > Are there any implementation or softwares of suffix trees which allows = > long sequence inputs and=20 > generate results in linear time? > Please give me some suggestions about suffix tree software and where to = > download the web. You can look at: http://www-igm.univ-mlv.fr/~marsan/recherches.html and http://www-igm.univ-mlv.fr/~marsan/smile.html > > Thanks a lot for your help! > > Best regards, > Chia-Yu > ------------ > Chia-Yu Su > Bioinformatics Program of=20 > Taiwan International Graduate Program > Academia Sinica, Taiwan > > E-mail: cysu at iis.sinica.edu.tw > Phone: +886-2-27883799 ext. 2302 > Fax: +886-2-27824814 -- Catherine Letondal -- Pasteur Institute Computing Center From operon at cbiot.ufrgs.br Fri Oct 1 13:03:55 2004 From: operon at cbiot.ufrgs.br (Marcos Oliveira de Carvalho) Date: Fri, 01 Oct 2004 14:03:55 -0300 Subject: [Bio BB] Implementation or software packages of suffix tree In-Reply-To: <200410011650.i91GoqCp209544@electre.pasteur.fr> References: <200410011650.i91GoqCp209544@electre.pasteur.fr> Message-ID: Also: http://www.techfak.uni-bielefeld.de/~kurtz/ > "Chia-Yu Su" wrote: >> >> Hi, >> >> I want to construct suffix trees to extract repeats from sequences. >> I found several softwares from the web (by Dan Gusfield, Ting Chen, = >> etc). >> Has anyone used any recommended suffix tree software? >> Are there any implementation or softwares of suffix trees which allows = >> long sequence inputs and=20 >> generate results in linear time? >> Please give me some suggestions about suffix tree software and where to >> = >> download the web. > > You can look at: > http://www-igm.univ-mlv.fr/~marsan/recherches.html > and http://www-igm.univ-mlv.fr/~marsan/smile.html > >> >> Thanks a lot for your help! >> >> Best regards, >> Chia-Yu >> ------------ >> Chia-Yu Su >> Bioinformatics Program of=20 >> Taiwan International Graduate Program >> Academia Sinica, Taiwan >> >> E-mail: cysu at iis.sinica.edu.tw >> Phone: +886-2-27883799 ext. 2302 >> Fax: +886-2-27824814 > > -- > Catherine Letondal -- Pasteur Institute Computing Center > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From ryangolhar at hotmail.com Sun Oct 3 21:26:05 2004 From: ryangolhar at hotmail.com (Ryan Golhar) Date: Sun, 3 Oct 2004 21:26:05 -0400 Subject: [BiO BB] PAM vs BLOSUM Message-ID: <000d01c4a9b1$1e2abf40$9500a8c0@GOLHARMOBILE1> I started teaching a class and am explainin how PAM and BLOSUM matrices are derived and how a log-odds matrix works. I was wondering if anyone could recommend possible examples to use as homework questions, or hands-on exercises for students to become aware of the benefits and differences in the matrices. I found a Matlab demo that compares the scores of alignments using different matrices, but it doesn't go to the extent that I was hoping. If you have any recommendations, I'd love to hear them. Thanks, Ryan From priyaa_b at yahoo.com Sun Oct 3 22:00:58 2004 From: priyaa_b at yahoo.com (Priya) Date: Sun, 3 Oct 2004 19:00:58 -0700 (PDT) Subject: [BiO BB] PAM vs BLOSUM In-Reply-To: <000d01c4a9b1$1e2abf40$9500a8c0@GOLHARMOBILE1> Message-ID: <20041004020058.63381.qmail@web41210.mail.yahoo.com> Hi Ryan, Guess this link wud be pretty useful.. http://www.matfys.kvl.dk/bioinformatik/exercise-multiple.html Cheers Priya. priya --------------------------------- Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryangolhar at hotmail.com Tue Oct 5 15:45:14 2004 From: ryangolhar at hotmail.com (Ryan Golhar) Date: Tue, 5 Oct 2004 15:45:14 -0400 Subject: [BiO BB] Random Sequence Generator Message-ID: <005b01c4ab13$d53f4bc0$ac00a8c0@GOLHARMOBILE1> Can anyone recommend a good random sequence generator program available for Linux? Ryan From dmb at mrc-dunn.cam.ac.uk Tue Oct 5 16:18:01 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue, 5 Oct 2004 21:18:01 +0100 (BST) Subject: [BiO BB] Random Sequence Generator In-Reply-To: <005b01c4ab13$d53f4bc0$ac00a8c0@GOLHARMOBILE1> Message-ID: On Tue, 5 Oct 2004, Ryan Golhar wrote: >Can anyone recommend a good random sequence generator program available >for Linux? perl -e '@x=qw(A T C G);for(1..10000){print $x[rand(@x)]}' Other than that it kind of depends on what your requirements are (and aparently the above random generator isn't too good). perl -e '@x=qw(all work and no play);while(1){print $x[rand(@x)]}' > >Ryan >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From idoerg at burnham.org Tue Oct 5 16:19:51 2004 From: idoerg at burnham.org (Iddo) Date: Tue, 05 Oct 2004 13:19:51 -0700 Subject: [BiO BB] Random Sequence Generator In-Reply-To: <005b01c4ab13$d53f4bc0$ac00a8c0@GOLHARMOBILE1> References: <005b01c4ab13$d53f4bc0$ac00a8c0@GOLHARMOBILE1> Message-ID: <416301E7.4070209@burnham.org> Ryan Golhar wrote: >Can anyone recommend a good random sequence generator program available >for Linux? > >Ryan >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > EMBOSS has shuffleseq. ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From landman at scalableinformatics.com Tue Oct 5 16:20:08 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 05 Oct 2004 16:20:08 -0400 Subject: [BiO BB] Random Sequence Generator In-Reply-To: References: Message-ID: <416301F8.5000106@scalableinformatics.com> For starters... use a good (P)RNG (rand is not appropriate for many real research cases). Have a look at Mersenne Twister http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html and the Perl module http://search.cpan.org/~ams/Math-Random-MT-1.03/MT.pm . That said, you have to be careful about what random means in terms of the specifics of the distribution. If you are looking at CG rich regions, you expect a good PRNG to give you effectively .25... probability of any of the letters, so you will not get (unless you bias the PRNG distribtion) a CG rich set of "random" sequence data. Just some thoughts... Joe Dan Bolser wrote: >On Tue, 5 Oct 2004, Ryan Golhar wrote: > > > >>Can anyone recommend a good random sequence generator program available >>for Linux? >> >> > >perl -e '@x=qw(A T C G);for(1..10000){print $x[rand(@x)]}' > >Other than that it kind of depends on what your requirements are (and >aparently the above random generator isn't too good). > >perl -e '@x=qw(all work and no play);while(1){print $x[rand(@x)]}' > > > > >>Ryan >>_______________________________________________ >>BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615 From bader at cbio.mskcc.org Tue Oct 5 16:23:51 2004 From: bader at cbio.mskcc.org (Gary Bader) Date: Tue, 5 Oct 2004 16:23:51 -0400 Subject: [BiO BB] Random Sequence Generator In-Reply-To: <005b01c4ab13$d53f4bc0$ac00a8c0@GOLHARMOBILE1> Message-ID: <00fc01c4ab19$3a463bf0$8349a8c0@cbio.mskcc.org> Hi Ryan, Yes, the shuffleseq tool from EMBOSS does a pretty good job. http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/shuffleseq.html Also check out: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/msbar.html -Gary > -----Original Message----- > From: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Ryan > Golhar > Sent: Tuesday, October 05, 2004 3:45 PM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] Random Sequence Generator > > Can anyone recommend a good random sequence generator program available > for Linux? > > Ryan > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From idoerg at burnham.org Tue Oct 5 16:32:08 2004 From: idoerg at burnham.org (Iddo) Date: Tue, 05 Oct 2004 13:32:08 -0700 Subject: [BiO BB] Random Sequence Generator In-Reply-To: References: Message-ID: <416304C8.2010803@burnham.org> Dan Bolser wrote: >On Tue, 5 Oct 2004, Ryan Golhar wrote: > > > >>Can anyone recommend a good random sequence generator program available >>for Linux? >> >> > >perl -e '@x=qw(A T C G);for(1..10000){print $x[rand(@x)]}' > >Other than that it kind of depends on what your requirements are (and >aparently the above random generator isn't too good). > >perl -e '@x=qw(all work and no play);while(1){print $x[rand(@x)]}' > > > > >>Ryan >>_______________________________________________ >>BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > Well, as long as we are in the not-so-great-one-line-random-sequence-generators business: python -c 'import random; z=["A","C","G","T"]*10; random.shuffle(z); print z' For a sequence of length 10. ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From dmb at mrc-dunn.cam.ac.uk Tue Oct 5 16:43:12 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue, 5 Oct 2004 21:43:12 +0100 (BST) Subject: [BiO BB] Random Sequence Generator In-Reply-To: <416301F8.5000106@scalableinformatics.com> Message-ID: On Tue, 5 Oct 2004, Joe Landman wrote: >For starters... use a good (P)RNG (rand is not appropriate for many real >research cases). Have a look at Mersenne Twister >http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html and the Perl >module http://search.cpan.org/~ams/Math-Random-MT-1.03/MT.pm . > >That said, you have to be careful about what random means in terms of >the specifics of the distribution. If you are looking at CG rich >regions, you expect a good PRNG to give you effectively .25... >probability of any of the letters, so you will not get (unless you bias >the PRNG distribtion) a CG rich set of "random" sequence data. > >Just some thoughts... That reminds me, you can get random sequences 'in the syle of' a particular HMM with HMMER (hmmemit). So you can build an hmm of a particular family and then generate psudo family members, or use existing hmm models for pfam / scop etc. > >Joe > >Dan Bolser wrote: > >>On Tue, 5 Oct 2004, Ryan Golhar wrote: >> >> >> >>>Can anyone recommend a good random sequence generator program available >>>for Linux? >>> >>> >> >>perl -e '@x=qw(A T C G);for(1..10000){print $x[rand(@x)]}' >> >>Other than that it kind of depends on what your requirements are (and >>aparently the above random generator isn't too good). >> >>perl -e '@x=qw(all work and no play);while(1){print $x[rand(@x)]}' >> >> >> >> >>>Ryan >>>_______________________________________________ >>>BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >>> >> >>_______________________________________________ >>BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> > > From B.A.T.Svensson at lumc.nl Wed Oct 6 02:17:51 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: Wed, 6 Oct 2004 08:17:51 +0200 Subject: [BiO BB] Random Sequence Generator Message-ID: There is an entropy demon software for Linux, I don't remeber the name right now, but it collects events from different part of the system , like I/O and the process list, and generates random numners, that can be polled. I'll see if I can find out the package name for you. -----Original Message----- From: Ryan Golhar To: bio_bulletin_board at bioinformatics.org Sent: 5-10-2004 21:45 Subject: [BiO BB] Random Sequence Generator Can anyone recommend a good random sequence generator program available for Linux? Ryan From bioinformatics2005 at yahoo.com.cn Wed Oct 6 08:18:47 2004 From: bioinformatics2005 at yahoo.com.cn (Fei Li) Date: Wed, 6 Oct 2004 20:18:47 +0800 (CST) Subject: [BiO BB] Random Sequence Generator In-Reply-To: Message-ID: <20041006121847.56475.qmail@web15702.mail.cnb.yahoo.com> I do not think we can find a general random sequence generator for all the case. At least, we need consider GC content, and maybe even content of "GC", "GG", "GT", "GA"......"TC"....."TA", etc. A "real" random sequnce is most likely useless in a specific case. Fei Dan Bolser wrote: On Tue, 5 Oct 2004, Joe Landman wrote: > >That said, you have to be careful about what random means >Just some thoughts... That reminds me, you can get random sequences 'in the syle of' a particular HMM with HMMER (hmmemit). > >Joe > >Dan Bolser wrote: > >>On Tue, 5 Oct 2004, Ryan Golhar wrote: >> >>perl -e '@x=qw(A T C G);for(1..10000){print $x[rand(@x)]}' >> >>Other than that it kind of depends on what your requirements are (and >>aparently the above random generator isn't too good). >> >>perl -e '@x=qw(all work and no play);while(1){print $x[rand(@x)]}' >>>Ryan >>>_______________________________________________ ------------------------------------------------- Fei Li, PhD, Postdoc fellow Institute of Bioinformatics MOE key laboratory of Bioinformatics Tsinghua University Beijing, 100084 China Tel:0086-10-62782877 Fax:0086-10-62786911 E-mail: flee at tsinghua.edu.cn Homepages:http://166.111.30.65/member/~lifei/ --------------------------------- Do You Yahoo!? 150??MP3???????????? ??????????????????? 1G??1000??????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From boris.steipe at utoronto.ca Wed Oct 6 09:45:53 2004 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 6 Oct 2004 09:45:53 -0400 Subject: [BiO BB] Random Sequence Generator In-Reply-To: <20041006121847.56475.qmail@web15702.mail.cnb.yahoo.com> Message-ID: <0AE099D2-179E-11D9-8370-000A9577512E@utoronto.ca> On Wednesday, Oct 6, 2004, at 08:18 Canada/Eastern, Fei Li wrote: > I do not think we can find a general random sequence generator for all > the case.? At least, we need consider GC content, and maybe even > content of? "GC", "GG", "GT", "GA"......"TC"....."TA", etc.?? A "real" > random sequnce is most likely useless in a specific case. > ? > Fei > Here is how this is done in principle: random characters with arbitrary, predefined target frequencies. (The code is a teaching example for non-Perl programmers, VERY explicit, one could of course do this MUCH more concisely in "real" Perl, at the expense of legibility). The target frequencies do not need to sum to 1.0, they can be raw counts from an example sequence. Changing this to amino acids instead of nucleotides should be straightforward. In this kind of simulation, you assume that all nucleotides are independent, this does not conserve dinucleotide, trinucleotide frequencies etc. If higher order correlations may play a role, it would be more appropriate to randomly sample from the original, rather than simulate a sequence. Boris ======================================================================= #!/usr/bin/perl use strict; use warnings; my $OutputLength = 100; my @Character; # Stores the emitted characters my @Frequency; # Stores the target frequency of each character my @Interval; # Stores the summed probability of each character initializeArrays(); for (my $i=0; $i < $OutputLength; $i++) { print getCharacter(rand); } print "\n"; exit(); # ====== Subroutines ======================= sub initializeArrays { my $Sum = 0; # Initialize the Character array - we enter target frequencies explicitly # but this could also be the result of counting amino acids in a database, # or nucleotides in a genome etc. # In this example we generate a skewed nucleotide composition $Character[0] = 'A'; $Frequency[0] = 17; $Character[1] = 'G'; $Frequency[1] = 27; $Character[2] = 'C'; $Frequency[2] = 23; $Character[3] = 'T'; $Frequency[3] = 19; # Convert the values of the target frequencies array to the top boundaries # of intervals having relative widths proportional to their relative frequency # and ranging from 0 to 1.0 for (my $i=0; $i <= $#Frequency; $i++) { $Sum += $Frequency[$i]; } for (my $i=0; $i <= $#Frequency; $i++) { $Interval[$i] = 0; for (my $j=0; $j <= $i; $j++) { $Interval[$i] += $Frequency[$j] / $Sum; } } return(); } # ========================================== sub getCharacter { my ($RandomNumber) = @_; my $i=0; # Which interval does the random number fall into ? This interval has the # index of the character we output. for ($i=0; $i <= $#Interval; $i++) { if ($Interval[$i] > $RandomNumber) { last; } } return($Character[$i]); } From landman at scalableinformatics.com Wed Oct 6 10:23:54 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 06 Oct 2004 10:23:54 -0400 Subject: [BiO BB] Random Sequence Generator In-Reply-To: <0AE099D2-179E-11D9-8370-000A9577512E@utoronto.ca> References: <0AE099D2-179E-11D9-8370-000A9577512E@utoronto.ca> Message-ID: <4163FFFA.4010902@scalableinformatics.com> Boris Steipe wrote: > In this kind of simulation, you assume that all nucleotides are > independent, this does not conserve dinucleotide, trinucleotide > frequencies etc. If higher order correlations may play a role, it > would be more appropriate to randomly sample from the original, rather > than simulate a sequence. Might be better (if you need multi-letter properties to match some sequence library set), to sample the distribution of the multi-letters, and pull randomly from there as compared to single letters. This way you can (to an extent) preserve correllations at the di-/tri-/... higher orders as required, though you will miss still higher order patterns (and isn't that what some of the HMM tools are for anyway?) and still "randomly" sample. Though with all due respect, please don't use "rand" for random numbers. The Mersenne twister and other modern pseudo-random number generators (PRNG) have superior properties, and decades of work on the part of folks doing Monte Carlo work in physics and chemistry have indicated that the quality of the PRNG is quite important. So what I am saying is that if you need to emit "random patterns" with similar di-nucleotide or tri-nucletide frequencies, that you emit di-nucleotides and tri-nucleotides versus single nucleotides. Joe [good/readable perl code removed: ] -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From boris.steipe at utoronto.ca Wed Oct 6 10:56:39 2004 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 6 Oct 2004 10:56:39 -0400 Subject: [BiO BB] Random Sequence Generator In-Reply-To: <4163FFFA.4010902@scalableinformatics.com> Message-ID: Sorry, I was misleading: "randomly sample" should be "randomly sample di-/tri-/... nucleotides from the original" ... This is to address the problem that your original sequence may not be sufficiently long to get meaningful frequencies. Eg. a hexamer randomly pulled from a 1kb promoter sequence implicitly represents that sequence's underlying hexamer frequencies; but I could not compile frequencies of _all_ 4096 hexamers from 1 kb. Of course you can't use this to see whether such a hexamer would be overrepresented - you need an independent random model for that. But you can look at separations between patterns, clustering, correlations and the like. Be well, Boris (I concur on the PRNG issue; However, for applications where "really random" is important, why not use true random numbers obtained from a physical process. This is easy enough to do, e.g. see http://www.lavarnd.org :-) On Wednesday, Oct 6, 2004, at 10:23 Canada/Eastern, Joe Landman wrote: > > > Boris Steipe wrote: > >> In this kind of simulation, you assume that all nucleotides are >> independent, this does not conserve dinucleotide, trinucleotide >> frequencies etc. If higher order correlations may play a role, it >> would be more appropriate to randomly sample from the original, >> rather than simulate a sequence. > > > > Might be better (if you need multi-letter properties to match some > sequence library set), to sample the distribution of the > multi-letters, and pull randomly from there as compared to single > letters. This way you can (to an extent) preserve correllations at > the di-/tri-/... higher orders as required, though you will miss still > higher order patterns (and isn't that what some of the HMM tools are > for anyway?) and still "randomly" sample. Though with all due > respect, please don't use "rand" for random numbers. The Mersenne > twister and other modern pseudo-random number generators (PRNG) have > superior properties, and decades of work on the part of folks doing > Monte Carlo work in physics and chemistry have indicated that the > quality of the PRNG is quite important. > > > So what I am saying is that if you need to emit "random patterns" with > similar di-nucleotide or tri-nucletide frequencies, that you emit > di-nucleotides and tri-nucleotides versus single nucleotides. > Joe > > [good/readable perl code removed: ] > > -- > Joseph Landman, Ph.D > Scalable Informatics LLC, > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > phone: +1 734 612 4615 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From skadener at brandeis.edu Thu Oct 7 16:08:06 2004 From: skadener at brandeis.edu (Sebastian Kadener) Date: Thu, 7 Oct 2004 16:08:06 -0400 Subject: [BiO BB] Finding common TF binding sites in promoter regions of different insect species Message-ID: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> Dear list members, I am trying to find TF binding sites (not ones that are currently known and in databases) in the promoter regions of at least four species of insects (the 3 sequenced drosophila and anopheles). I have used meme for this already but the results that I got weren't so good. I was thinking of performing some kind of multiple local alignment to find regulatory elements among the promoters but I wasn't sure how to go about doing this. If anyone has any suggestions on how to find TF binding sites in promoters, I would be very interested in hearing them. Thank you in advance. Sebastian -- Sebastian Kadener Rosbash laboratory Dept. of Biology, MS008 Brandeis University/HHMI 781 736 3163 tel 415 South St. 781 736 3164 fax Waltham, MA 02454 skadener at brandeis.edu From hjm at tacgi.com Thu Oct 7 16:21:42 2004 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 7 Oct 2004 13:21:42 -0700 Subject: [BiO BB] Finding common TF binding sites in promoter regions of different insect species In-Reply-To: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> References: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> Message-ID: <200410071321.42490.hjm@tacgi.com> see: Nature. 2004 Sep 2;431(7004):99-104. http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v431/n7004/abs/nature02800_fs.html&dynoptions=doi1097180708 and refs therein. hjm On Thursday 07 October 2004 1:08 pm, Sebastian Kadener wrote: > Dear list members, > > I am trying to find TF binding sites (not ones that are currently > known and in databases) in the promoter regions of at least four > species of insects (the 3 sequenced drosophila and anopheles). I > have used meme for this already but the results that I got weren't > so good. > > I was thinking of performing some kind of multiple local alignment > to find regulatory elements among the promoters but I wasn't sure > how to go about doing this. > > If anyone has any suggestions on how to find TF binding sites in > promoters, I would be very interested in hearing them. Thank you > in advance. > > Sebastian -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From operon at cbiot.ufrgs.br Thu Oct 7 16:22:00 2004 From: operon at cbiot.ufrgs.br (Marcos Oliveira de Carvalho) Date: Thu, 07 Oct 2004 17:22:00 -0300 Subject: [BiO BB] Finding common TF binding sites in promoter regions of different insect species In-Reply-To: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> References: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> Message-ID: Hi Sebastian, Take look at http://www.esat.kuleuven.ac.be/~thijs/Work/MotifSampler.html Cheers Marcos On Thu, 7 Oct 2004 16:08:06 -0400, Sebastian Kadener wrote: > Dear list members, > > I am trying to find TF binding sites (not ones that are currently known > and in > databases) in the promoter regions of at least four species of insects > (the 3 > sequenced drosophila and anopheles). I have used meme for this already > but > the results that I got weren't so good. > > I was thinking of performing some kind of multiple local alignment to > find > regulatory elements among the promoters but I wasn't sure how to go about > doing this. > > If anyone has any suggestions on how to find TF binding sites in > promoters, I > would be very interested in hearing them. Thank you in advance. > > Sebastian > > From ryangolhar at hotmail.com Thu Oct 7 22:09:22 2004 From: ryangolhar at hotmail.com (Ryan Golhar) Date: Thu, 7 Oct 2004 22:09:22 -0400 Subject: [BiO BB] Finding common TF binding sites in promoter regions of different insect species In-Reply-To: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> Message-ID: <005301c4acdb$d3f09070$4900a8c0@GOLHARMOBILE1> You should consider using FootPrinter for this... Ryan -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Sebastian Kadener Sent: Thursday, October 07, 2004 4:08 PM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Finding common TF binding sites in promoter regions of different insect species Dear list members, I am trying to find TF binding sites (not ones that are currently known and in databases) in the promoter regions of at least four species of insects (the 3 sequenced drosophila and anopheles). I have used meme for this already but the results that I got weren't so good. I was thinking of performing some kind of multiple local alignment to find regulatory elements among the promoters but I wasn't sure how to go about doing this. If anyone has any suggestions on how to find TF binding sites in promoters, I would be very interested in hearing them. Thank you in advance. Sebastian -- Sebastian Kadener Rosbash laboratory Dept. of Biology, MS008 Brandeis University/HHMI 781 736 3163 tel 415 South St. 781 736 3164 fax Waltham, MA 02454 skadener at brandeis.edu _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From thompson at wadsworth.org Fri Oct 8 13:11:34 2004 From: thompson at wadsworth.org (William Thompson) Date: Fri, 8 Oct 2004 13:11:34 -0400 (EDT) Subject: [BiO BB] Re: Finding common TF binding sites in promoter regions of different insect species Message-ID: <200410081711.i98HBYj04762@csserv.wadsworth.org> There are a number of good tools on the web for finding TFBS: meme which you mentioned and Gibbs sampling based tools http://bayesweb.wadsworth.org/gibbs/gibbs.html and http://www.esat.kuleuven.ac.be/~thijs/Work/MotifSampler.html. They work best when the sequences that you are searching are regulatory regions from co-regulated genes in a single species or from orthologous genes from related species. It sounds like you have already regions from related species. You might want to check out http://bayesweb.wadsworth.org/web_help.PF.html and http://bayesweb.wadsworth.org/web_help_text.CE.html. There are some general guidelines there that might be helpful. Bill Thompson, PhD Bioinformatics Center NYS Department of Health Center for Medical Science, rm 2006 150 New Scotland Avenue Albany, New York 12208 (518) 486-7882 thompson at wdsworth.org > Dear list members, > > I am trying to find TF binding sites (not ones that are currently known and in > databases) in the promoter regions of at least four species of insects (the 3 > sequenced drosophila and anopheles). I have used meme for this already but > the results that I got weren't so good. > > I was thinking of performing some kind of multiple local alignment to find > regulatory elements among the promoters but I wasn't sure how to go about > doing this. > > If anyone has any suggestions on how to find TF binding sites in promoters, I > would be very interested in hearing them. Thank you in advance. > > Sebastian > > > -- > Sebastian Kadener > Rosbash laboratory > Dept. of Biology, MS008 > Brandeis University/HHMI 781 736 3163 tel > 415 South St. 781 736 3164 fax > Waltham, MA 02454 skadener at brandeis.edu From Nadia.Bolshakova at cs.tcd.ie Tue Oct 12 06:17:27 2004 From: Nadia.Bolshakova at cs.tcd.ie (Nadia Bolshakova) Date: Tue, 12 Oct 2004 11:17:27 +0100 Subject: [BiO BB] CALL FOR PAPERS: The 18th IEEE Symposium on Computer-Based Medical Systems Message-ID: <00cf01c4b044$acc7b390$6e26e286@DBNJK90J> IEEE CBMS 2005 The 18th IEEE Symposium on Computer-Based Medical Systems http://conferences.computer.org/CBMS2005/index.html --- PRELIMINARY CALL FOR PAPERS --- The 18th IEEE Symposium on Computer-Based Medical Systems (CBMS 2005) will be held on June 23-24, 2005 at Trinity College Dublin, Ireland. CBMS 2005 is co-sponsored by the IEEE Computer Society (Technical Committee on Computational Medicine, TCCM), and Department of Computer Science, Trinity College Dublin. The conference Web site is http://www.cs.tcd.ie/research_groups/mlg/CBMS2005/index.html http://conferences.computer.org/CBMS2005/index.html (mirror) CBMS 2005 is intended to provide an international forum for discussing the latest results in the field of computational medicine. The symposium is dedicated to a broad arena of issues which relate computing to medicine, with a focus on bioinformatics. The symposium consists of regular sessions with technical contributions reviewed and selected by an international program committee, as well as of invited talks and tutorials given by leading scientists. The symposium provides a mechanism for the exchange of ideas and technologies between academicians and industrial scientists who are developing Computer-Based Medical Systems. It is the premiere symposium in its field, attracting a worldwide audience. This symposium draws together experts in many fields to discuss the latest advances in medical systems based upon computers. Topics of interest include, but are not limited to: * Software Systems in Medicine * Computer-Aided Diagnosis * Knowledge-based Systems & Data Mining * Decision Support Systems * Medical Devices with Embedded Computers * Signal and Image Processing in Medicine * Medical Image Segmentation & Compression * Network and Telemedicine Systems * Medical Databases & Information Systems * Web-based Delivery of Medical Information * Multimedia Biomedical Databases * Content Analysis of Biomedical Image Data * Hand-held Computing Applications in Medicine * Bioinformatics in Medicine IMPORTANT DATES January 26, 2005 Submission of (3-page, maximum) paper summary March 1, 2005 Notification of acceptance March 24, 2005 Final camera-ready paper (6 pages, maximum) due March 24, 2005 Pre-registration deadline You must pre-register to have your paper published in the proceedings. You must pre-register to have a paper published in the proceedings. If you only plan to attend and are not submitting a paper, pre-registration is still strongly encouraged. This conference is space-limited, and registration may not be available on-site. SUBMISSION PROCEDURES FOR PAPER SUMMARIES No hardcopy submissions are being accepted. Electronic submissions of original technical research papers will only be accepted in PDF format. File size is limited to 2 MB. Use a 12-point font size and single-spaced text on a maximum of three A4 pages, including figures and references. Include one cover sheet, stating the paper title, authors, technical area(s) covered in the article, and corresponding author's information (telephone, fax, mailing address, e-mail address). Author names should appear only on the cover sheet, not on the summary. Submit your manuscript no later than January 26, 2005. Authors will be notified of acceptance by March 1, 2005 after a review process by three independent experts. Each accepted paper will be published in the conference proceedings by IEEE CS Press, conditional upon the author's advance registration. INVITED SPEAKERS Prof. Jane B. Grimson Co-Chair, Centre for Health Informatics Vise Provost, Trinity College Dublin Ireland Prof. Jan Komorowski Head, The Linnaeus Centre for Bioinformatics Uppsala University Sweden Dr. R. Bharat Rao Department Head, Clinical CAD & Data Mining Computer Aided Diagnosis & Therapy Siemens Medical Solutions, Inc. USA INTENDED AUDIENCE Engineers, scientists, clinicians and managers involved in medical computing projects are encouraged to submit papers to the symposium and/or attend the symposium. The symposium provides its attendees with an opportunity to experience state-of-the-art research and development in a variety of topics directly and indirectly related to their own work. In addition to research papers, keynote speakers and tutorial sessions it provides participants with an opportunity to come up-to-date on important technological issues. The symposium encourages the participation of students engaged in research/development in computer-based medical systems. A prize will be awarded to be the Best Student Paper submitted to (and presented at) the symposium. HOTEL INFORMATION A block of rooms will be reserved for symposium attendees within Trinity College Dublin (see http://www.tcd.ie/Conferences). The rate for these rooms is euro55-65 per night. There are a number of other hotels of different quality and price ranges within walking distance, just a few minutes away from the college. These include: * Trinity Lodge Guesthouse, www.trinitylodge.com (rooms from euro95), * Trinity Capital Hotel, www.trinitycapitalhotel.com (rooms from euro123), and * Davenport Hotel, www.ocallaghanhotels.com/davenport/ (rooms from euro200). VENUE In comparison to other capitals, Dublin is a convivial city with a human scale and an extrovert populace. At the heart of the city lies Trinity College with its magnificent buildings and beautiful campus spanning 35 acres. Trinity College has occupied this location for the past 400 years, and while none of the original buildings remain, the College boasts fine squares and gardens and a collection of magnificent buildings dating from the 17th to the 20th century. In contrast to the serene surroundings of the College landscape, the modern city of Dublin is just outside the main gate of the campus, with shops, theatres, cinemas and museums within walking distance. Dublin Airport is served by most international airlines and the low fare airlines continue to expand their services into Ireland. London is an hour's flight away and the east coast of the United States only five and half hours away by air. International carriers servicing Ireland include: Aer Lingus, Air France, Alitalia, Aeroflot, British Airways, British Midland, Continental Airlines, Delta Airlines, Finnair, Iberia, Lufthansa, Ryanair, Sabena, and SAS. The history of the city of Dublin stems from 841AD when the Vikings created a settlement. They founded a City-state in 917. After the Anglo-Norman invasion of 1170 the Anglo-Normans erected defensive walls around the Castle. These have been partially reconstructed and can be seen at St.Audoen's Church. Dublin is set in a wonderful part of the island. With the sea close by (a beach only as far a way as Killiney), and countryside and rolling hills surrounding the City there is plenty for you to do whatever your interests. The City itself is renowned for its writers, artists and musicians. There are many venues throughout the city where you can hear various types of music from classical to rock and pop. See more information at www.dublin.ie and www.entertainment.ie. PREVIOUS CONFERENCES Previous conferences were held in Bethesda MD, USA (2004 and 2001), New York City, USA (2003), Maribor, Slovenia (2002 and 1997 ), Houston TX, USA (2000), Stamford CT, USA (1999), Lubbock TX, USA (1998 and 1995), Ann Arbor MI, USA (1996 and 1993), Winston-Salem NC, USA (1994), Durham NC, USA (1992), Baltimore MD, USA (1991), Chapel Hill NC, USA (1990), Minneapolis MN, USA (1989 and 1988). CBMS2004 photos from Bethesda, MD: http://archive.nlm.nih.gov/conf/cbms2004/index.htm For further questions, please use the appropriate contact from the list below For general conference questions: Prof. Padraig Cunningham, General Co-Chair, Padraig.Cunningham at cs.tcd.ie Dr. Alexey Tsymbal, General Co-Chair, Alexey.Tsymbal at cs.tcd.ie For paper submission questions: Dr. Nadia Bolshakova, Publication Chair, Nadia.Bolshakova at cs.tcd.ie Program Chairs Prof. Dr. Peter Kokol University of Maribor, Slovenia Dr. Marina Krol Mt Sinai School of Medicine, USA Publicity Chair Mykola Pechenizkiy University of Jyvaskyla, Finland Steering Committee Sunanda Mitra Texas Tech University, USA Peter Kokol University of Maribor, Slovenia Ian Greenshields University of Connecticut, USA Nasser Kehtarnavaz Texas A&M University, USA Tim Kriewall Medtronic Xomed, USA Marina Krol Mt Sinai School of Medicine, USA Rodney Long National Library of Medicine, USA Margaret Peterson Hospital for Special Surgery, USA George Thoma National Library of Medicine, USA Richard E. Wendt III MD Anderson Cancer Center, USA Program Committee and Reviewers Francisco Azuaje University of Ulster, Northern Ireland Virginia Gonzalez Velez Universidad Autonoma Metropolitana-Azcapotzalco, Mexico Vyacheslav Grebenyuk National Tech University, Ukraine Jane Grimson Trinity College Dublin, Ireland Jan Komorowski The Linnaeus Centre for Bioinformatics, Uppsala University, Sweden Oleksandr Logvynovskiy London South Bank University, UK Chris Nugent University of Ulster, Northern Ireland David W. Patterson Northern Ireland Knowledge Engineering Lab, University of Ulster, Northern Ireland Seppo Puuronen University of Jyvaskyla, Finland Bharat Rao Siemens Medical Solutions, USA Niall Rooney Northern Ireland Knowledge Engineering Lab, University of Ulster, Northern Ireland Michael Shifrin N.N.Burdenko Neurosurgery Institute, Russia Vagan Terziyan University of Jyvaskyla, Finland Tatjana Welzer University of Maribor, Slovenia Building up a Program Committee is in progress... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nadia.Bolshakova at cs.tcd.ie Tue Oct 12 06:18:57 2004 From: Nadia.Bolshakova at cs.tcd.ie (Nadia Bolshakova) Date: Tue, 12 Oct 2004 11:18:57 +0100 Subject: [BiO BB] CALL FOR SPECIAL TRACK PROPOSALS: The 18th IEEE Symposium on Computer-Based Medical Systems Message-ID: <00ea01c4b045$372ca040$6e26e286@DBNJK90J> IEEE CBMS 2005 The 18th IEEE Symposium on Computer-Based Medical Systems http://conferences.computer.org/CBMS2005/index.html --- CALL FOR SPECIAL TRACK PROPOSALS --- Trinity College Dublin Ireland June 23-24, 2005 Deadline for submission of special track proposals is November 15, 2004 The IEEE CBMS 2005 Ogranising Committee invites proposals for Special Tracks for the 2005 IEEE Symposium on Computer-Based Medical Systems, to be held at Trinity College Dublin, Ireland, June 23-24, 2005. A special track usually consists of presentation of papers in a medical informatics sub discipline or a special field, refereed by researchers and practitioners in the field. Unlike workshops, where position papers and reports on initial and intended work are appropriate, papers selected for a special track should report on significant unpublished work suitable for publication as a conference paper. CBMS 2005 is co-sponsored by the IEEE Computer Society (Technical Committee on Computational Medicine, TCCM), and Department of Computer Science, Trinity College Dublin. More information on IEEE CBMS 2005 may be found on the conference web site: http://conferences.computer.org/CBMS2005/index.html. If you are interested in proposing a special track, please send us a proposal as described below, by the deadline. The CBMS Organising Committee will respond to you on the acceptance of your proposal by December 15, 2004. We expect you to follow this timetable to provide sufficient time for publicity of the special tracks. Each Special Track organizer will be responsible for the quality of the Special Track papers and manage the review process accordingly, assigning a Program Committee for the Track. All accepted papers will be published in the IEEE CBMS 2005 Proceedings by the IEEE Computer Science Press. Papers for special tracks which were not accepted by the corresponding Program Committee can be considered for publication as regular submissions by the General Program Committee of IEEE CBMS 2005. If you know of some colleague who might be interested in proposing a track, please share this announcement with her/him or send us the e-mail address of the colleague. Please find below the details to be included in your proposal and important dates. Proposals should be submitted by e-mail to Dr. Nadia Bolshakova (Nadia.Bolshakova at cs.tcd.ie), Special Tracks Chair. All proposals will be carefully reviewed. The notification of acceptance will be sent out before December 15, 2004. Please contact us with any questions that are not answered here, if you would like to find out more about proposing a special track for IEEE CBMS 2005. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = DETAILS TO BE INCLUDED IN YOUR PROPOSAL TO ORGANIZE A SPECIAL TRACK 1. Track title 2. Rough estimate of size (# of sessions / papers) Usually, 4-5 papers per session, 1-3 sessions per track 3. Track Program Committee with a brief biography of track organizer(s) 4. Draft of your Call for Papers IMPORTANT DATES Deadline for proposals November 15, 2004 Notification of acceptance December 15, 2004 Deadline for paper submissions (recommended) February 1, 2005 Notification of acceptance March 1, 2005 Camera-ready copy/ preregistration deadline March 24, 2005 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngadewal at yahoo.com Tue Oct 12 07:33:18 2004 From: ngadewal at yahoo.com (nikhil gadewal) Date: Tue, 12 Oct 2004 04:33:18 -0700 (PDT) Subject: [BiO BB] interactome prediction Message-ID: <20041012113318.18001.qmail@web40914.mail.yahoo.com> Hi all Members, I have around 135 genes involved in oral cancer for which there expression level and every detail information like 'genecards' are mined from medline and internet. I want to study the interaction of all the genes to know there involvment in tumor progression. Is there any tool which predict the molecular interaction map using the information of expression level or any other data. Thank you in advance. regards, Nikhil ===== NIKHIL S. GADEWAL Bioinformatics center, ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India E-mail: ngadewal at actrec.res.in __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From bioinformatics2005 at yahoo.com.cn Tue Oct 12 05:54:30 2004 From: bioinformatics2005 at yahoo.com.cn (Fei Li) Date: Tue, 12 Oct 2004 17:54:30 +0800 (CST) Subject: [BiO BB] Finding common TF binding sites in promoter regions of different insect species In-Reply-To: <1097179686.4165a2260acf9@webmail.staff.brandeis.edu> Message-ID: <20041012095430.13462.qmail@web15706.mail.cnb.yahoo.com> Hi, Sebastian, You can also check the webpages of Michael Q Zhang in CSHL, http://rulai.cshl.edu/ , he did lots of work on TFBS alignment between three drosophila species provide little useful informations. Because they are very close in phylogeny. Maybe, alignment between drosophila and anopheles will be better. I got this from Micahel. Finding motif by sequnce alignment will largely rely on the relationship between speices you selected. For more information, please directly contact Michael directly, a very kind PI in cshl. Best Fei Sebastian Kadener wrote: Dear list members, I am trying to find TF binding sites (not ones that are currently known and in databases) in the promoter regions of at least four species of insects (the 3 sequenced drosophila and anopheles). I have used meme for this already but the results that I got weren't so good. I was thinking of performing some kind of multiple local alignment to find regulatory elements among the promoters but I wasn't sure how to go about doing this. If anyone has any suggestions on how to find TF binding sites in promoters, I would be very interested in hearing them. Thank you in advance. Sebastian -- Sebastian Kadener Rosbash laboratory Dept. of Biology, MS008 Brandeis University/HHMI 781 736 3163 tel 415 South St. 781 736 3164 fax Waltham, MA 02454 skadener at brandeis.edu _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board ------------------------------------------------- Fei Li, PhD, Postdoc fellow Institute of Bioinformatics MOE key laboratory of Bioinformatics Tsinghua University Beijing, 100084 China Tel:0086-10-62782877 Fax:0086-10-62786911 E-mail: flee at tsinghua.edu.cn Homepages:http://166.111.30.65/member/~lifei/ --------------------------------- Do You Yahoo!? 150??MP3???????????? ??????????????????? 1G??1000??????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hchen at utmem.edu Tue Oct 12 10:13:01 2004 From: hchen at utmem.edu (Hao Chen) Date: Tue, 12 Oct 2004 09:13:01 -0500 Subject: [BiO BB] interactome prediction In-Reply-To: <20041012113318.18001.qmail@web40914.mail.yahoo.com> References: <20041012113318.18001.qmail@web40914.mail.yahoo.com> Message-ID: <20041012141301.GA2354@utmail.utmem.edu> Hello Nikhil On Tue, Oct 12, 2004 at 04:33:18AM -0700, nikhil gadewal wrote: > Hi all Members, > > I have around 135 genes involved in oral cancer for > which there expression level and every detail > information like 'genecards' are mined from medline > and internet. > I want to study the interaction of all the genes > to know there involvment in tumor progression. Is > there any tool which predict the molecular interaction > map using the information of expression level or any > other data. > You might be interested in www.chilibot.net This is a literature mining software that allows you to find, from the literature, known interactions among the genes of your interest. It can also find if they are known to be involved in "tumor" or "cancer" if you include these terms in the same query. A paper describing this software is just accepted for publication at http://www.biomedcentral.com/1471-2105/5/147/abstract Hao > Thank you in advance. > > regards, > Nikhil > > ===== > NIKHIL S. GADEWAL > Bioinformatics center, > ACTREC, Tata Memorial Centre, > Kharghar, Navi Mumbai, India > E-mail: ngadewal at actrec.res.in > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- - : Hao Chen, Ph.D. : Research Associate : Department of Pharmacology : University of Tennessee Health Science Center : Memphis, TN 38163 USA : Office: 901 448 3201 : Mobil: 901 826 1845 Mining PubMed: http://www.chilibot.net - From windyskyemail-open at yahoo.co.kr Fri Oct 1 08:53:42 2004 From: windyskyemail-open at yahoo.co.kr (Junguk Hur) Date: Fri, 1 Oct 2004 07:53:42 -0500 Subject: [BiO BB] Getting promotor region sequences of Yeast Message-ID: <20041001124741.F2C76D1F0E@www.bioinformatics.org> Dear all, I am trying to get intergenic sequences of yeast genes for promotor analyses. I have a gene list of 7K but don't know how to get the upstream region sequences. The gene list and relevant data was obtained from the following sites, Professor Young's Lab, http://web.wi.mit.edu/young/location/ Does anybody have idea how to get thoes upstream region sequences? Many thanks in advance, Junguk From KarnamPR at moffitt.usf.edu Fri Oct 1 14:40:50 2004 From: KarnamPR at moffitt.usf.edu (Karnam, Puru) Date: Fri, 1 Oct 2004 14:40:50 -0400 Subject: [BiO BB] Entrez Batch Message-ID: <74A93E0A840F7143BC98AFDD6FB5EEE1576B47@m-ex1.hlm.ad.moffitt.usf.edu> Hi, I can get the gi number corresponding to refseq(ex., NM_002265) using the Entrez. Using Entrez, If I want to get only GI, I need to parse the file. I was wondering if there is any tool just to obtain the GI numbers if I have a file with Refseqs. Thanks Puru ###################################################################### This transmission may be confidential or protected from disclosure and is only for review and use by the intended recipient. Access by anyone else is unauthorized. Any unauthorized reader is hereby notified that any review, use, dissemination, disclosure or copying of this information, or any act or omission taken in reliance on it, is prohibited and may be unlawful. If you received this transmission in error, please notify the sender immediately. Thank you. ###################################################################### From bcalves at briancalves.com Mon Oct 11 17:49:57 2004 From: bcalves at briancalves.com (Brian Calves) Date: Mon, 11 Oct 2004 17:49:57 -0400 Subject: [BiO BB] How do you evaluate new software? Message-ID: <7E5D61EC-1BCF-11D9-9FA9-0003936F3E32@briancalves.com> Hi, I've been enjoying the mailing list archives; although I haven't finished reviewing the full history, yet. I've followed a lot of the passing references to various software/databases. This has given rise to a curiosity: When you become aware of a new bioinformatics software tool, what criteria do you use in determining whether to accept or reject the new software/database? Best Regards, Brian From ryangolhar at hotmail.com Tue Oct 12 19:09:33 2004 From: ryangolhar at hotmail.com (Ryan Golhar) Date: Tue, 12 Oct 2004 19:09:33 -0400 Subject: [BiO BB] Getting promotor region sequences of Yeast In-Reply-To: <20041001124741.F2C76D1F0E@www.bioinformatics.org> Message-ID: <002c01c4b0b0$89410950$0200a8c0@GOLHARMOBILE1> I've done something like this before. I used the tool Spidey from NCBI to determine exon locations, then used a perl script to parse that information and grab 2500 bp upstream of the first exon. ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam at umdnj.edu -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Junguk Hur Sent: Friday, October 01, 2004 8:54 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Getting promotor region sequences of Yeast Dear all, I am trying to get intergenic sequences of yeast genes for promotor analyses. I have a gene list of 7K but don't know how to get the upstream region sequences. The gene list and relevant data was obtained from the following sites, Professor Young's Lab, http://web.wi.mit.edu/young/location/ Does anybody have idea how to get thoes upstream region sequences? Many thanks in advance, Junguk _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From stefanielager at fastmail.ca Wed Oct 13 01:08:18 2004 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Wed, 13 Oct 2004 05:08:18 +0000 (UTC) Subject: [BiO BB] Entrez Batch In-Reply-To: <74A93E0A840F7143BC98AFDD6FB5EEE1576B47@m-ex1.hlm.ad.moffitt.usf.edu> Message-ID: <20041013050819.D660F8611D3@mail.interchange.ca> You can use SeqHound http://www.blueprint.org/seqhound/ . > Hi, > I can get the gi number corresponding to refseq(ex., NM_002265) using > the Entrez. Using Entrez, If I want to get only GI, I need to parse > the file. I was wondering if there is any tool just to obtain the GI > numbers if I have a file with Refseqs. Thanks Puru > ###################################################################### > This transmission may be confidential or protected from disclosure and > is only for review and use by the intended recipient. Access by anyone > else is unauthorized. Any unauthorized reader is hereby notified that > any review, use, dissemination, disclosure or copying of this > information, or any act or omission taken in reliance on it, is > prohibited and may be unlawful. If you received this transmission in > error, please notify the sender immediately. Thank you. > > ###################################################################### _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From john_abraham_bio at yahoo.com Wed Oct 13 02:26:57 2004 From: john_abraham_bio at yahoo.com (John Abraham) Date: Tue, 12 Oct 2004 23:26:57 -0700 (PDT) Subject: [BiO BB] Getting promotor region sequences of Yeast In-Reply-To: <002c01c4b0b0$89410950$0200a8c0@GOLHARMOBILE1> Message-ID: <20041013062657.4346.qmail@web53904.mail.yahoo.com> You can also try SCPD( saccharomyces cerevisea Promotor Database Ryan Golhar wrote:I've done something like this before. I used the tool Spidey from NCBI to determine exon locations, then used a perl script to parse that information and grab 2500 bp upstream of the first exon. ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam at umdnj.edu -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Junguk Hur Sent: Friday, October 01, 2004 8:54 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Getting promotor region sequences of Yeast Dear all, I am trying to get intergenic sequences of yeast genes for promotor analyses. I have a gene list of 7K but don't know how to get the upstream region sequences. The gene list and relevant data was obtained from the following sites, Professor Young's Lab, http://web.wi.mit.edu/young/location/ Does anybody have idea how to get thoes upstream region sequences? Many thanks in advance, Junguk _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. -------------- next part -------------- An HTML attachment was scrubbed... URL: From schlitt at ebi.ac.uk Wed Oct 13 04:48:12 2004 From: schlitt at ebi.ac.uk (Thomas Schlitt) Date: Wed, 13 Oct 2004 09:48:12 +0100 (BST) Subject: [BiO BB] Getting promotor region sequences of Yeast In-Reply-To: <20041001124741.F2C76D1F0E@www.bioinformatics.org> Message-ID: Dear Junkuk the best tool I know for bakers yeast and pombe is part of expression profiler. Have a look at http://ep.ebi.ac.uk The link "Genomes" links to a tool developed by Jaak Vilo which lets you choose genes by chromosome location or gene name and on the next page the option "Sequence extraction" lets you choose which part of the sequence for each gene you want counting from the ATG start codon - for upstream sequences choose a negative number as "from" Alternatively the database SGD has some files on their ftp server - I think. Cheers Thomas On Fri, 1 Oct 2004, Junguk Hur wrote: > > Dear all, > > I am trying to get intergenic sequences of yeast genes for promotor > analyses. > > I have a gene list of 7K but don't know how to get the upstream region > sequences. > > The gene list and relevant data was obtained from the following sites, > Professor Young's Lab, > > http://web.wi.mit.edu/young/location/ > > Does anybody have idea how to get thoes upstream region sequences? > > Many thanks in advance, > > Junguk > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _____________________________________________________________ Thomas Schlitt, PhD British Antarctic Survey High Cross, Madingley Road Cambridge CB3 0ET, UK Tel. ++44-1223-221656 tsc at bas.ac.uk schlitt at ebi.ac.uk From o.medina at gmx.net Wed Oct 13 12:18:09 2004 From: o.medina at gmx.net (Oscar Medina) Date: Wed, 13 Oct 2004 10:18:09 -0600 Subject: [BiO BB] Promoter-Intron/Exon Analysis In-Reply-To: References: <20041001124741.F2C76D1F0E@www.bioinformatics.org> Message-ID: <6.1.2.0.2.20041013101448.01c0e070@pop.gmx.net> An HTML attachment was scrubbed... URL: From chea at niaid.nih.gov Wed Oct 13 10:16:56 2004 From: chea at niaid.nih.gov (Anney Che) Date: Wed, 13 Oct 2004 10:16:56 -0400 Subject: [BiO BB] Protein structure tool Message-ID: Hi everyone, I would like some suggestions on protein structure tools (visual and modeling) that works well on Mac OS X. Thanks From operon at cbiot.ufrgs.br Wed Oct 13 17:20:35 2004 From: operon at cbiot.ufrgs.br (Marcos Oliveira de Carvalho) Date: Wed, 13 Oct 2004 18:20:35 -0300 Subject: [BiO BB] Protein structure tool In-Reply-To: References: Message-ID: Hi Anney, try: http://www.pirx.com/iMol/ http://pymol.sourceforge.net/ http://us.expasy.org/spdbv/text/disclaim.htm http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml http://www.ks.uiuc.edu/Research/vmd/ http://sgce.cbse.uab.edu/ribbons/ Cheers Marcos On Wed, 13 Oct 2004 10:16:56 -0400, Anney Che wrote: > Hi everyone, > > I would like some suggestions on protein structure tools (visual and > modeling) that works well on Mac OS X. > > Thanks > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From boris.steipe at utoronto.ca Wed Oct 13 17:34:19 2004 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 13 Oct 2004 17:34:19 -0400 Subject: [BiO BB] Protein structure tool In-Reply-To: Message-ID: .. and you might add RasMol http://www.openrasmol.org/ ... which runs just fine in the Classic environment. Boris ============ On Wednesday, Oct 13, 2004, at 17:20 Canada/Eastern, Marcos Oliveira de Carvalho wrote: > > Hi Anney, > > try: > http://www.pirx.com/iMol/ > http://pymol.sourceforge.net/ > http://us.expasy.org/spdbv/text/disclaim.htm > http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml > http://www.ks.uiuc.edu/Research/vmd/ > http://sgce.cbse.uab.edu/ribbons/ > > Cheers > Marcos > > > > On Wed, 13 Oct 2004 10:16:56 -0400, Anney Che > wrote: > >> Hi everyone, >> >> I would like some suggestions on protein structure tools (visual and >> modeling) that works well on Mac OS X. >> >> Thanks >> >> _______________________________________________ >> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From katia at cin.ufpe.br Thu Oct 14 17:03:05 2004 From: katia at cin.ufpe.br (Katia Silva Guimaraes) Date: Thu, 14 Oct 2004 18:03:05 -0300 (EST) Subject: [BiO BB] CompBioNets 2004 - Registration Open Message-ID: The registration for CompBioNets 2004 (http://compbionets.cin.ufpe.br/) is open, and early registration prices are valid until October 30. Special rates are also secured until that date at the MARANTE PLAZA HOTEL for participants of CompBioNets 2004. TOPICS CompBioNets topics will be computational biology in general, and more specifically, but not exclusively, genetic regulation, motifs, metabolism, protein-protein interaction, biochemical networks, protein function and structure, genome dynamics, genome rearrangements, comparative genomics. INVITED SPEAKERS We will have the confirmed presence of Amihood Amir (Bar-Ilan University, Israel) Gene Myers (U. of California, Berkeley, USA). Other invited speakers are expected to be added in the future. VENUE The event will take place at MARANTE PLAZA HOTEL, a nice four star, ocean front hotel, which is located at Av. Boa Viagem, 1070, Recife, Brazil. EVENT WEBPAGE http://compbionets.cin.ufpe.br/ Katia S. Guimaraes COMPBIONETS 2004 - Algorithms and Computational Methods for Biochemical and Evolutionary Networks From ulimard at yahoo.com.br Mon Oct 18 08:23:16 2004 From: ulimard at yahoo.com.br (Ulisses) Date: Mon, 18 Oct 2004 10:23:16 -0200 Subject: [BiO BB] Pos in Bioinformatics References: Message-ID: <021e01c4b50d$3ecdbf10$4402100a@CLIVRES02> Hi all, I am a graduate student doing my final year in Computer Science. For my final project work I have chosen this field and I'm making a Formal Language for Proteins. However, this is not my question. I want to know what are the possible bioinformatics universities around the world that I could study after the graduation. I would be glad too, if someone help me Atenciosamente Ulisses Dias -------------- next part -------------- An HTML attachment was scrubbed... URL: From hershel.safer at weizmann.ac.il Wed Oct 20 05:34:19 2004 From: hershel.safer at weizmann.ac.il (Hershel Safer) Date: Wed, 20 Oct 2004 11:34:19 +0200 Subject: [BiO BB] comparison of siRNA primer software? In-Reply-To: <021e01c4b50d$3ecdbf10$4402100a@CLIVRES02> References: <021e01c4b50d$3ecdbf10$4402100a@CLIVRES02> Message-ID: <6.1.2.0.2.20041020113112.023cd0b0@wicc.weizmann.ac.il> Hi folks, We've been asked to recommend software for identifying good primers for siRNA knockdown experiments, and preferably also for predicting the effectiveness of primers chosen by other means. Even a quick search reveals many tools that supposedly do the former, but I need to suggest no more than two or three tools. Do you know of anybody who has compared some of these tools and has either published the results or would be willing to share them? Thanks, Hershel ---------- Hershel M. Safer, Ph.D. Head, Bioinformatics and Biological Computing Weizmann Institute of Science PO Box 26, Rehovot 76100, Israel tel: +972-8-934-3456 | fax: +972-8-934-6006 email: hershel.safer at weizmann.ac.il | hsafer at alum.mit.edu url: http://bioportal.weizmann.ac.il -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmnunif at charter.net Wed Oct 20 16:23:15 2004 From: dmnunif at charter.net (D. Norris) Date: Wed, 20 Oct 2004 15:23:15 -0500 Subject: [BiO BB] Bionformatics and System Biology References: <1095850493.415159fd8686c@utalca.cl> Message-ID: <003201c4b6e2$a1442b80$cfc6700a@VALUED4DA88152> Hi Jose: It is our opinion at Unifinium Ltd. that it is impossible to have bioinformation unless you obtained it through the "Systems Biology" investigative approach. dmn ----- Original Message ----- From: To: Sent: Wednesday, September 22, 2004 5:54 AM Subject: [BiO BB] Bionformatics and System Biology > > Hi, I am looking for information about the differences and similarities > between Bioinformatics and System Biology. > Do you know were can I found some papers, documents or other sources of > information about this? > > Thanks in advanced for yours comments. > > Jos? Antonio > > > ------------------------------------------------- > Este mensaje fue enviado por: http://utalca.cl > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From j.d.tucker at vla.defra.gsi.gov.uk Thu Oct 21 12:33:38 2004 From: j.d.tucker at vla.defra.gsi.gov.uk (Tucker, James) Date: Thu, 21 Oct 2004 17:33:38 +0100 Subject: [BiO BB] Phylip Message-ID: <558287EEA397EB41A7C26070EBB6DF1C90C2FC@VLA6> Hi All, I've just downloaded Phylip. I'm hoping to use it create a max parsimony tree, with bootstrapping. I'm using a binary data set of gene presence. However, I don't know the correct format to save in for the program. Can any one tell me the format and how to save the file in this format. Or alternatively suggest some other software. Thanks. James Veterinary Laboratories Agency (VLA) This email and any attachments is intended for the named recipient only. If you have received it in error you have no authority to use, disclose, store or copy any of its contents and you should destroy it and inform the sender. Whilst this email and associated attachments will have been checked for known viruses whilst within VLA systems we can accept no responsibility once it has left our systems. Communications on VLA's computer systems may be monitored and/or recorded to secure the effective operation of the system and for other lawful purposes. From maria.mirto at unile.it Wed Oct 20 17:11:31 2004 From: maria.mirto at unile.it (Maria Mirto) Date: Wed, 20 Oct 2004 23:11:31 +0200 (CEST) Subject: [BiO BB] REMINDER CFP: IEEE ITCC2005 Message-ID: <3152.151.28.168.23.1098306691.squirrel@webmail2.unile.it> Dear all, Please find attached the Call For Papers for: 6th IEEE International Conference on Information Technology: Coding and Computing (ITCC2005) - Track on Methodologies, Technologies and Applications in distributed and Grid systems. Las Vegas, Nevada 11-13 April 2005 http://datadog.unile.it/itcc2005/cfp.htm http://www.itcc.info/ sponsored by IEEE Computer Society This Conference Track aims at offering a forum of discussion where young researchers and PhD students could present their research activities, either at an early or mature phase. Best regards, Maria Mirto. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Maria Mirto, CACT/ISUFI (Center for Advanced Computing Technology) Engineering Faculty, Department of Innovation Engineering University of Lecce, Via per Monteroni, 73100 Lecce, Italy phone: +39-0832-297304, fax: +39-0832-297279 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ******************************************************************************** We apologize if you received multiple copies of this Call for Papers. Please feel free to distribute it to those who might be interested. ******************************************************************************** -------------------------------------------------------------------------------- Track on Methodologies, Technologies and Applications in Distributed and Grid systems. ITCC 2005: IEEE International Conference on Information Technology: Coding and Computing Sponsored by the IEEE Computer Society April 11-13, 2005 The Orleans, Las Vegas, Nevada -------------------------------------------------------------------------------- Call for Papers http://datadog.unile.it/itcc2005/cfp.htm http://www.itcc.info ***************** Grids couple geographically distributed resources such as high performance computers, workstations, clusters, and scientific instruments. In particular Computational Grids aggregate computing power, Data Grids manage and analyze shared large-scale data set and Service or Collaborative Grids have the potential to allow real time processing of data streams from scientific instruments such as particle accelerators and telescopes in ways that are more flexible and powerful than are currently available in traditional systems. Recently, Grid technologies are being integrated with Web Services technologies to provide a framework for interoperable application-to-application interaction. Existing applications exploit these technologies in several domains. In health systems, the Grid offers the power and ubiquity needed to acquire biomedical data, process and deliver biomedical images (CT, MRI, PET, SPECT, etc) located in different hospitals, within a wide area. So, the Grid acts as a Collaborative Working Environment: physicians often want to aggregate not only medical data, but also human expertise and might want colleague around the world to visualize the examinations in the same way and at the same time so that the group can discuss the diagnosis in real time. Analysis of the problems relevant to the use of GRID in medical virtual environments will be appreciated. In Geographic Information Systems (GISs), the Grid exploits a dynamic infrastructure for retrieval and on-demand processing of remote sensing data. Grid Computing techniques can be used in the industry, reducing process time for improvement of design of some components and, in general, for supporting of complex simulations. In the virtual reality (VR), some applications in different areas such as entertainment, training, design valuation, data visualization etc., need of computing power and resources for non-immersive and immersive environments. Finally, bioinformatics applications call for the ability to read large datasets (e.g. protein databases) and to create new datasets (e.g. mass spectrometry proteomics data). These applications can require the ability to change (updating) existing datasets; consequently a Data Grid, i.e. a distributed infrastructure for storing large datasets, is needed. In the bioinformatics field, a Data Grid could prove to be useful to build Electronic Patient Record systems (EPRs) for the management of patient information (data, metadata and images), to support data replication, allowing the integration and sharing of biological databases and, generally, for the developement of efficient bioinformatics (in particular proteomics and genomics) applications. The main goal of the Conference Track is to discuss well-known and emerging data-intensive applications in the context of distributed and Grid systems, and to analyze technologies and methodologies useful to develop such applications in those environments. In particular, this Conference Track aims at offering a forum of discussion where young researchers and PhD students could present their research activities, either at an early or mature phase. Topics include, but are not limited to: Data intensive applications in distributed and Grid systems: - Grid for the following application areas: o Aeronautics o Astronomy o Astrophysics o Bioinformatics o Chemistry o Climatology o Cosmology o Earth observations o Earthquake studies o E-learning o Environment management o Fluid dynamics o Genomics o Geology o High energy physics o Industrial design o Medicine o Meteorology o Molecular engineering o Pharmacology o Proteomics Technologies and methodologies in distributed and grid-based applications: - Grid Portals; - Web and Grid Services; - Service Orchestration and WorkFlow; - Advanced Resource Reservation and Scheduling; - Databases and the grid; - Grid Information and Monitoring services and related (OO, Relational, XML) data models; - Grid Security; - Grid Workload and Data management services; - Agent architectures for grid environments; - Extracting knowledge from data grids; - Parallel and Distributed application (cluster and grid based); - Peer-to-Peer Systems for grid environments; - Simulation and Applications of Modeling; - Collaboration technologies. IMPORTANT DATES October 29, 2004 Paper Due December 19, 2004 Author Notification January 9, 2005 Camera-Ready Copy Note: The Proceedings will be published by IEEE Computer Society. A special issue of an international journal is being planned consisting of selected papers from this conference. Authors of selected papers will be invited to submit an extended version for the journal. SUBMISSION DETAILS Papers should be original and contain contributions of theoretical or experimental nature, or be unique experience reports. Interested authors should submit a 6-page summary of their original and unpublished work including 5 keywords, before October 29 2004, in the IEEE format to the track chair: Maria Mirto, CACT/ISUFI (Center for Advanced Computing Technology) & SPACI (Southern Partnership for Advanced Computational Infrastructures) Consortium, c/o Engineering Faculty, Department of Innovation Engineering, University of Lecce, Via per Monteroni, 73100 Lecce, Italy, Voice: +39-0832-297304, Fax: +39-0832-297279, Email: maria.mirto at unile.it Electronic submission (PostScript or PDF) is strongly encouraged. From ilya.venger at weizmann.ac.il Wed Oct 27 06:12:42 2004 From: ilya.venger at weizmann.ac.il (Ilya Venger) Date: Wed, 27 Oct 2004 12:12:42 +0200 Subject: [BiO BB] GeneExpressionOmnibus(GEO) question Message-ID: <417F749A.6010407@weizmann.ac.il> Hi, I am in the process of building a local expression database for S. cereveisiae. The best source of expression data I could find was GEO. They state there that the data they present in GDS accessions is normalized, but they don't explain how. The GSMs of different contributors not necessarily match, and they sometimes lack parts of the data. Also, if the raw data is submitted, how come they don't show the measurements for both repetitions of each gene in a cDNA array. Another problem, is that although the GDS are claimed to be normalized the sum of logs for all genes on a chip doesn't sum to 0. So, if anybody worked with GEO before, and has experience at it I would be glad to pose several more specific questions. Cheers, Ilya. From landman at scalableinformatics.com Thu Oct 28 11:46:24 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 28 Oct 2004 11:46:24 -0400 (EDT) Subject: [BiO BB] [blast-announce] [ BLAST_Announce #044] BLAST 2.2.10 released Message-ID: Binaries can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.10/ Source code can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20041020/ Additionally, NCBI now provides anoncvs access (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowSection&rid=too lkit.section.cvs_external) to toolkit sources. A cvsweb source browser (http://www.ncbi.nlm.nih.gov/cvsweb/index.cgi/internal/c++/src/algo/blast/co re/) and doxygen documentation (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/group__AlgoBlast.h tml) are also available. Notes for the 2.2.10 release New engine We have been rewriting and restructuring the BLAST engine in order to make BLAST more modular and extensible. bl2seq and megablast currently support the new engine; it can be enabled with the -V F option. Using the new engine may result in significant performance improvements in some cases. General changes -megablast now performs ungapped extensions in order to prevent suboptimal alignments -consolidated formatting code -removed fmerge.c -small fixes to sum statistics code -better error handling -fixed masking of translated queries -fixed several readdb threading bugs -improved protein neighbor generation -hsp sorting/inclusion fixes -many changes in HSP linking -several fixes for translated RPS blast BlastPGP -added code to spread out gap costs when extracting data from the sequence alignment to build PSSM -changed handling of all-zero columns of residue frequencies to use the underlying scoring matrix frequency ratios rather than scoring matrix's scores - disallowed an ungapped search if more than one iteration is specified scoremat.asn specification -added a new 'formatrpsdb' application; given a collection of Score-mat ASN.1 files, this application creates a database suitable for use with RPSBLAST -Simplified NCBI-ScoreMat specification to represent PSSMs instead of arbitrary scoring matrices. blastpgp and formatrpsdb can deal with this format. If you have any questions please write to blast-help at ncbi.nlm.nih.gov From moyc at mail.med.upenn.edu Thu Oct 28 12:33:31 2004 From: moyc at mail.med.upenn.edu (Chris Moy) Date: Thu, 28 Oct 2004 12:33:31 -0400 Subject: [BiO BB] Protein Design for Antibody Production Message-ID: <1AC04109-28FF-11D9-892A-000A9599E70C@mail.med.upenn.edu> Hello, We are looking to create an immune response with a specific protein sequence but we want to make sure we select the region which will give us the most specific response. Are there any tools out there to help out with this? I have only seen the ABIE tool at chang bioscience. Please comment freely about your experiences. Chris From landman at scalableinformatics.com Thu Oct 28 13:43:45 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 28 Oct 2004 13:43:45 -0400 Subject: [BiO BB] Updated NCBI rpms released for the 2.2.10 toolkit Message-ID: <41812FD1.6060508@scalableinformatics.com> Folks: We rebuilt the NCBI rpms for AMD64, i386, i586 (non-P4), athlon, and i686 (p4). Feel free to grab them from our site http://downloads.scalableinformatics.com/downloads/ncbi/ They are named NCBI-2.2.10-1.*.rpm, where * = {src,x86_64,i386,i586,i686} They were built on RHEL/SuSE/Fedora Core2 machines. Should install without problems (and use the source if you have problems). Please note that if you have a non-pentium4/non-athlon machine (PIII) you want the i586 or i386 version. If you have a pentium4 based machine, you want the i686 version. AMD64 (and probably EM64T) will use the x86_64. Athlon's will use the athlon version. Unless someone supplies me with G5 or Itanium2, I probably wont be able to do builds for those platforms. Enjoy, and as usual, bug reports/problems to us, not to NCBI. We built the RPMS, so if they are broken, we need to know. Joe -------- Original Message -------- Subject: [blast-announce] [ BLAST_Announce #044] BLAST 2.2.10 released Date: Thu, 28 Oct 2004 11:35:37 -0400 From: Mcginnis, Scott (NIH/NLM/NCBI) To: 'blast-announce at ncbi.nlm.nih.gov' Binaries can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.10/ Source code can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20041020/ Additionally, NCBI now provides anoncvs access (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowSection&rid=too lkit.section.cvs_external) to toolkit sources. A cvsweb source browser (http://www.ncbi.nlm.nih.gov/cvsweb/index.cgi/internal/c++/src/algo/blast/co re/) and doxygen documentation (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/group__AlgoBlast.h tml) are also available. Notes for the 2.2.10 release New engine We have been rewriting and restructuring the BLAST engine in order to make BLAST more modular and extensible. bl2seq and megablast currently support the new engine; it can be enabled with the -V F option. Using the new engine may result in significant performance improvements in some cases. General changes -megablast now performs ungapped extensions in order to prevent suboptimal alignments -consolidated formatting code -removed fmerge.c -small fixes to sum statistics code -better error handling -fixed masking of translated queries -fixed several readdb threading bugs -improved protein neighbor generation -hsp sorting/inclusion fixes -many changes in HSP linking -several fixes for translated RPS blast BlastPGP -added code to spread out gap costs when extracting data from the sequence alignment to build PSSM -changed handling of all-zero columns of residue frequencies to use the underlying scoring matrix frequency ratios rather than scoring matrix's scores - disallowed an ungapped search if more than one iteration is specified scoremat.asn specification -added a new 'formatrpsdb' application; given a collection of Score-mat ASN.1 files, this application creates a database suitable for use with RPSBLAST -Simplified NCBI-ScoreMat specification to represent PSSMs instead of arbitrary scoring matrices. blastpgp and formatrpsdb can deal with this format. If you have any questions please write to blast-help at ncbi.nlm.nih.gov -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615 From applmicro at formatex.org Thu Oct 28 13:54:51 2004 From: applmicro at formatex.org (BioMicroWorld 2005) Date: Thu, 28 Oct 2004 19:54:51 +0200 Subject: [BiO BB] Invitation to submit abstract(s) to Applied Microbiology 2005 References: <51D680A44284D2119F210008C756CAC801D5442F@bau01ex.bau.us.solvay.com> <001901c4968d$fd6ab340$0302a8c0@Wireless> Message-ID: <015e01c4bd17$3ebc45c0$06001aac@PUESTO2> 1st International Conference on Environmental, Industrial and Applied Microbiology (BioMicroWorld-2005) "Fostering Cross-disciplinary Applied Research in Microbiology and Microbial Biotechnology" Badajoz, Spain, March 15-18th, 2005 http://www.formatex.org/biomicroworld2005 Dear colleague, On behalf of the organizers, you are cordially invited to send abstracts of your best research for presentation at the forthcoming 1st International Conference on Environmental, Industrial and Applied Microbiology (BioMicroWorld-2005), that will take place in March 2005 in Badajoz (Spain). Modern microbiology includes a broad variety of scholarly approaches leading to a better understanding of all living things at the macroscopic, microscopic/single-cell and nanoscopic/molecular level, producing beneficial applications in medicine, agriculture, industry, and ecology. Therefore, the Conference will specially welcome papers reporting interdisciplinary researchers, relating Microbiology with other Sciences as Physico/chemistry, Materials Science, Polymer Science, Environmental Sciences, Genetics, Pharmacology, Microscopy/Imaging Science, Nanoscience and Nanotechnology, etc. In other words, we are specially (but not exclusivelly) interested in reports applying the techniques, the training, and the culture of Microbiology to research areas usually associated with other scientific and engineering disciplines, from an applied perspective. Main topics of the Conference are - Environmental Microbiology, Marine Microbiology, Water/Aquatic Microbiology, Geomicrobiology - Industrial Microbiology - Future Bioindustries - Food Microbiology - Cell Engineering - Pharmaceutical Microbiology - Agriculture, Soil, Forest Microbiology - Structure and Morphogenesis - Analytical Techniques, Imaging Techniques, Microscopy - Microbial Physiology, Metabolism and Gene Expression - Microbial Biotechnology - Aerospace Microbiology, Astro(micro)biology - Quantitative Models and Bioinformatics in Microbiology - Methods in Basic and Applied Microbiology - Medical Microbiology - Microbiology Education IMPORTANT DEADLINES Submission of abstracts: November 22th, 2004 Submission of Full Papers for publication: On site CONFERENCE PUBLICATION - PROCEEDINGS 1. Proceedings Book A multi-volume book entitled "Recent Advances in Multidisciplinary Applied Microbiology. Biological, Physical, Chemical and Engineering Aspects" will be published including Full papers of works (oral, posters) presented at the conference. The book will be published by an international publisher (now in negotiations with Elsevier Science), in order to give it a broad international distribution. 2. Abstracts Book A separate Abstracts Booklet will be published with abstracts of works presented at the Conference. It will be distributed at the beginning of the conference. 3. International Journals special issues Agreements have been reached with several international journals, in order to produce special issues based on very good papers presented at the Conference. Manuscripts must be delivered during the Conference, according to the instructions we will give in due course for each journal. All papers will be strictly reviewed and treated as regular papers. Our goal is to produce high quality and high impact special issues. Please refer to the conference website for details. 4. 2005 Current Reviews on Applied Microbiology and Microbial Biotechnoogy A call for mini-reviews on topics covered by the conference will be made by the end of September. Accepted reviews will be collected in a comprehensive international publication. Authors will receive a free copy of the publication, and its content will be made freely available shortly after the conference. Proposals for mini-reviews are welcome from qualified researchers, irrespective they attend the conference or not. SPECIAL SESSIONS - WORKSHOPS - Workshop on Modern Microscopy Techniques in Applied Microbiology - MICROFACTORIES - Microbial Production of Chemicals and Pharmaceuticals - Workshop on Biotechnologically Relevant Enzymes and Proteins - Workshop on Studies on Extracellular Matrix: Biology and Physico/chemistry - Workshop on Biosurfactants: Purification, Mass production, Applications - Workshop on Yeast and Bacterial Flocculation: Fundamentals and Industrial interest - Worskhop on Microbial Biosensors - Methods in Cell, Proteins, Enzymes and other Biomolecules Immobilisation - Workshop on Biohydrogen - Hydrogen production by Microorganisms, as a Novel Source of Renewable Energy - Workshop on Bioremediation - Workshop on Microbial Biopolymers: Synthesis, Characterization and Applications - Methods in Cell Adhesion: Classical and Novel Methods, from Macroscopic to Nanometer scale, from Biochemistry to Nano(bio)technology PLENARY LECTURES Plenary talks include: Biomarkers to define interactions in the environment and health David C. White, Director of the Center for Biomarker Analysis, University of Tennessee, USA The genetics and biochemistry of malonate and 2,4-D biodegradation by Burkholderia cepacia strain 2a Ian James Bruce, Nanobiotechnology Research Group, Istituto di Scienze Chimiche, Universita degli Studi di Urbino, ITALY / School of Science, University of Greenwich, UK Alexander Steinbuchel, Institut fur Molekulare Mikrobiologie und Biotechnologie, Munster, GERMANY [title to b eannounced] We hope you find it interesting to present your work(s) at the Conference, and we certainly hope to receive your abstract(s) by November 22th! Best regards, Fatima Penya / Borja Gonzalez BioMicroWorld-2005 Secretariat Formatex Research Centre C/Zurbaran, 1 2? Office 1 06001 Badajoz, SPAIN Phone/Fax: +34 924258615 E-mail: applmicro at formatex.org http://www.formatex.org/biomicroworld2005 From shashikanthm17 at yahoo.co.in Thu Oct 28 14:00:42 2004 From: shashikanthm17 at yahoo.co.in (shashikanth marri) Date: Thu, 28 Oct 2004 19:00:42 +0100 (BST) Subject: [BiO BB] Intron/Exon Analysis Message-ID: <20041028180042.19463.qmail@web8506.mail.in.yahoo.com> Hai Oscar Medina, I have seen your question in bio bulletin board forum. i would like give suggestions to your query. Just by seeing the mRNA sequence we cannot say whether it is a alternatively spliced or not if it is a new sequence. we cannot say whether it is spliced or not unless we compare it with the genomic DNA. If you are comparing with the genomic DNA we predict the exons. the best tools now avialble online are SIM4 (http://pbil.univ-lyon1.fr/sim4.php) and SPIDEY (http://www.ncbi.nih.gov/IEB/Research/Ostell/Spidey/). wether it is spliced or not we cannot confirm just by aligning. If any previous related mRNA are present in that system has to be known (you get either from literature search or just do the blast search and find related mRNA). which is annotated. then align your mRNA and annotated mRNA on genomic sequence and observe the exon regions. if any exon is missing in annotated one compared to the your mRNA then you can say it is spliced. in other words your mRNA sequence lies in intronic region of the annotated mRNA on genomic DNA then you can say it as spliced form. If want to do for whole mRNA then take EST sequences and compare your mRNA sequence on genomic DNA. then observe any exon lies on intron region, or any 5' exonic region extenstion then you say it as spliced form. ________________________________________________________________________ Yahoo! India Matrimony: Find your life partner online Go to: http://yahoo.shaadi.com/india-matrimony From shashikanth.marri at gmail.com Thu Oct 28 13:45:26 2004 From: shashikanth.marri at gmail.com (shashikanth marri) Date: Thu, 28 Oct 2004 23:15:26 +0530 Subject: [BiO BB] Intron/Exon Analysis Message-ID: <286bfc9c04102810453cb3a39a@mail.gmail.com> Hai Oscar Medina, I have seen your question in bio bulletin board forum. i would like give suggestions to your query. Just by seeing the mRNA sequence we cannot say whether it is a alternatively spliced or not if it is a new sequence. we cannot say whether it is spliced or not unless we compare it with the genomic DNA. If you are comparing with the genomic DNA we predict the exons. the best tools now avialble online are SIM4 (http://pbil.univ-lyon1.fr/sim4.php) and SPIDEY (http://www.ncbi.nih.gov/IEB/Research/Ostell/Spidey/). wether it is spliced or not we cannot confirm just by aligning. If any previous related mRNA are present in that system has to be known (you get either from literature search or just do the blast search and find related mRNA). which is annotated. then align your mRNA and annotated mRNA on genomic sequence and observe the exon regions. if any exon is missing in annotated one compared to the your mRNA then you can say it is spliced. in other words your mRNA sequence lies in intronic region of the annotated mRNA on genomic DNA then you can say it as spliced form. If want to do for whole mRNA then take EST sequences and compare your mRNA sequence on genomic DNA. then observe any exon lies on intron region, or any 5' exonic region extenstion then you say it as spliced form. From stefanielager at fastmail.ca Fri Oct 29 01:01:09 2004 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Fri, 29 Oct 2004 05:01:09 +0000 (UTC) Subject: [BiO BB] Protein Design for Antibody Production In-Reply-To: <1AC04109-28FF-11D9-892A-000A9599E70C@mail.med.upenn.edu> Message-ID: <20041029050110.0499A8610E9@mail.interchange.ca> If you want a "specific response" you must choose a unique region. To find an antigenic region you can use the program "antigenic" in the EMBOSS package. http://emboss.sourceforge.net/apps/antigenic.html > Hello, > > We are looking to create an immune response with a specific protein > sequence but we want to make sure we select the region which will give > us the most specific response. Are there any tools out there to help > out with this? I have only seen the ABIE tool at chang bioscience. > Please comment freely about your experiences. > > Chris > _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From MAG at Stowers-Institute.org Fri Oct 29 13:39:49 2004 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Fri, 29 Oct 2004 12:39:49 -0500 Subject: [BiO BB] Substitution matrices vs HMM Message-ID: Hi All, I was trying to develop an algorithm for describing/predicting a pattern (e.g. transmembrane region, signal peptide etc) in protein sequences. I want to derive this pattern from the multiple sequence alignments. But I was wondering if I should use substitution matrices or HMMs to describe/represent these patterns. Are there any definite advantages of using one over the other ? Does the choice depend on what I am trying to define ? Can somebody please direct me to relevant literature or suggest something from personal experience ? Thanks in advance, Manisha Goel -------------- next part -------------- An HTML attachment was scrubbed... URL: From idoerg at burnham.org Fri Oct 29 14:37:50 2004 From: idoerg at burnham.org (Iddo) Date: Fri, 29 Oct 2004 11:37:50 -0700 Subject: [BiO BB] Substitution matrices vs HMM In-Reply-To: References: Message-ID: <41828DFE.9010707@burnham.org> Goel, Manisha wrote: > Hi All, > > I was trying to develop an algorithm for describing/predicting a > pattern (e.g. transmembrane region, signal peptide etc) in protein > sequences. > > I want to derive this pattern from the multiple sequence alignments. > But I was wondering if I should use substitution matrices or HMMs to > describe/represent these patterns. > Are there any definite advantages of using one over the other ? Does > the choice depend on what I am trying to define ? > Can somebody please direct me to relevant literature or suggest > something from personal experience ? > > > Thanks in advance, > Manisha Goel > Judging by the wording of your question, you should read up a bit more on sequence analysis before you go and try this. Substitution matrices are NxN matrices describing the probability of substitution of one alphabet letter by another (in the case of proteins, each letter normally represents an amino acid, hence N=20). They are not a tool for pattern detection, they are used for sequence alignment. You may want to think of positional specific score matrices (PSSMs). Those are LxN sized matrices generated from multiple alignments. Where L is the length of the protein (or the part of the protein you wish to investigate), and N is the number of letters in your alphabet (again, with proteins N=20, usually). Each entry in the matrix is the probability of the amino acid appearing in that particular position in a multiple alignment. So given a set of known good multiple alignements, you can generate a PSSM for those. From the PSSM you can generate a *profile*, which is the same LxN size matrix, with each cell being some sort of transformation of the raw value in the PSSM. (I'm being a bit superficial here). You can then use the profile to fish out new sequence from a database of sequences, or from a database of profiles. As you pointed out, another way of doing this is using HMMs to describe the patterns. Not getting into that, I'll just say that profile HMMs are also profiles generated from PSSMs, but unlike the previous profile, there is a more robust probabilitic model used to generate them. Lots of work has been done on this. In order to avoid duplication of previous work, I suggest you jump start your background research with the following review: *Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y.* Cell Mol Life Sci. 2003 Dec;60(12):2637-50. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14685688 Table 1 lists many resources you should look into before you attempt something new. It might be a good idea to see who has done what there, and how they did it. For general background, I recommend the following book: *Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids* by Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison http://www.amazon.com/exec/obidos/tg/detail/-/0521629713/qid=1099074233/sr=1-1/ref=sr_1_1/002-6013332-8507249?v=glance&s=books BTW, the fact that lots of work has been done already, shouldn't discourage you from going in. The field of pattern detection is far from perfect. Putting it mildly... Good luck, Iddo -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From karplus at soe.ucsc.edu Fri Oct 29 15:10:50 2004 From: karplus at soe.ucsc.edu (Kevin Karplus) Date: Fri, 29 Oct 2004 12:10:50 -0700 Subject: [BiO BB] Re: [ssml] Substitution matrices vs HMM In-Reply-To: (MAG@Stowers-Institute.org) References: Message-ID: <200410291910.i9TJAo0N010031@cheep.cse.ucsc.edu> Manisha Goel asked > I was trying to develop an algorithm for describing/predicting a > pattern (e.g. transmembrane region, signal peptide etc) in protein > sequences. I want to derive this pattern from the multiple sequence > alignments. But I was wondering if I should use substitution matrices > or HMMs to describe/represent these patterns. Are there any definite > advantages of using one over the other? Does the choice depend on what > I am trying to define? Can somebody please direct me to relevant > literature or suggest something from personal experience? HMMs are currently the best method for representing patterns of the type you have in mind. Profile HMMs are the most popular, and are supported by two main packages HMMer and SAM. Both packages are free to academics, non-profits, and government researchers, but the HMMer package is open-source and SAM is not (at least not yet---we're thinking of making it open-source but have not had the time or resources to clean up the source code enough to do that reasonably). SAM and HMMer models are slightly different, but similar enough to be interconvertible with only fairly small losses (SAM models are slightly more general than HMMer models, and use a different calibration method). There is sam2hmmer and hmmer2sam software available on the web. For developing new profile HMMs, SAM is a better choice, because there has been more development on the model-building code. (HMMer is more popular than SAM, largely because of the prebuilt PFAM resource, which is a very valuable database.) See http://stash.mrc-lmb.cam.ac.uk/HMMER-SAM/ for information about a test comparing HMMER and SAM, be people who were not on the development team for either and were trying to decide which to use. If you want to do non-profile HMMs (such as the transmembrane models of TMHMM), then you may have to build your own code---I've not seen general-purpose HMM code that was a good utility kit for building new HMMs. Of course, I haven't been looking for one, so I may have missed some major developments. ------------------------------------------------------------ Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus Professor of Biomolecular Engineering, University of California, Santa Cruz Undergraduate and Graduate Director, Bioinformatics Senior member, IEEE Board of Directors, ISCB (starting Jan 2005) life member (LAB, Adventure Cycling, American Youth Hostels) Effective Cycling Instructor #218-ck (lapsed) Affiliations for identification only.