From smagarwal at yahoo.com Mon Aug 1 01:26:48 2005 From: smagarwal at yahoo.com (Subhash Agarwal) Date: Mon, 1 Aug 2005 06:26:48 +0100 (BST) Subject: [BiO BB] Small molecule name Message-ID: <20050801052649.62757.qmail@web31501.mail.mud.yahoo.com> Hi everybody Is there a way that can be used for searching a mol2 file for which structure is present in the file. For example a file below is present, I know want to name of the structure in this file: Thanks Subhash Agarwal @ATOM 1 O1 0.5363 1.3463 -0.0907 O.3 1 <1> 0.0000 2 C1 0.4843 -0.0055 -0.0781 C.ar 1 <1> 0.0000 3 C2 -0.7390 -0.6831 0.0511 C.ar 1 <1> 0.0000 4 C3 1.6513 -0.7775 -0.1995 C.ar 1 <1> 0.0000 5 C4 -2.0030 0.0440 0.1762 C.ar 1 <1> 0.0000 6 C5 -0.7531 -2.0954 0.0603 C.ar 1 <1> 0.0000 7 C6 2.9640 -0.1273 -0.3435 C.2 1 <1> 0.0000 8 C7 1.5494 -2.1758 -0.1812 C.ar 1 <1> 0.0000 9 C8 -3.1957 -0.7024 0.3021 C.ar 1 <1> 0.0000 10 N1 -2.0355 1.3765 0.1718 N.ar 1 <1> 0.0000 11 C9 -1.9818 -2.7798 0.1896 C.ar 1 <1> 0.0000 12 N2 0.3927 -2.7770 -0.0536 N.ar 1 <1> 0.0000 13 O2 3.0558 1.0817 -0.2598 O.co2 1 <1> 0.0000 14 O3 4.0642 -0.8726 -0.5684 O.co2 1 <1> 0.0000 15 C10 -4.4208 -0.0149 0.4235 C.ar 1 <1> 0.0000 16 C11 -3.1505 -2.1119 0.3048 C.ar 1 <1> 0.0000 17 C12 -3.1583 2.0446 0.2824 C.ar 1 <1> 0.0000 18 O4 -5.5886 -0.6864 0.5469 O.3 1 <1> 0.0000 19 C13 -4.3899 1.3872 0.4126 C.ar 1 <1> 0.0000 20 C14 -5.6370 2.1603 0.5362 C.2 1 <1> 0.0000 21 O5 -6.7085 1.5871 0.5492 O.co2 1 <1> 0.0000 22 O6 -5.5902 3.5036 0.6349 O.co2 1 <1> 0.0000 23 H1 -0.2691 1.8569 -0.0108 H 1 <1> 0.0000 24 H2 2.4303 -2.7609 -0.2729 H 1 <1> 0.0000 25 H3 -1.9871 -3.8413 0.1964 H 1 <1> 0.0000 26 H4 -4.0546 -2.6601 0.3994 H 1 <1> 0.0000 27 H5 -3.1344 3.1058 0.2732 H 1 <1> 0.0000 28 H6 -5.5920 -1.6434 0.5528 H 1 <1> 0.0000 29 **** 0.9611 1.6108 -0.9565 LP 1 <1> 0.0000 30 **** 1.1278 1.6195 0.6679 LP 1 <1> 0.0000 31 **** -1.1775 1.8821 0.0814 LP 1 <1> 0.0000 32 **** 0.3651 -3.7765 -0.0409 LP 1 <1> 0.0000 33 **** 3.9476 1.5234 -0.3578 LP 1 <1> 0.0000 34 **** 2.2395 1.6348 -0.0929 LP 1 <1> 0.0000 35 **** 4.9560 -0.4308 -0.6664 LP 1 <1> 0.0000 36 **** 3.9888 -1.8674 -0.6373 LP 1 <1> 0.0000 37 **** -6.1681 -0.4004 -0.2162 LP 1 <1> 0.0000 38 **** -6.0019 -0.3909 1.4082 LP 1 <1> 0.0000 39 **** -7.5554 2.1121 0.6331 LP 1 <1> 0.0000 40 **** -6.7433 0.5904 0.4760 LP 1 <1> 0.0000 41 **** -6.4372 4.0286 0.7188 LP 1 <1> 0.0000 42 **** -4.7085 3.9753 0.6242 LP 1 <1> 0.0000 @BOND 1 1 2 1 2 1 23 1 3 2 3 ar 4 2 4 ar 5 3 5 ar 6 3 6 ar 7 4 7 1 8 4 8 ar 9 5 9 ar 10 5 10 ar 11 6 11 ar 12 6 12 ar 13 7 13 ar 14 7 14 ar 15 8 12 ar 16 8 24 1 17 9 15 ar 18 9 16 ar 19 10 17 ar 20 11 16 ar 21 11 25 1 22 15 18 1 23 15 19 ar 24 16 26 1 25 17 19 ar 26 17 27 1 27 18 28 1 28 19 20 1 29 20 21 ar 30 20 22 ar 31 1 29 1 32 1 30 1 33 10 31 1 34 12 32 1 35 13 33 1 36 13 34 1 37 14 35 1 38 14 36 1 39 18 37 1 40 18 38 1 41 21 39 1 42 21 40 1 43 22 41 1 44 22 42 1 @SUBSTRUCTURE 1 **** 1 @SET LONE_PAIRS STATIC ATOMS **** "" 14 29 30 31 32 33 34 35 36 37 38 39 40 41 42 DONOR_HYDROGENS STATIC ATOMS **** "" 2 23 28 ATOM$BLUE STATIC ATOMS COLORGROUP SYSTEM 14 29 30 31 32 33 34 35 36 37 38 39 40 41 42 ATOM$RED STATIC ATOMS COLORGROUP SYSTEM 2 23 28 _______________________________________________________ Too much spam in your inbox? Yahoo! Mail gives you the best spam protection for FREE! http://in.mail.yahoo.com From felipe.albrecht at gmail.com Mon Aug 1 08:41:33 2005 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Mon, 1 Aug 2005 09:41:33 -0300 Subject: [BiO BB] A problem about a subroutin in my code In-Reply-To: <20050730003544.95514.qmail@web53503.mail.yahoo.com> References: <20050730003544.95514.qmail@web53503.mail.yahoo.com> Message-ID: A just single comment, why in : my @Motif = split(//,'$origin[$y]'); # This is a loop to get the motif template from origin8 the variable $origin[$y] is under quotes? The correct isnt : " my @Motif = split(//,$origin[$y]); " ? Felipe Albrecht 2005/7/29, Alex Zhang : > Dear all, > > Sorry to bother you. I need some help on my code. I have an input file named > "origin8.txt" which holds 200 short sequences of width 8. My code is to use > each > short sequence from "origin8.txt" as a template to generate 100 short > sequences of the same > width and store them in a txt file A. > > Then the code will read 100 short sequences from the txt file A and 100 long > sequences of width 200 from a txt file B , and then replaced a substring of > each long sequence using each short sequence. This code will lead to two txt > files C and D. File C will hold 100 replaced long sequences. > > In other words, I want to input "origin8.txt" to get 200 File D. > > My code can generates 200 File D but each of them holds nothing. So I guess > the problem is caused by a failure of passing the data to a subroutine named > "make_file". > > Can anybody suggest me how to modify that? Thank you very much in advance! > > Sincerely, > > Alex > > > > > > > > My code: > > > > ******************************************************************* > > #!/usr/bin/perl > use strict; > use warnings; > my (@origin, $y); > my $N_Sequences = 100; > my @Alphabet = split(//,'ACGT'); > my $P_Consensus = 0.85; # This is the probability of dominant > letter > # ====== Globals ========================== > my @Probabilities; # Stores the > probability of each character > > > # ====== Program ========================== > > open (ORIGIN, "< origin8.txt"); # This file holds 200 sequences used > for motif template > chomp (@origin = ); > close ORIGIN; > > for ($y=0; $y<=$#origin; $y++) { > > > my @Motif = split(//,'$origin[$y]'); # This is a loop to get the > motif template from origin8 > open (OUT_NORM, ">short_sequences8_[$y].txt") or die "Unable to open > file :$!"; > for (my $i=0; $i < $N_Sequences; $i++) { > for (my $j=0; $j < scalar(@Motif); $j++) { > loadConsensusCharacter($Motif[$j]); > addNoiseToDistribution(); > convertToIntervals(); > print OUT_NORM (getRandomCharacter(rand(1.0))); > } > print OUT_NORM "\n"; > make_files(); > } > } > > exit(); > > # ====== Subroutines ======================= > # > sub loadConsensusCharacter { > my ($char) = @_; > my $Found = 'FALSE'; > > for (my $i=0; $i < scalar(@Alphabet); $i++) { > if ( $char eq $Alphabet[$i]) { > $Probabilities[$i] = 1.0; > $Found = 'TRUE'; > } else { > $Probabilities[$i] = 0.0; > } > } > if ($Found eq 'FALSE') { > die("Panic: Motif-Character\"$char\" was not found in Alphabet. > Aborting.\n"); > } > > return(); > } > > # ========================================== > sub addNoiseToDistribution { > > > my $P_NonConsensus = ( 1.0-$P_Consensus) / (scalar(@Alphabet) - 1); > > for (my $i=0; $i < scalar(@Probabilities); $i++) { > if ( $Probabilit ies[$i] == 1.0 ) { > $Probabilities[$i] = $P_Consensus; > } else { > $Probabilities[$i] = $P_NonConsensus; > } > } > > return(); > } > > # ========================================== > sub convertToIntervals { > > my $Sum = 0; > > for (my $i=1; $i < scalar(@Probabilities); $i++) { > $Probabilities[$i] += $Probabilities[$i-1]; > } > > return(); > } > > # ========================================== > sub getRandomCharacter { > > my ($RandomNumber) = @_; > & nbsp;my $i=0; > for ($i=0; $i < scalar(@Probabilities); $i++) { > if ($Probabilities[$i] > $RandomNumber) { last; } > } > > return($Alphabet[$i]); > } > > # ========================================== > sub make_files { > my (@short, @long,$x,$r, $output_norm); > > open (SHORT, "< short_sequences8_[$y].txt"); > chomp (@short = ); > close SHORT; > > open (LONG, "< long_sequences.txt"); > chomp (@long = ); > close LONG; > > open (OUT_INITIAL, "> output8_[$y]1.txt"); > open (OUT_REPLACED, "> output8_[$y]2.txt"); > > for ($x=0; $x<=$#short; $x++) { > $r=2; > print OUT_INITIAL ">SeqName$x\n$long[$x]\n"; > print OUT_REPLACED "SeqName$x\n" . substr($long[$x], $r, length > $short[$x]) . "\n";} > > > close OUT_INITIAL; > close OUT_REPLACED; > > } > > ******************************************************************* > > > > > > Input file "origin8.txt" holds 200 sequences as: > > > > TTTATAAT > TGTCAATG > CGTTGATG > CGTCCTAG > GGCTTCCA > ATTAGCCT > GTCCTGAT > TGTAAATC > CGCTTATT > TTGACATA > CCTGATAT > ATGAATCG > CGTCCGAT > TGGCCCAT > ATCCTGAT > TGCCCATT > CCCTAACT > AAAAAAAA > TTTTTTTT > CCCCCCCC > GGGGGGGG > AAAAAAAT > AAAAAAAG > AAAAAAAC > AAAAAACC > AAAAAATT > AAAAAAGG > AAAAAACT > AAAAAACG > AAAAAACA > AAAAACAA > AAAACAAA > AAACAAAA > AACAAAAA > ACAAAAAA > CAAAAAAA > AAAAAATA > AAAAATAA > AAAATAAA > AAATAAAA > AATAAAAA > ATAAAAAA > TAAAAAAA > AAAAAAGA > AAAAAGAA > AAAAGAAA > AAAGAAAA > AAGAAAAA > AGAAAAAA > GAAAAAAA > AAAACCAA > AACCAAAA > CCAAAAAA > AAAATTAA > AATTAAAA > TTAAAAAA > AAAAACCC > AAAACCCA > AAACCCAA > AACCCAAA > ACCCAAAA > CCCAAAAA > AAAAATTT > AAAATTTA > AAATTTAA > AATTTAAA > ATTTAAAA > TTTAAAAA > AAAAAGGG > AAAAGGGA > AAAGGGAA > AAGGGAAA > AGGGAAAA > GGGAAAAA > AAAACCCC > AAACCCCA > AACCCCAA > ACCCCAAA > CCCCAAAA > AAAATTTT > AAATTTTA > AATTTTA A > ATTTTAAA > TTTTAAAA > AAAAGGGG > AAAGGGGA > AAGGGGAA > AGGGGAAA > GGGGAAAA > AAACCCCC > AACCCCCA > ACCCCCAA > CCCCCAAA > AAATTTTT > AATTTTTA > ATTTTTAA > TTTTTAAA > AAAGGGGG > AAGGGGGA > AGGGGGAA > GGGGGAAA > AAGGGGGG > AGGGGGGA > GGGGGGAA > AACCCCCC > ACCCCCCA > CCCCCCAA > AATTTTTT > ATTTTTTA > TTTTTTAA > ATTTTTTT > TTTTTTTA > ACCCCCCC > CCCCCCCA > AGGGGGGG > GGGGGGGA > ATTTTTTT > TTTTTTTA > ATAAAATA > AATAAATA > AAATAATA > AAAATATA > ACAAAACA > AACAAACA > AAACAACA > AAAACACA > AGAAAAGA > AAGAAAGA > AAAGAAGA > AAAAGAGA > ATAAAAGA > ATAAAACA > AGAAAATA > AGAAAACA > ACAAAAGA > ACAAAATA > ATTAAATA > AATTAATA > AAATTATA > ACCAAACA > AACCAACA > AAACCACA > AGGAAAGA > AAGGAAGA > AAAGGAGA > ATTTAATA > AATTTATA > ACCCAACA > AACCCACA > AGGGAAGA > AAGGGAGA > ATTTAACA > ATTTAAGA > AATTTACA > AATTTAGA > ACCCAATA > ACCCAAGA > AACCCATA > AACCCAGA > AGGGAACA > AGGGAATA > AAGGGATA > AAGGGACA > TTGGGACA > C CGGGACA< BR>AGAAGGGA > TGCCCATA > TAAAAAAT > TGCCTATA > CCGTAGTC > ACTTGACT > CTGATCCC > TGTGACTA > CCTGATCC > CCTGAACC > TGATCACG > GGGTAACC > CTTTTGAA > TTGTATGA > CCTGATAA > CTGGTTAG > CCCCGACC > TTGGGGAC > GGTTTGAC > GCTTAGAC > GTTACACC > TTGTACCA > TGGTACCA > CCGTACAT > CCCTTGCC > GTGTTGGT > ATCGATCG > ACGTACGT > TCAGTCAG > GCTATACG > GTCCATAC > CCGTCCGT > ATATATCC > GTGTCCCC > > ________________________________ > Yahoo! Mail for Mobile > Take Yahoo! Mail with you! Check email on your mobile phone. > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > From jeff at bioinformatics.org Mon Aug 1 09:06:40 2005 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Mon, 01 Aug 2005 09:06:40 -0400 Subject: [BiO BB] A problem about a subroutin in my code In-Reply-To: References: <20050730003544.95514.qmail@web53503.mail.yahoo.com> Message-ID: <42EE1E60.7070302@bioinformatics.org> Please try not to repost a long message or attachment when replying. Subscribers who get messages in "digest" format will end up with lots to scan through :-) Thanks, Jeff Felipe Albrecht wrote: > A just single comment, > > why in : > my @Motif = split(//,'$origin[$y]'); # This is a loop to get > the motif template from origin8 > the variable $origin[$y] is under quotes? > The correct isnt : " my @Motif = split(//,$origin[$y]); " ? > > Felipe Albrecht -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From bci2005 at dimi.uniud.it Wed Aug 3 06:11:09 2005 From: bci2005 at dimi.uniud.it (BCI2005 Committee) Date: Wed, 3 Aug 2005 12:11:09 +0200 Subject: [BiO BB] BCI 2005 School: Last Call for Participation Message-ID: <22A0EA65-2E56-4F58-96D4-2C0C0A7605D3@dimi.uniud.it> ------------------------------------------------------------ SECOND INTERNATIONAL SCHOOL ON BIOLOGY, COMPUTATION AND INFORMATION (BCI 2005) September 12-16, 2005, Dobbiaco (BZ), Italy http://bioinf.dimi.uniud.it/bci2005 CALL FOR PARTICIPATION ------------------------------------------------------------ The second edition of the School on Biology, Computation and Information (BCI) aims at bringing together teachers and students from Biology, Mathematics and Computer Science. Main goal of the school is to give an updated overview of interdisciplinary techniques and problems to be studied on the boundaries among the three fields. COURSES Title: Biological Systems as Reactive Systems Teacher: Prof. Luca Cardelli, Microsoft Research, Cambridge, UK Title: Optimization Methods for Computational Biology Teacher: Prof. Giuseppe Lancia, Dip. di Matematica e Informatica, Universita' degli Studi di Udine, Italy Title: Algorithmic and Combinatorial Analysis of Genomes Teacher: Prof. Giorgio Valle, Dip. di Biologia / CRIBI, Universita' degli Studi di Padova, Italy STUDENT SESSION We strongly encourage participants to sign in for the student session that is planned on the last day of the school. The student session is meant to be quite informal, giving participants the opportunity to share their research activity. More information is available on the website (see below). REGISTRATION Early registration deadline: July 8, 2005. Late registration deadline: August 19, 2005. We can provide accommodation for 40 participants, assigned on a first-come first-served basis. To apply use the online registration form available at https://bioinf.dimi.uniud.it/bci_registr Acceptance of more participants will be evaluated by the organizers. Early registration fee: EUR 350. Late registration fee: EUR 400. The registration fee covers participation at all lectures, course materials and full board accommodation. ACCOMMODATION Participants will be lodged at the Apparthotel Germania (http://www.apparthotel-germania.com/). Accommodation includes five nights with breakfasts, lunches and dinners, including the dinner on Sunday 11, 2005 h20:00, and the lunch on Friday 16, 2005. WEBSITE AND CONTACT For all additional information, please visit the website http://bioinf.dimi.uniud.it/bci2005 You can also contact the school organizers at bci2005 at dimi.uniud.it ------------------------------------------------------------ ORGANIZING COMMETTEE - Alberto Policriti, University of Udine (school director) - Agostino Dovier, University of Udine (school co-director) - Luca Bortolussi, University of Udine - Alberto Casagrande, University of Udine - Raffaella Gentilini, University of Udine - Carla Piazza, University of Udine - Nicola Vitacolonna, University of Udine - Marco Zantoni, University of Udine SPONSORS - Dipartimento di Matematica e Informatica dell'Universita' di Udine. - Dipartimento di Matematica e Informatica dell'Universita' di Trieste. - INTERREG Project Formazione post-lauream e Aggiornamento dei Ricercatori sui Processi Innovativi del Calcolo Scientifico - Realizzazione di una Rete di Formazione e Ricerca fra le Universita' di Innsbruck, Trieste, Udine e Bolzano - Progetto cofinanziato nell'ambito dell'Unione Europea e del Ministero delle Infrastrutture e dei Trasporti. - Istituto Nazionale di Alta Matematica - Gruppo Nazionale di Calcolo Scientifico. The school is organized in the scope of the INdAM project "Metodi Matematici ed Algoritmici per l'Analisi di Sequenze di Nucleotidi e Amminoacidi" (Algorithmic and Mathematical Methods for the Analysis of Nucleotide and Aminoacid Sequences). From bci2005 at dimi.uniud.it Wed Aug 3 06:06:58 2005 From: bci2005 at dimi.uniud.it (BCI2005 Committee) Date: Wed, 3 Aug 2005 12:06:58 +0200 Subject: [BiO BB] BCI 2005: viaggio e materiale didattico Message-ID: <68A56E72-FEF6-4D54-8796-501713E254EA@dimi.uniud.it> Gentili professori, stiamo procedendo nell'organizzazione della scuola. A tal fine, vi preghiamo di metterci a conoscenza di eventuali esigenze per il trasporto da e verso Dobbiaco: salvo una vostra esplicita richiesta, assumeremo che effettuerete il viaggio con i vostri mezzi. Restiamo in ogni caso a disposizione per informazioni relative a come raggiungere la sede della scuola. Approfitto per invitarvi a inviarci, prima dell'inizio della scuola, il materiale didattico (articoli, slide, etc...) e i riferimenti bibliografici inerenti al corso da voi tenuto, da poter inserire nella pagina web della scuola. Cordiali saluti Nicola Vitacolonna BCI 2005 Organizing Committee From mayagao1999 at yahoo.com Thu Aug 4 15:26:02 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Thu, 4 Aug 2005 12:26:02 -0700 (PDT) Subject: [BiO BB] A problem about a subroutin in my code In-Reply-To: Message-ID: <20050804192602.5667.qmail@web53510.mail.yahoo.com> Dear Felipe, Thank you very much! Yes, it was an error for '$origin[$y]'. Besides that, there was some other problem with my code I think. It was supposed to produce 200 outputs but it produced only 100 outputs. Regards, Alex --- Felipe Albrecht wrote: > A just single comment, > > why in : > my @Motif = split(//,'$origin[$y]'); # This > is a loop to get > the motif template from origin8 > the variable $origin[$y] is under quotes? > The correct isnt : " my @Motif = > split(//,$origin[$y]); " ? > > Felipe Albrecht > > 2005/7/29, Alex Zhang : > > Dear all, > > > > Sorry to bother you. I need some help on my code. > I have an input file named > > "origin8.txt" which holds 200 short sequences of > width 8. My code is to use > > each > > short sequence from "origin8.txt" as a template to > generate 100 short > > sequences of the same > > width and store them in a txt file A. > > > > Then the code will read 100 short sequences from > the txt file A and 100 long > > sequences of width 200 from a txt file B , and > then replaced a substring of > > each long sequence using each short sequence. This > code will lead to two txt > > files C and D. File C will hold 100 replaced long > sequences. > > > > In other words, I want to input "origin8.txt" to > get 200 File D. > > > > My code can generates 200 File D but each of them > holds nothing. So I guess > > the problem is caused by a failure of passing the > data to a subroutine named > > "make_file". > > > > Can anybody suggest me how to modify that? Thank > you very much in advance! > > > > Sincerely, > > > > Alex > > > > > > > > > > > > > > > > My code: > > > > > > > > > ******************************************************************* > > > > #!/usr/bin/perl > > use strict; > > use warnings; > > my (@origin, $y); > > my $N_Sequences = 100; > > my @Alphabet = split(//,'ACGT'); > > my $P_Consensus = 0.85; # This is > the probability of dominant > > letter > > # ====== Globals ========================== > > my @Probabilities; # Stores the > > probability of each character > > > > > > # ====== Program ========================== > > > > open (ORIGIN, "< origin8.txt"); # This file > holds 200 sequences used > > for motif template > > chomp (@origin = ); > > close ORIGIN; > > > > for ($y=0; $y<=$#origin; $y++) { > > > > > > my @Motif = split(//,'$origin[$y]'); # > This is a loop to get the > > motif template from origin8 > > open (OUT_NORM, ">short_sequences8_[$y].txt") > or die "Unable to open > > file :$!"; > > for (my $i=0; $i < $N_Sequences; $i++) { > > for (my $j=0; $j < scalar(@Motif); > $j++) { > > > loadConsensusCharacter($Motif[$j]); > > addNoiseToDistribution(); > > > convertToIntervals(); > > print OUT_NORM > (getRandomCharacter(rand(1.0))); > > > } > > print OUT_NORM "\n"; > > make_files(); > > } > > } > > > > exit(); > > > > # ====== Subroutines ======================= > > # > > sub loadConsensusCharacter { > > my ($char) = @_; > > my $Found = 'FALSE'; > > > > for (my $i=0; $i < scalar(@Alphabet); $i++) { > > if ( $char eq $Alphabet[$i]) { > > $Probabilities[$i] = 1.0; > > $Found = 'TRUE'; > > } else { > > $Probabilities[$i] = 0.0; > > } > > } > > if ($Found eq 'FALSE') { > > die("Panic: Motif-Character\"$char\" was not > found in Alphabet. > > Aborting.\n"); > > } > > > > return(); > > } > > > > # ========================================== > > sub addNoiseToDistribution { > > > > > > my $P_NonConsensus = ( 1.0-$P_Consensus) / > (scalar(@Alphabet) - 1); > > > > for (my $i=0; $i < scalar(@Probabilities); > $i++) { > > if ( $Probabilit ies[$i] == 1.0 ) { > > $Probabilities[$i] = $P_Consensus; > > } else { > > $Probabilities[$i] = $P_NonConsensus; > > } > > } > > > > return(); > > } > > > > # ========================================== > > sub convertToIntervals { > > > > my $Sum = 0; > > > > for (my $i=1; $i < scalar(@Probabilities); > $i++) { > > $Probabilities[$i] += > $Probabilities[$i-1]; > > } > > > > return(); > > } > > > > # ========================================== > > sub getRandomCharacter { > > > > my ($RandomNumber) = @_; > > & nbsp;my $i=0; > > for ($i=0; $i < scalar(@Probabilities); $i++) > { > > if ($Probabilities[$i] > $RandomNumber) { > last; } > > } > > > > return($Alphabet[$i]); > > } > > > > # ========================================== > > sub make_files { > > my (@short, @long,$x,$r, $output_norm); > > > > open (SHORT, "< short_sequences8_[$y].txt"); > > chomp (@short = ); > > close SHORT; > > > > open (LONG, "< long_sequences.txt"); > > chomp (@long = ); > > close LONG; > > > > open (OUT_INITIAL, "> output8_[$y]1.txt"); > > open (OUT_REPLACED, "> output8_[$y]2.txt"); > > > > for ($x=0; $x<=$#short; $x++) { > > $r=2; > > print OUT_INITIAL ">SeqName$x\n$long[$x]\n"; > === message truncated === __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ngadewal at yahoo.com Mon Aug 8 02:25:19 2005 From: ngadewal at yahoo.com (nikhil gadewal) Date: Sun, 7 Aug 2005 23:25:19 -0700 (PDT) Subject: [BiO BB] pathway prediction Message-ID: <20050808062519.70946.qmail@web33408.mail.mud.yahoo.com> Dear Members, Other than Chilibot any tool available free on internet to predict biological network or pathway by mining pubmed abstracts from a list of genes. Thankyou in advance Nikhil NIKHIL S. GADEWALACTREC, Tata Memorial Centre,Kharghar, Navi Mumbai, IndiaGreat minds discuss ideas; Average minds discuss events; Small minds discuss people. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From hchen at utmem.edu Mon Aug 8 08:05:53 2005 From: hchen at utmem.edu (Hao Chen) Date: Mon, 8 Aug 2005 07:05:53 -0500 Subject: [BiO BB] pathway prediction In-Reply-To: <20050808062519.70946.qmail@web33408.mail.mud.yahoo.com> References: <20050808062519.70946.qmail@web33408.mail.mud.yahoo.com> Message-ID: <20050808120553.GA6371@utmail.utmem.edu> On Sun, Aug 07, 2005 at 11:25:19PM -0700, nikhil gadewal wrote: > Dear Members, > > Other than Chilibot any tool available free on > internet to predict biological network or pathway by > mining pubmed abstracts from a list of genes. > http://www.pubgene.org old version of PubMed, fast, need to manually search PubMed for details. http://pubmatrix.grc.nia.nih.gov/ current version of PubMed, no graphs http://bioinf.cs.ucl.ac.uk/biorat/ windows software http://www.textpresso.org/ C.elegans only, full text, ontology based query More at http://arrowsmith.psych.uic.edu/arrowsmith_uic/tools.html > Thankyou in advance > > Nikhil > > NIKHIL S. GADEWALACTREC, Tata Memorial Centre,Kharghar, Navi Mumbai, IndiaGreat minds discuss ideas; Average minds discuss events; Small minds discuss people. > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- - : Hao Chen, Ph.D. : Research Associate : Department of Pharmacology : University of Tennessee Health Science Center : Memphis, TN 38163 USA : Office: 901 448 3201 : Mobil: 901 826 1845 Mining PubMed: http://www.chilibot.net - From joypaul.joy at gmail.com Tue Aug 9 03:50:08 2005 From: joypaul.joy at gmail.com (Joy Paul) Date: Tue, 9 Aug 2005 00:50:08 -0700 Subject: [BiO BB] pathway prediction In-Reply-To: <20050808120553.GA6371@utmail.utmem.edu> References: <20050808062519.70946.qmail@web33408.mail.mud.yahoo.com> <20050808120553.GA6371@utmail.utmem.edu> Message-ID: <33e5e0730508090050ce50899@mail.gmail.com> there is one kegg database , which will solve all your problems. this is kyoto encyclopedia of genes and genomes. you will find the link to all databases and genomic and proteomic information . all that you need is there. joy paul. bioinformatics centre. university of pune. india. From val at vtek.com Tue Aug 9 10:46:21 2005 From: val at vtek.com (val) Date: Tue, 9 Aug 2005 10:46:21 -0400 Subject: [BiO BB] pathway prediction References: <20050808062519.70946.qmail@web33408.mail.mud.yahoo.com><20050808120553.GA6371@utmail.utmem.edu> <33e5e0730508090050ce50899@mail.gmail.com> Message-ID: <276e01c59cf1$1c432120$c400a8c0@sony> Pathway prediction? ..sounds too good. Be realistic... val ----- Original Message ----- From: "Joy Paul" To: "Hao Chen" ; "The general forum at Bioinformatics.Org" Sent: Tuesday, August 09, 2005 3:50 AM Subject: Re: [BiO BB] pathway prediction > there is one kegg database , which will solve all your problems. > this is kyoto encyclopedia of genes and genomes. you will find the > link to all databases and genomic and proteomic information . all > that you need is there. > > joy paul. > bioinformatics centre. > university of pune. > india. > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From bioinfosm at gmail.com Tue Aug 9 13:24:41 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Tue, 9 Aug 2005 12:24:41 -0500 Subject: [BiO BB] Clustering small DNA sequences into groups Message-ID: <726450810508091024495a4fd9@mail.gmail.com> Hi, I have a set of small DNA sequences (about 40) 6-10 bp, and wish to group them into clusters based on sequence. Any suggestions for doing that ? Thanks, Samantha From operon at cbiot.ufrgs.br Tue Aug 9 14:51:00 2005 From: operon at cbiot.ufrgs.br (Marcos Oliveira de Carvalho) Date: Tue, 09 Aug 2005 15:51:00 -0300 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: <726450810508091024495a4fd9@mail.gmail.com> References: <726450810508091024495a4fd9@mail.gmail.com> Message-ID: Hi Samantha, BLASTCLUST can group DNA sequences. Maybe you will need to tweak the parameters (almost the same for BLAST). You can get it at the NCBI ftp: ftp://ftp.ncbi.nih.gov/blast/ cheers Marcos On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox wrote: > Hi, > > I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > group them into clusters based on sequence. > > Any suggestions for doing that ? > > Thanks, > > Samantha > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From dmb at mrc-dunn.cam.ac.uk Tue Aug 9 14:53:45 2005 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue, 9 Aug 2005 19:53:45 +0100 (BST) Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: <726450810508091024495a4fd9@mail.gmail.com> Message-ID: On Tue, 9 Aug 2005, Samantha Fox wrote: >Hi, > >I have a set of small DNA sequences (about 40) 6-10 bp, and wish to >group them into clusters based on sequence. > >Any suggestions for doing that ? I never tried using CD-HIT to cluster DNA, but it should work (you will have to alter the 'throwaway' length to something like 4 to stop all your sequences being filterd as too short. I found blastclust (which can be explicitly set to cluster DNA) automatically ignores any protein sequence of less than 30 residues. While it could cluster those together (100% identical for example) it always seems to put any protein fragment less than 30 residues into a new cluster. Not sure if the behaviour is the same in DNA mode. >Thanks, > >Samantha >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From marty.gollery at gmail.com Tue Aug 9 15:24:18 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Tue, 9 Aug 2005 12:24:18 -0700 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: References: <726450810508091024495a4fd9@mail.gmail.com> Message-ID: I believe those sequences are too short for Blastclust. The default word size is 32. Marty On 8/9/05, Marcos Oliveira de Carvalho wrote: > > > Hi Samantha, > > BLASTCLUST can group DNA sequences. Maybe you will need to tweak the > parameters (almost the same for BLAST). You can get it at the NCBI ftp: > ftp://ftp.ncbi.nih.gov/blast/ > > cheers > Marcos > > > > On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox > wrote: > > > Hi, > > > > I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > > group them into clusters based on sequence. > > > > Any suggestions for doing that ? > > > > Thanks, > > > > Samantha > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- From bioinfosm at gmail.com Tue Aug 9 17:23:09 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Tue, 9 Aug 2005 16:23:09 -0500 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: References: <726450810508091024495a4fd9@mail.gmail.com> Message-ID: <7264508105080914235e540d3@mail.gmail.com> Thanks so much for your replies. However, it did not work yet. cd-hit gave this error, and blastclust is not usable for such small sequences ! Any suggestions ? > cat fasta >one tagcgc >two atcgtt > ./cd-hit -i fasta -o www total seq: 0 longest and shortest : 0 and 99999 Total letters: 0 terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc Abort (core dumped) > ./cd-hit -i fasta -o www -l 5 Fatal Error Too short -l, redefine it Program halted !! On 8/9/05, Martin Gollery wrote: > I believe those sequences are too short for Blastclust. The default > word size is 32. > > Marty > > On 8/9/05, Marcos Oliveira de Carvalho wrote: > > > > > > Hi Samantha, > > > > BLASTCLUST can group DNA sequences. Maybe you will need to tweak the > > parameters (almost the same for BLAST). You can get it at the NCBI ftp: > > ftp://ftp.ncbi.nih.gov/blast/ > > > > cheers > > Marcos > > > > > > > > On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox > > wrote: > > > > > Hi, > > > > > > I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > > > group them into clusters based on sequence. > > > > > > Any suggestions for doing that ? > > > > > > Thanks, > > > > > > Samantha From idoerg at burnham.org Tue Aug 9 18:07:34 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue, 09 Aug 2005 15:07:34 -0700 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: References: Message-ID: <42F92926.3060708@burnham.org> CD-HIT does not work on DNA, or short sequences for that matter.. Dan Bolser wrote: >On Tue, 9 Aug 2005, Samantha Fox wrote: > > > >>Hi, >> >>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to >>group them into clusters based on sequence. >> >>Any suggestions for doing that ? >> >> > >I never tried using CD-HIT to cluster DNA, but it should work (you will >have to alter the 'throwaway' length to something like 4 to stop all your >sequences being filterd as too short. > >I found blastclust (which can be explicitly set to cluster >DNA) automatically ignores any protein sequence of less than 30 >residues. While it could cluster those together (100% identical for >example) it always seems to put any protein fragment less than 30 residues >into a new cluster. > >Not sure if the behaviour is the same in DNA mode. > > > > >>Thanks, >> >>Samantha >>_______________________________________________ >>Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9930 http://ffas.ljcrf.edu/~iddo From idoerg at burnham.org Tue Aug 9 18:12:29 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue, 09 Aug 2005 15:12:29 -0700 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: <7264508105080914235e540d3@mail.gmail.com> References: <726450810508091024495a4fd9@mail.gmail.com> <7264508105080914235e540d3@mail.gmail.com> Message-ID: <42F92A4D.9030301@burnham.org> How about building a distance matrix of your own (based on %ID between fragments) and then use WEKA for the clustering? ./I Samantha Fox wrote: >Thanks so much for your replies. However, it did not work yet. cd-hit >gave this error, and blastclust is not usable for such small sequences >! > >Any suggestions ? > > > >>cat fasta >>one >> >> >tagcgc > > >>two >> >> >atcgtt > > >>./cd-hit -i fasta -o www >> >> >total seq: 0 >longest and shortest : 0 and 99999 >Total letters: 0 >terminate called after throwing an instance of 'std::bad_alloc' > what(): St9bad_alloc >Abort (core dumped) > > > >>./cd-hit -i fasta -o www -l 5 >> >> > >Fatal Error >Too short -l, redefine it > >Program halted !! > > > >On 8/9/05, Martin Gollery wrote: > > >>I believe those sequences are too short for Blastclust. The default >>word size is 32. >> >>Marty >> >>On 8/9/05, Marcos Oliveira de Carvalho wrote: >> >> >>>Hi Samantha, >>> >>>BLASTCLUST can group DNA sequences. Maybe you will need to tweak the >>>parameters (almost the same for BLAST). You can get it at the NCBI ftp: >>>ftp://ftp.ncbi.nih.gov/blast/ >>> >>>cheers >>>Marcos >>> >>> >>> >>>On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox >>>wrote: >>> >>> >>> >>>>Hi, >>>> >>>>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to >>>>group them into clusters based on sequence. >>>> >>>>Any suggestions for doing that ? >>>> >>>>Thanks, >>>> >>>>Samantha >>>> >>>> >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9930 http://ffas.ljcrf.edu/~iddo From bioinfosm at gmail.com Wed Aug 10 00:14:28 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Tue, 9 Aug 2005 23:14:28 -0500 Subject: [BiO BB] Transfac matrices Message-ID: <7264508105080921147d09276d@mail.gmail.com> Hi, I had a doubt about Transfac. I could not locate the collection of matrices in PWM or some useful format that can be used in my scripts. Another thing is comparing the motif or PWM that I find, with all Transfac matrices; to ascertain if this is one of those, or something totally unrelated. I am sure there should be something available, just that I could not find it. Can someone help please, thanks. Samantha From bioinfosm at gmail.com Wed Aug 10 00:29:09 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Tue, 9 Aug 2005 23:29:09 -0500 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: <7264508105080914235e540d3@mail.gmail.com> References: <726450810508091024495a4fd9@mail.gmail.com> <7264508105080914235e540d3@mail.gmail.com> Message-ID: <7264508105080921292bf1a82c@mail.gmail.com> Well, I had thought that there will be some tools to do what I wanted. Maybe I will explain a bit more. Say I have these dna sequences tataa, tattta, ttaata, taaaaa, tatata, aattaa,ataaa, tctttc, ttcatt, acttca. Now maybe some sort of grouping or clustering.... in this small example last 3 fall in one group ... somehow... Any clues ! Samantha > > On 8/9/05, Martin Gollery wrote: > > I believe those sequences are too short for Blastclust. The default > > word size is 32. > > > > Marty > > > > On 8/9/05, Marcos Oliveira de Carvalho wrote: > > > > > > > > > Hi Samantha, > > > > > > BLASTCLUST can group DNA sequences. Maybe you will need to tweak the > > > parameters (almost the same for BLAST). You can get it at the NCBI ftp: > > > ftp://ftp.ncbi.nih.gov/blast/ > > > > > > cheers > > > Marcos > > > > > > > > > > > > On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox > > > wrote: > > > > > > > Hi, > > > > > > > > I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > > > > group them into clusters based on sequence. > > > > > > > > Any suggestions for doing that ? > > > > > > > > Thanks, > > > > > > > > Samantha > From idh at poulet.org Wed Aug 10 04:40:30 2005 From: idh at poulet.org (Yannick Wurm) Date: Wed, 10 Aug 2005 10:40:30 +0200 Subject: [BiO BB] Clustering small DNA sequences into groups In-Reply-To: <7264508105080921292bf1a82c@mail.gmail.com> References: <726450810508091024495a4fd9@mail.gmail.com> <7264508105080914235e540d3@mail.gmail.com> <7264508105080921292bf1a82c@mail.gmail.com> Message-ID: <7470016C-7D13-4351-83F7-5ADFC42D8DAE@poulet.org> Hi Sam, here's farfetched hack: if you don't find anything else, you may want to think along similar lines - replace every base by a tab-delimited number eg: acttca becomes 1 2 4 4 2 1 - use the resulting file as an input for a clustering algorithm which clusters things, such as gene expression data good luck, yannick On Aug 10, 2005, at 06:29, Samantha Fox wrote: > Well, I had thought that there will be some tools to do what I wanted. > Maybe I will explain a bit more. Say I have these dna sequences > tataa, tattta, ttaata, taaaaa, tatata, aattaa,ataaa, tctttc, > ttcatt, acttca. > > Now maybe some sort of grouping or clustering.... in this small > example last 3 fall in one group ... somehow... > > Any clues ! > > Samantha > > >> >> On 8/9/05, Martin Gollery wrote: >> >>> I believe those sequences are too short for Blastclust. The default >>> word size is 32. >>> >>> Marty >>> >>> On 8/9/05, Marcos Oliveira de Carvalho >>> wrote: >>> >>>> >>>> >>>> Hi Samantha, >>>> >>>> BLASTCLUST can group DNA sequences. Maybe you will need to tweak >>>> the >>>> parameters (almost the same for BLAST). You can get it at the >>>> NCBI ftp: >>>> ftp://ftp.ncbi.nih.gov/blast/ >>>> >>>> cheers >>>> Marcos >>>> >>>> >>>> >>>> On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox >>>> >>>> wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> I have a set of small DNA sequences (about 40) 6-10 bp, and >>>>> wish to >>>>> group them into clusters based on sequence. >>>>> >>>>> Any suggestions for doing that ? >>>>> >>>>> Thanks, >>>>> >>>>> Samantha >>>>> >> >> > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From idoerg at burnham.org Wed Aug 10 12:29:16 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 10 Aug 2005 09:29:16 -0700 Subject: [BiO BB] Re: Where can I make Logo plot? In-Reply-To: <20050810131853.9CCCE19CC82@mail.tongji.edu.cn> References: <20050810131853.9CCCE19CC82@mail.tongji.edu.cn> Message-ID: <42FA2B5C.3090807@burnham.org> http://www.lecb.ncifcrf.gov/~toms/glossary.html#sequence_logo ekeen wrote: > Where can I make Logo plot like follow? > > Thanks. > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9949 http://ffas.ljcrf.edu/~iddo From deepan_3356 at yahoo.co.in Wed Aug 10 16:01:37 2005 From: deepan_3356 at yahoo.co.in (deepan@bioinformatics.org) Date: Wed, 10 Aug 2005 21:01:37 +0100 (BST) Subject: [BiO BB] neural networks of macaque In-Reply-To: <42FA2B5C.3090807@burnham.org> Message-ID: <20050810200137.47713.qmail@web8409.mail.in.yahoo.com> Hi, I want to download the whole neural network of macaque from www.cocomac.org. I dont know how to query that website to download the whole network. If some one is working in this please guide me . Thanks Deepan Chakravarthy N my home page http://deepan.tk mail id : deepan at bioinformatics.org ----------------------------------------- deepan chakravarthy n 3rd year,(5th sem), b.tech(biotech), center for biotechnology, anna university , chennai. ph no: home:04287-241199,04287-244399, brother no:9840824399, address: ac tech hostel (jh 108), anna university, chennai-25. ___________________________________________________________ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com From Alex.Bossers at wur.nl Thu Aug 11 10:44:28 2005 From: Alex.Bossers at wur.nl (Bossers, Alex) Date: Thu, 11 Aug 2005 16:44:28 +0200 Subject: [BiO BB] Clustering small DNA sequences into groups Message-ID: <839C6D97DA4B564882F385F1DA46742C01DE6254@slely0004.wurnet.nl> Have you tried TGICL? No experience wit small sequences but its good to configure. No machine at hand to test it :(. http://www.tigr.org/tdb/tgi/software/ Alex > -----Oorspronkelijk bericht----- > Van: bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.org > [mailto:bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.or g] > Namens Samantha Fox > Verzonden: woensdag 10 augustus 2005 6:29 > Aan: mgollery at unr.edu; The general forum at Bioinformatics.Org; dmb at mrc- > dunn.cam.ac.uk; operon at cbiot.ufrgs.br; idoerg at burnham.org > Onderwerp: Re: [BiO BB] Clustering small DNA sequences into groups > > Well, I had thought that there will be some tools to do what I wanted. > Maybe I will explain a bit more. Say I have these dna sequences > tataa, tattta, ttaata, taaaaa, tatata, aattaa,ataaa, tctttc, ttcatt, acttca. > > Now maybe some sort of grouping or clustering.... in this small > example last 3 fall in one group ... somehow... > > Any clues ! > > Samantha > > > > > On 8/9/05, Martin Gollery wrote: > > > I believe those sequences are too short for Blastclust. The default > > > word size is 32. > > > > > > Marty > > > > > > On 8/9/05, Marcos Oliveira de Carvalho wrote: > > > > > > > > > > > > Hi Samantha, > > > > > > > > BLASTCLUST can group DNA sequences. Maybe you will need to tweak > the > > > > parameters (almost the same for BLAST). You can get it at the NCBI ftp: > > > > ftp://ftp.ncbi.nih.gov/blast/ > > > > > > > > cheers > > > > Marcos > > > > > > > > > > > > > > > > On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > > > > > group them into clusters based on sequence. > > > > > > > > > > Any suggestions for doing that ? > > > > > > > > > > Thanks, > > > > > > > > > > Samantha > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From bioinfosm at gmail.com Thu Aug 11 11:09:18 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Thu, 11 Aug 2005 10:09:18 -0500 Subject: [BiO BB] BLAST: locally installed databases Message-ID: <7264508105081108094a345417@mail.gmail.com> Hello, I wish to have regularly updated databases from NCBI to use with my blastall command. Is there an easy standard way to do so ? Thanks, Samantha From golharam at umdnj.edu Thu Aug 11 11:44:32 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 11 Aug 2005 11:44:32 -0400 Subject: [BiO BB] BLAST: locally installed databases In-Reply-To: <7264508105081108094a345417@mail.gmail.com> Message-ID: <002901c59e8b$91466680$e6028a0a@GOLHARMOBILE1> I have a perl script running as a cron job to download the databases nightly if there are new tarballs available.... I can email you the script... -----Original Message----- From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org ] On Behalf Of Samantha Fox Sent: Thursday, August 11, 2005 11:09 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] BLAST: locally installed databases Hello, I wish to have regularly updated databases from NCBI to use with my blastall command. Is there an easy standard way to do so ? Thanks, Samantha _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From bioinfosm at gmail.com Thu Aug 11 11:50:33 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Thu, 11 Aug 2005 10:50:33 -0500 Subject: [BiO BB] BLAST: locally installed databases In-Reply-To: <002901c59e8b$91466680$e6028a0a@GOLHARMOBILE1> References: <7264508105081108094a345417@mail.gmail.com> <002901c59e8b$91466680$e6028a0a@GOLHARMOBILE1> Message-ID: <726450810508110850923f076@mail.gmail.com> That will help Ryan. Thanks. And these are pre-formatted databases? And my guess is they goto the data folder specified in .ncbirc ? Thanks again, Sumit On 8/11/05, Ryan Golhar wrote: > I have a perl script running as a cron job to download the databases > nightly if there are new tarballs available.... > > I can email you the script... > > > -----Original Message----- > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > ] On Behalf Of Samantha Fox > Sent: Thursday, August 11, 2005 11:09 AM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] BLAST: locally installed databases > > > Hello, > I wish to have regularly updated databases from NCBI to use with my > blastall command. > > Is there an easy standard way to do so ? > > Thanks, > > Samantha > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From golharam at umdnj.edu Thu Aug 11 12:09:45 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 11 Aug 2005 12:09:45 -0400 Subject: [BiO BB] BLAST: locally installed databases In-Reply-To: <726450810508110850923f076@mail.gmail.com> Message-ID: <002d01c59e8f$16f42c60$e6028a0a@GOLHARMOBILE1> Attached is the file get_nr.pl which will download the nr and nt database from NCBI to the local machine as root to the directory /usr/local/ncbi/db. It does this by comparing the .tar.gz file timestamp on NCBI with the one on your local machine. If the one on NCBI is newer, the file is downloaded. The script is pretty simply and should get you going. If you have any questions, let me know and I'll answer what I can. It was written for Solaris, so you might need to change paths to some stuff for other environments as well as download location... Good luck, Ryan -----Original Message----- From: Samantha Fox [mailto:bioinfosm at gmail.com] Sent: Thursday, August 11, 2005 11:51 AM To: golharam at umdnj.edu; The general forum at Bioinformatics.Org Subject: Re: [BiO BB] BLAST: locally installed databases That will help Ryan. Thanks. And these are pre-formatted databases? And my guess is they goto the data folder specified in .ncbirc ? Thanks again, Sumit On 8/11/05, Ryan Golhar wrote: > I have a perl script running as a cron job to download the databases > nightly if there are new tarballs available.... > > I can email you the script... > > > -----Original Message----- > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.o > rg > ] On Behalf Of Samantha Fox > Sent: Thursday, August 11, 2005 11:09 AM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] BLAST: locally installed databases > > > Hello, > I wish to have regularly updated databases from NCBI to use with my > blastall command. > > Is there an easy standard way to do so ? > > Thanks, > > Samantha > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -------------- next part -------------- A non-text attachment was scrubbed... Name: get_nr.pl Type: application/octet-stream Size: 1680 bytes Desc: not available URL: From Alex.Bossers at wur.nl Fri Aug 12 01:23:43 2005 From: Alex.Bossers at wur.nl (Bossers, Alex) Date: Fri, 12 Aug 2005 07:23:43 +0200 Subject: [BiO BB] BLAST: locally installed databases Message-ID: <839C6D97DA4B564882F385F1DA46742C037F8416@slely0004.wurnet.nl> What happened to the incremental updates to blastDb anyway? Alex > -----Oorspronkelijk bericht----- > Van: bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.org > [mailto:bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.or g] > Namens Ryan Golhar > Verzonden: donderdag 11 augustus 2005 18:10 > Aan: 'Samantha Fox'; 'The general forum at Bioinformatics.Org' > Onderwerp: RE: [BiO BB] BLAST: locally installed databases > > Attached is the file get_nr.pl which will download the nr and nt > database from NCBI to the local machine as root to the directory > /usr/local/ncbi/db. > > It does this by comparing the .tar.gz file timestamp on NCBI with the > one on your local machine. If the one on NCBI is newer, the file is > downloaded. > > The script is pretty simply and should get you going. If you have any > questions, let me know and I'll answer what I can. It was written for > Solaris, so you might need to change paths to some stuff for other > environments as well as download location... > > Good luck, > > Ryan > > -----Original Message----- > From: Samantha Fox [mailto:bioinfosm at gmail.com] > Sent: Thursday, August 11, 2005 11:51 AM > To: golharam at umdnj.edu; The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] BLAST: locally installed databases > > > That will help Ryan. Thanks. > And these are pre-formatted databases? And my guess is they goto the > data folder specified in .ncbirc ? > > Thanks again, > Sumit > > On 8/11/05, Ryan Golhar wrote: > > I have a perl script running as a cron job to download the databases > > nightly if there are new tarballs available.... > > > > I can email you the script... > > > > > > -----Original Message----- > > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.o > > rg > > ] On Behalf Of Samantha Fox > > Sent: Thursday, August 11, 2005 11:09 AM > > To: bio_bulletin_board at bioinformatics.org > > Subject: [BiO BB] BLAST: locally installed databases > > > > > > Hello, > > I wish to have regularly updated databases from NCBI to use with my > > blastall command. > > > > Is there an easy standard way to do so ? > > > > Thanks, > > > > Samantha > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > From ekeen at tongji.edu.cn Wed Aug 10 09:11:25 2005 From: ekeen at tongji.edu.cn (ekeen) Date: Wed, 10 Aug 2005 21:11:25 +0800 Subject: [BiO BB] Where can I make Logo plot? In-Reply-To: <42F92A4D.9030301@burnham.org> Message-ID: <20050810131853.9CCCE19CC82@mail.tongji.edu.cn> Where can I make Logo plot like follow? Thanks. -----????----- ???: bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org [mailto:bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org] ?? Iddo Friedberg ????: 2005?8?10? 6:12 ???: The general forum at Bioinformatics.Org ??: Re: [BiO BB] Clustering small DNA sequences into groups How about building a distance matrix of your own (based on %ID between fragments) and then use WEKA for the clustering? ./I Samantha Fox wrote: >Thanks so much for your replies. However, it did not work yet. cd-hit >gave this error, and blastclust is not usable for such small sequences >! > >Any suggestions ? > > > >>cat fasta >>one >> >> >tagcgc > > >>two >> >> >atcgtt > > >>./cd-hit -i fasta -o www >> >> >total seq: 0 >longest and shortest : 0 and 99999 >Total letters: 0 >terminate called after throwing an instance of 'std::bad_alloc' > what(): St9bad_alloc >Abort (core dumped) > > > >>./cd-hit -i fasta -o www -l 5 >> >> > >Fatal Error >Too short -l, redefine it > >Program halted !! > > > >On 8/9/05, Martin Gollery wrote: > > >>I believe those sequences are too short for Blastclust. The default >>word size is 32. >> >>Marty >> >>On 8/9/05, Marcos Oliveira de Carvalho wrote: >> >> >>>Hi Samantha, >>> >>>BLASTCLUST can group DNA sequences. Maybe you will need to tweak the >>>parameters (almost the same for BLAST). You can get it at the NCBI ftp: >>>ftp://ftp.ncbi.nih.gov/blast/ >>> >>>cheers >>>Marcos >>> >>> >>> >>>On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox >>>wrote: >>> >>> >>> >>>>Hi, >>>> >>>>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to >>>>group them into clusters based on sequence. >>>> >>>>Any suggestions for doing that ? >>>> >>>>Thanks, >>>> >>>>Samantha >>>> >>>> >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9930 http://ffas.ljcrf.edu/~iddo _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 33092 bytes Desc: not available URL: From imitra at myezconnect.com Sat Aug 13 06:40:55 2005 From: imitra at myezconnect.com (Indranil Mitra) Date: Sat, 13 Aug 2005 16:10:55 +0530 Subject: [BiO BB] RE: BiO_Bulletin_Board Digest, Vol 10, Issue 9 In-Reply-To: <20050812160214.44527D210B@www.bioinformatics.org> Message-ID: <20050813063231.GA3246@mail07b.vwh1.net> Thanks Ryan, I was going thru the email burst of BioBB and it got a good answer to an email thru your email. Rgds, Indranil -----Original Message----- From: bio_bulletin_board-bounces+imitra=myezconnect.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+imitra=myezconnect.com at bioinformatics.org ] On Behalf Of bio_bulletin_board-request at bioinformatics.org Sent: Friday, August 12, 2005 9:32 PM To: bio_bulletin_board at bioinformatics.org Subject: BiO_Bulletin_Board Digest, Vol 10, Issue 9 Send BiO_Bulletin_Board mailing list submissions to bio_bulletin_board at bioinformatics.org To subscribe or unsubscribe via the World Wide Web, visit https://bioinformatics.org/mailman/listinfo/bio_bulletin_board or, via email, send a message with subject or body 'help' to bio_bulletin_board-request at bioinformatics.org You can reach the person managing the list at bio_bulletin_board-owner at bioinformatics.org When replying, please edit your Subject line so it is more specific than "Re: Contents of BiO_Bulletin_Board digest..." Today's Topics: 1. RE: BLAST: locally installed databases (Ryan Golhar) 2. RE: BLAST: locally installed databases (Bossers, Alex) ---------------------------------------------------------------------- Message: 1 Date: Thu, 11 Aug 2005 12:09:45 -0400 From: Ryan Golhar Subject: RE: [BiO BB] BLAST: locally installed databases To: 'Samantha Fox' , "'The general forum at Bioinformatics.Org'" Message-ID: <002d01c59e8f$16f42c60$e6028a0a at GOLHARMOBILE1> Content-Type: text/plain; charset="us-ascii" Attached is the file get_nr.pl which will download the nr and nt database from NCBI to the local machine as root to the directory /usr/local/ncbi/db. It does this by comparing the .tar.gz file timestamp on NCBI with the one on your local machine. If the one on NCBI is newer, the file is downloaded. The script is pretty simply and should get you going. If you have any questions, let me know and I'll answer what I can. It was written for Solaris, so you might need to change paths to some stuff for other environments as well as download location... Good luck, Ryan -----Original Message----- From: Samantha Fox [mailto:bioinfosm at gmail.com] Sent: Thursday, August 11, 2005 11:51 AM To: golharam at umdnj.edu; The general forum at Bioinformatics.Org Subject: Re: [BiO BB] BLAST: locally installed databases That will help Ryan. Thanks. And these are pre-formatted databases? And my guess is they goto the data folder specified in .ncbirc ? Thanks again, Sumit On 8/11/05, Ryan Golhar wrote: > I have a perl script running as a cron job to download the databases > nightly if there are new tarballs available.... > > I can email you the script... > > > -----Original Message----- > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.o > rg > ] On Behalf Of Samantha Fox > Sent: Thursday, August 11, 2005 11:09 AM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] BLAST: locally installed databases > > > Hello, > I wish to have regularly updated databases from NCBI to use with my > blastall command. > > Is there an easy standard way to do so ? > > Thanks, > > Samantha > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -------------- next part -------------- A non-text attachment was scrubbed... Name: get_nr.pl Type: application/octet-stream Size: 1680 bytes Desc: not available Url : http://bioinformatics.org/pipermail/bio_bulletin_board/attachments/20050811/ 33b48702/get_nr-0001.obj ------------------------------ Message: 2 Date: Fri, 12 Aug 2005 07:23:43 +0200 From: "Bossers, Alex" Subject: RE: [BiO BB] BLAST: locally installed databases To: , "The general forum at Bioinformatics.Org" Message-ID: <839C6D97DA4B564882F385F1DA46742C037F8416 at slely0004.wurnet.nl> Content-Type: text/plain; charset="us-ascii" What happened to the incremental updates to blastDb anyway? Alex > -----Oorspronkelijk bericht----- > Van: bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.org > [mailto:bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.or g] > Namens Ryan Golhar > Verzonden: donderdag 11 augustus 2005 18:10 > Aan: 'Samantha Fox'; 'The general forum at Bioinformatics.Org' > Onderwerp: RE: [BiO BB] BLAST: locally installed databases > > Attached is the file get_nr.pl which will download the nr and nt > database from NCBI to the local machine as root to the directory > /usr/local/ncbi/db. > > It does this by comparing the .tar.gz file timestamp on NCBI with the > one on your local machine. If the one on NCBI is newer, the file is > downloaded. > > The script is pretty simply and should get you going. If you have any > questions, let me know and I'll answer what I can. It was written for > Solaris, so you might need to change paths to some stuff for other > environments as well as download location... > > Good luck, > > Ryan > > -----Original Message----- > From: Samantha Fox [mailto:bioinfosm at gmail.com] > Sent: Thursday, August 11, 2005 11:51 AM > To: golharam at umdnj.edu; The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] BLAST: locally installed databases > > > That will help Ryan. Thanks. > And these are pre-formatted databases? And my guess is they goto the > data folder specified in .ncbirc ? > > Thanks again, > Sumit > > On 8/11/05, Ryan Golhar wrote: > > I have a perl script running as a cron job to download the databases > > nightly if there are new tarballs available.... > > > > I can email you the script... > > > > > > -----Original Message----- > > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.o > > rg > > ] On Behalf Of Samantha Fox > > Sent: Thursday, August 11, 2005 11:09 AM > > To: bio_bulletin_board at bioinformatics.org > > Subject: [BiO BB] BLAST: locally installed databases > > > > > > Hello, > > I wish to have regularly updated databases from NCBI to use with my > > blastall command. > > > > Is there an easy standard way to do so ? > > > > Thanks, > > > > Samantha > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > ------------------------------ _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board End of BiO_Bulletin_Board Digest, Vol 10, Issue 9 ************************************************* --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.859 / Virus Database: 585 - Release Date: 2/14/2005 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.859 / Virus Database: 585 - Release Date: 2/14/2005 From bioinfosm at gmail.com Sun Aug 14 18:47:56 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Sun, 14 Aug 2005 17:47:56 -0500 Subject: [BiO BB] Where can I make Logo plot? In-Reply-To: <20050810131853.9CCCE19CC82@mail.tongji.edu.cn> References: <42F92A4D.9030301@burnham.org> <20050810131853.9CCCE19CC82@mail.tongji.edu.cn> Message-ID: <7264508105081415474baa4605@mail.gmail.com> type seqlogo in url bar .... and hit enter .. its the berkley tool Google for Sequence logo plot, you can find other options as well. Sam On 8/10/05, ekeen wrote: > > > > Where can I make Logo plot like follow? > > Thanks. > > > > -----????----- > ???: > bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org > [mailto:bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org] > ?? Iddo Friedberg > ????: 2005?8?10? 6:12 > ???: The general forum at Bioinformatics.Org > ??: Re: [BiO BB] Clustering small DNA sequences into groups > > > > How about building a distance matrix of your own (based on %ID between > > fragments) and then use WEKA for the clustering? > > > > > > > > ./I > > > > Samantha Fox wrote: > > > > >Thanks so much for your replies. However, it did not work yet. cd-hit > > >gave this error, and blastclust is not usable for such small sequences > > >! > > > > > >Any suggestions ? > > > > > > > > > > > >>cat fasta > > >>one > > >> > > >> > > >tagcgc > > > > > > > > >>two > > >> > > >> > > >atcgtt > > > > > > > > >>./cd-hit -i fasta -o www > > >> > > >> > > >total seq: 0 > > >longest and shortest : 0 and 99999 > > >Total letters: 0 > > >terminate called after throwing an instance of 'std::bad_alloc' > > > what(): St9bad_alloc > > >Abort (core dumped) > > > > > > > > > > > >>./cd-hit -i fasta -o www -l 5 > > >> > > >> > > > > > >Fatal Error > > >Too short -l, redefine it > > > > > >Program halted !! > > > > > > > > > > > >On 8/9/05, Martin Gollery wrote: > > > > > > > > >>I believe those sequences are too short for Blastclust. The default > > >>word size is 32. > > >> > > >>Marty > > >> > > >>On 8/9/05, Marcos Oliveira de Carvalho wrote: > > >> > > >> > > >>>Hi Samantha, > > >>> > > >>>BLASTCLUST can group DNA sequences. Maybe you will need to tweak the > > >>>parameters (almost the same for BLAST). You can get it at the NCBI ftp: > > >>>ftp://ftp.ncbi.nih.gov/blast/ > > >>> > > >>>cheers > > >>>Marcos > > >>> > > >>> > > >>> > > >>>On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox > > >>>wrote: > > >>> > > >>> > > >>> > > >>>>Hi, > > >>>> > > >>>>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > > >>>>group them into clusters based on sequence. > > >>>> > > >>>>Any suggestions for doing that ? > > >>>> > > >>>>Thanks, > > >>>> > > >>>>Samantha > > >>>> > > >>>> > > >_______________________________________________ > > >Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > > >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > > > > > > > > > > > > -- > > > > Iddo Friedberg, Ph.D. > > The Burnham Institute > > 10901 N. Torrey Pines Rd. > > La Jolla, CA 92037 > > Tel: (858) 646 3100 x3516 > > Fax: (858) 713 9930 > > http://ffas.ljcrf.edu/~iddo > > > > _______________________________________________ > > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > From ekeen at tongji.edu.cn Mon Aug 15 09:21:24 2005 From: ekeen at tongji.edu.cn (ekeen) Date: Mon, 15 Aug 2005 21:21:24 +0800 Subject: =?gb2312?B?tPC4tDogW0JpTyBCQl0gV2hlcmUgY2FuIEkgbWFrZSBMb2dvIHBsb3Q/?= In-Reply-To: <7264508105081415474baa4605@mail.gmail.com> Message-ID: <20050815132939.32C0B19CCA3@mail.tongji.edu.cn> Thanks Sam et al. I have done it with your help. :) -----????----- ???: bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org [mailto:bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org] ?? Samantha Fox ????: 2005?8?15? 6:48 ???: The general forum at Bioinformatics.Org ??: Re: [BiO BB] Where can I make Logo plot? type seqlogo in url bar .... and hit enter .. its the berkley tool Google for Sequence logo plot, you can find other options as well. Sam On 8/10/05, ekeen wrote: > > > > Where can I make Logo plot like follow? > > Thanks. > > > > -----????----- > ???: > bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org > [mailto:bio_bulletin_board-bounces+ekeen=tongji.edu.cn at bioinformatics.org] > ?? Iddo Friedberg > ????: 2005?8?10? 6:12 > ???: The general forum at Bioinformatics.Org > ??: Re: [BiO BB] Clustering small DNA sequences into groups > > > > How about building a distance matrix of your own (based on %ID between > > fragments) and then use WEKA for the clustering? > > > > > > > > ./I > > > > Samantha Fox wrote: > > > > >Thanks so much for your replies. However, it did not work yet. cd-hit > > >gave this error, and blastclust is not usable for such small sequences > > >! > > > > > >Any suggestions ? > > > > > > > > > > > >>cat fasta > > >>one > > >> > > >> > > >tagcgc > > > > > > > > >>two > > >> > > >> > > >atcgtt > > > > > > > > >>./cd-hit -i fasta -o www > > >> > > >> > > >total seq: 0 > > >longest and shortest : 0 and 99999 > > >Total letters: 0 > > >terminate called after throwing an instance of 'std::bad_alloc' > > > what(): St9bad_alloc > > >Abort (core dumped) > > > > > > > > > > > >>./cd-hit -i fasta -o www -l 5 > > >> > > >> > > > > > >Fatal Error > > >Too short -l, redefine it > > > > > >Program halted !! > > > > > > > > > > > >On 8/9/05, Martin Gollery wrote: > > > > > > > > >>I believe those sequences are too short for Blastclust. The default > > >>word size is 32. > > >> > > >>Marty > > >> > > >>On 8/9/05, Marcos Oliveira de Carvalho wrote: > > >> > > >> > > >>>Hi Samantha, > > >>> > > >>>BLASTCLUST can group DNA sequences. Maybe you will need to tweak the > > >>>parameters (almost the same for BLAST). You can get it at the NCBI ftp: > > >>>ftp://ftp.ncbi.nih.gov/blast/ > > >>> > > >>>cheers > > >>>Marcos > > >>> > > >>> > > >>> > > >>>On Tue, 09 Aug 2005 14:24:41 -0300, Samantha Fox > > >>>wrote: > > >>> > > >>> > > >>> > > >>>>Hi, > > >>>> > > >>>>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to > > >>>>group them into clusters based on sequence. > > >>>> > > >>>>Any suggestions for doing that ? > > >>>> > > >>>>Thanks, > > >>>> > > >>>>Samantha > > >>>> > > >>>> > > >_______________________________________________ > > >Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > > >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > > > > > > > > > > > > > -- > > > > Iddo Friedberg, Ph.D. > > The Burnham Institute > > 10901 N. Torrey Pines Rd. > > La Jolla, CA 92037 > > Tel: (858) 646 3100 x3516 > > Fax: (858) 713 9930 > > http://ffas.ljcrf.edu/~iddo > > > > _______________________________________________ > > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > From ekeen at tongji.edu.cn Mon Aug 15 09:27:12 2005 From: ekeen at tongji.edu.cn (ekeen) Date: Mon, 15 Aug 2005 21:27:12 +0800 Subject: [BiO BB] How to use BLOSUM62 matrix? Message-ID: <20050815133451.809F619CC94@mail.tongji.edu.cn> I want do alignment of sequences with BLOSUM62 matrix. How to get the comparison score? Is there web services to do this work? Is there any example to do this work? Thanks. Best regards, ?Name: ekeen ?Mail: ekeen at tongji.edu.cn ?Tel: 021-65983987-8007 ?Address: Tongji University ?:Shanghai, 200092 ?:P. R. China -------------- next part -------------- An HTML attachment was scrubbed... URL: From golharam at umdnj.edu Mon Aug 15 13:54:21 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 15 Aug 2005 13:54:21 -0400 Subject: [BiO BB] BLAST: locally installed databases In-Reply-To: <839C6D97DA4B564882F385F1DA46742C037F8416@slely0004.wurnet.nl> Message-ID: <013101c5a1c2$5dc535e0$2f01a8c0@GOLHARMOBILE1> NCBI says to download the databases in whole. They don't do the incremental updates anymore...not sure why... -----Original Message----- From: Bossers, Alex [mailto:Alex.Bossers at wur.nl] Sent: Friday, August 12, 2005 1:24 AM To: golharam at umdnj.edu; The general forum at Bioinformatics.Org Subject: RE: [BiO BB] BLAST: locally installed databases What happened to the incremental updates to blastDb anyway? Alex > -----Oorspronkelijk bericht----- > Van: bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.org > [mailto:bio_bulletin_board-bounces+alex.bossers=wur.nl at bioinformatics.or g] > Namens Ryan Golhar > Verzonden: donderdag 11 augustus 2005 18:10 > Aan: 'Samantha Fox'; 'The general forum at Bioinformatics.Org' > Onderwerp: RE: [BiO BB] BLAST: locally installed databases > > Attached is the file get_nr.pl which will download the nr and nt > database from NCBI to the local machine as root to the directory > /usr/local/ncbi/db. > > It does this by comparing the .tar.gz file timestamp on NCBI with the > one on your local machine. If the one on NCBI is newer, the file is > downloaded. > > The script is pretty simply and should get you going. If you have any > questions, let me know and I'll answer what I can. It was written for > Solaris, so you might need to change paths to some stuff for other > environments as well as download location... > > Good luck, > > Ryan > > -----Original Message----- > From: Samantha Fox [mailto:bioinfosm at gmail.com] > Sent: Thursday, August 11, 2005 11:51 AM > To: golharam at umdnj.edu; The general forum at Bioinformatics.Org > Subject: Re: [BiO BB] BLAST: locally installed databases > > > That will help Ryan. Thanks. > And these are pre-formatted databases? And my guess is they goto the > data folder specified in .ncbirc ? > > Thanks again, > Sumit > > On 8/11/05, Ryan Golhar wrote: > > I have a perl script running as a cron job to download the databases > > nightly if there are new tarballs available.... > > > > I can email you the script... > > > > > > -----Original Message----- > > From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org > > [mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.o > > rg > > ] On Behalf Of Samantha Fox > > Sent: Thursday, August 11, 2005 11:09 AM > > To: bio_bulletin_board at bioinformatics.org > > Subject: [BiO BB] BLAST: locally installed databases > > > > > > Hello, > > I wish to have regularly updated databases from NCBI to use with my > > blastall command. > > > > Is there an easy standard way to do so ? > > > > Thanks, > > > > Samantha > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > _______________________________________________ > > Bioinformatics.Org general forum - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > From mayagao1999 at yahoo.com Tue Aug 16 12:31:02 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Tue, 16 Aug 2005 09:31:02 -0700 (PDT) Subject: [BiO BB] A question about the perl code Message-ID: <20050816163102.73365.qmail@web53506.mail.yahoo.com> Dear All, I made a group A which includes 16 combinations of any two nucleotides like: AA,AC,AG,AT, CA,CC,CG,CT, GA,GC,GG,GT, TA,TC,TG,TT If I randomly got a pair like AC, I want to exclude AC, AT, AG, AA, TC, CC, GC. In other words, I want to exclude the pairs in group A which has the same nucleotide with the pair randomly selected. Can anybody suggest me how to approach this using Perl? Thanks! Alex __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From dmb at mrc-dunn.cam.ac.uk Tue Aug 16 13:29:56 2005 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue, 16 Aug 2005 18:29:56 +0100 (BST) Subject: [BiO BB] A question about the perl code In-Reply-To: <20050816163102.73365.qmail@web53506.mail.yahoo.com> Message-ID: On Tue, 16 Aug 2005, Alex Zhang wrote: >Dear All, > >I made a group A which includes 16 combinations of any >two nucleotides like: AA,AC,AG,AT, >CA,CC,CG,CT, >GA,GC,GG,GT, >TA,TC,TG,TT > >If I randomly got a pair like AC, I want to exclude >AC, AT, AG, AA, TC, CC, GC. In other words, I want to >exclude the pairs in group A which has the same >nucleotide with the pair randomly selected. Can >anybody suggest me how to approach this using Perl? If you have time try reading the chapter on sets in 'Mastering Algorithms in Perl' (the wolf book). That should get you started. If you are in a rush, use a regular expression matched over a list of your pairs (for example with the perl grep function). Use your first random choice to initalize a variable in the regexp pattern. Dan. > >Thanks! > Alex > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From rmp at sanger.ac.uk Wed Aug 17 04:30:23 2005 From: rmp at sanger.ac.uk (Roger Pettett) Date: Wed, 17 Aug 2005 09:30:23 +0100 (BST) Subject: [BiO BB] A question about the perl code In-Reply-To: <20050816163102.73365.qmail@web53506.mail.yahoo.com> References: <20050816163102.73365.qmail@web53506.mail.yahoo.com> Message-ID: On Tue, 16 Aug 2005, Alex Zhang wrote: > Dear All, > > I made a group A which includes 16 combinations of any > two nucleotides like: AA,AC,AG,AT, > CA,CC,CG,CT, > GA,GC,GG,GT, > TA,TC,TG,TT > > If I randomly got a pair like AC, I want to exclude > AC, AT, AG, AA, TC, CC, GC. In other words, I want to > exclude the pairs in group A which has the same > nucleotide with the pair randomly selected. Can > anybody suggest me how to approach this using Perl? How about something like this? my @group = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT); my $rand = $group[rand(scalar @group)]; my @filtered = grep { $_ !~ /[$rand]/ } @group; print join("\n", @filtered); -- Roger Pettett, rmp at sanger.ac.uk Project Leader (Web Systems) The Sanger Institute, WTGC, Cambridge CB10 1SA, UK http://www.sanger.ac.uk/ http://decipher.sanger.ac.uk/ http://www.ensembl.org/ From mayagao1999 at yahoo.com Wed Aug 17 11:39:36 2005 From: mayagao1999 at yahoo.com (Alex Zhang) Date: Wed, 17 Aug 2005 08:39:36 -0700 (PDT) Subject: [BiO BB] A question about the perl code In-Reply-To: Message-ID: <20050817153937.20319.qmail@web53505.mail.yahoo.com> That helps! Thank you! Roger Pettett wrote:On Tue, 16 Aug 2005, Alex Zhang wrote: > Dear All, > > I made a group A which includes 16 combinations of any > two nucleotides like: AA,AC,AG,AT, > CA,CC,CG,CT, > GA,GC,GG,GT, > TA,TC,TG,TT > > If I randomly got a pair like AC, I want to exclude > AC, AT, AG, AA, TC, CC, GC. In other words, I want to > exclude the pairs in group A which has the same > nucleotide with the pair randomly selected. Can > anybody suggest me how to approach this using Perl? How about something like this? my @group = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT); my $rand = $group[rand(scalar @group)]; my @filtered = grep { $_ !~ /[$rand]/ } @group; print join("\n", @filtered); -- Roger Pettett, rmp at sanger.ac.uk Project Leader (Web Systems) The Sanger Institute, WTGC, Cambridge CB10 1SA, UK http://www.sanger.ac.uk/ http://decipher.sanger.ac.uk/ http://www.ensembl.org/ _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From blooms812000 at yahoo.co.in Thu Aug 18 04:44:38 2005 From: blooms812000 at yahoo.co.in (mamtha thomas) Date: Thu, 18 Aug 2005 09:44:38 +0100 (BST) Subject: [BiO BB] doubt!!! Message-ID: <20050818084438.97833.qmail@web8508.mail.in.yahoo.com> Hi all, I would be grateful to know the application of linux in Bioinformatics field. mamtha --------------------------------- Check out Yahoo! India Rakhi Special for Rakhi shopping, contests and lots more. http://in.promos.yahoo.com/rakhi/index.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.gille at charite.de Thu Aug 18 11:07:55 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Thu, 18 Aug 2005 17:07:55 +0200 (CEST) Subject: [BiO BB] doubt!!! In-Reply-To: <20050818084438.97833.qmail@web8508.mail.in.yahoo.com> References: <20050818084438.97833.qmail@web8508.mail.in.yahoo.com> Message-ID: <39779.192.168.220.203.1124377675.squirrel@webmail.charite.de> Most Bioinformatics software is developed with the GNU tools like GNU C++ and GCC If you are working usually with Windows I would not recommend to change entirely to Linux but to install Cygwin. If you have Mac you can install Xcode. Also have a look at http://www.charite.de/bioinf/strap/ which facilitates the installation and usage of many Bioinformatic tools under all OS including Windows. Also check bioknoppix Advantages of Linux: stable, multitasking, compatibility, comfortable software installation, and sufficient performance even under less powerful computers Disadvantage: Difficult installation of special hardware like 3d acceleration From operon at cbiot.ufrgs.br Thu Aug 18 15:05:22 2005 From: operon at cbiot.ufrgs.br (Marcos Oliveira de Carvalho) Date: Thu, 18 Aug 2005 16:05:22 -0300 Subject: [BiO BB] paper Message-ID: Dear all, Looking some papers about phylogenomic methods I found this reference: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15166018&query_hl=48 However, in the paragraph below, I was very confused: "More precisely, GBDP starts with an all-against-all pairwise comparison of genomes using BLASTN (Altschul et al., 1990). In a second step, a distance matrix is calculated from the resulting HSPs **(High Speed Products)**. Here, we studied a number of variants which are described in detail below..." High Speed Products? I wonder they are not talking about "high score segment pairs" (HSSPs also abbreviated as HSPs). cheers Marcos From marty.gollery at gmail.com Thu Aug 18 16:04:18 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Thu, 18 Aug 2005 13:04:18 -0700 Subject: [BiO BB] paper In-Reply-To: References: Message-ID: Yes, they mean 'High Scoring Pairs', not 'High Speed Products'. Marty On 8/18/05, Marcos Oliveira de Carvalho wrote: > > Dear all, > > Looking some papers about phylogenomic methods I found this reference: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15166018&query_hl=48 > > However, in the paragraph below, I was very confused: > > "More precisely, GBDP starts with an all-against-all pairwise > comparison of genomes using BLASTN (Altschul et al., 1990). > In a second step, a distance matrix is calculated from the > resulting HSPs **(High Speed Products)**. > Here, we studied a number of variants which are described in detail > below..." > > High Speed Products? > > I wonder they are not talking about "high score segment pairs" (HSSPs also > abbreviated as HSPs). > > > cheers > > Marcos > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- From ekeen at tongji.edu.cn Thu Aug 18 22:42:47 2005 From: ekeen at tongji.edu.cn (ekeen) Date: Fri, 19 Aug 2005 10:42:47 +0800 Subject: [BiO BB] How to use BLOSUM62 matrix? In-Reply-To: Message-ID: <20050819025038.118C819CC7E@mail.tongji.edu.cn> I want do alignment of sequences with BLOSUM62 matrix. How to get the comparison score? Is there web services to do this work? Is there any example to do this work? Thanks. Best regards, ?Name: ekeen ?Mail: ekeen at tongji.edu.cn ?Tel: 021-65983987-8007 ?Address: Tongji University ?:Shanghai, 200092 ?:P. R. China From p.balaji at gmail.com Fri Aug 19 05:15:21 2005 From: p.balaji at gmail.com (Balaji) Date: Fri, 19 Aug 2005 02:15:21 -0700 Subject: [BiO BB] Re: paper In-Reply-To: References: Message-ID: <881c02220508190215ea8a630@mail.gmail.com> Hey marcos, I am sure they meant high score segment pairs, i have never heard of "high speed products" On 8/18/05, Martin Gollery wrote: > Yes, they mean 'High Scoring Pairs', not 'High Speed Products'. > > Marty > > On 8/18/05, Marcos Oliveira de Carvalho wrote: > > > > Dear all, > > > > Looking some papers about phylogenomic methods I found this reference: > > > > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15166018&query_hl=48 > > > > However, in the paragraph below, I was very confused: > > > > "More precisely, GBDP starts with an all-against-all pairwise > > comparison of genomes using BLASTN (Altschul et al., 1990). > > In a second step, a distance matrix is calculated from the > > resulting HSPs **(High Speed Products)**. > > Here, we studied a number of variants which are described in detail > > below..." > > > > High Speed Products? > > > > I wonder they are not talking about "high score segment pairs" (HSSPs > also > > abbreviated as HSPs). > > > > > > cheers > > > > Marcos > > > > _______________________________________________ > > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > -- > -- > Martin Gollery > Associate Director > Center For Bioinformatics > University of Nevada at Reno > Dept. of Biochemistry / MS330 > 775-784-7042 > ----------- > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From marty.gollery at gmail.com Fri Aug 19 12:19:47 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Fri, 19 Aug 2005 09:19:47 -0700 Subject: [BiO BB] How to use BLOSUM62 matrix? In-Reply-To: <20050819025038.118C819CC7E@mail.tongji.edu.cn> References: <20050819025038.118C819CC7E@mail.tongji.edu.cn> Message-ID: There are several ways to do this, the most popular is the BLAST web server at http://www.ncbi.nlm.nih.gov/BLAST/ Marty On 8/18/05, ekeen wrote: > I want do alignment of sequences with BLOSUM62 matrix. How to get the > comparison score? > > Is there web services to do this work? Is there any example to do this work? > > Thanks. > > > > Best regards, > > ?Name: ekeen > ?Mail: ekeen at tongji.edu.cn > ?Tel: 021-65983987-8007 > ?Address: Tongji University > ?:Shanghai, 200092 > ?:P. R. China > > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- From tdhoufek at unity.ncsu.edu Fri Aug 19 21:49:41 2005 From: tdhoufek at unity.ncsu.edu (T.D. Houfek) Date: Fri, 19 Aug 2005 21:49:41 -0400 Subject: [BiO BB] A question about the perl code In-Reply-To: References: <20050816163102.73365.qmail@web53506.mail.yahoo.com> Message-ID: <43068C35.8050103@unity.ncsu.edu> Hi all, The formulation of this problem was slightly ambiguous, but from Alex's list of excluded pairs I guessed he only wanted to exclude pairs which had an identical nucleotide in the same position as the sample. Roger's solution seems appropriate if instead Alex wishes to exclude pairs with any nucleotides in common, irregardless of position. (Or, and I have had this happen, some crucial punctuation marks were stripped when cutting-and-pasting code into my mail client's window -- villainous HTML emails ...) Double-check me. It's Friday night, so this little program -- though based on Roger's -- is the product of collaboration between me, Gin, and Tonic. Which I suppose should be written ' collaboration among Gin, Tonic, and I', but it just doesn't sound the same, does it? Cheers, T.D. Houfek senior bioinformatics developer Plant Nematode Genetics Group, North Carolina State University ============== sample code follows ============================ #!/usr/bin/perl use strict; my @group = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT); my $rand = $group[rand(scalar @group)]; ## Uncomment to force the 'AC' sample example: ## $rand = 'AC'; print "\n\nRandom Number: $rand\n"; print "\n\nMethod #1: remove all pairs with at least one nucleotide in the same position as in the sample:\n"; my @unfiltered = grep { $_ !~ /^[A.]|[.C]$/ } @group; my @filtered = grep { $_ =~ /^[A.]|[.C]$/ } @group; print "Removed:\n"; print join("\t", @filtered); print "\n"; print "Remaining:\n"; print join("\t", @unfiltered); print "\n"; my @unfiltered2 = grep { $_ !~ /[$rand]/ } @group; my @filtered2 = grep { $_ =~ /[$rand]/ } @group; print "\nMethod #2: remove all pairs sharing at least one nucleotide with sample:\n"; print "Removed:\n"; print join("\t", @filtered2); print "\n"; print "Remaining:\n"; print join("\t", @unfiltered2); print "\n"; Roger Pettett wrote: > On Tue, 16 Aug 2005, Alex Zhang wrote: > >> Dear All, >> >> I made a group A which includes 16 combinations of any >> two nucleotides like: AA,AC,AG,AT, >> CA,CC,CG,CT, >> GA,GC,GG,GT, >> TA,TC,TG,TT >> >> If I randomly got a pair like AC, I want to exclude >> AC, AT, AG, AA, TC, CC, GC. In other words, I want to >> exclude the pairs in group A which has the same >> nucleotide with the pair randomly selected. Can >> anybody suggest me how to approach this using Perl? > > > How about something like this? > > my @group = qw(AA AC AG AT > CA CC CG CT > GA GC GG GT > TA TC TG TT); > my $rand = $group[rand(scalar @group)]; > my @filtered = grep { $_ !~ /[$rand]/ } @group; > > print join("\n", @filtered); > > From ekeen at tongji.edu.cn Mon Aug 29 03:21:30 2005 From: ekeen at tongji.edu.cn (ekeen) Date: Mon, 29 Aug 2005 15:21:30 +0800 Subject: [BiO BB] simple question about kinase classfication. Message-ID: <20050829072655.02FB819CCFF@mail.tongji.edu.cn> Dear prof, I am beginner. I have follow questions: 1, what's the relation about the kinase groups and the kinase family? 2, How can I know a kinase groups contains what kinase family? For example: In AGC group, there are GRK, PKA, PKB, PKC, PKG, RSK, SGK, et al. How can know this? Thanks Best Regards, ekeen -------------- next part -------------- An HTML attachment was scrubbed... URL: From MEC at Stowers-Institute.org Mon Aug 29 12:59:07 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Mon, 29 Aug 2005 11:59:07 -0500 Subject: [BiO BB] extracting data from NCBI taxonomy Message-ID: <20050829165911.E99DC207DC0@www.bioinformatics.org> You will need to script eutils http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html For starters: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&ter m=txid7711[Organism:exp]&rettype=count tells you there are 902185 such proteints You can then get any subrange of those 902185 of them with URLS. For instance, you can get the second 50 such gi numbers (in wordy xml) in your browser using http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&ter m=txid7711[Organism:exp]&retstart=51&retmax=50&rettype=uilist however, "The maximum number of retrieved records is 10,000." (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esummary_help.html) So you need a loop, so you need a script (which you'll need to parse the wordy xml anyway), which means you will need to study eutils a little more and learn how to use the web environment feature. I don't think there is another way, but I hope I can be proved wrong. Cheers, Malcolm Cook -----Original Message----- From: bio_bulletin_board-bounces+mec=stowers-institute.org at bioinformatics.org [mailto:bio_bulletin_board-bounces+mec=stowers-institute.org at bioinformat ics.org] On Behalf Of Lipika Ray Sent: Friday, August 26, 2005 1:11 PM To: marty.gollery at gmail.com Cc: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] extracting data from NCBI taxonomy Hi, I don't want to pick it up through web browser. I want it through perl program. For that I need a definite URL which will point to the text version of gi-list in protein database for a txid7711[Organism:exp], say. Lipika Ray Postdoctoral Fellow SUNY, Albany >Hi, >Go to NCBI and search for txid9606[Organism:exp] > >Then click the pulldown next to 'display' and pick 'GI list'. Pick >'Send to: File' from the pulldown, and you can name the file whatever >you want. >Marty On 8/26/05, Dan Bolser wrote: > Lipika Ray wrote: > > Yes, I know that grepping from file 'gi_taxid_prot.dmp.gz' is possible for > > 9606. But if you want to know the gi list of taxid 7711, say, then you > > can't get any gi list associated with that taxid in that flat file. > > That's why I am searching for the link to definite URL where from I can > > get the information directly. > > That sounds strange. Perhaps they forgot to update the file? > > > > Lipika Ray > > Postdoctoral Fellow > > SUNY, Albany > > > > > > > >>Message: 8 > >>Date: Fri, 26 Aug 2005 15:51:55 +0100 > >>From: Dan Bolser > >>Subject: Re: [BiO BB] extracting data from NCBI taxonomy > >>To: "The general forum at Bioinformatics.Org" > >> > >>Message-ID: <430F2C8B.7030605 at mrc-dunn.cam.ac.uk> > >>Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >> > >>Lipika Ray wrote: > >> > >>>Hi, > >>> > >>>I want to extract the full gi list of protein database in text format of > >>>Homo sapiens with Taxid 9606. I want to do it through perl programming. > >>>So > >>>I need the definite URL which will extract this information and store > >>>into > >>>a file. > >>>But I am seeing that there is some parameter, say query_key which is > >>>related to the history page. I don't understand how to set this > >>>parameter. > >>>For example, > >>>The link with a taxonomy id has a definite URL like: > >>> > >>>http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7215 > >>> > >>>I don't understand what should be the definite URL by which I can > >>>extract > >>>the required information. > >>>Please help me in this regard. > >>>Thanking you, > >> > >>You could always just use this file... > >> > >>ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_prot.dmp.gz > >> > >>and grep for 9606 > > _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From lray at albany.edu Mon Aug 29 15:10:10 2005 From: lray at albany.edu (Lipika Ray) Date: Mon, 29 Aug 2005 15:10:10 -0400 (EDT) Subject: [BiO BB] extracting data from NCBI taxonomy Message-ID: <2073.169.226.137.221.1125342610.squirrel@169.226.137.221> Hi Malcolm, Thanks a lot! Now I get my required results. Thanks again. Lipika Ray Postdoctoral Fellow SUNY, UAlbany > >You will need to script eutils >http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html > >For starters: > >http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&ter >m=txid7711[Organism:exp]&rettype=count > >tells you there are 902185 such proteints > >You can then get any subrange of those 902185 of them with URLS. For >instance, you can get the second 50 such gi numbers (in wordy xml) in >your browser using > >http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&ter >m=txid7711[Organism:exp]&retstart=51&retmax=50&rettype=uilist > >however, "The maximum number of retrieved records is 10,000." >(http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esummary_help.html) > >So you need a loop, so you need a script (which you'll need to parse the >wordy xml anyway), which means you will need to study eutils a little >more and learn how to use the web environment feature. >I don't think there is another way, but I hope I can be proved wrong. > >Cheers, > >Malcolm Cook From c.klaassen at cwz.nl Wed Aug 31 03:04:51 2005 From: c.klaassen at cwz.nl (=?ISO-8859-1?Q?Corn=E9_HW_Klaassen?=) Date: Wed, 31 Aug 2005 09:04:51 +0200 Subject: [BiO BB] aligning character data other than DNA or protein Message-ID: <43155693.3080706@cwz.nl> Hi all, Does anybody know of a convenient tool to align stretches of character data other than DNA or protein (i.e. numbers: 01-02-03-05 and 01-02-04-05) that is also able to deal with insertions and deletions etc.? Thanks, Corn? -- ----------------------------------------------------------------------------------------- dr. Corn? H.W. Klaassen, Molecular Biologist Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. tel: #31-24-3658670 (office) tel: #31-24-3658677 (direct) fax: #31-24-3658671 E-Mail: c.klaassen at cwz.nl From iain.m.wallace at gmail.com Wed Aug 31 04:24:25 2005 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Wed, 31 Aug 2005 09:24:25 +0100 Subject: [BiO BB] aligning character data other than DNA or protein In-Reply-To: <43155693.3080706@cwz.nl> References: <43155693.3080706@cwz.nl> Message-ID: <8cff3eb8050831012457cdda37@mail.gmail.com> Hi, I think ClustalW can do that, as long as it is in a sequence format. Iain On 8/31/05, Corn? HW Klaassen wrote: > Hi all, > > Does anybody know of a convenient tool to align stretches of character > data other than DNA or protein (i.e. numbers: 01-02-03-05 and > 01-02-04-05) that is also able to deal with insertions and deletions etc.? > > Thanks, > > Corn? > > -- > ----------------------------------------------------------------------------------------- > dr. Corn? H.W. Klaassen, Molecular Biologist > Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 > Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. > > tel: #31-24-3658670 (office) > tel: #31-24-3658677 (direct) > fax: #31-24-3658671 > E-Mail: c.klaassen at cwz.nl > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From c.klaassen at cwz.nl Wed Aug 31 05:35:34 2005 From: c.klaassen at cwz.nl (=?ISO-8859-1?Q?Corn=E9_HW_Klaassen?=) Date: Wed, 31 Aug 2005 11:35:34 +0200 Subject: [BiO BB] aligning character data other than DNA or protein In-Reply-To: <8cff3eb8050831012457cdda37@mail.gmail.com> References: <43155693.3080706@cwz.nl> <8cff3eb8050831012457cdda37@mail.gmail.com> Message-ID: <431579E6.9070201@cwz.nl> Iain Clustal only accepts 24 letters. Numbers and other characters are disregarded. So what if I have more than 24 different characters? Corn? ----------------------------------------------------------------------------------------- dr. Corn? H.W. Klaassen, Molecular Biologist Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. tel: #31-24-3658670 (office) tel: #31-24-3658677 (direct) fax: #31-24-3658671 E-Mail: c.klaassen at cwz.nl Iain Wallace wrote: >Hi, > >I think ClustalW can do that, as long as it is in a sequence format. > >Iain > >On 8/31/05, Corn? HW Klaassen wrote: > > >>Hi all, >> >>Does anybody know of a convenient tool to align stretches of character >>data other than DNA or protein (i.e. numbers: 01-02-03-05 and >>01-02-04-05) that is also able to deal with insertions and deletions etc.? >> >>Thanks, >> >>Corn? >> >>-- >>----------------------------------------------------------------------------------------- >>dr. Corn? H.W. Klaassen, Molecular Biologist >>Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 >>Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. >> >>tel: #31-24-3658670 (office) >>tel: #31-24-3658677 (direct) >>fax: #31-24-3658671 >>E-Mail: c.klaassen at cwz.nl >> >>_______________________________________________ >>Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> >_______________________________________________ >Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > From gabraham at cs.rmit.edu.au Wed Aug 31 05:45:20 2005 From: gabraham at cs.rmit.edu.au (Gad Abraham) Date: Wed, 31 Aug 2005 19:45:20 +1000 Subject: [BiO BB] aligning character data other than DNA or protein In-Reply-To: <431579E6.9070201@cwz.nl> References: <43155693.3080706@cwz.nl> <8cff3eb8050831012457cdda37@mail.gmail.com> <431579E6.9070201@cwz.nl> Message-ID: <20050831094520.GA12159@cs.rmit.edu.au> > >On 8/31/05, Corn? HW Klaassen wrote: > > > > > >>Hi all, > >> > >>Does anybody know of a convenient tool to align stretches of character > >>data other than DNA or protein (i.e. numbers: 01-02-03-05 and > >>01-02-04-05) that is also able to deal with insertions and deletions etc.? Sorry for breaking the chain, I lost the previous email. What is your distance measure between the characters? +1 for match and -1 for mismatch? If you know how to program, you can do this in Perl or even C very easily (or adapt a script so it accepts arbitrary characters). Cheers, Gad -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gabraham at cs.rmit.edu.au http://yallara.cs.rmit.edu.au/~gabraham +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From daniel.lang at biologie.uni-freiburg.de Wed Aug 31 07:02:58 2005 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed, 31 Aug 2005 13:02:58 +0200 Subject: [BiO BB] aligning character data other than DNA or protein In-Reply-To: <431579E6.9070201@cwz.nl> References: <43155693.3080706@cwz.nl> <8cff3eb8050831012457cdda37@mail.gmail.com> <431579E6.9070201@cwz.nl> Message-ID: <43158E62.1040504@biologie.uni-freiburg.de> Hi, Are you sure, wether it is really Clustal that restricts to 24 chars or simply the underlying substitutiion matrix. If this is the case, this could be accomplished by writing your own matrix. Or you try other tools in combination with adapted substitution matrix... Daniel Corn? HW Klaassen wrote: > Iain > > Clustal only accepts 24 letters. Numbers and other characters are > disregarded. So what if I have more than 24 different characters? > > Corn? > > ----------------------------------------------------------------------------------------- > > dr. Corn? H.W. Klaassen, Molecular Biologist > Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 > Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. > > tel: #31-24-3658670 (office) > tel: #31-24-3658677 (direct) > fax: #31-24-3658671 > E-Mail: c.klaassen at cwz.nl > > > > Iain Wallace wrote: > >> Hi, >> >> I think ClustalW can do that, as long as it is in a sequence format. >> >> Iain >> On 8/31/05, Corn? HW Klaassen wrote: >> >> >>> Hi all, >>> >>> Does anybody know of a convenient tool to align stretches of character >>> data other than DNA or protein (i.e. numbers: 01-02-03-05 and >>> 01-02-04-05) that is also able to deal with insertions and deletions >>> etc.? >>> >>> Thanks, >>> >>> Corn? >>> >>> -- >>> ----------------------------------------------------------------------------------------- >>> >>> dr. Corn? H.W. Klaassen, Molecular Biologist >>> Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 >>> Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. >>> >>> tel: #31-24-3658670 (office) >>> tel: #31-24-3658677 (direct) >>> fax: #31-24-3658671 >>> E-Mail: c.klaassen at cwz.nl >>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Daniel Lang University of Freiburg, Plant Biotechnology Schaenzlestr. 1, D-79104 Freiburg fax: +49 761 203 6945 phone: +49 761 203 6974 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang at biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# From cupton at uvic.ca Wed Aug 31 12:16:39 2005 From: cupton at uvic.ca (Chris Upton) Date: Wed, 31 Aug 2005 09:16:39 -0700 Subject: [BiO BB] Re: BiO_Bulletin_Board Digest, Vol 10, Issue 20 In-Reply-To: <20050831160431.C2DE9103886@primary.bioinformatics.org> References: <20050831160431.C2DE9103886@primary.bioinformatics.org> Message-ID: Hi, We could probably modify our Base-By-Base program to do that if you want. See www.virology.ca under workbench. Do you just need an editor or something to do the alignment? Chris Chris Upton Ph.D. Associate Professor Biochemistry and Microbiology Tel. 250-721-6507 University of Victoria Fax 250-721-8855 P.O. Box 3055 STN CSC Victoria, BC V8W 3P6 Canada web.uvic.ca/~cupton www.virology.ca www.sarsresearch.ca On Aug 31, 2005, at 9:04 AM, bio_bulletin_board- request at bioinformatics.org wrote: > Send BiO_Bulletin_Board mailing list submissions to > bio_bulletin_board at bioinformatics.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > or, via email, send a message with subject or body 'help' to > bio_bulletin_board-request at bioinformatics.org > > You can reach the person managing the list at > bio_bulletin_board-owner at bioinformatics.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of BiO_Bulletin_Board digest..." > > > Today's Topics: > > 1. aligning character data other than DNA or protein > (Corn? HW Klaassen) > 2. Re: aligning character data other than DNA or protein > (Iain Wallace) > 3. Re: aligning character data other than DNA or protein > (Corn? HW Klaassen) > 4. Re: aligning character data other than DNA or protein > (Gad Abraham) > 5. Re: aligning character data other than DNA or protein > (Daniel Lang) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 31 Aug 2005 09:04:51 +0200 > From: Corn? HW Klaassen > Subject: [BiO BB] aligning character data other than DNA or protein > To: "The general forum at Bioinformatics.Org" > > Message-ID: <43155693.3080706 at cwz.nl> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi all, > > Does anybody know of a convenient tool to align stretches of character > data other than DNA or protein (i.e. numbers: 01-02-03-05 and > 01-02-04-05) that is also able to deal with insertions and > deletions etc.? > > Thanks, > > Corn? > > -- > ---------------------------------------------------------------------- > ------------------- > dr. Corn? H.W. Klaassen, Molecular Biologist > Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 > Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. > > tel: #31-24-3658670 (office) > tel: #31-24-3658677 (direct) > fax: #31-24-3658671 > E-Mail: c.klaassen at cwz.nl > > > > ------------------------------ > > Message: 2 > Date: Wed, 31 Aug 2005 09:24:25 +0100 > From: Iain Wallace > Subject: Re: [BiO BB] aligning character data other than DNA or > protein > To: "The general forum at Bioinformatics.Org" > > Message-ID: <8cff3eb8050831012457cdda37 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > I think ClustalW can do that, as long as it is in a sequence format. > > Iain > > On 8/31/05, Corn? HW Klaassen wrote: > >> Hi all, >> >> Does anybody know of a convenient tool to align stretches of >> character >> data other than DNA or protein (i.e. numbers: 01-02-03-05 and >> 01-02-04-05) that is also able to deal with insertions and >> deletions etc.? >> >> Thanks, >> >> Corn? >> >> -- >> --------------------------------------------------------------------- >> -------------------- >> dr. Corn? H.W. Klaassen, Molecular Biologist >> Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 >> Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. >> >> tel: #31-24-3658670 (office) >> tel: #31-24-3658677 (direct) >> fax: #31-24-3658671 >> E-Mail: c.klaassen at cwz.nl >> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> > > > ------------------------------ > > Message: 3 > Date: Wed, 31 Aug 2005 11:35:34 +0200 > From: Corn? HW Klaassen > Subject: Re: [BiO BB] aligning character data other than DNA or > protein > To: "The general forum at Bioinformatics.Org" > > Message-ID: <431579E6.9070201 at cwz.nl> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Iain > > Clustal only accepts 24 letters. Numbers and other characters are > disregarded. So what if I have more than 24 different characters? > > Corn? > > ---------------------------------------------------------------------- > ------------------- > dr. Corn? H.W. Klaassen, Molecular Biologist > Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 > Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. > > tel: #31-24-3658670 (office) > tel: #31-24-3658677 (direct) > fax: #31-24-3658671 > E-Mail: c.klaassen at cwz.nl > > > > Iain Wallace wrote: > > >> Hi, >> >> I think ClustalW can do that, as long as it is in a sequence format. >> >> Iain >> >> On 8/31/05, Corn? HW Klaassen wrote: >> >> >> >>> Hi all, >>> >>> Does anybody know of a convenient tool to align stretches of >>> character >>> data other than DNA or protein (i.e. numbers: 01-02-03-05 and >>> 01-02-04-05) that is also able to deal with insertions and >>> deletions etc.? >>> >>> Thanks, >>> >>> Corn? >>> >>> -- >>> -------------------------------------------------------------------- >>> --------------------- >>> dr. Corn? H.W. Klaassen, Molecular Biologist >>> Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 >>> Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. >>> >>> tel: #31-24-3658670 (office) >>> tel: #31-24-3658677 (direct) >>> fax: #31-24-3658671 >>> E-Mail: c.klaassen at cwz.nl >>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >>> >>> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> >> >> > > > ------------------------------ > > Message: 4 > Date: Wed, 31 Aug 2005 19:45:20 +1000 > From: Gad Abraham > Subject: Re: [BiO BB] aligning character data other than DNA or > protein > To: "The general forum at Bioinformatics.Org" > > Message-ID: <20050831094520.GA12159 at cs.rmit.edu.au> > Content-Type: text/plain; charset=unknown-8bit > > >>> On 8/31/05, Corn? HW Klaassen wrote: >>> >>> >>> >>>> Hi all, >>>> >>>> Does anybody know of a convenient tool to align stretches of >>>> character >>>> data other than DNA or protein (i.e. numbers: 01-02-03-05 and >>>> 01-02-04-05) that is also able to deal with insertions and >>>> deletions etc.? >>>> > > Sorry for breaking the chain, I lost the previous email. > > What is your distance measure between the characters? +1 for match and > -1 for mismatch? > > If you know how to program, you can do this in Perl or even C very > easily (or adapt a script so it accepts arbitrary characters). > > Cheers, > Gad > > -- > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gabraham at cs.rmit.edu.au > http://yallara.cs.rmit.edu.au/~gabraham > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > ------------------------------ > > Message: 5 > Date: Wed, 31 Aug 2005 13:02:58 +0200 > From: Daniel Lang > Subject: Re: [BiO BB] aligning character data other than DNA or > protein > To: "The general forum at Bioinformatics.Org" > > Message-ID: <43158E62.1040504 at biologie.uni-freiburg.de> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > Are you sure, wether it is really Clustal that restricts to 24 > chars or > simply the underlying substitutiion matrix. If this is the case, this > could be accomplished by writing your own matrix. > Or you try other tools in combination with adapted substitution > matrix... > Daniel > > Corn? HW Klaassen wrote: > >> Iain >> >> Clustal only accepts 24 letters. Numbers and other characters are >> disregarded. So what if I have more than 24 different characters? >> >> Corn? >> >> --------------------------------------------------------------------- >> -------------------- >> >> dr. Corn? H.W. Klaassen, Molecular Biologist >> Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 >> Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. >> >> tel: #31-24-3658670 (office) >> tel: #31-24-3658677 (direct) >> fax: #31-24-3658671 >> E-Mail: c.klaassen at cwz.nl >> >> >> >> Iain Wallace wrote: >> >> >>> Hi, >>> >>> I think ClustalW can do that, as long as it is in a sequence format. >>> >>> Iain >>> On 8/31/05, Corn? HW Klaassen wrote: >>> >>> >>> >>>> Hi all, >>>> >>>> Does anybody know of a convenient tool to align stretches of >>>> character >>>> data other than DNA or protein (i.e. numbers: 01-02-03-05 and >>>> 01-02-04-05) that is also able to deal with insertions and >>>> deletions >>>> etc.? >>>> >>>> Thanks, >>>> >>>> Corn? >>>> >>>> -- >>>> ------------------------------------------------------------------- >>>> ---------------------- >>>> >>>> dr. Corn? H.W. Klaassen, Molecular Biologist >>>> Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 >>>> Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. >>>> >>>> tel: #31-24-3658670 (office) >>>> tel: #31-24-3658677 (direct) >>>> fax: #31-24-3658671 >>>> E-Mail: c.klaassen at cwz.nl >>>> >>>> _______________________________________________ >>>> Bioinformatics.Org general forum - >>>> BiO_Bulletin_Board at bioinformatics.org >>>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioinformatics.Org general forum - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >>> >>> >>> >>> >> _______________________________________________ >> Bioinformatics.Org general forum - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > -- > > Daniel Lang > University of Freiburg, Plant Biotechnology > Schaenzlestr. 1, D-79104 Freiburg > fax: +49 761 203 6945 > phone: +49 761 203 6974 > homepage: http://www.plant-biotech.net/ > e-mail: daniel.lang at biologie.uni-freiburg.de > > ################################################# > My software never has bugs. > It just develops random features. > ################################################# > > > > > > ------------------------------ > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > End of BiO_Bulletin_Board Digest, Vol 10, Issue 20 > ************************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felipe.albrecht at gmail.com Wed Aug 31 19:47:54 2005 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Wed, 31 Aug 2005 20:47:54 -0300 Subject: [BiO BB] aligning character data other than DNA or protein In-Reply-To: <43158E62.1040504@biologie.uni-freiburg.de> References: <43155693.3080706@cwz.nl> <8cff3eb8050831012457cdda37@mail.gmail.com> <431579E6.9070201@cwz.nl> <43158E62.1040504@biologie.uni-freiburg.de> Message-ID: Why you dont use a simple dynamic programming algorithm? 2005/8/31, Daniel Lang : > Hi, > Are you sure, wether it is really Clustal that restricts to 24 chars or > simply the underlying substitutiion matrix. If this is the case, this > could be accomplished by writing your own matrix. > Or you try other tools in combination with adapted substitution matrix... > Daniel > > Corn? HW Klaassen wrote: > > Iain > > > > Clustal only accepts 24 letters. Numbers and other characters are > > disregarded. So what if I have more than 24 different characters? > > > > Corn? > > > > ----------------------------------------------------------------------------------------- > > > > dr. Corn? H.W. Klaassen, Molecular Biologist > > Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 > > Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. > > > > tel: #31-24-3658670 (office) > > tel: #31-24-3658677 (direct) > > fax: #31-24-3658671 > > E-Mail: c.klaassen at cwz.nl > > > > > > > > Iain Wallace wrote: > > > >> Hi, > >> > >> I think ClustalW can do that, as long as it is in a sequence format. > >> > >> Iain > >> On 8/31/05, Corn? HW Klaassen wrote: > >> > >> > >>> Hi all, > >>> > >>> Does anybody know of a convenient tool to align stretches of character > >>> data other than DNA or protein (i.e. numbers: 01-02-03-05 and > >>> 01-02-04-05) that is also able to deal with insertions and deletions > >>> etc.? > >>> > >>> Thanks, > >>> > >>> Corn? > >>> > >>> -- > >>> ----------------------------------------------------------------------------------------- > >>> > >>> dr. Corn? H.W. Klaassen, Molecular Biologist > >>> Canisius Wilhelmina Hospital, Molecular Biology Unit, C60-C70 > >>> Weg door Jonkerbos 100, 6532 SZ Nijmegen, The Netherlands. > >>> > >>> tel: #31-24-3658670 (office) > >>> tel: #31-24-3658677 (direct) > >>> fax: #31-24-3658671 > >>> E-Mail: c.klaassen at cwz.nl > >>> > >>> _______________________________________________ > >>> Bioinformatics.Org general forum - > >>> BiO_Bulletin_Board at bioinformatics.org > >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > >>> > >>> > >> > >> _______________________________________________ > >> Bioinformatics.Org general forum - > >> BiO_Bulletin_Board at bioinformatics.org > >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > >> > >> > >> > > _______________________________________________ > > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- > > Daniel Lang > University of Freiburg, Plant Biotechnology > Schaenzlestr. 1, D-79104 Freiburg > fax: +49 761 203 6945 > phone: +49 761 203 6974 > homepage: http://www.plant-biotech.net/ > e-mail: daniel.lang at biologie.uni-freiburg.de > > ################################################# > My software never has bugs. > It just develops random features. > ################################################# > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >