<br>Thanks to all for their replies.<br>
I just wanted to share, that I finally got to use align0 from the fasta
package .. for my purpose of finding distance (or similarity) between
small dna sequences.<br>
align0 does not penalize end gaps.<br>
<br>
Samantha<br>
<br><div><span class="gmail_quote">On 8/27/05, <b class="gmail_sendername">Eric L. Cabot</b> <<a href="mailto:ecabot@yahoo.com">ecabot@yahoo.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Samantha,<br><br>I'm assuming that you aren't going to use mimatches determined this way for<br>a phylogenetic analysis! If so, there are "better" ways to go about geting<br>distance measures.<br><br>That being said, if you are willing to look at your sequences to determine
<br>where the endgaps are, and you are buying into using EMBOSS, then you could<br>first filter your sequences with the EMBOSS program SeqRet, supplying the<br>start and stop positions of interest.<br><br> In your case, you the region of interest is spans positions 2 through 6.
<br>Here's a set of sample command lines:<br><br>c:> seqret needle.msf -sbegin=2 -send=6 -osformat=msf -outseq=trunc.msf<br><br>c:\Stuff>type trunc.msf<br>!!NA_MULTIPLE_ALIGNMENT 1.0<br><br> trunc.msf MSF: 5 Type: N 27/08/05 CompCheck: 2332 ..
<br><br> Name: one Len: 5 Check: 1166 Weight: 1.00<br> Name: two Len: 5 Check: 1166 Weight: 1.00<br><br>//<br><br> 1 5<br>one cagtt<br>two cagtt<br><br><br>c:> c:\Stuff>infoalign
trunc.msf -weight=n -change=n -description=n -auto<br>Warning: Sequence character string not found in ajSeqCvtKS<br>#
USA
Name SeqLen AlignLen Gaps GapLen<br>Ident Similar Differ<br>msf::trunc.msf:one one
5 5
0
0
5 0<br> 0<br>msf::trunc.msf:two two
5 5
0
0
5 0<br> 0<br><br><br><br>Of course, years back, when I was working in Technical Support at GCG, I<br>would have provided a GCG-centric solution, involving the program Reformat.<br><br><br>If you don't want to read the sequences, then I could probably whip-up a
<br>Perl script for a specific format of sequences to detect and/or remove<br>endgaps. But hopefully, SeqRet/InfoAlign will do.<br><br>Eric L. Cabot<br>Genome Center<br>University of Wisconsin<br><br>--- Samantha Fox <
<a href="mailto:bioinfosm@gmail.com">bioinfosm@gmail.com</a>> wrote:<br><br>> Thanks for your response. Heres some conversation and discussion we<br>> had, but still looking for a solution.<br>><br>> ---------- Forwarded message ----------
<br>> From: Samantha Fox <<a href="mailto:bioinfosm@gmail.com">bioinfosm@gmail.com</a>><br>> Date: Aug 26, 2005 2:34 PM<br>> Subject: Re: [BiO BB] No. of mismatches in dna sequence alignment<br>> To: <a href="mailto:pfern@igc.gulbenkian.pt">
pfern@igc.gulbenkian.pt</a>, "The general forum at Bioinformatics.Org"<br>> <<a href="mailto:bio_bulletin_board@bioinformatics.org">bio_bulletin_board@bioinformatics.org</a>><br>><br>><br>> :) Thats what, it gives 6-5 = 1. But for tcagtt and cagttt pair I want
<br>> a value 0, as their alignment gives no mismatch, just end gaps.<br>><br>> tcagtt-<br>> -cagttt<br>> This alignment also gives one difference. This is not the same as<br>> 0-mismatch that I expect !
<br>><br>> Is there something that gives edit distance between dna sequences ?<br>><br>>
#
USA
Name SeqLen AlignLen Gaps GapLen<br>> Ident
Similar Differ %
Change Weight Description<br>>
msf::wf.needle:tcagtt
tcagtt 6
6
0
0 5<br>> 0
1
16.666666 1.000000<br>>
msf::wf.needle:cagttt
cagttt 6
6
0
0 5<br>> 0
1
16.666666 1.000000<br>><br>> Hope I clarified what I desire.<br>><br>> Basically the motivation is, I wish to use pair-wise distances to make<br>> groups of these small dna sequences. So cagttt should be in the same
<br>> group as tcagtt, as its just a sort of extension.<br>><br>> Thanks.<br>><br>> On 8/26/05, Pedro Fernandes <<a href="mailto:pfern@igc.gulbenkian.pt">pfern@igc.gulbenkian.pt</a>> wrote:<br>> > Dear Samantha
<br>> ><br>> > If you subtract: Ident from AlignLen you get your result. Am I<br>> mistaken?<br>> Samantha<br>><br>> from your last example<br>><br>> ==================<br>><br>> >one
<br>> tcagtt<br>> >two<br>> gcagtt<br>><br>> ==================<br>><br>> Run EMBOSS NEEDLE and get<br>><br>><br>> ==================<br>><br>><br>> !!NA_MULTIPLE_ALIGNMENT 1.0<br>
><br>> outfile MSF: 6 Type: N 27/08/05 CompCheck: 3229 ..<br>><br>> Name: one Len: 6 Check: 1621 Weight: 1.00<br>> Name: two Len: 6 Check: 1608 Weight: 1.00<br>><br>> //<br>><br>
> 1 6<br>> one tcagtt<br>> two gcagtt<br>><br>> ==================<br>><br>> Then run EMBOSS INFOALIGN with this output and get<br>><br>> ==================<br>><br>
><br>> Name AlignLen Ident
Differ<br>> one
6 5 1<br>> two
6 5 1<br>><br>><br>> ==================<br>><br>> Use the Differ column dirctly or else just subtract:AlignLen-Ident<br>><br>> Is this what you need?<br>><br>> ><br>> > Hope this helps
<br>> > Pedro<br>> ><br>> ><br>> > Samantha Fox said:<br>> > > Pedro, thanks for taking time to run for my example. What field do<br>> you<br>> > > look at for the results ?<br>
> > ><br>> > ><br>> > > On 8/26/05, Pedro Fernandes <<a href="mailto:pfern@igc.gulbenkian.pt">pfern@igc.gulbenkian.pt</a>> wrote:<br>> > >> Hi<br>> > >><br>> > >> I tried INFOALIGN on your ALIGNED sequences and it does work!
<br>> > >> Maybe you are using the initial sequences not the aligned ones.<br>> > >><br>> > >><br>> > >> Pedro<br>> > >><br>> > >><br>> ><br>> >
<br>> > _______________________________________________<br>> > Bioinformatics.Org general forum -<br>> <a href="mailto:BiO_Bulletin_Board@bioinformatics.org">BiO_Bulletin_Board@bioinformatics.org</a><br>> >
<a href="https://bioinformatics.org/mailman/listinfo/bio_bulletin_board">https://bioinformatics.org/mailman/listinfo/bio_bulletin_board</a><br>> ><br>><br><br><br><br><br>____________________________________________________
<br>Start your day with Yahoo! - make it your home page<br><a href="http://www.yahoo.com/r/hs">http://www.yahoo.com/r/hs</a><br><br></blockquote></div><br>