<br>Thanks to all for their replies.<br>

I just wanted to share, that I finally got to use align0 from the fasta

package .. for my purpose of finding distance (or similarity) between

small dna sequences.<br>

align0 does not penalize end gaps.<br>

<br>

Samantha<br>

<br><div><span class="gmail_quote">On 8/27/05, <b class="gmail_sendername">Eric L. Cabot</b> <<a href="mailto:ecabot@yahoo.com">ecabot@yahoo.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Samantha,<br><br>I'm assuming that you aren't going to use mimatches determined this way for<br>a phylogenetic analysis! If so, there are "better" ways to go about geting<br>distance measures.<br><br>That being said, if you are willing to look at your sequences to determine

where the endgaps are, and you are buying into using EMBOSS, then you could first filter your sequences with the EMBOSS program SeqRet, supplying the start and stop positions of interest.    In your case, you the region of interest is spans positions 2 through 6.

Here's a set of sample command lines: c:> seqret needle.msf -sbegin=2 -send=6 -osformat=msf -outseq=trunc.msf c:\Stuff>type trunc.msf !!NA_MULTIPLE_ALIGNMENT 1.0   trunc.msf MSF: 5 Type: N 27/08/05 CompCheck: 2332 ..

<br><br>  Name: one        Len: 5  Check: 1166 Weight: 1.00<br>  Name: two        Len: 5  Check: 1166 Weight: 1.00<br><br>//<br><br>           1   5<br>one        cagtt<br>two        cagtt<br><br><br>c:> c:\Stuff>infoalign 

trunc.msf -weight=n -change=n  -description=n -auto<br>Warning: Sequence character string not found in ajSeqCvtKS<br>#

USA            

Name        SeqLen    AlignLen        Gaps    GapLen<br>Ident   Similar Differ<br>msf::trunc.msf:one      one          

5 5      

0      

0      

5       0<br>    0<br>msf::trunc.msf:two      two          

5 5      

0      

0      

5       0<br>    0<br><br><br><br>Of course, years back, when I was working in Technical Support at GCG, I<br>would have provided a GCG-centric solution, involving the program Reformat.<br><br><br>If you don't want to read the sequences, then I could probably whip-up a

<br>Perl script for a specific format of sequences to detect and/or remove<br>endgaps.  But hopefully, SeqRet/InfoAlign will do.<br><br>Eric L. Cabot<br>Genome Center<br>University of Wisconsin<br><br>--- Samantha Fox <

<a href="mailto:bioinfosm@gmail.com">bioinfosm@gmail.com</a>> wrote:<br><br>> Thanks for your response. Heres some conversation and discussion we<br>> had, but still looking for a solution.<br>><br>> ---------- Forwarded message ----------

<br>> From: Samantha Fox <<a href="mailto:bioinfosm@gmail.com">bioinfosm@gmail.com</a>><br>> Date: Aug 26, 2005 2:34 PM<br>> Subject: Re: [BiO BB] No. of mismatches in dna sequence alignment<br>> To: <a href="mailto:pfern@igc.gulbenkian.pt">

pfern@igc.gulbenkian.pt</a>, "The general forum at Bioinformatics.Org"<br>> <<a href="mailto:bio_bulletin_board@bioinformatics.org">bio_bulletin_board@bioinformatics.org</a>><br>><br>><br>> :) Thats what, it gives 6-5 = 1. But for tcagtt and cagttt pair I want

> a value 0, as their alignment gives no mismatch, just end gaps. > > tcagtt- > -cagttt > This alignment also gives one difference. This is not the same as > 0-mismatch that I expect !

<br>><br>> Is there something that gives edit distance between dna sequences ?<br>><br>>

#

USA            

Name        SeqLen    AlignLen        Gaps    GapLen<br>>  Ident  

Similar Differ  %

Change        Weight  Description<br>>

msf::wf.needle:tcagtt  

tcagtt        6

6      

0      

0       5<br>>  0      

1      

16.666666       1.000000<br>>

msf::wf.needle:cagttt  

cagttt        6

6      

0      

0       5<br>>  0      

1      

16.666666       1.000000<br>><br>> Hope I clarified what I desire.<br>><br>> Basically the motivation is, I wish to use pair-wise distances to make<br>> groups of these small dna sequences. So cagttt should be in the same

<br>> group as tcagtt, as its just a sort of extension.<br>><br>> Thanks.<br>><br>> On 8/26/05, Pedro Fernandes <<a href="mailto:pfern@igc.gulbenkian.pt">pfern@igc.gulbenkian.pt</a>> wrote:<br>> > Dear Samantha

<br>> ><br>> > If you subtract: Ident from AlignLen you get your result. Am I<br>> mistaken?<br>> Samantha<br>><br>> from your last example<br>><br>> ==================<br>><br>> >one

<br>> tcagtt<br>> >two<br>> gcagtt<br>><br>> ==================<br>><br>> Run EMBOSS NEEDLE and get<br>><br>><br>> ==================<br>><br>><br>> !!NA_MULTIPLE_ALIGNMENT 1.0<br>

><br>>  outfile MSF: 6 Type: N 27/08/05 CompCheck: 3229 ..<br>><br>>  Name: one        Len: 6  Check: 1621 Weight: 1.00<br>>  Name: two        Len: 6  Check: 1608 Weight: 1.00<br>><br>> //<br>><br>

>           1    6<br>> one        tcagtt<br>> two        gcagtt<br>><br>> ==================<br>><br>> Then run EMBOSS INFOALIGN with this output and get<br>><br>> ==================<br>><br>

><br>>          Name        AlignLen  Ident  

Differ<br>>        one          

6          5        1<br>>        two          

6          5        1<br>><br>><br>> ==================<br>><br>> Use the Differ column dirctly or else just subtract:AlignLen-Ident<br>><br>> Is this what you need?<br>><br>> ><br>> > Hope this helps

<br>> > Pedro<br>> ><br>> ><br>> > Samantha Fox said:<br>> > > Pedro, thanks for taking time to run for my example. What field do<br>> you<br>> > > look at for the results ?<br>

> > ><br>> > ><br>> > > On 8/26/05, Pedro Fernandes <<a href="mailto:pfern@igc.gulbenkian.pt">pfern@igc.gulbenkian.pt</a>> wrote:<br>> > >> Hi<br>> > >><br>> > >> I tried INFOALIGN on your ALIGNED sequences and it does work!

<br>> > >> Maybe you are using the initial sequences not the aligned ones.<br>> > >><br>> > >><br>> > >> Pedro<br>> > >><br>> > >><br>> ><br>> >

<br>> > _______________________________________________<br>> > Bioinformatics.Org general forum  -<br>> <a href="mailto:BiO_Bulletin_Board@bioinformatics.org">BiO_Bulletin_Board@bioinformatics.org</a><br>> > 

<a href="https://bioinformatics.org/mailman/listinfo/bio_bulletin_board">https://bioinformatics.org/mailman/listinfo/bio_bulletin_board</a><br>> ><br>><br><br><br><br><br>____________________________________________________

<br>Start your day with Yahoo! - make it your home page<br><a href="http://www.yahoo.com/r/hs">http://www.yahoo.com/r/hs</a><br><br></blockquote></div><br>