Fwd: [BiO BB] No. of mismatches in dna sequence alignment

Samantha Fox bioinfosm at gmail.com
Fri Oct 28 17:22:43 EDT 2005


Thanks to all for their replies.
I just wanted to share, that I finally got to use align0 from the fasta
package .. for my purpose of finding distance (or similarity) between small
dna sequences.
align0 does not penalize end gaps.

Samantha

On 8/27/05, Eric L. Cabot <ecabot at yahoo.com> wrote:
>
> Samantha,
>
> I'm assuming that you aren't going to use mimatches determined this way
> for
> a phylogenetic analysis! If so, there are "better" ways to go about geting
> distance measures.
>
> That being said, if you are willing to look at your sequences to determine
> where the endgaps are, and you are buying into using EMBOSS, then you
> could
> first filter your sequences with the EMBOSS program SeqRet, supplying the
> start and stop positions of interest.
>
> In your case, you the region of interest is spans positions 2 through 6.
> Here's a set of sample command lines:
>
> c:> seqret needle.msf -sbegin=2 -send=6 -osformat=msf -outseq=trunc.msf
>
> c:\Stuff>type trunc.msf
> !!NA_MULTIPLE_ALIGNMENT 1.0
>
> trunc.msf MSF: 5 Type: N 27/08/05 CompCheck: 2332 ..
>
> Name: one Len: 5 Check: 1166 Weight: 1.00
> Name: two Len: 5 Check: 1166 Weight: 1.00
>
> //
>
> 1 5
> one cagtt
> two cagtt
>
>
> c:> c:\Stuff>infoalign trunc.msf -weight=n -change=n -description=n -auto
> Warning: Sequence character string not found in ajSeqCvtKS
> # USA Name SeqLen AlignLen Gaps GapLen
> Ident Similar Differ
> msf::trunc.msf:one one 5 5 0 0 5 0
> 0
> msf::trunc.msf:two two 5 5 0 0 5 0
> 0
>
>
>
> Of course, years back, when I was working in Technical Support at GCG, I
> would have provided a GCG-centric solution, involving the program
> Reformat.
>
>
> If you don't want to read the sequences, then I could probably whip-up a
> Perl script for a specific format of sequences to detect and/or remove
> endgaps. But hopefully, SeqRet/InfoAlign will do.
>
> Eric L. Cabot
> Genome Center
> University of Wisconsin
>
> --- Samantha Fox <bioinfosm at gmail.com> wrote:
>
> > Thanks for your response. Heres some conversation and discussion we
> > had, but still looking for a solution.
> >
> > ---------- Forwarded message ----------
> > From: Samantha Fox <bioinfosm at gmail.com>
> > Date: Aug 26, 2005 2:34 PM
> > Subject: Re: [BiO BB] No. of mismatches in dna sequence alignment
> > To: pfern at igc.gulbenkian.pt, "The general forum at Bioinformatics.Org"
> > <bio_bulletin_board at bioinformatics.org>
> >
> >
> > :) Thats what, it gives 6-5 = 1. But for tcagtt and cagttt pair I want
> > a value 0, as their alignment gives no mismatch, just end gaps.
> >
> > tcagtt-
> > -cagttt
> > This alignment also gives one difference. This is not the same as
> > 0-mismatch that I expect !
> >
> > Is there something that gives edit distance between dna sequences ?
> >
> > # USA Name SeqLen AlignLen Gaps GapLen
> > Ident Similar Differ % Change Weight Description
> > msf::wf.needle:tcagtt tcagtt 6 6 0 0 5
> > 0 1 16.666666 1.000000
> > msf::wf.needle:cagttt cagttt 6 6 0 0 5
> > 0 1 16.666666 1.000000
> >
> > Hope I clarified what I desire.
> >
> > Basically the motivation is, I wish to use pair-wise distances to make
> > groups of these small dna sequences. So cagttt should be in the same
> > group as tcagtt, as its just a sort of extension.
> >
> > Thanks.
> >
> > On 8/26/05, Pedro Fernandes <pfern at igc.gulbenkian.pt> wrote:
> > > Dear Samantha
> > >
> > > If you subtract: Ident from AlignLen you get your result. Am I
> > mistaken?
> > Samantha
> >
> > from your last example
> >
> > ==================
> >
> > >one
> > tcagtt
> > >two
> > gcagtt
> >
> > ==================
> >
> > Run EMBOSS NEEDLE and get
> >
> >
> > ==================
> >
> >
> > !!NA_MULTIPLE_ALIGNMENT 1.0
> >
> > outfile MSF: 6 Type: N 27/08/05 CompCheck: 3229 ..
> >
> > Name: one Len: 6 Check: 1621 Weight: 1.00
> > Name: two Len: 6 Check: 1608 Weight: 1.00
> >
> > //
> >
> > 1 6
> > one tcagtt
> > two gcagtt
> >
> > ==================
> >
> > Then run EMBOSS INFOALIGN with this output and get
> >
> > ==================
> >
> >
> > Name AlignLen Ident Differ
> > one 6 5 1
> > two 6 5 1
> >
> >
> > ==================
> >
> > Use the Differ column dirctly or else just subtract:AlignLen-Ident
> >
> > Is this what you need?
> >
> > >
> > > Hope this helps
> > > Pedro
> > >
> > >
> > > Samantha Fox said:
> > > > Pedro, thanks for taking time to run for my example. What field do
> > you
> > > > look at for the results ?
> > > >
> > > >
> > > > On 8/26/05, Pedro Fernandes <pfern at igc.gulbenkian.pt> wrote:
> > > >> Hi
> > > >>
> > > >> I tried INFOALIGN on your ALIGNED sequences and it does work!
> > > >> Maybe you are using the initial sequences not the aligned ones.
> > > >>
> > > >>
> > > >> Pedro
> > > >>
> > > >>
> > >
> > >
> > > _______________________________________________
> > > Bioinformatics.Org general forum -
> > BiO_Bulletin_Board at bioinformatics.org
> > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > >
> >
>
>
>
>
> ____________________________________________________
> Start your day with Yahoo! - make it your home page
> http://www.yahoo.com/r/hs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20051028/8d3c85fa/attachment.html>


More information about the BBB mailing list