[Biodevelopers] batch tool for finding mitochondrial DNApolymorphisms

Mon Apr 10 14:35:46 EDT 2006

Hi, 
    I don't know exactly what you are looking for, but if you assume all
polymorphisms are single base substitutions and that there are no
insertions or deletions (is this correct??), then the basic code is
pretty easy. Just look at each position in each sequence and see if it
matches the reference. If so, keep going. If not, record a polymorphism.

    Allowing insertions for deletions is trickier because there is a
chance that your sequences will get out of alignment with each other and
that would cause massive problems. You would probably have to check
alignment with every position. I am not sure off hand what the best way
to do this would be, but I think it would not be too hard...
Ethan

Ethan Strauss Ph.D.
Bioinformatics Scientist
Promega Corporation
2800 Woods Hollow Rd.
Madison, WI 53711
608-274-4330
800-356-9526
ethan.strauss at promega.com

________________________________

From: biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.org
[mailto:biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.o
rg] On Behalf Of David Whyte
Sent: Saturday, April 08, 2006 4:31 PM
To: biodevelopers at bioinformatics.org
Subject: [Biodevelopers] batch tool for finding mitochondrial
DNApolymorphisms

Hi,

I have a bioinformatics project that involves finding polymorphisms in
mitochondrial DNA (mtDNA).  The polymorphisms are typically denoted as
"reference base/position/polymorphic base", as in A750G.  I'd like to
add a software tool to our company website where a visitor could paste
in a set of mitochondrial genomes, and a reference sequence, and get
back a list of polymorphisms.  Something like:

>Seq1

A458G, T4899A....

>SEQ2

T678C, G6789C....

etc.   

We sequence mitochondrial DNA for customers interested in learning about
their ancient ancestry.

The site will be freely available.  It will be attached to our company
site, www.argusbio.com <http://www.argusbio.com/> , which is still in
development at LunarPages.  The author's name and an email link could be
listed on the page.

A full-length genome is 16,569 bases long.  Typically two people will
have around 30 to 50 differences in their mtDNAs - more (but less than
100) if they have very different ancestry (African vs European, for
example).  These polymorphisms determine the person's mitochondrial
haplogroup.

It would be very helpful if the program were able to determine which
haplogroup the mtDNA belongs in based on the list of polymorphisms.  I
have tables of diagnostic polymorphisms used for classing mt genomes.

It would also be very useful if there were an option to generate a fasta
file that consisted of just polymorphic sites.  So if someone put in 100
full-length genomes, and a reference genome, the output would be fasta
sequences where each base varied from the reference in at least one test
sequence.  This output would be much easier to align with CLUSTALW than
the full-length sequences, which are typically > 99% invariant. 

I am looking for some ideas of how best to implement this web-based
tool.  

Thanks,

David B. Whyte, Ph.D.
Argus Biosciences, LLC
650-954-1055

dwhyte at argusbio.com <mailto:dwhyte at argusbio.com> 
www.argusbio.com <http://www.argusbio.com/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/biodevelopers/attachments/20060410/eee7ccd2/attachment.html