Hi, I don't know exactly what you are looking for, but if you assume all polymorphisms are single base substitutions and that there are no insertions or deletions (is this correct??), then the basic code is pretty easy. Just look at each position in each sequence and see if it matches the reference. If so, keep going. If not, record a polymorphism. Allowing insertions for deletions is trickier because there is a chance that your sequences will get out of alignment with each other and that would cause massive problems. You would probably have to check alignment with every position. I am not sure off hand what the best way to do this would be, but I think it would not be too hard... Ethan Ethan Strauss Ph.D. Bioinformatics Scientist Promega Corporation 2800 Woods Hollow Rd. Madison, WI 53711 608-274-4330 800-356-9526 ethan.strauss at promega.com ________________________________ From: biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.org [mailto:biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.o rg] On Behalf Of David Whyte Sent: Saturday, April 08, 2006 4:31 PM To: biodevelopers at bioinformatics.org Subject: [Biodevelopers] batch tool for finding mitochondrial DNApolymorphisms Hi, I have a bioinformatics project that involves finding polymorphisms in mitochondrial DNA (mtDNA). The polymorphisms are typically denoted as "reference base/position/polymorphic base", as in A750G. I'd like to add a software tool to our company website where a visitor could paste in a set of mitochondrial genomes, and a reference sequence, and get back a list of polymorphisms. Something like: >Seq1 A458G, T4899A.... >SEQ2 T678C, G6789C.... etc. We sequence mitochondrial DNA for customers interested in learning about their ancient ancestry. The site will be freely available. It will be attached to our company site, www.argusbio.com <http://www.argusbio.com/> , which is still in development at LunarPages. The author's name and an email link could be listed on the page. A full-length genome is 16,569 bases long. Typically two people will have around 30 to 50 differences in their mtDNAs - more (but less than 100) if they have very different ancestry (African vs European, for example). These polymorphisms determine the person's mitochondrial haplogroup. It would be very helpful if the program were able to determine which haplogroup the mtDNA belongs in based on the list of polymorphisms. I have tables of diagnostic polymorphisms used for classing mt genomes. It would also be very useful if there were an option to generate a fasta file that consisted of just polymorphic sites. So if someone put in 100 full-length genomes, and a reference genome, the output would be fasta sequences where each base varied from the reference in at least one test sequence. This output would be much easier to align with CLUSTALW than the full-length sequences, which are typically > 99% invariant. I am looking for some ideas of how best to implement this web-based tool. Thanks, David B. Whyte, Ph.D. Argus Biosciences, LLC 650-954-1055 dwhyte at argusbio.com <mailto:dwhyte at argusbio.com> www.argusbio.com <http://www.argusbio.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://bioinformatics.org/pipermail/biodevelopers/attachments/20060410/eee7ccd2/attachment.html