Repetitive DNA sequence
From Bioinformatics.Org Wiki
Sequences with short repeating units
The two most common terms used for sequences containing short repeating units are simple sequence repeat (SSR) and microsatellite.
SSRs are composed of short (1 to 5 bp), tandemly repeating units that are exact in identity and repetition. Although the elongation of SSR tracts may be due to more than one mechanism , much is thought to be the result of slip-strand replication errors. In the process of nascent strand formation, reannealing can occur. And when the strands contain repetitive elements, such as with SSR tracts, the annealing can be imperfect, leading to the addition of the same elements. The errors become permanent when an additional round of replication occurs before they are discovered by repair enzymes .
Some biochemists use the descriptor polymeric and modifications thereon in order to describe a repeat more precisely, e.g.:
- homopolymer or heteropolymer
The most abundant SSR tracts are the homopolymer repeats poly(dA).poly(dT) and poly(dG).poly(dC). Long (> 9 bp) tracts of both types are found at higher than expected frequencies in the non-coding regions of eukaryote genomes. This is particularly true for poly(dA).poly(dT) tracts in the AT-rich genomes .
The biological importance of SSR tracts has been clearly delineated. Homopolymer tracts, for example, can serve as protein binding signals, particularly as upstream promoter elements . Also, long homopolymer tracts are spaced non-randomly in the genome of Dictyostelium discoideum, suggesting a preferential linker DNA location in the repeating nucleosome structure of this AT-rich organism . While this restricted localization may be thermodynamically determined, the suggestion is that these tracts may serve some function determined by their accessibility in the linker DNA region between nucleosomes.
The heteropolymer tracts are at least as important biologically. Dinucleotide repeats are associated with human diseases such as Norrie's disease , and the expansion of trinucleotide repeats is often associated with neurodegenerative disease and chromosomal fragility, such as Huntington's disease and fragile X syndrome, respectively . Many of the SSR tract monomer lengths can play a role in sequence-specific DNA binding by proteins . In coding regions, homopolymer and dinucleotide tract elongation can lead to frame-shift errors, often resulting in cancers. And, trinucleotide tract elongation can lead to tandem amino acid repeats.
Larger repeats may be called oligomeric, referring to the oligomer that is the repeating unit and may likewise use the following terms:
- identical or repetitive oligomers
- oligonucleotide repeat
Latin prefixes mono-, di-, tri-, etc. might also be added to the root words mer and nucleotide to describe a simple repeat.
Sequences with large repeating units
Minisatellite (10-100 bp repeating unit) and macrosatelite (> 100 bp repeating unit) are the most common terms for sequences with large repeating units.
One common way to describe a repetitive sequence is to use symbolic notation, e.g. (N)n, where N is a repeating unit, and n is the number of repeats.
Other general descriptors (adjectives) are used, e.g.:
Descriptors are usually used along with the words sequence, repeat, tract or run.
Names for the repeating unit
- IMEx - Imperfect Microsatellite Extractor
- JSTRING - Java Search for Tandem Repeats in genomes
- Microsatellite Repeats Finder
- MISA - MIcroSAtellite identification tool
- Phobos - a tandem repeat search tool for perfect and imperfect repeats -- the maximum pattern size depends only on computational power
- Poly - quantitative analysis of SSRs
- sputnik - a simple microsatellite search program
- Tandem Repeats Finder
- ↑ Toth, G., Gaspari, Z., Jurka, J. 2000. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res 10(7):967-981. (PMID ??????)
- ↑ Streisinger, G., Okada, Y., Emrich, J., Newton, J., Tsugita, A., Terzaghi, E., Inouye, M. 1966. Frameshift mutations and the genetic code. Cold Spring Harb Symp Quant Biol 31():77-84. (PMID ??????)
- ↑ Kunkel TA, Soni A 1988. Mutagenesis by transient misalignment. J Biol Chem 263(29):14784-14789. (PMID ??????)
- ↑ Marx, K.A., Hess, S.T., Blake, R.D. 1993. Characteristics of the large (dA).(dT) homopolymer tracts in D. discoideum gene flanking and intron sequences. J Biomol Struct Dyn 11(1):57-66. (PMID ??????)
- ↑ Struhl, K. 1985. Naturally occurring poly(dA-dT) sequences are upstream promoter elements for constitutive transcription in yeast. Proc Natl Acad Sci U S A 82(24):8419-8423. (PMID ??????)
- ↑ Marx, K.A., Hess, S.T., Blake, R.D. 1994. Alignment of (dA).(dT) homopolymer tracts in gene flanking sequences suggests nucleosomal periodicity in D. discoideum DNA. J Biomol Struct Dyn 12(1):235-246. (PMID ??????)
- ↑ Kenyon, J.R., Craig, I.W. 1999. Analysis of the 5' regulatory region of the human Norrie's disease gene: evidence that a non-translated CT dinucleotide repeat in exon one has a role in controlling expression. Gene 227(2):181-188. (PMID ??????)
- ↑ Ashley, C.T., Warren, S.T. 1995. Trinucleotide repeat expansion and human disease. Annu Rev Genet 29():703-728. (PMID ??????)
- ↑ Richards, R.I., Holman, K., Yu, S., Sutherland, G.R. 1993. Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. Hum Mol Genet 2(9):1429-1435. (PMID ??????)