YAPP Eukaryotic Core Promoter Predictor

The program searches for the elements of canonical core promoters - TATA boxes, initiators etc - within a sequence of DNA, and for putative synergistic combinations or these elements, such as an Initiator (INR) motif at -2 in combination with a DPE motif at +28. The expectation is that synergistic combinations are less like to occur by chance than single elements, and that two or more weak elements can combine to make a functional promoter. The approach is based upon statistical analysis of eukaryotic core promoter elements (Gershenzon 2005). The program is designed for scanning sequneces of putative or experimentally determined promoter regions.

The application is not suitable as a predictive tool for promoters which do not contain these canonical core promoter motifs, such as CpG-rich promoter regions. For CpG Island detection try EMBOSS CpGPlot.

The algorithm uses a sliding window to search for matching motifs represented in Position Weight Mattices (PWMs), and calculates a 'Conservation Index' and Matrix Similarity Score (after Cartharius 2005) for matches in the sequence. The program accepts an optional parameter to indicated the position of the TSS, in which case the search is to restricted to elements which lie within the functional range of the specified TSS.

Position Weight Matrices

The TATA box PWM was derived from EPD Promoter Elements Page, HMM-trained from 600 unrelated vertebrate promoter sequences. The given values have been converted to percentages.

The PWM provided in the EPD Promoter Elements Page for INR motif has a consensus sequence [TG]C[AT][GTC][TCA][CT][TCG][TC]. This not consistent with the consensus sequence [CT][TC]A-[TA][CT][TC] given in more recent literature(e.g. Gershezon). I have therefore used a seperate source for the INR motif (Chalkley and Verrijzer, 1999). The DPE, MTE and BRE element PWMs are from Jin et al 2006.

If you find this program useful and have any comments or suggestions for improvement, please contact Chris Joyce.

References

(BACK)


Core Promoter Element PWMs used by YAPP

INR
PosACGT
1055.36044.64
2075025
3100000
423.2128.5726.7921.43
528.570071.43
616.0742.86041.07
7051.7916.0732.14

TATA
PosACGT
117.721.12932.2
219.336.136.48.2
36.614.86.871.8
483.40016.6
5000100
695005
772.30027.7
894.205.80
953.3020.126.6
1029.3951.210.5
1117.732.537.712.1
1222.73333.211.1

MTE
PosACGT
13.434.560.31.7
224.141.4313.4
387.93.48.60
48.65.274.112.1
51.794.803.4
61.741.453.43.4
710.344.844.80
843.1056.90
912.18.667.212.1
105.286.23.45.2
111.75.289.73.4
1217.234.546.61.7

DPE
PosACGT
151.7048.30
2001000
358.80041.2
4055.2044.8
521.530.5480

BRE
PosACGT
1068.931.10
2067.632.40
335.1064.90
4010000
5001000
6010000
7010000


Searched synergistic element combinations

Element 1Element 2Min. gapMax. gap
MTEDPE1010
BREINR2142
BREDPE5373
TATAMTE3858
TATADPE4868
TATAINR1637
BRETATA55
INRDPE2830