The program YAPP searches for core promoter elements within a sequence of DNA, and for putative synergistic combinations or these core elements, such as an Initiator (INR) motif at -2 in combination with a DPE motif at +28. The expectation is that synergistic combinations are less like to occur by chance than single elements, and that two or more weak elements can combine to make a functional promoter. The approach is based statistical analysis of eukaryotic core promoter elements (Gershenzon 2005). The program is designed for scanning short, previously identified putative or experimentally determined promoter regions, rather than as a predictive tool to search large genomic sequences for promoter regions.

The algorithm is a basic, unoptimised sliding window search for matching motifs represented in Position Weight Mattices (PWMs), and calculates a 'Conservation Index' and Matrix Similarity Score (after Cartharius 2005) for matches in the sequence. The program accepts an optional parameter to indicated the position of the TSS, in which case the search is to restricted to elements which lie within the functional range of the specified TSS.

Position Weight Matrices

The TATA box PWM was derived from EPD Promoter Elements Page, HMM-trained from 600 unrelated vertebrate promoter sequences. The given values have been converted to percentages.

The PWM provided in the EPD Promoter Elements Page for INR motif has a consensus sequence [TG]C[AT][GTC][TCA][CT][TCG][TC]. This not consistent with the consensus sequence [CT][TC]A-[TA][CT][TC] given in more recent literature(e.g. Gershezon), for which there are no available PWMs.

I have therefore used a seperate source for the INR motif (Chalkley and Verrijzer, 1999), and the DPE and BRE elements (Jin et al 2006).

The search algorithm is not optimised, therefore the program is not suitable (or intended) for searching sequences greater than 2k.

References