PeakAnalyzer comprises two main utilities: PeakSplitter and PeakAnnotator.
PeakSplitter accurately subdivides experimentally-derived peak regions containing more than one site of signal enrichment, optionally retrieving genomic DNA sequences corresponding to subpeak summit regions. Local maxima are identified in the peak region, heights of neighboring maxima are compared, and the lowest value is multiplied by a user-adjustable parameter to yield the minimum read depth required to separate peaks. This facilitates more detailed analysis of individual subpeaks, which is particularly useful for discerning individual binding sites that may be present in aggregate peak regions and in obtaining candidate sequences for motif analysis.
PeakAnnotator scans the target genome to identify and report functional elements proximal to the input set of peak loci. This facilitates automated annotation of large-scale experimental data, and obviates the need to import numerous coordinate sets into a genome browser for manual visualization and assessment. PeakAnnotator contains three main subroutines: Nearest Downstream Gene (
NDG), Transcription Start Site (
TSS) and Overlap Data Sets (
NDG locates the nearest downstream genes on both strands and calculates their distances. If the peak region intersects a gene, the program determines if the overlap is within an exon, intron, 5′ UTR or 3′ UTR. Multiple transcripts or genes overlapping a given location are all reported, providing a means to identify putative bi-directional promoters where the peak is proximal to genes on both strands.
TSS locates the nearest transcriptional start site relative to each locus, scanning both downstream or upstream of the experimental peak to account for transcription initiation on either the sense or antisense strand.
ODS calculates the overlap in positions/peaks between data sets, where loci that intersect by at least one nucleotide on either strand are reported and a P-value of overlap enrichment over random is calculated.