================================================================================
    PeakSplitter Overview 
================================================================================

////////////////////////////////////////////////////////////////////////////////
INSTALLATION
////////////////////////////////////////////////////////////////////////////////

The source code as well as Visual Studio 2005 solution (.sln) file are included.

PeakSplitter uses external libraries:
1. The iMatix SFL, Copyright © 1991-2000 iMatix Corporation <http://www.imatix.com>
2. zlib - Copyright (C) 1995-2005 Jean-loup Gailly and Mark Adler
3.tclap - Copyright © 2003,2004,2005,2006,2009 Michael E. Smoot


////////////////////////////////////////////////////////////////////////////////
USAGE
////////////////////////////////////////////////////////////////////////////////

Usage: PeakSplitter.exe <-p peakfile> <-w wig file/folder> <-o output folder> [options]

Options:

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.

   -p <string>,  --peakFile <string>
     (required)  input peak file

   -w <string>,  --wigFile <string>
     (required)  input wig file or folder

   -o <string>,  --outDir <string>
     (required)  output folder

   -c <int>,  --cutoff <int>
     height cutoff (default 5)

   -v <float>,  --valley <float>
     float value to determine the valley depth required for peak separation
     (default 0.6)

    -f,  --fetch
     whether to fetch subpeaks sequences or not (default true)

    -u <string>,  --url <string>
     Das url where to get sequences from (default is for human
     "http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference")
    
    -n <int>,  --numSeq <int>
     number of best peak sequences to fetch (default 300)

    -l <int>,  --length <int>
     length of sequence to fetch (default 60)


*** -p/ --peakFile 
This is a REQUIRED parameter for PeakSplitter. 
The file lists the genomic coordinates output by a peak calling program 
(or obtained in some other way). The format should be tab/space delimited, 
where each locus is described by its "chromosome", "start" and "end" location.
This file should be sorted by chromosome and start position.
PLEASE REMOVE ANY HEADER LINES FROM THE FILE IF THESE ARE PRESENT

*** -w/--wigFile
This is a REQUIRED parameter for PeakSplitter.
This can be a wig file OR a wig folder that contains one wig file for each chromosome, 
where the wig file describes the signals (usually number of reads) along the genome and
are created by the peak-calling program that generated the peak file. 
PeakSplitter supports wig files in VariableStep or Bedgraph formats. 
The wig header lines, "track type" and "variableStep" (when using VariableStep format)
are required. 
The files can be zipped or gzipped, so it's not necessary to uncompress them.

wig file names for each chromosme (under wig folder) should contain the word "chr" + 
chromosome number, for example "my.chr12.wig".

*** -o/--outDir
This is a REQUIRED parameter for PeakSplitter.
An output directory must be specified where PeakSplitter can write the result files. 

*** -x/--prefix
string to add to output file names, for example when the same peak files are analyzed 
using different parameters.

*** -c/--cutoff
Height cutoff (default 5). Only subpeaks with at least this number of reads in 
their summit region will be reported.

*** -v/--valley 
Real value indicating the valley depth required for peak separation (default 0.6). 
Local maxima regions are found within each peak and the height of neighboring local 
maxima are compared. The lowest value is multiplied by the valley real-valued number 
to yield the minimum depth required to separate the two peaks. 
For example, a value of 0.5 means that the height of the valley should be less than 
half the height of its summits in order for them to be separated.

*** -f/--fetch
By default, PeakSplitter will fetch the subpeak sequences near their summit region. 
In order to turn this feature off set it to be
-f false

*** -u/--url
The sequences are exported directly from the DAS Ensembl database.
The user has to specify the DAS URL for the organism of interest.
The DAS URL for the human database is "http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference", 
and URLs for other organisms can be found at: "http://www.ensembl.org/das/dsn"

*** -n/--numSeq
Number of best subpeak sequences to fetch (those with the highest numbers of reads 
in their summit region). These sequences can be used as input for motif prediction 
tools such as MEME. 
The default number is 300. This is the maximum number of sequences the web-based 
version of MEME will accept (more sequences can be input when run locally).

*** -l/--length
Length of sequence to fetch (default 60)
The sequences are retrieved near the summit region. 
If the length is 60, 30 bp will be included upstream to the peak summit position, 
and 30 downstream. The total sequence length will be 61.

////////////////////////////////////////////////////////////////////////////////
OUTPUT FILES
////////////////////////////////////////////////////////////////////////////////

If the -x parameter is specified, all output file names will start with the 
base string following the -x flag.

1. peakFileName.subpeaks.inputFileNameSuffix
For example, if the input peak file is "myPeaks.test", 
the output file will be "myPeaks.subpeaks.test"

This is a tab-delimited file which contains information about subpeaks, including:
a. Chromosome name
b. Start position of the subpeak
c. End position of the subpeak 
d. Number of reads in the peak summit position 
e. Subpeak summit position relative to the start position of the subpeak region.

2. peakFileName(without suffix).bestSubpeaks.fa
For example, if the input peak file is "myPeaks.test", the output file will be 
"myPeaks.bestSubpeaks.fa

This is a fasta file, containing the sequences of the best subpeaks 
(those with highest numbers of reads in their summit position).


