The Disperse project aims to develop and distribute software and data for design of selector assays for exon resequencing applications. The software consists of Java and Perl code integrated into a pipeline that performs all tasks required to transform a list of gene names to a set of selector probes targetting all exonic regions of those genes.
For descriptions of the selector technology and its applications, see the references page.
In brief, designing selector probes for a given target gene involves selecting a set of restriction enzymes that generate suitable restriction fragments, and then assembling sequences of selector probes that target the fragments containing the sequences of interest.
This task is divided into several stages, each of which will consume data from previous stages, and produce output data that is used in later stages, or that will be helpful for other purposes.
To perform a selector design using this software, a number of things are required:
A complete design job is divided into the following stages:
Given a set of gene names, Disperse will find the coordinates for all regions of coding sequence for each gene. First, the CCDS data is checked. If the gene is not found, Disperse will access the NCBI Gene database to extract the coding sequence regions. Overlapping and adjacent regions are merged.
For each gene, a set of ROIs is generated based on the coding sequence coordinates, adding a number of flanking positions on either side of each sequence, and merging any overlapping regions. The user can specify the flank size to include a desired number of bases on either side of each CDS to the regions that will be targeted for selection.
The sequences for each ROI, and an additional number of bases on each side, is extracted using the fastacmd program and a Blast database.
If the SNP data file is present, all SNPs within the sequences are extracted from this file.
The extracted SNPs are added to to the sequences to produce a set of sequences with snp codes.
The PieceMaker software is used to find all restriction fragments that fulfill the specified design criteria, and to select a combination of restriction reactions that maximizes the portion of the ROIs that is included in selectable fragments.
PieceMaker is now used to select a subset of the restriction fragments generated by the selected restriction reactions. This subset is selected to minimize the number of fragments required to achieve optimal coverage.
This step creates a file with information about the amplicons that should be generated by the designed set of selector probes.
The sequences of the selector probes are assembled based on the target fragments and a general sequence motif. This step is carried out by the ProbeMaker software.
A number of the files created in the previous steps are consolidated into a set of output files providing an overview of the design.
Disperse has been tested on openSUSE linux 10.3, Windows XP Professional, and Max OS X 10.5.
The software can be operated in a pipeline mode or in a stagewise mode. Both modes are operated from the command line. The pipeline and some of the stages are Perl scripts, while some stages are executed as shell scripts invoking Java programs. There is also a basic graphical user interface for the pipeline.
Detailed documentation on how to run the pipeline is available on the 'Usage' menu