From Bioinformatics.Org Wiki
Assume that a researcher has microarray expression values for two disease subtypes: A and B. Within these samples, some are known to respond to drug X, while the rest do not respond to any known drug. The names and functions of the gene products are known and in a database. Also, the gene expression data do not already include expression under any drug.
First, knowing the names and functions of the gene products can help validate the predictive capabilities mentioned here.
The approach taken by Eisen et al (1998) and Perou et al (2000) can be followed. The program Cluster can be used to obtain a hierarchical clustering of the genes and samples, using average linkage clustering to get coarser clusters. The results of the clustering could then be examined using the auxiliary program TreeView.
Samples of the same type (diseased subtype or normal) would have expression profiles that are characteristic of their type and would thus be clustered together by the clustering algorithm. As mentioned in the articles, this gives us the ability to classify the samples and predict the classification of new samples.
Diseased samples having characteristic expression profiles means that these samples have genes which have certain levels of expression. The program Cluster will also cluster genes of similar expression profiles, thus revealing which clusters of genes, with certain expression levels, are associated with which samples -- some samples being "diseased" and some being of a certain disease subtype.
Another microarray experiment can then be performed where the diseased samples are treated with drug X. Reclustering these new data with those from the untreated samples will then reveal which genes change their expression levels under treatment. Those genes are likely targets of drug X and can be examined alone (without any of the other genes), for a patient, for changes in expression levels (matching the normal samples) under the drug.
As mentioned above, knowing the identities of the samples and genes can help validate predictive capabilities. If sample and gene identities are not known, validation can be done by performing a microarray experiment with one or more known samples or genes, and seeing how they cluster.
It might be desirable to know if drug X is more effective on disease subtype A or B. This could be discerned from the experiment performed using the drug, mentioned above.
- MATLAB Bioinformatics Toolbox software provides access to genomic and proteomic data formats, analysis techniques, and specialized visualizations for genomic and proteomic sequence and microarray analysis.
- MAExplorer - The Microarray Explorer (MAExplorer) is a Java-based data-mining facility for microarray databases run as a stand-alone program. It includes graphics, statistics, clustering, reports, data filtering.
- SEPON - SEPON designs gene-specific oligonucleotides for microarray experiments and is able to use EST input from organisms in which the genome is not annotated for genes. SEPON implements a novel algorithm for reducing cross-hybridization by utilizing thermodyna
- Teiresias-based Gene expression analysis - Discover patterns in microarray data using the Teiresias algorithm. Allows discovery of inversely regulated genes.
- ↑ Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95:14863-14868.
- ↑ Perou, C.M., Sorlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R., Ross, D.T., Johnsen, H., Akslen, L.A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S.X., Lonning, P.E., Borresen-Dale, A.L., Brown, P.O., Bolstein, D. 2000. Molecular portraits of human breast tumors. Nature 406(6797):747-752.