LEMMA {lemma} | R Documentation |
LEMMA fits a linear mixed model to normalized microarray data. See the complete LEMMA paper (on the lemma web site) which contains the underlying model and the theory.
http://www.stat.cornell.edu/lemma/docs/LEMMAsummary.pdf contains a short summary of the model.
This version supports two treatment groups and either a two-way classification (null and nonnull genes, as in the LEMMA paper), or a three-way classification: null genes, for which statistically there is no difference in expression between the two treatment groups; nonnull group 1 - genes that are significantly more expressed in treatment group 1 than in treatment group 2; and nonnull group 2 - genes that are significantly more expressed in treatment group 2 than in treatment group 1.
The program runs on both Windows and Linux.
The input should consist of a data frame with G rows, and have the following structure:
In this version n1 and n2 do not have to be the same, but all the rows in Y1
have to have n1 elements, and all the rows in Y2 have to have n2 elements.
The program also uses the following variables when the user invokes the
lemma
function:
outdir
, locfdrcutoff
, fdrcutoff
, topgenes
, titletext
, mgq
, tol
, maxIts
, modes
, plots
.
All of the parameter estimates, plots, and gene lists will be saved under the outdir directory. In particular, this directory will contain the following files:
log.txt
- reporting the total number of genes, sample sizes,
mean(d_g), sd(d_g), mean(m_g), sd(m_g), estimates of the shape and scale
parameters of the assumed inverse gamma prior for the error variance.
It also contains the mean and variance of the fitted error variance distribution
(they should be close to the sample mean and variance based on the observed m_g ).
Estimates for \tau, \psi, \sigma^2_\psi, p_1 and p_2 are also included
in this log file, as well as the number of nonnulls genes detected using the
user-provided local fdr and the FDR thresholds:w
. Any convergence problems
in the EM algorithm are reported in this file.
resultsRR.txt
- contains a list of genes sorted by their posterior
null probability. This file also contains the estimated posterior probabilities
for a gene being more expressed in treatment group 1 than in treatment group 2
(and vice versa). It also contains the gene effect (d_g-\tau ).
resultsFDR.txt
- contains a list of genes sorted by their
BH-adjusted p-values. The file also contains the gene effect (d_g-\tau ),
and the sign of the gene effect which can be used to determine if a (nonnull)
gene is more expressed in treatment group 1 than in treatment group 2 (or vice versa).
AllData.RData
- contains the following elements:
dg, mg, n1, n2, f, G, RRfdr0, RRfdr1, RRfdr2, alpha\_hat, beta\_hat, sig2eb, tau,
psi, sig2psi, p0, p1, p2, pBH0
Note:
Bar, H.Y. hyb2@cornell.edu, Schifano, E.D. eds27@cornell.edu
Bar, H.Y., Booth, J.G., Schifano, E.D., Wells, M.T., (2009). Laplace approximated EM Microarray Analysis: an empirical Bayes approach for comparative microarray experiments.
http://www.stat.cornell.edu/lemma/docs/lemma.pdf
Read lemma to see how to execute the program.
Use lemmaPlots to produce diagnostics plots.
Use printTopGenes to produce a list of genes sorted by their adjusted p-values or by their posterior null probabilities.
## Not run: lemma(apoai,titletext="APO-AI, Callow et al (2000)",outdir="OUT/apoai", plots=F) lemmaPlots("OUT/apoai",mgq=0.99, titletext="APO-AI (Callow et al., 2000)") lemma(simdata,titletext="Simulated data",outdir="OUT/simdata") # Similarly, if the user wants to use the 2-way classification: lemma(apoai,titletext="APO-AI, Callow et al (2000)",outdir="OUT/apoai", modes=2, plots=F) lemmaPlots("OUT/apoai",mgq=0.99,titletext="APO-AI (Callow et al., 2000)", modes=2) ## End(Not run)