convexMLE {convexHaz} | R Documentation |
This function computes the nonparametric MLE of a convex hazard function.
convexMLE(x, M = 100, GRIDLESS = 0, ini.LSE = 0, tol = 1e-05, tol.SR = 1e-08, type = 1, max.loop = 100, max.inner.loop.SR = 250, max.outer.loop.SR = 50, range = c(0, max(x)), solve.tol = .Machine$double.eps)
x |
vector containing the data. |
M |
size of the grid used in the implementation of the (support reduction) algorithm. The default value is M=100 . |
GRIDLESS |
boolean variable. If GRIDLESS=1 , then a 'gridless' implementation of the algorithm is used. The default value is GRIDLESS=0 . |
ini.LSE |
boolean variable. If ini.LSE=1 then the initial value for each call of the support reduction algorithm for the MLE is taken from the least squares estimator. The default value is ini.LSE=0 . |
tol |
tolerance value of the bisection algorithm. The default value is tol=1e-05 . |
tol.SR |
tolerance value of the support reduction algorithm. The default value is tol.SR=1e-08 . |
type |
integer between 1 and 3 indicating the exit criterion function used by the bisection algorithm. The default is type=1 . |
max.loop |
maximum number of iterations in the bisection algorithm. The default value is max.loop=100 . To be used for troubleshooting. |
max.inner.loop.SR |
maximum number of iterations in the inner loop of the support reduction algorithm. The default value is max.inner.loop.SR=250 . To be used for troubleshooting. |
max.outer.loop.SR |
maximum number of iterations in the outer loop of the support reduction algorithm. The default value is max.outer.loop.SR=50 . To be used for troubleshooting. |
range |
vector of length two describing the range of antimodes over which the (bisection) algorithm searches. The default is range=c(0,max(x)) . The range should always be contained in [0,max(x)] . To be used in troubleshooting. |
solve.tol |
tolerance value used in solve() invoked by the algorithm. The default is solve.tol=.Machine$double.eps , which is also the default of the solve() function. To be used in troubleshooting. |
This function gives an iterative procedure to find the nonparametric MLE of a convex hazard function under IID sampling. The likelihood in this case is actually a modified likelihood, which excludes the term h(max(x)
). The full likelihood can be made arbitrarily large by increasing h(max(x)
), and therefore this term is omitted. The estimated MLE returned by the algorithm is thus the function on [0, max(x)
), with the assumption that the MLE is equal to infinity beyond max(x)
.
The function consists of an outer, bisection algorithm loop and an inner, support reduction algorithm loop which calls upon srMLE
. The goal is to maximize the likelihood over the space of positive convex hazards. This is done in two steps: the support reduction algorithm maximizes the log likelihood (or minimizes its negative, negllh
) over the space of positive convex hazard with minimum (antimode) at a
; the bisection algorithm searches over the possible values of a
in [0,max(x)]
.
The bisection algorithm ends once the exit criterion function is smaller than tol
. The exit criterion function may be specified via the option type
:
type=1
sum((cur.phi-min(old.phi))^2)
type=2
sqrt(sum((cur.phi - old.phi)^2))
type=3
sort(cur.phi)[2]+sort(cur.phi)[3]-2*sort(cur.phi)[1]
where cur.phi
is the vector of the least squares criterion function phi
along different values of antimode a
, and old.phi
is the same but in the previous iteration.
The support reduction algorithm was first described in Groeneboom, Jongbloed and Wellner (2007). It is an iterative optimization procedure which exits when a sufficient tolerance level tol.SR
is reached. The algorithm uses a gridded implementation (see Jankowski and Wellner (2008) for the details) described by the parameter M
. The idea is that any convex function may be written as a mixture of elbow functions, and the grid describes the possible support of the mixing distribution. (The documentation for the function h
provides a little more information; see also Jankowski and Wellner (2007, 2008).) The larger M
is the more accurate, and slower, the algorithm. Setting GRIDLESS=1
will also increase the accuracy, but this often results in non-convergence of the support reduction algorithm (see 'Troubleshooting').
mle |
list containing the support supp and mixing measure mix (further lists) for the
computed MLE. The function h can translate these into a
hazard function.
|
llh |
value of the (modified) log likelihood at the MLE. |
antimode |
value of the antimode a at which the MLE of the hazard has its minimum.
|
conv |
boolean indicating if the (bisection) algorithm has converged. |
iter |
number of iterations of the bisection algorithm. |
"convexMLE.out" |
The algorithm also creates a file called convexMLE.out in R's current working directory. The file contains three variables: (1) negllh the value of the smallest least squares criterion function at the antimode found by the support reduction algorithm, (2) antimode the value of the antimode, and (3) conv a boolean indicating if the support reduction algorithm converged. If convexMLE.out already exists, then the algorithm will overwrite it.
|
The file "convexMLE.out"
contains information which may be used to determine how well the algorithm has worked. Plotting negllh
against antimode
should result in a U-shaped (first decreasing, then increasing) function, with minimum lying at the antimode of the MLE. This is only true though for the values of negllh
(and antimode
) where the srMLE
algorithm converged (ie. conv=TRUE
). Plotting the output in "convexMLE.out"
can be used to check this (see 'Examples'), and to check where convergence of srMLE
failed.
If the bisection algorithm fails to converge, then max.loop
may be increased. Also, changing the range
may used temporarily to diagnose the issue.
Alternatively, the support reduction algorithm may fail to converge. The file
"convexMLE.out"
shows which of the calls of the support reduction algorithm by the bisection loop converged. If too many of these failed, then changing the values of
max.inner.loop.SR
, max.outer.loop.SR
, solve.tol
or ini.LSE
may fix this. Increasing max.inner.loop.SR
or max.outer.loop.SR
gives the algorithms more time to converge, though it is rare that this is the cause of the problem. Setting ini.LSE
changes the starting point of the support reduction algorithm, and may even increase its speed.
At each iteration, the support reduction algorithm does a finite dimensional quadratic optimization using the function solve()
. Occasionally, the matrix passed to solve()
is computationally singular. Several built-in catches exist in the code to handle this, but if the problem occurs too frequently, the algorithm will abort. In fact, this is by far the most common cause of nonconvergence. The larger the value of M
, the more likely the problem is, but this also depends on the data set. Also, choosing GRIDLESS=1
highly increases the probability that this type of problem will occur. Setting solve.tol
to a smaller value may fix the problem.
Hanna Jankowski: hkj@mathstat.yorku.ca, Ivy Wang, Hugh McCague, Jon A. Wellner
Groeneboom, Jongbloed and Wellner (2008). The support reduction algorithm for computing nonparametric function estimates in mixture models. Scan. J. Statist. 35, 385–399.
Jankowski and Wellner (2007). Nonparametric estimation of a convex bathtub-shaped hazard function. Technical Report 521, Department of Statistics, University of Washington.
Jankowski and Wellner (2008). Computation of nonparametric convex hazard estimators via profile methods. Technical Report 542, Department of Statistics, University of Washington.
# Generate sample data: set.seed(1111, kind="default") x <- rbeta(100, 0.5, 0.5) # Find the mle (this takes a few minutes) mle <- convexMLE(x, M=1000) # And the mle has minimum at mle$antimode # Check the bisection algorithm diag <- read.table("convexMLE.out", header=TRUE) diag plot(diag$antimode, diag$negllh, pch=16, col="blue", main="diagnostics", ylab="phi", xlab="antimode") # plot the true hazard function and the lse h.true <- function(x) dbeta(x, 0.5, 0.5)/(1-pbeta(x, 0.5, 0.5)) tt <- c(0,sort(mle$mle$supp$tau), sort(mle$mle$supp$eta), max(x)) yy <- sapply(tt, h, supp=mle$mle$supp, hpar=mle$mle$mix) plot(h.true, xlim=c(0, 0.95), lwd=2, col="blue", ylab="hazard", xlab="time", main="MLE of convex hazard") lines(tt,yy, lwd=2, col="red") legend("topleft", c("true","estimated"),col=c("blue", "red"), lwd=2, inset = .05)