convexMLE {convexHaz}R Documentation

Compute the nonparametric MLE of a convex hazard

Description

This function computes the nonparametric MLE of a convex hazard function.

Usage

convexMLE(x, M = 100, GRIDLESS = 0, ini.LSE = 0, tol = 1e-05, tol.SR = 1e-08, type = 1, 
max.loop = 100, max.inner.loop.SR = 250, max.outer.loop.SR = 50, range = c(0, max(x)), 
solve.tol = .Machine$double.eps)

Arguments

x vector containing the data.
M size of the grid used in the implementation of the (support reduction) algorithm. The default value is M=100.
GRIDLESS boolean variable. If GRIDLESS=1, then a 'gridless' implementation of the algorithm is used. The default value is GRIDLESS=0.
ini.LSE boolean variable. If ini.LSE=1 then the initial value for each call of the support reduction algorithm for the MLE is taken from the least squares estimator. The default value is ini.LSE=0.
tol tolerance value of the bisection algorithm. The default value is tol=1e-05.
tol.SR tolerance value of the support reduction algorithm. The default value is tol.SR=1e-08.
type integer between 1 and 3 indicating the exit criterion function used by the bisection algorithm. The default is type=1.
max.loop maximum number of iterations in the bisection algorithm. The default value is max.loop=100. To be used for troubleshooting.
max.inner.loop.SR maximum number of iterations in the inner loop of the support reduction algorithm. The default value is max.inner.loop.SR=250. To be used for troubleshooting.
max.outer.loop.SR maximum number of iterations in the outer loop of the support reduction algorithm. The default value is max.outer.loop.SR=50. To be used for troubleshooting.
range vector of length two describing the range of antimodes over which the (bisection) algorithm searches. The default is range=c(0,max(x)). The range should always be contained in [0,max(x)]. To be used in troubleshooting.
solve.tol tolerance value used in solve() invoked by the algorithm. The default is solve.tol=.Machine$double.eps, which is also the default of the solve() function. To be used in troubleshooting.

Details

This function gives an iterative procedure to find the nonparametric MLE of a convex hazard function under IID sampling. The likelihood in this case is actually a modified likelihood, which excludes the term h(max(x)). The full likelihood can be made arbitrarily large by increasing h(max(x)), and therefore this term is omitted. The estimated MLE returned by the algorithm is thus the function on [0, max(x)), with the assumption that the MLE is equal to infinity beyond max(x).

The function consists of an outer, bisection algorithm loop and an inner, support reduction algorithm loop which calls upon srMLE. The goal is to maximize the likelihood over the space of positive convex hazards. This is done in two steps: the support reduction algorithm maximizes the log likelihood (or minimizes its negative, negllh) over the space of positive convex hazard with minimum (antimode) at a; the bisection algorithm searches over the possible values of a in [0,max(x)].

The bisection algorithm ends once the exit criterion function is smaller than tol. The exit criterion function may be specified via the option type:

type=1 sum((cur.phi-min(old.phi))^2)

type=2 sqrt(sum((cur.phi - old.phi)^2))

type=3 sort(cur.phi)[2]+sort(cur.phi)[3]-2*sort(cur.phi)[1]

where cur.phi is the vector of the least squares criterion function phi along different values of antimode a, and old.phi is the same but in the previous iteration.

The support reduction algorithm was first described in Groeneboom, Jongbloed and Wellner (2007). It is an iterative optimization procedure which exits when a sufficient tolerance level tol.SR is reached. The algorithm uses a gridded implementation (see Jankowski and Wellner (2008) for the details) described by the parameter M. The idea is that any convex function may be written as a mixture of elbow functions, and the grid describes the possible support of the mixing distribution. (The documentation for the function h provides a little more information; see also Jankowski and Wellner (2007, 2008).) The larger M is the more accurate, and slower, the algorithm. Setting GRIDLESS=1 will also increase the accuracy, but this often results in non-convergence of the support reduction algorithm (see 'Troubleshooting').

Value

mle list containing the support supp and mixing measure mix (further lists) for the computed MLE. The function h can translate these into a hazard function.
llh value of the (modified) log likelihood at the MLE.
antimode value of the antimode a at which the MLE of the hazard has its minimum.
conv boolean indicating if the (bisection) algorithm has converged.
iter number of iterations of the bisection algorithm.
"convexMLE.out" The algorithm also creates a file called convexMLE.out in R's current working directory. The file contains three variables: (1) negllh the value of the smallest least squares criterion function at the antimode found by the support reduction algorithm, (2) antimode the value of the antimode, and (3) conv a boolean indicating if the support reduction algorithm converged. If convexMLE.out already exists, then the algorithm will overwrite it.

Diagnostics

The file "convexMLE.out" contains information which may be used to determine how well the algorithm has worked. Plotting negllh against antimode should result in a U-shaped (first decreasing, then increasing) function, with minimum lying at the antimode of the MLE. This is only true though for the values of negllh (and antimode) where the srMLE algorithm converged (ie. conv=TRUE). Plotting the output in "convexMLE.out" can be used to check this (see 'Examples'), and to check where convergence of srMLE failed.

Troubleshooting

If the bisection algorithm fails to converge, then max.loop may be increased. Also, changing the range may used temporarily to diagnose the issue.

Alternatively, the support reduction algorithm may fail to converge. The file
"convexMLE.out" shows which of the calls of the support reduction algorithm by the bisection loop converged. If too many of these failed, then changing the values of
max.inner.loop.SR, max.outer.loop.SR, solve.tol or ini.LSE may fix this. Increasing max.inner.loop.SR or max.outer.loop.SR gives the algorithms more time to converge, though it is rare that this is the cause of the problem. Setting ini.LSE changes the starting point of the support reduction algorithm, and may even increase its speed.

At each iteration, the support reduction algorithm does a finite dimensional quadratic optimization using the function solve(). Occasionally, the matrix passed to solve() is computationally singular. Several built-in catches exist in the code to handle this, but if the problem occurs too frequently, the algorithm will abort. In fact, this is by far the most common cause of nonconvergence. The larger the value of M, the more likely the problem is, but this also depends on the data set. Also, choosing GRIDLESS=1 highly increases the probability that this type of problem will occur. Setting solve.tol to a smaller value may fix the problem.

Author(s)

Hanna Jankowski: hkj@mathstat.yorku.ca, Ivy Wang, Hugh McCague, Jon A. Wellner

References

Groeneboom, Jongbloed and Wellner (2008). The support reduction algorithm for computing nonparametric function estimates in mixture models. Scan. J. Statist. 35, 385–399.

Jankowski and Wellner (2007). Nonparametric estimation of a convex bathtub-shaped hazard function. Technical Report 521, Department of Statistics, University of Washington.

Jankowski and Wellner (2008). Computation of nonparametric convex hazard estimators via profile methods. Technical Report 542, Department of Statistics, University of Washington.

See Also

convexLSE hazard srLSE srMLE

Examples

# Generate sample data:
set.seed(1111, kind="default")
x 	<- rbeta(100, 0.5, 0.5) 

# Find the mle (this takes a few minutes)	
mle 	<- convexMLE(x, M=1000)

# And the mle has minimum at
mle$antimode

# Check the bisection algorithm
diag	<- read.table("convexMLE.out", header=TRUE)
diag

plot(diag$antimode, diag$negllh, pch=16, col="blue", main="diagnostics", ylab="phi", xlab="antimode")


# plot the true hazard function and the lse
h.true 	<- function(x) dbeta(x, 0.5, 0.5)/(1-pbeta(x, 0.5, 0.5))
tt	<- c(0,sort(mle$mle$supp$tau), sort(mle$mle$supp$eta), max(x))
yy	<- sapply(tt, h, supp=mle$mle$supp, hpar=mle$mle$mix)


plot(h.true, xlim=c(0, 0.95), lwd=2, col="blue", ylab="hazard", xlab="time", main="MLE of convex hazard")
lines(tt,yy, lwd=2, col="red")
legend("topleft", c("true","estimated"),col=c("blue", "red"), lwd=2, inset = .05)














[Package convexHaz version 0.2 Index]