Firth’s penalized likelihood for proportional hazards regressions for complex surveys
Section 4. Summary
Firth’s penalized likelihood is useful for obtaining
maximum likelihood estimates from a monotone likelihood from proportional
hazards regression models. We proposed a weight scaling method and demonstrated
that Firth’s penalized likelihood using the scaled weights have some desirable
properties for complex surveys. A simulation study shows that estimated biases
in point estimates and standard errors using the scaled weights are lower than
estimated biases using the unscaled weights. Although Firth’s penalized
likelihood produces “good” estimates in most simulated data sets, there are a
few data sets in which the Firth penalty failed to produce “good” convergences.
The Firth penalized likelihood that uses scaled weights successfully corrected
for a monotone likelihood when we estimated hazard rates for heart attacks
using a data set from the NHEFS. Although the numeric results are quite
encouraging, further research is needed to derive asymptotic distributions of
the estimators obtained by using Firth’s penalized likelihood.
We recommend the unpenalized likelihood when convergence
is not an issue, but we recommend Firth’s penalized likelihood using the scaled
weights when a monotone likelihood is encountered in fitting proportional
hazards regression models for complex surveys.
Acknowledgements
I am grateful to Ying So, Randy Tobias, and Ed
Huddleston at SAS Institute Inc. for their valuable assistance in the
preparation of this article. I would also like to thank the two anonymous
referees and the associate editor for their constructive suggestions.
Appendix 1
Consistency of the Firth penalized likelihood estimator
The estimators in Section 2 are defined as the
solution to a system of equations that are constructed by using the score
functions from proportional hazards regression models. In this appendix, we
show that under certain regularity conditions these estimators are design
consistent. Properties of estimators that are solutions to a set of estimating
equations are well studied in the survey literature. For example, see Binder (1983), Godambe and Thompson (1986),
and Fuller (2009, Section 1.3.4).
However, the estimating equations for proportional
hazards regression models are more complex than the estimating equations for
generalized linear models because the score functions involve weighted sums
over the sampled units. Binder (1992)
and Lin (2000) showed that the estimators obtained by solving the
estimating equations for proportional hazards regression models are consistent.
In this appendix, we follow arguments similar to those of Lin (2000) and Andersen and Gill (1982).
Several technical assumptions are necessary to show that
the point estimates are consistent. We need assumptions about the estimating
equations, the finite population, and the sample design – to whit:
- The functions defining the estimating equations
should be smooth and convex.
- The finite population should be such that the
moments for population quantities that are used in defining the estimating
equations exists.
- The sample design should be such that the
Narain-Horvitz-Thompson (NHT) estimators (Rao, 2005) for the population totals
are well behaved.
All these assumptions are common in the sample survey
literature; for example, see Fuller (2009). The score functions for
proportional hazards regression models involve ratios of means of exponential
functions that are infinitely differentiable.
Let
and
denote, respectively, the index set and values
for the
finite population in a sequence of populations
indexed by
and let
be a sample of size
from
To study large sample properties for
sample-based estimators, we assume sequences of population and samples such
that
and
keeping the sampling fraction,
fixed.
Assume
is an independent random sample of size
from the joint distribution of
where
is the failure time or the censoring time,
whichever is less;
if the failure time is less than the censoring
time and 0 otherwise; and
is a vector of possibly time-varying
explanatory variables.
Let
be a set of regression parameters for the
superpopulation that is defined by the joint distribution of
Let
be a set of finite population parameters
obtained by solving the estimating equations when all
units in the population are observed, and let
be an estimator of
that is obtained by solving the weighted
estimating equations by using only the sampled units. Our objective is to show
that
approaches
and that they both approach
as the sample size and population size
increase.
Consider the estimating equations that correspond to
Firth’s penalized likelihood described in Section 2. For simplicity, we
write these equations when there are no tied events. To further simplify
notation, we write each component of the estimating equations separately. The
finite population parameters,
are a solution to the penalized partial
likelihood score function,
where
where
and where
denotes the trace of a matrix;
denotes the indicator function;
and
is the number of regression parameters. Note
that
and
depend on
although the notation does not reflect this
for reasons of simplicity.
In defining the score function for the penalized
likelihood, we assume that the information matrix for the finite population,
is always positive definite.
However, in any realistic situation, not all units in
the finite population are available. Let a sample
be selected by using a probability design that
assigns a nonzero selection probability,
to every unit in the population. Let
be the design weight. A sample-based
estimator,
is obtained by solving the estimated penalized
partial likelihood score equations. Assuming that
is known, a sample-based estimator for
is
where
are the NHT estimators for
and
respectively.
Because
and
use weighted sums over sampled units, we need
techniques defined in Lin (2000) to study large sample properties of these
estimators. Define
and
Then the finite population score functions can
be written using stochastic integration,
and the sample-based score functions are
Note that the quantities
and
are simply means over finite population
quantities. Define the limits of these means as follows:
Thus the finite population score function,
converges to the superpopulation score
function
where
Now assume that the population quantities,
that are used to define the score functions
have finite moments and the sequence of sample designs is such that any smooth
functions of NHT estimators are consistent. Because
is a smooth function of population totals, and
each total is estimated by using a NHT estimator,
is design-consistent for
That is,
Therefore, by using arguments similar to Lin
(2000) and Andersen and Gill (1982), it can be shown that
and
converge to the same limit.
Because
is fixed,
is the NHT estimator for
and
is a consistent estimator (not necessarily
unbiased) of 0, both
and
converge to the same limit with the same order
of convergence. It is straightforward to show that
and
the estimating equations that use the scaled
weights, have the same expectation.
Appendix 2
SAS program to obtain the Firth penalized likelihood estimates
The SAS statements at the end of this section fit a
proportional hazards regression model using the scaled weights in Firth’s
penalized likelihood. The PROC statement invokes the procedure, and the
VARMETHOD = JK option requests the jackknife variance estimation
method. You can also specify VARMETHOD = TAYLOR, VARMETHOD = BRR,
or VARMETHOD = BOOT to request the Taylor series linearized, balanced
repeated replication, or bootstrap replication variance estimation method,
respectively. The DETAILS sub-option of the VARMETHOD = JK option
prints estimates from each replicate sample along with the convergence status.
The WEIGHT statement specifies the sampling weights, the STRATA statement
specifies the strata, and the CLUSTER statement specifies the PSUs. The MODEL
statement specifies the analysis model. The FIRTH option in the MODEL statement
requests Firth’s penalized likelihood. The two HAZARDRATIO statements requests
hazard ratios for blood cholesterol and smoking, respectively. The ODS OUTPUT
statement stores replicate estimates and convergence status from each replicate
in the SAS data set RepEstimatesFirth. This data set is useful for checking the
convergence status of every replicate sample.
Start of text box
proc surveyphreg data = NHEFS varmethod=jk (details);
class
Gender HighBloodChol Race Smoker;
weight
ObservationWeight;
strata
Stratum;
cluster
model
EventTime*HeartAttack(2) = Income HighBloodChol Smoker Race Gender Race*Gender / firth;
hazardratio
HighBloodChol;
hazardratio
Smoker;
ods output
repestimates=RepEtimatesFirth;
run;
End of text box
References
Andersen, P.K., and Gill, R.D. (1982). Cox’s regression
model counting process: A large sample study. Annals of Statistics, 10, 1100-1120.
Bender, R., Augustin, T. and Blettner, M. (2005).
Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine, 24, 1713-1723.
Binder, D.A. (1983). On the variances of asymptotically
normal estimators from complex surveys. International
Statistical Review, 51, 279-292.
Binder, D.A. (1990). Fitting Cox’s proportional hazards
models from survey data. Proceedings of
the Survey Research Methods Section, American Statistical Association, 342-347.
Binder, D.A. (1992). Fitting Cox’s proportional hazards
models from survey data. Biometrika,
79, 139-147.
Binder, D.A., and Patak, Z. (1994). Use of estimating
functions for estimation from complex surveys. Journal of the American Statistical Association, 89(427),1035-1043.
Boudreau, C., and Lawless, J.F. (2006). Survival
analysis based on the proportional hazards model and survey data. Canadian Journal of Statistics, 34,
203-216.
Breslow, N.E. (1974). Covariance analysis of censored
survival data. Biometrics, 30, 89-99.
Chambers, R.L., and Skinner, C.J. (2003). Analysis of Survey Data. Chichester, UK:
John Wiley & Sons, Inc.
Chambless, L.E., and Boyle, K.E. (1985). Maximum likelihood
methods for complex sample data: Logistic regression and discrete proportional
hazards models. Communications in
Statistics - Theory and Methods, 14, 1377-1392.
Cox, D.R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society,
Series B, (with discussion), 34, 187-220.
Cox, D.R. (1975). Partial likelihood. Biometrika, 62, 269-276.
Firth, D. (1993). Bias reduction of maximum likelihood
estimates. Biometrika, 80, 27-38.
Fuller, W.A. (1975). Regression analysis for sample
survey. Sankhyā, Series C, 37, 117-132.
Fuller, W.A. (2009). Sampling
Statistics. Hoboken, NJ: John Wiley & Sons, Inc.
Godambe, V.P., and Thompson, M.E. (1986). Parameters of
superpopulation and survey population: Their relationships and estimation. International Statistical Review, 54,
127-138.
Heinze, G. (1999). The
Application of Firth’s Procedure to Cox and Logistic Regression. Technical
Report 10, updated January 2001, Department of Medical Computer Sciences,
Section of Clinical Biometrics, University of Vienna.
Heinze, G., and Schemper, M. (2001). A solution to the
problem of monotone likelihood in Cox regression. Biometrics, 51, 114-119.
Heinzel, G., Rüdiger, A. and Schilling, R. (2002). Spectrum and Spectral Density Estimation by
the Discrete Fourier Transform (DFT), Including a Comprehensive List of Window
Functions and Some New Flat-Top Windows. Tech. rep., Max-Planck-Institut
für Gravitationsphysik (Albert-Einstein-Institut), Teilinstitut Hannover.
Kish, L., and Frankel, M.R. (1974). Inference from
complex samples. Journal of the Royal
Statistical Society, Series B, 36, 1-37.
Korn, E.L., and Graubard, B.I. (1999). Analysis of Health Surveys. New York:
John Wiley & Sons, Inc.
Lee, E.W., Wei, L.J. and Amato, D.A. (1992). Cox-type regression
analysis for large numbers of small groups of correlated failure time
observations. Survival Analysis: State of
the Art, (Eds., J.P. Klein and P.K. Goel), Dordrecht,
Netherlands: Kluwer Academic, 237-247.
Lin, D.Y. (2000). On fitting Cox’s proportional hazards
models to survey data. Biometrika,
87, 37-47.
Lin, D.Y., and Wei, L.J. (1989). The robust inference
for the proportional hazards model. Journal
of the American Statistical Association, 84, 1074-1078.
Mukhopadhyay, P.K. (2010). Not hazardous to your health:
Proportional hazards modeling for survey data with the SURVEYPHREG procedure. Proceedings of the SAS Global Forum 2010
Conference. Cary, NC: SAS Institute Inc.
http://support.sas.com/resources/papers/proceedings10/254-2010.pdf.
Mukhopadhyay, P.K., An, A.B., Tobias, R.D. and Watts,
D.L. (2008). Try, try again: Replication-based variance estimation methods for
survey data analysis in SAS 9.2. Proceedings
of the SAS Global Forum 2008 Conference. Cary, NC: SAS Institute Inc.
http://www2.sas.com/proceedings/forum2008/367-2008.pdf.
Pfeffermann, D. (1993). The role of sampling weights
when modeling survey data. International
Statistical Review, 61, 317-337.
Rao, J.N.K. (2005). Interplay between sample survey
theory and practice: An appraisal. Survey
Methodology, 31, 2, 117-138. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2005002/article/9040-eng.pdf.
SAS Institute Inc. (2018). SAS/STAT 15.1 User’s Guide. Cary, NC: SAS Institute Inc.
http://go.documentation.sas.com/?docsetId=statug&docsetTarget=titlepage.htm&docsetVersion=15.1&locale=en.
Shao, J., and Tu, D. (1995). The Jackknife and Bootstrap. New York: Springer-Verlag.
Sitter, R.R. (1992). Comparing three bootstrap methods
for survey data. Canadian Journal of
Statistics, 20, 135-154.
Wolter, K.M. (2007). Introduction
to Variance Estimation. 2nd Ed. New York: Springer.