Firth’s penalized likelihood for proportional hazards regressions for complex surveys
Section 3. Applications in complex surveys
Data from complex surveys frequently contain unequal
weights, strata, and clusters. It is recommended that the weights and other
design features be used in the analysis stage. Weighted data provide a better
representation of the study population than unweighted data. In this section,
we compare the scaled and unscaled weights to estimate proportional hazards
regression coefficients through a simulation study and apply the Firth
penalized likelihood using the scaled weights to estimate survival times from a
data set from NHEFS.
3.1 A simulation study
We performed a small simulation study to compare biases
in parameter estimates and standard errors for scaled and unscaled weights
using Firth’s penalized likelihood. We used two sampling methods to select
samples from a fixed finite population: a simple random sample without
replacement (SRS) in which each observation unit gets an equal weight; and a
probability proportional to size (PPS) without replacement sample in which the
sampling weight for an observation unit depends on the value of a size measure
associated with the hazard function for the unit. For the purpose of finite
population inference, we treat the estimated proportional hazards regression
parameters in the finite population as the “true” parameter values. Biases are
measured from these true values.
Finite populations of size 10,000 are generated as
follows:
-
-
-
-
-
-
where
is the hazard function,
is the survival time,
is a censoring indicator, and
is a size measure for each unit. Six finite
populations are generated by using different censoring values
0.1, 0.3, 0.5, 0.7, 0.8, 0.9). See Bender, Augustin and Blettner (2005) for
methods of generating survival times. Ten regressors
are generated using Bernoulli distributions to
create monotone likelihoods, especially when the sample size is small and the
censoring rate is high.
Samples are selected from each finite population by
using two sampling methods: simple random samples without replacement; and
probability proportional to size samples without replacement, where the
variable
is used as the size measure. Four sample sizes
are used for each sampling method: 50, 100, 500, and 1,000. Sampling weights
for all units for SRS depend only on the sample size, but the sampling weight
for a unit for PPS depends on both the sample size and the observed value of
the variable
for that corresponding unit. To ensure the
same distribution of the censored observations in the sampled data as in the
population, samples are selected independently from censored and uncensored
units in the population.
Finally, the regression parameters from the proportional
hazards regression model
are estimated from each sampled data set, where
is the baseline hazard,
is the survival time, and
is the censored indicator. The regression
parameters are estimated by maximizing the weighted Firth penalized likelihood.
Note that the unpenalized likelihood does not converge in most cases because of
the monotonicity of the likelihood in the simulated data. When the likelihood
is not monotone, we found that the point estimates obtained by using the
penalized likelihood are very close to the point estimates obtained by using
the unpenalized likelihood. Heinze and
Schemper (2001) reported similar findings for unweighted data.
We compare relative biases in point estimates and
standard errors using the jackknife method for scaled and unscaled weights. The
relative biases (RBs) are defined below (Sitter, 1992).
Let
be the point estimate and
be the variance estimate for one component of
from data set
Define the following:
Relative bias for point estimates,
Relative bias for variance estimates,
where the true MSE is
and
is the “true” parameter value obtained by
fitting the proportional hazards regression model using all units in the finite
population. The ratio of RBs is defined as the ratio of the RB using the
unscaled weights to the RB using the scaled weights.
The median of ratios of RBs over 5,000 repetitions is
displayed in this section. We report the median because there are some “bad”
samples in which convergences are questionable even with the Firth correction.
These “bad” samples produce few estimates with very large biases. Because of
these large biases, the mean of the ratio of RBs is a more unstable statistic
than the median. Without the “bad” replicates, the mean and medians are very
close. We also noticed that the penalized log likelihood using the unscaled
weights produces more of these “false” convergences.
Results for all regressors
are similar. For simplicity, we display
results for only two regressors,
and
Ratios of RBs in parameter estimates for unscaled and
scaled weights for the variables
and
are displayed in Figures 3.1, 3.2, 3.3,
and 3.4. For small sample sizes and a large number of censored observations,
RBs using scaled weights are much smaller than RBs using unscaled weights. For
large sample sizes, RBs from both weights are similar primarily because the
Firth option is not necessary, since the convergence is not an issue with large
data sets.

Description for Figure 3.1
Graph presenting the ratio of
relative biases in parameter estimates for SRS samples for Z3. The ratio of RB
is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is greater than 1.

Description for Figure 3.2
Graph presenting the ratio of
relative biases in parameter estimates for SRS samples for Z8. The ratio of RB
is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is greater than 1.

Description for Figure 3.3
Graph presenting the ratio of
relative biases in parameter estimates for PPS samples for Z3. The ratio of RB
is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is greater than 1.

Description for Figure 3.4
Graph presenting the ratio of relative
biases in parameter estimates for PPS samples for Z8. The ratio of RB is on the
y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are on the
x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with
different symbols on the graph. For large sample sizes, RB ratios are around
one. For small sample sizes and a large number of censored observations, the
ratio is greater than 1.
Ratios of RBs in standard errors for unscaled and scaled
weights for the variables
and
are displayed in Figures 3.5, 3.6, 3.7, and
3.8. RBs for standard errors follow the same trend as RBs for point estimates.
However, RBs for standard errors are higher than RBs for point estimates. For
small sample sizes and a large number of censored observations, RBs using
scaled weights are much smaller than RBs using unscaled weights. For large
sample sizes, RBs from both scaled and unscaled weights are similar primarily
because the Firth option is not necessary, since the convergence is not an
issue with large data sets.

Description for Figure 3.5
Graph presenting the ratio of
relative biases in standard errors for SRS samples for Z3. The ratio of RB is
on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is way greater than 1.

Description for Figure 3.6
Graph presenting the ratio of
relative biases in standard errors for SRS samples for Z8. The ratio of RB is
on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is way greater than 1.

Description for Figure 3.7
Graph presenting the ratio of
relative biases in standard errors for PPS samples for Z3. The ratio of RB is
on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is way greater than 1.

Description for Figure 3.8
Graph presenting the ratio of
relative biases in standard errors for PPS samples for Z8. The ratio of RB is
on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are
on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented
with different symbols on the graph. For large sample sizes, RB ratios are
around one. For small sample sizes and a large number of censored observations,
the ratio is way greater than 1.
Table 3.1 displays the first quartile, median, and
third quartile for ratio of RBs in point estimates and standard errors for sample
size 50. Results for variable
for 10% and 90% censoring are reported in the
table. We observed that for all variables, the first and third quartiles for
ratio of RBs do not contain 1 when the sample size is small and the percentage
of censoring is high. However, as expected, for large samples and a small
number of censored observations, the difference in RBs between the scaled and
unscaled weights is small.
Table 3.1
Ratio of RBs in point estimates and standard errors for sample size 50 (variable )
Table summary
This table displays the results of Ratio of RBs in point estimates and standard errors for sample size 50 (variable
). The information is grouped by Design (appearing as row headers), Ratio of RBs in Point Estimates, Ratio of RBs in Standard Errors, 90% Censored and 10% Censored (appearing as column headers).
| Design |
Ratio of RBs in Point Estimates |
Ratio of RBs in Standard Errors |
| 90% Censored |
10% Censored |
90% Censored |
10% Censored |
| First Quartile |
Median |
Third Quartile |
First Quartile |
Median |
Third Quartile |
First Quartile |
Median |
Third Quartile |
First Quartile |
Median |
Third Quartile |
| SRS |
1.81 |
4.38 |
7.37 |
1.00 |
1.03 |
1.06 |
9.03 |
17.26 |
40.87 |
1.03 |
1.15 |
1.33 |
| PPS |
3.36 |
5.73 |
11.54 |
0.99 |
1.03 |
1.08 |
5.57 |
12.13 |
29.92 |
1.00 |
1.15 |
1.33 |
3.2 An application using NHEFS
We studied the time to get a heart attack and its
relation to blood cholesterol and smoking using a data set from NHEFS.
The NHEFS is a national longitudinal survey in the
United States that is used to determine the relationships between clinical, nutritional,
and behavioral factors; to determine hospital utilizations; and to monitor
changes in risk factors for an initial cohort that represents the NHANES I
population. A cohort of size 14,407 was selected for the NHEFS. Vital and
tracing status data, interview data, health care facility stay data, and
mortality data from 1987 are available for public use. For more information
about the survey and the data sets used in this section, see the Centers for
Disease Control and Prevention’s website (https://www.cdc.gov/).
We used 4,673 observations from 1987 NHEFS public-use
interview data to study the occurrence of first heart attack for the 1987
survey population and its relation to blood cholesterol and smoking. The
following variables are used:
- Stratum, the stratum identification.
- ObservationWeight, the sampling weight
associated with each observation unit.
- PSU, the primary sampling unit identification.
- Age, the event-time variable, defined as
follows:
- age
of the subject when the first heart attack was reported for subjects who reported
a heart attack,
- age
of the subject as reported in the interview for subjects who never reported a
heart attack.
- HeartAttack, the heart attack indicator (1 = heart
attack reported).
- Income, household income standardized to mean
zero.
- HighBloodChol, the indicator that a subject has
a high or low blood cholesterol level.
- Smoker, subject’s smoking habit (1 = current,
2 = former,
1 = non-smoker).
- Race, the race of the subject (1 = black,
2 = white, 3 = other).
- Gender, the gender of the subject.
The SURVEYPHREG procedure in SAS/STAT (Mukhopadhyay, 2010) is used to fit a
proportional hazards regression model for age on income, blood cholesterol,
smoking habit, race, gender, and race and gender interaction. Heart attack is
used as the censored indicator. Observation weights range from 1,164 to 121,040
with mean 16,036.51, median 12,321, and coefficient of variation 74.35. The
subjects are divided into 644 clusters and 35 strata.
PROC SURVEYPHREG is used in this section instead of PROC
PHREG because the NHEFS uses complex survey design involving stratification,
clustering, and unequal weights. PROC SURVEYPHREG supports STRATA, CLUSTER, and
WEIGHT statements to account for stratification, clustering, and unequal
weights, respectively. In addition, PROC SURVEYPHREG supports both Taylor
series linearization and jackknife variance estimation methods for survey data (Mukhopadhyay, 2010). We used the
jackknife variance estimation method for this study. SAS statements to fit this
model are given in Appendix 2.
The 4,673 subjects in the sample represent almost 74.9
million individuals in the 1987 study population. Among all the subjects, 213
subjects reported at least one heart attack, and the other 4,460 subjects are
considered to be censored. The 213 event observations in the sample represent
an estimated 3.2 million population units, and the 4,460 censored observations
in the sample represent an estimated 71.7 million population units. There are
95.44% observations in the sample that have not reported a heart attack which
estimates 95.68% individuals in the population (Table 3.2) without a heart
attack.
Table 3.2
Number of censored and uncensored observations and their sum of weights
Table summary
This table displays the results of Number of censored and uncensored observations and their sum of weights. The information is grouped by (appearing as row headers), Total, Event, Censored and Percent Censored (appearing as column headers).
|
Total |
Event |
Censored |
Percent Censored |
| Number of Observations |
4.673 |
213 |
4,460 |
95.44 |
| Sum of Weights |
74,938,614 |
3,239,653 |
71,698,961 |
95.68 |
Without the Firth penalty, the Newton-Raphson
optimization converges by satisfying the relative gradient convergence
criterion (GCONV = 1E
8), but
coefficients for the variables Smoker and Race do not converge. The
coefficients for Smoker = 2 are 7.47, 10.87, and 11.83; and the
coefficients for Race = 1 are 7.55, 10.95, and 11.17 in the last
three iterations, respectively. This phenomenon is very common when you have a
monotone likelihood (see Table 1.1). Among 644 replicate samples (= 644 PSUs),
monotone likelihood is observed in 542 replicates. Firth’s penalized likelihood
is a good alternative when you encounter monotone likelihoods.
We use the FIRTH option in PROC SURVEYPHREG (see “The
SURVEYPHREG Procedure” in SAS Institute Inc. (2018)) to maximize Firth’s
penalized likelihood. The FIRTH option in PROC SURVEYPHREG uses the scaled
weights. The penalized likelihood optimization converges with GCONV = 1E
8, along
with reasonable convergence in all coefficients. Convergence is also achieved
in all 644 replicate samples with the Firth penalty.
Table 3.3 displays the estimated hazards ratios
along with their 95% Wald confidence intervals for blood cholesterol levels and
smoking. In the 1987 study population, the estimated hazard of having a heart
attack for a subject with low blood cholesterol is 0.6 times the estimated
hazard of having a heart attack for a subject with high blood cholesterol.
Because the 95% confidence interval does not contain 1, it is reasonable to
conclude that the hazard of having a heart attack for a subject with low blood
cholesterol is significantly lower than the hazard of having a heart attack for
a subject with high blood cholesterol after adjusting for smoking, race, and
other regressors in the 1987 study population.
The estimated hazard ratios for nonsmokers, current
smokers, and former smokers are 0.59, 0.64, and 1.1, respectively. The
estimated hazard for nonsmokers to have a heart attack is lower than the
estimated hazard for current or former smokers. However, we do not have enough
evidence to conclude that hazard ratios for smoking are significantly different
at the 95% level after adjusting for blood cholesterol, race, and other
regressors in the 1987 study population.
Table 3.3
Hazard ratios for blood cholesterol and smoking, and their 95% Wald confidence intervals
Table summary
This table displays the results of Hazard ratios for blood cholesterol and smoking. The information is grouped by (appearing as row headers), Point Estimate and Confidence Limit (appearing as column headers).
|
Point Estimate |
Confidence Limit |
| Lower |
Upper |
| HighBloodChol 0 vs 1 |
0.643 |
0.469 |
0.882 |
| Smoker -1 vs 1 |
0.590 |
0.259 |
1.345 |
| Smoker -1 vs 2 |
0.641 |
0.361 |
1.140 |
| Smoker 1 vs 2 |
1.087 |
0.359 |
3.290 |