Firth’s penalized likelihood for proportional hazards regressions for complex surveys
Section 3. Applications in complex surveys

Data from complex surveys frequently contain unequal weights, strata, and clusters. It is recommended that the weights and other design features be used in the analysis stage. Weighted data provide a better representation of the study population than unweighted data. In this section, we compare the scaled and unscaled weights to estimate proportional hazards regression coefficients through a simulation study and apply the Firth penalized likelihood using the scaled weights to estimate survival times from a data set from NHEFS.

3.1  A simulation study

We performed a small simulation study to compare biases in parameter estimates and standard errors for scaled and unscaled weights using Firth’s penalized likelihood. We used two sampling methods to select samples from a fixed finite population: a simple random sample without replacement (SRS) in which each observation unit gets an equal weight; and a probability proportional to size (PPS) without replacement sample in which the sampling weight for an observation unit depends on the value of a size measure associated with the hazard function for the unit. For the purpose of finite population inference, we treat the estimated proportional hazards regression parameters in the finite population as the “true” parameter values. Biases are measured from these true values.

Finite populations of size 10,000 are generated as follows: 

where h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGObaaaa@3209@ is the hazard function, t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWG0baaaa@3215@ is the survival time, c MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGJbaaaa@3204@ is a censoring indicator, and m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbaaaa@320E@ is a size measure for each unit. Six finite populations are generated by using different censoring values ( ν = MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaGGOaGaeqyVd4MaaGjbVlaai2daaa a@35D4@ 0.1, 0.3, 0.5, 0.7, 0.8, 0.9). See Bender, Augustin and Blettner (2005) for methods of generating survival times. Ten regressors ( Z 1 , Z 2 , , Z 10 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaadaqadeqaaiaabQfadaWgaaWcbaGaaG ymaaqabaGccaaISaGaaGjbVlaabQfadaWgaaWcbaGaaGOmaaqabaGc caaISaGaaGjbVlablAciljaaiYcacaaMe8UaaeOwamaaBaaaleaaca aIXaGaaGimaaqabaaakiaawIcacaGLPaaaaaa@40B6@ are generated using Bernoulli distributions to create monotone likelihoods, especially when the sample size is small and the censoring rate is high.

Samples are selected from each finite population by using two sampling methods: simple random samples without replacement; and probability proportional to size samples without replacement, where the variable m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbaaaa@320E@ is used as the size measure. Four sample sizes are used for each sampling method: 50, 100, 500, and 1,000. Sampling weights for all units for SRS depend only on the sample size, but the sampling weight for a unit for PPS depends on both the sample size and the observed value of the variable m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGTbaaaa@320E@ for that corresponding unit. To ensure the same distribution of the censored observations in the sampled data as in the population, samples are selected independently from censored and uncensored units in the population.

Finally, the regression parameters from the proportional hazards regression model

λ ( t , Z ) = λ 0 ( t ) exp ( β 1 Z 1 + β 2 Z 2 + + β 1 0 Z 1 0 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH7oaBdaqadeqaaiaadshacaaISa GaaGjbVlaahQfaaiaawIcacaGLPaaacaaMe8UaaGypaiaaysW7cqaH 7oaBdaWgaaWcbaGaaCimaaqabaGcdaqadeqaaiaayIW7caWH0bGaaG jcVdGaayjkaiaawMcaaiGacwgacaGG4bGaaiiCamaabmqabaGaeqOS di2aaSbaaSqaaiaahgdaaeqaaOGaaCOwamaaBaaaleaacaWHXaaabe aakiaaysW7cqGHRaWkcaaMe8UaeqOSdi2aaSbaaSqaaiaahkdaaeqa aOGaaCOwamaaBaaaleaacaWHYaaabeaakiaaysW7cqGHRaWkcaaMe8 UaeS47IWKaaGjbVlabgUcaRiaaysW7cqaHYoGydaWgaaWcbaGaaCym aiaahcdaaeqaaOGaaCOwamaaBaaaleaacaWHXaGaaCimaaqabaaaki aawIcacaGLPaaaaaa@658C@

are estimated from each sampled data set, where λ 0 ( t ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaH7oaBdaWgaaWcbaGaaGimaaqaba GcdaqadeqaaiaayIW7caWG0bGaaGjcVdGaayjkaiaawMcaaaaa@3965@ is the baseline hazard, t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWG0baaaa@3215@ is the survival time, and c MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGJbaaaa@3204@ is the censored indicator. The regression parameters are estimated by maximizing the weighted Firth penalized likelihood. Note that the unpenalized likelihood does not converge in most cases because of the monotonicity of the likelihood in the simulated data. When the likelihood is not monotone, we found that the point estimates obtained by using the penalized likelihood are very close to the point estimates obtained by using the unpenalized likelihood. Heinze and Schemper (2001) reported similar findings for unweighted data.

We compare relative biases in point estimates and standard errors using the jackknife method for scaled and unscaled weights. The relative biases (RBs) are defined below (Sitter, 1992).

Let β ^ s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacuaHYoGygaqcamaaBaaaleaacaWGZb aabeaaaaa@33F1@ be the point estimate and v ^ s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaaceWG2bGbaKaadaWgaaWcbaGaam4Caa qabaaaaa@334B@ be the variance estimate for one component of β MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWHYoaaaa@325A@ from data set s . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGZbGaaiOlaaaa@32C6@ Define the following:

Relative bias for point estimates, β ^ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacuaHYoGygaqcaiaacYcaaaa@337D@

RB ( β ^ ) = S 1 s = 1 S | ( β ^ s β T ) | | β T | . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaaMi8UaaeOuaiaabkeacaaMc8UaaG jcVpaabmqabaGafqOSdiMbaKaaaiaawIcacaGLPaaacaaMe8UaaGyp aiaaysW7caWGtbWaaWbaaSqabeaacqGHsislcaaIXaaaaOWaaabCae qaleaacaWGZbGaaGypaiaaigdaaeaacaWGtbaaniabggHiLdGccaaM c8+aaSaaaeaadaabdeqaaiaayIW7daqadeqaaiqbek7aIzaajaWaaS baaSqaaiaadohaaeqaaOGaaGjbVlabgkHiTiaaysW7cqaHYoGydaWg aaWcbaGaamivaaqabaaakiaawIcacaGLPaaacaaMi8oacaGLhWUaay jcSdaabaWaaqWabeaacaaMi8UaeqOSdi2aaSbaaSqaaiaadsfaaeqa aOGaaGjcVdGaay5bSlaawIa7aaaacaGGUaaaaa@637B@

Relative bias for variance estimates, v ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaaceWG2bGbaKaaaaa@3227@

RB ( v ^ ) = S 1 s = 1 S | ( v ^ s MSE T ) | MSE T MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaaMi8UaaeOuaiaabkeacaaMc8UaaG jcVpaabmqabaGaaGjcVlqadAhagaqcaiaayIW7aiaawIcacaGLPaaa caaMe8UaaGypaiaaysW7caWGtbWaaWbaaSqabeaacqGHsislcaaIXa aaaOWaaabCaeqaleaacaWGZbGaaGypaiaaigdaaeaacaWGtbaaniab ggHiLdGccaaMc8+aaSaaaeaadaabdeqaaiaayIW7daqadeqaaiqadA hagaqcamaaBaaaleaacaWGZbaabeaakiaaysW7cqGHsislcaaMe8Ua aeytaiaabofacaqGfbWaaSbaaSqaaiaadsfaaeqaaaGccaGLOaGaay zkaaGaaGjcVdGaay5bSlaawIa7aaqaaiaab2eacaqGtbGaaeyramaa BaaaleaacaWGubaabeaaaaaaaa@5FEA@

where the true MSE is

MSE T ( β ^ ) = S 1 s ( β ^ s β T ) 2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaaMi8UaaeytaiaabofacaqGfbWaaS baaSqaaiaadsfaaeqaaOWaaeWabeaacuaHYoGygaqcaaGaayjkaiaa wMcaaiaaysW7caaI9aGaaGjbVlaadofadaahaaWcbeqaaiabgkHiTi aaigdaaaGcdaaeqbqabSqaaiaadohaaeqaniabggHiLdGccaaMc8+a aeWabeaacuaHYoGygaqcamaaBaaaleaacaWGZbaabeaakiaaysW7cq GHsislcaaMe8UaeqOSdi2aaSbaaSqaaiaadsfaaeqaaaGccaGLOaGa ayzkaaWaaWbaaSqabeaacaaIYaaaaaaa@50B6@

and β T MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacqaHYoGydaWgaaWcbaGaamivaaqaba aaaa@33C2@ is the “true” parameter value obtained by fitting the proportional hazards regression model using all units in the finite population. The ratio of RBs is defined as the ratio of the RB using the unscaled weights to the RB using the scaled weights.

The median of ratios of RBs over 5,000 repetitions is displayed in this section. We report the median because there are some “bad” samples in which convergences are questionable even with the Firth correction. These “bad” samples produce few estimates with very large biases. Because of these large biases, the mean of the ratio of RBs is a more unstable statistic than the median. Without the “bad” replicates, the mean and medians are very close. We also noticed that the penalized log likelihood using the unscaled weights produces more of these “false” convergences.

Results for all regressors Z 1 , Z 2 , , Z 10 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaqGAbWaaSbaaSqaaiaaigdaaeqaaO GaaGilaiaaysW7caqGAbWaaSbaaSqaaiaaikdaaeqaaOGaaGilaiaa ysW7cqWIMaYscaaISaGaaGjbVlaabQfadaWgaaWcbaGaaGymaiaaic daaeqaaaaa@3F22@ are similar. For simplicity, we display results for only two regressors, Z 3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaqGAbWaaSbaaSqaaiaaiodaaeqaaa aa@32E2@ and Z 8 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaqGAbWaaSbaaSqaaiaaiIdaaeqaaO GaaiOlaaaa@33A3@

Ratios of RBs in parameter estimates for unscaled and scaled weights for the variables Z 3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaqGAbWaaSbaaSqaaiaaiodaaeqaaa aa@32E2@ and Z 8 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaqGAbWaaSbaaSqaaiaaiIdaaeqaaa aa@32E7@ are displayed in Figures 3.1, 3.2, 3.3, and 3.4. For small sample sizes and a large number of censored observations, RBs using scaled weights are much smaller than RBs using unscaled weights. For large sample sizes, RBs from both weights are similar primarily because the Firth option is not necessary, since the convergence is not an issue with large data sets.

Figure 3.1 Ratio of relative biases in parameter estimates for SRS samples for Z3

Description for Figure 3.1

Graph presenting the ratio of relative biases in parameter estimates for SRS samples for Z3. The ratio of RB is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is greater than 1.

Figure 3.2 Ratio of relative biases in parameter estimates for SRS samples for Z8

Description for Figure 3.2

Graph presenting the ratio of relative biases in parameter estimates for SRS samples for Z8. The ratio of RB is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is greater than 1.

Figure 3.3 Ratio of relative biases in parameter estimates for PPS samples for Z3

Description for Figure 3.3

Graph presenting the ratio of relative biases in parameter estimates for PPS samples for Z3. The ratio of RB is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is greater than 1.

Figure 3.4 Ratio of relative biases in parameter estimates for PPS samples for Z8

Description for Figure 3.4

Graph presenting the ratio of relative biases in parameter estimates for PPS samples for Z8. The ratio of RB is on the y-axis, ranging from 0 to 6. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is greater than 1.

Ratios of RBs in standard errors for unscaled and scaled weights for the variables Z 3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGAbWaaSbaaSqaaiaaiodaaeqaaa aa@32E4@ and Z 8 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGAbWaaSbaaSqaaiaaiIdaaeqaaa aa@32E9@ are displayed in Figures 3.5, 3.6, 3.7, and 3.8. RBs for standard errors follow the same trend as RBs for point estimates. However, RBs for standard errors are higher than RBs for point estimates. For small sample sizes and a large number of censored observations, RBs using scaled weights are much smaller than RBs using unscaled weights. For large sample sizes, RBs from both scaled and unscaled weights are similar primarily because the Firth option is not necessary, since the convergence is not an issue with large data sets.

Figure 3.5 Ratio of relative biases in standard errors for SRS samples for Z3

Description for Figure 3.5

Graph presenting the ratio of relative biases in standard errors for SRS samples for Z3. The ratio of RB is on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is way greater than 1.

Figure 3.6 Ratio of relative biases in standard errors for SRS samples for Z8

Description for Figure 3.6

Graph presenting the ratio of relative biases in standard errors for SRS samples for Z8. The ratio of RB is on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is way greater than 1.

Figure 3.7 Ratio of relative biases in standard errors for PPS samples for Z3

Description for Figure 3.7

Graph presenting the ratio of relative biases in standard errors for PPS samples for Z3. The ratio of RB is on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is way greater than 1.

Figure 3.8 Ratio of relative biases in standard errors for PPS samples for Z8

Description for Figure 3.8

Graph presenting the ratio of relative biases in standard errors for PPS samples for Z8. The ratio of RB is on the y-axis, ranging from 0 to 15. The sample sizes 50, 100, 500, 1,000 are on the x-axis. The censoring values (0.1, 0.3, 0.5, 0.7, 0.9) are represented with different symbols on the graph. For large sample sizes, RB ratios are around one. For small sample sizes and a large number of censored observations, the ratio is way greater than 1.

Table 3.1 displays the first quartile, median, and third quartile for ratio of RBs in point estimates and standard errors for sample size 50. Results for variable Z 3 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaGaaeaadaaakeaacaWGAbWaaSbaaSqaaiaaiodaaeqaaa aa@32E4@ for 10% and 90% censoring are reported in the table. We observed that for all variables, the first and third quartiles for ratio of RBs do not contain 1 when the sample size is small and the percentage of censoring is high. However, as expected, for large samples and a small number of censored observations, the difference in RBs between the scaled and unscaled weights is small.


Table 3.1
Ratio of RBs in point estimates and standard errors for sample size 50 (variable Z 3 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8qrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqaciWa ceGabeqabeGabeqadeaakeaacaqGAbWaaSbaaSqaaiaaiodaaeqaaO Gaaiykaaaa@3391@ )
Table summary
This table displays the results of Ratio of RBs in point estimates and standard errors for sample size 50 (variable Z 3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8qrps0l bbf9q8WrFfeuY=Hhbbf9y8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqaciWa ceGabeqabeGabeqadeaakeaacaqGAbWaaSbaaSqaaiaaiodaaeqaaO Gaaiykaaaa@3391@ ). The information is grouped by Design (appearing as row headers), Ratio of RBs in Point Estimates, Ratio of RBs in Standard Errors, 90% Censored and 10% Censored (appearing as column headers).
Design Ratio of RBs in Point Estimates Ratio of RBs in Standard Errors
90% Censored 10% Censored 90% Censored 10% Censored
First Quartile Median Third Quartile First Quartile Median Third Quartile First Quartile Median Third Quartile First Quartile Median Third Quartile
SRS 1.81 4.38 7.37 1.00 1.03 1.06 9.03 17.26 40.87 1.03 1.15 1.33
PPS 3.36 5.73 11.54 0.99 1.03 1.08 5.57 12.13 29.92 1.00 1.15 1.33

3.2  An application using NHEFS

We studied the time to get a heart attack and its relation to blood cholesterol and smoking using a data set from NHEFS.

The NHEFS is a national longitudinal survey in the United States that is used to determine the relationships between clinical, nutritional, and behavioral factors; to determine hospital utilizations; and to monitor changes in risk factors for an initial cohort that represents the NHANES I population. A cohort of size 14,407 was selected for the NHEFS. Vital and tracing status data, interview data, health care facility stay data, and mortality data from 1987 are available for public use. For more information about the survey and the data sets used in this section, see the Centers for Disease Control and Prevention’s website (https://www.cdc.gov/).

We used 4,673 observations from 1987 NHEFS public-use interview data to study the occurrence of first heart attack for the 1987 survey population and its relation to blood cholesterol and smoking. The following variables are used: 

The SURVEYPHREG procedure in SAS/STAT (Mukhopadhyay, 2010) is used to fit a proportional hazards regression model for age on income, blood cholesterol, smoking habit, race, gender, and race and gender interaction. Heart attack is used as the censored indicator. Observation weights range from 1,164 to 121,040 with mean 16,036.51, median 12,321, and coefficient of variation 74.35. The subjects are divided into 644 clusters and 35 strata.

PROC SURVEYPHREG is used in this section instead of PROC PHREG because the NHEFS uses complex survey design involving stratification, clustering, and unequal weights. PROC SURVEYPHREG supports STRATA, CLUSTER, and WEIGHT statements to account for stratification, clustering, and unequal weights, respectively. In addition, PROC SURVEYPHREG supports both Taylor series linearization and jackknife variance estimation methods for survey data (Mukhopadhyay, 2010). We used the jackknife variance estimation method for this study. SAS statements to fit this model are given in Appendix 2.

The 4,673 subjects in the sample represent almost 74.9 million individuals in the 1987 study population. Among all the subjects, 213 subjects reported at least one heart attack, and the other 4,460 subjects are considered to be censored. The 213 event observations in the sample represent an estimated 3.2 million population units, and the 4,460 censored observations in the sample represent an estimated 71.7 million population units. There are 95.44% observations in the sample that have not reported a heart attack which estimates 95.68% individuals in the population (Table 3.2) without a heart attack.


Table 3.2
Number of censored and uncensored observations and their sum of weights
Table summary
This table displays the results of Number of censored and uncensored observations and their sum of weights. The information is grouped by (appearing as row headers), Total, Event, Censored and Percent Censored (appearing as column headers).
Total Event Censored Percent Censored
Number of Observations 4.673 213 4,460 95.44
Sum of Weights 74,938,614 3,239,653 71,698,961 95.68

Without the Firth penalty, the Newton-Raphson optimization converges by satisfying the relative gradient convergence criterion (GCONV = 1E MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@37A3@ 8), but coefficients for the variables Smoker and Race do not converge. The coefficients for Smoker = 2 are 7.47, 10.87, and 11.83; and the coefficients for Race = 1 are 7.55, 10.95, and 11.17 in the last three iterations, respectively. This phenomenon is very common when you have a monotone likelihood (see Table 1.1). Among 644 replicate samples (= 644 PSUs), monotone likelihood is observed in 542 replicates. Firth’s penalized likelihood is a good alternative when you encounter monotone likelihoods.

We use the FIRTH option in PROC SURVEYPHREG (see “The SURVEYPHREG Procedure” in SAS Institute Inc. (2018)) to maximize Firth’s penalized likelihood. The FIRTH option in PROC SURVEYPHREG uses the scaled weights. The penalized likelihood optimization converges with GCONV = 1E MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@37A3@ 8, along with reasonable convergence in all coefficients. Convergence is also achieved in all 644 replicate samples with the Firth penalty.

Table 3.3 displays the estimated hazards ratios along with their 95% Wald confidence intervals for blood cholesterol levels and smoking. In the 1987 study population, the estimated hazard of having a heart attack for a subject with low blood cholesterol is 0.6 times the estimated hazard of having a heart attack for a subject with high blood cholesterol. Because the 95% confidence interval does not contain 1, it is reasonable to conclude that the hazard of having a heart attack for a subject with low blood cholesterol is significantly lower than the hazard of having a heart attack for a subject with high blood cholesterol after adjusting for smoking, race, and other regressors in the 1987 study population.

The estimated hazard ratios for nonsmokers, current smokers, and former smokers are 0.59, 0.64, and 1.1, respectively. The estimated hazard for nonsmokers to have a heart attack is lower than the estimated hazard for current or former smokers. However, we do not have enough evidence to conclude that hazard ratios for smoking are significantly different at the 95% level after adjusting for blood cholesterol, race, and other regressors in the 1987 study population.


Table 3.3
Hazard ratios for blood cholesterol and smoking, and their 95% Wald confidence intervals
Table summary
This table displays the results of Hazard ratios for blood cholesterol and smoking. The information is grouped by (appearing as row headers), Point Estimate and Confidence Limit (appearing as column headers).
Point Estimate Confidence Limit
Lower Upper
HighBloodChol 0 vs 1 0.643 0.469 0.882
Smoker -1 vs 1 0.590 0.259 1.345
Smoker -1 vs 2 0.641 0.361 1.140
Smoker 1 vs 2 1.087 0.359 3.290

Date modified: