Weighted censored quantile regression

Section 1. Introduction

Table of contents

In quantile regression (Koenker, 2005), the conditional quantiles of the response variable for a given set of predictor variables are modelled. The regression parameters are estimated by minimizing a check loss function at a specific quantile, $τ,$ instead of the square loss function as in the standard linear regression. A quantile regression model based on properly selected quantiles could provide a global assessment of the covariate effects on the response, which is often ignored by the standard linear regression model. Recently, censored quantile regression has been studied extensively. Powell (1984) introduced the least absolute deviation (LAD) estimator, also called the median regression model for the left censored survival data, using the censored Tobit model (Tobin, 1958). Powell (1986) generalized the LAD estimation to any quantile.

Portnoy (2003) introduced a censored quantile regression model under random censoring as a generalization of the Kaplan-Meier estimator recursively using the Kaplan-Meier estimator (Kaplan and Meier, 1958). Peng and Huang (2008) developed a censored quantile regression model based on the Nelson-Aalen estimator using counting processes and martingale theory. In survival analysis setup, for the $i^{th}$ $(i = 1, 2, \dots, n)$ subject, let $T_{i}$ be the logarithm of the failure time, $C_{i}$ the logarithm of right censoring time, $X_{i}$ the $p$ -vector covariate and let $Y_{i} = \min (T_{i}, C_{i})$ be the logarithm of the survival time. For a given quantile, $τ,$ the regression coefficients, $β (τ),$ can be estimated as

$\hat{β} (τ) = \underset{β \in ℜ^{p}}{\arg \min} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \min {C_{i}, X_{i}^{⊤} β}), (1.1)$

where $ρ_{τ} (u) = u [τ - I (u < 0)],$ is the check loss function.

In many studies, we may have some information about the target population from previous studies. This is common in survey sampling since surveys are carried out repeatedly with similar objectives. For example, in survey sampling, information about the population mean and variance could be available from previous surveys or records. The information of the parameters as well as type of relationship, distributional assumptions, etc. also could be considered as auxiliary information available for analysis. The auxiliary information could be effectively used to improve the efficiency of the statistical inference (Kuk and Mak, 1989; Rao, Kovar and Mantel, 1990; Chen and Qin, 1993). The idea used in this paper can be easily extendable in survey sampling to arrive efficient parameter estimates by making use of the information available from previous surveys.

Consider a known relationship between the survival time, $Y$ (or the failure time, $T)$ and a set of covariates $X,$ as $Y = f (X; θ),$ where $θ$ is the parameter of interest. The knowledge about this relationship can be treated as auxiliary information. In a more general case, the auxiliary information can be expressed as $E {g (Z; θ)} = 0$ for some $d$ -dimensional parameter, $θ \in R^{d},$ where $Z$ is the observed data from the present study and $g (Z; θ) \in R^{q},$ some function with $q \geq d .$ The parameter, $θ$ could be unknown, but can be estimated using the information available from previous studies.

Chen and Qin (1993) introduced the use of auxiliary information to improve the efficiency of estimators in the context of survey sampling using empirical likelihood (Owen, 1988, 2001). Li and Wang (2003) accommodated the auxiliary information to the censored linear regression model using empirical likelihood by defining a synthetic variable (Koul, Susarla and Ryzin, 1981). Fang, Li, Lu and Qin (2013) proposed the effective use of auxiliary information in the linear regression model with right censored data using empirical likelihood, by utilizing the Buckley-James (Buckley and James, 1979) estimating equation. Tang and Leng (2012) introduced an empirical likelihood based linear quantile regression model using auxiliary information.

In this paper, we propose an empirical likelihood (EL) based approach to accommodate auxiliary information to the censored quantile regression. EL is a non-parametric likelihood approach proposed by Owen (1988, 2001), which has similar properties of parametric likelihood. We utilize the EL based data driven probabilities as the weights by using the estimating function, $g (Z; θ)$ and incorporate those weights into the censored quantile regression model. The resulted weighted censored quantile regression parameter $β (τ)$ can be estimated as

$\hat{β} (τ) = \underset{β \in ℜ^{p}}{\arg \min} \sum_{i = 1}^{n} ω_{i} ρ_{τ} (Y_{i} - \min {C_{i}, X_{i}^{⊤} β}), (1.2)$

where $ω_{i} ’ s$ are the weights. We propose to use the EL based data driven probabilities as the weights. Our simulation results show that the EL based weighted censored quantile regression performs more efficiently than the standard linear censored quantile regression.

The rest of the paper is organized as follows. In Section 2, we present the estimation procedure of the EL based data driven probabilities. In Section 3, we introduce the EL based weighted censored quantile regression and investigate the asymptotic properties of the estimators. In Section 4, performance analysis of the proposed method is conducted using the simulations. The application to the north central cancer treatment lung cancer data is also presented as an illustration. Our conclusions are given in Section 5.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-05-07

Language selection

Search and menus

Search

Weighted censored quantile regression

Section 1. Introduction

Weighted censored quantile regression Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Weighted censored quantile regression

Section 1. Introduction