Analytical Studies: Methods and References
Mapping the Washington Group on Disability Statistics Disability Measure to the Health Utilities Index Mark 3: Development and Testing of Qualitative and Predictive Multivariable Models in a General Population Sample

by Thomas Charters, Dafna Kohen , and Julie Bernier

Release date: February 3, 2025

Text begins

Abstract

Background: Statistics Canada routinely collects information on functional health and related concepts. Recently, the Washington Group (WG) measure of disability was introduced to the Canadian Community Health Survey (CCHS). The WG measure is used as a tool for developing internationally comparable data on disability. In alternate cycles of the CCHS, it replaces the Health Utilities Index Mark 3 (HUI3), a generic preference-based measure of health-related quality of life. The HUI3 is used to derive evaluative health measures common in population health and economic evaluations. Because the WG measure is not preference-based, it cannot be used to derive these measures. To address these resulting data gaps, this study mapped the health state utility values of the HUI3 score from the WG measure.

Data and methods: Qualitative and empirical mapping used a “head-to-head” subsample of the 2017 CCHS, where WG and HUI3 measures were collected from the same respondents aged 40 years and older. Qualitative mapping relied on expert judgment to link attributes between measures. Empirical mapping used regression models to estimate the statistical relationship between WG and HUI3 measures, in addition to health and demographic variables. Out-of-sample predictive performance was assessed through descriptive statistics, measures of predictive accuracy such as mean absolute error and testing model calibration.

Results: Predictive performance was assessed across qualitative and quantitative methods. The preferred estimation strategy used quantitative mapping and resulted in reasonably precise estimates of the HUI3 score and reflected the distributional properties of the HUI3 score. Inclusion of different components of the WG measure influenced predictive accuracy.

1 Introduction

Population health surveys commonly collect information on health status as represented by functional abilities. Questions assess the ability levels of respondents carrying out various tasks or activities, in addition to health states that may impede this functioning. Disability is a related concept, involving interactions between these elements of functional status and environmental factors that limit or restrict participation in society (Cieza & Stucki, 2008; Madans et al., 2011). The Washington Group on Disability Statistics (WG) measure of disability was developed by an international consortium and sponsored by the United Nations Statistical Commission. The WG aimed to develop an internationally comparable population-based measure of disability to be used in censuses or national surveys through measurement of functional limitations across domains closely associated with social participation (Madans et al., 2011). To facilitate comparability across different countries and cultural contexts, the WG measure assesses functional health through difficulties in universal basic activities. While the WG measure was developed within the International Classification of Functioning, Disability and Health framework (Washington Group on Disability Statistics, 2009; Washington Group on Disability Statistics, Budapest Initiative, & United Nations Economic and Social Commission for Asia and the Pacific, 2016), it does not include social or environmental factors implicit to this framework for reasons of brevity and comparability. The WG measure is intended for use in conjunction with other information sources to highlight inequalities between limitations in health, functioning and social inclusion, and thereby identify targets for intervention as per the United Nations 2030 Sustainable Development Goals (Mont, 2019; United Nations, 2015; Washington Group on Disability Statistics, 2020). The validity and reliability of the WG measure have been demonstrated in international contexts (Miller et al., 2011; Washington Group on Disability Statistics et al., 2016), and the WG measure has been adopted in censuses or surveys in over 80 countries (Washington Group on Disability Statistics, 2020).

Historically, Statistics Canada has used several surveys and measures to estimate levels of disability through measures of impairment, functional health or activity limitations. Among these, the Health Utilities Index® Mark 3 (HUI3) (Feeny et al., 2002) has been incorporated into several health and social surveys for several decades (Health Utilities Inc., 2015). The Health Utilities Index (HUI) system was developed to provide a standardized measure to assess and compare health and health-related quality of life (HRQoL) in patient groups and the general population, and to evaluate health interventions (Horsman, Furlong, Feeny, & Torrance, 2003). Further, the HUI3 has been used to derive evaluative health measures such as health-adjusted life expectancy (HALE) and quality-adjusted life years (QALYs), commonly used in population health and economic evaluations (Bushnik, Tjepkema, & Martel, 2018; Heintz, Wiréhn, Peebo, Rosenqvist, & Levin, 2012). The validity, reliability and responsiveness of the HUI3 system are well established in clinical and population health settings (Boyle, Furlong, Feeny, Torrance, & Hatcher, 1995; Feeny et al., 2002; Feng, Bernier, McIntosh, & Orpana, 2009; Kopec & Willison, 2003).

Since 2000, the Canadian Community Health Survey (CCHS) has been administered by Statistics Canada to provide comprehensive health information on the Canadian population (Béland, Dale, Dufour, & Hamel, 2005). The HUI3 instrument had been included in the CCHS since its inception. In 2015, the CCHS underwent a major redesign, which saw updates to its content, sampling methods and administration (Statistics Canada, 2015). Following the redesign, the HUI3 and the WG Short Set on Functioning (WG-SS) questions were included as part of two-year theme content, being collected in alternate cycles to optimize data collection. The inclusion of the WG-SS measure meets commitments for collection of internationally integrated data on disability (Washington Group on Disability Statistics, 2020). Both measures describe functional capacities (what you can do) intrinsic to the person (“within or near the skin”) rather than performance (what you do) to avoid influence by context-dependent environmental factors (Furlong et al., 2001; Madans & Loeb, 2013). Unlike the HUI3, the WG measure cannot generate health state utility values since these are derived from a preference-based scoring function (Brazier, Yang, Tsuchiya, & Rowen, 2010; Drummond, Sculpher, Claxton, Stoddart, & Torrance, 2015). As such, the WG does not permit calculation of HALE or QALYs.

Collection of the HUI3 in alternating years will lead to data gaps. While the WG and HUI3 measures play complementary roles in the measurement of functional health, the WG measure is not suitable for use in the calculation of important health measures used in population health and program evaluations. Mapping provides a potential solution for estimating health state utility values from the WG measure. Mapping involves the estimation of a relationship between a target measure (HUI3) and a source measure (WG) (Brazier et al., 2010). The relationship may be estimated by using a statistical model or algorithm, or through equating or linking equivalent values between instruments (Fayers & Hays, 2014; Longworth & Rowen, 2013). Importantly, the validity and feasibility of mapping rely on sufficient conceptual overlap between the measures (Brazier et al., 2010; Round & Hawton, 2017; Wailoo et al., 2017). Mapping studies are common (Brazier et al., 2010; Mukuria et al., 2019) and include several examples of successful mapping of the HUI from other measures (Bartman et al., 1997; Franks, Lubetkin, Gold, & Tancredi, 2003; Grootendorst et al., 2007; Marshall et al., 2008; Nichol, Sengupta, & Globe, 2001; Sengupta, Nichol, Wu, & Globe, 2004). Estimation of the HUI3 health state utility score would alleviate data gaps in years when not collected and may optimize resources used in data collection. The purpose of this study is to map the health state utility values of the HUI3 score from the WG measure by estimating the relationship between the measures both qualitatively and statistically. This report builds on previous research, which has established necessary levels of conceptual overlap between these two measures (available upon request) (Asakawa, et al., 2017).

2 Materials and methods

2.1 Data

The 2017 CCHS annual component was used in this study. The CCHS is a cross-sectional representative survey covering a range of topics relevant to the health status, health behaviours and demographic profiles of the Canadian population aged 12 years and older living in private dwellings. People living on Indian reserves, on Crown lands, in institutions and in remote regions or serving in the Canadian Forces are excluded from the sample. Additionally, individuals residing in the territories are excluded from the one-year sample files. Approximately 98% of Canadians aged 12 and older are represented in the CCHS (Statistics Canada, 2018). Health, demographic and socioeconomic variables collected from the 2017 CCHS included respondents’ sex and age, highest educational attainment, marital status, self-perceived general and mental health, presence of chronic conditions, and the WG-SS questions.

Mapping used a unique “head-to-head” subsample of the CCHS containing both WG and HUI3 measures from the same respondents aged 40 years and older. The subsample contained three additional variables from the WG Extended Set on Functioning (WG ES-F) in addition to the multi-attribute health status classification system questionnaire, and derived attribute-specific and overall scores for the HUI3. The rapid-response file included 2,837 respondents who were provided modules for both the HUI3 and WG ES-F, with 2,597 having non-missing responses to the HUI3 target measure.

2.2 Washington Group measures

All domains from Washington Group Short-Set (WG-SS) were included in the core content of the 2017 CCHS annual component. The WG-SS consists of six domains or attributes, including vision (“Do you have difficulty seeing, even if wearing glasses?”), hearing (“Do you have difficulty hearing, even if using a hearing aid?”), mobility (“Do you have difficulty walking or climbing steps?”), cognition (“Do you have difficulty remembering or concentrating?”), self-care (“Do you have difficulty with self-care, such as washing all over or dressing?”) and communication (“Using your usual language, do you have difficulty communicating, for example understanding or being understood?”). Each attribute is assessed by a single question containing four response options: “No difficulty,” “Some difficulty,” “A lot of difficulty” and “Cannot do at all” (Washington Group on Disability Statistics, 2009). An additional category of response, “Missing,” combined the response categories of “Don’t know” and “Refusal.”

In addition to the WG-SS questions, the rapid-response subsample of the 2017 CCHS contains three attributes from the WG ES-F related to pain, anxiety and depression. Indicators for each attribute were derived by one question measuring the frequency of the attribute and a second question on its intensity. The first question for the pain attribute asked, “In the past three months, how often did you have pain?” with response options including “Never,” “Some days,” “Most days” and “Every day.” The second question asked, “Thinking about the last time you had pain, how much pain did you have?” with response options for those who indicated that they experienced pain at least some days, including “A little,” “A lot” and “Somewhere in between a little and a lot.” Questions also asked about respondents’ emotional state, or affect, for anxiety (“How often do you feel worried, nervous or anxious?”) and depression (“How often do you feel depressed?”) with response categories for both questions including “Daily,” “Weekly,” “Monthly,” “A few times a year” and “Never.” Questions on the intensity of anxiety and depression asked whether the level of the feelings, the last time they were experienced, was “A little,” “A lot” or “Somewhere in between a little and a lot” (Washington Group on Disability Statistics et al., 2016). Categorical measures containing information on both the frequency and the intensity of each attribute were derived in four categories, plus a “Missing” category combining responses of “Don’t know” and “Refusal.” Appendix 1 shows the derivation of categories for the WG ES-F measures.

2.3 The Health Utilities Index Mark 3

HUI3 overall scores, attribute-specific responses and questions were included in the 2017 CCHS rapid-response survey. The HUI3 multi-attribute classification system consists of eight attributes—vision, hearing, speech, ambulation, dexterity, emotion, cognition and pain—with five or six response options per attribute (Feeny et al., 2002; Horsman et al., 2003). Preference-based scoring functions convert this descriptive information into utilities, cardinal scores describing preferences over various health states. The multi-attribute scoring function generates overall HUI3 utility scores through multiplicative models and describes up to 972,000 different health states. These scores range from -0.36 (implying a state worse than death) to 1.00 (perfect functional health), with 0.00 representing the state of death. The HUI3 system can also be used to generate single-attribute utility scores representative of mean preferences for health states ranging from 0.00 (most impaired) to 1.00 (no impairment) for each attribute. Overall HUI3 multi-attribute scoring functions for each response level are shown in Appendix 2. Indicators for moderate to severe functional difficulties were derived from the HUI3 disability categories (Feeny & Furlong, 1997) by combining “moderate” scores from 0.70 to 0.88 with “severe” scores below 0.70 into one category, compared with a combined indicator for “mild” scores from 0.98 to 0.99 and no functional difficulties (score of 1.00).

2.4 Mapping part 1: Qualitative mapping

Conceptual mapping of the WG-SS and WG ES-F was completed and validated in phases 1 and 2 of this research project but was not validated in respondents receiving both the WG and HUI3 instruments (Asakawa et al., 2017; Asakawa et al., 2018). Briefly, conceptual mapping first involved assessing the conceptual overlap between attributes defined in the WG to the HUI3 system (e.g., dexterity in HUI3 to self-care in WG). When the attribute types were matched, mapping was done at the level of functional difficulty. As the WG attribute-specific measures were specified through four levels of functional health and the HUI3 through five or six levels, a given WG level was sometimes mapped to two or more HUI3 levels. The initial mapping was performed by a single analyst, reviewed and revised by two additional analysts, and then validated through a review by two experts knowledgeable in measures of HRQoL. When mapping the HUI3 attribute for emotion, two measures of affect collected in the WG ES-F were potential matches: anxiety and depression. These attributes, reflecting both the degree and frequency of affect, were considered to reflect the emotional state of the respondent and were included in alternate models. Appendix 3 shows conceptual mapping between the WG and HUI3 attributes.

Following the conceptual mapping of WG attributes and levels of functional health, qualitative mapping was conducted using two different methods. Missing categories of the WG were not eligible for mapping and were excluded from the head-to-head subsample, resulting in a subsample of N=2,537. Method 1 first mapped the WG responses to the HUI3 attribute levels and then assigned the corresponding HUI3 scoring function to those levels. In instances where WG categories mapped to more than one HUI3 level, attribute level assignment was randomly allocated based on the weighted proportion of those HUI3 levels conceptually mapped to the WG category. This permitted derivation of scoring functions using the WG measure aligned with those used in the HUI3 system. Method 2 directly mapped HUI3 scoring functions (Appendix 2) to the WG levels. When WG categories conceptually mapped to more than one HUI3 attribute level, the weighted average of the HUI3 scoring function was assigned.

For example, the WG-SS vision category 1, “No difficulty seeing with or without glasses,” was conceptually mapped to HUI3 attribute levels 1, “Able to see well enough to read ordinary newsprint and recognize a friend on the other side of the street, without glasses or contact lenses,” and 2, “Able to see well enough to read ordinary newsprint and recognize a friend on the other side of the street, but with glasses.” Among those with a WG score equal to 1, 40.5% and had HUI3 attribute level 1 and 59.5% had HUI3 attribute level 2. In Method 1, 40.5% of respondents with WG-SS vision=1 were randomly assigned to a mapped HUI3 attribute level 1 and 59.5% to level 2, where HUI3 scoring functions of 1.00 and 0.98 were assigned, respectively. In Method 2, the weighted average of the HUI3 scoring function was taken as 0.988=(1.00*0.405 + 0.98*0.595) and assigned to those with WG-SS vision=1.

2.5 Mapping part 2: Empirical mapping

Empirical mapping exploits information collected in the “head-to-head” comparison by using predictive modelling to estimate the statistical relationship between source and target measures. Predictive modelling involves applying a statistical model or data mining algorithm to data to predict new or future observations (Shmueli, 2010). The continuous HUI3 overall score (target measure) was modelled on categorical WG variables, in addition to demographic and health variables of interest (the source measures). Respondent health and demographic characteristics were included since these were anticipated to have associations to the HUI3 independent from those of the WG measures, and that their inclusion had potential to improve overall predictive accuracy. Demographic and health characteristics were selected based on their inclusion in the annual CCHS core content, having independent associations to the HUI3 score and comparable use in other mapping studies (Bartman et al., 1997; Franks et al., 2003; Grootendorst et al., 2007; Marshall et al., 2008; Nichol et al., 2001; Sengupta et al., 2004). Sets of predictive models were also tested in the absence of demographic and health characteristics. This was to provide predictive equations for instances when mapped health state utility values are to be used in demographic comparisons. Missing data in categorical independent variables were permitted (Shmueli, 2010) and treated as distinct item scores.

Responses to the WG questions were entered into the model as discrete dummy variables with the response “No difficulty” as the reference category. Use of item scores in regression models was chosen, given the potential to improve model flexibility (Brazier et al., 2010). Three different sets of WG items were tested together: the WG-SS only (25 potential coefficients), and two sets with both the WG-SS and WG ES-F, including measures of pain in both and, alternately, anxiety or depression for affect (each: 33 potential coefficients). Age, sex, marital status (married or common law, single, widowed, separated, divorced, or missing status), and self-rated general health and mental health (each: poor, fair, good, very good, excellent, or not stated) were included in succession in 12 sets of prediction models. Self-rated health has been demonstrated to be a reliable measure (Lundberg & Manderbacka, 1996) positively correlated to physicians’ health ratings (Maddox & Douglass, 1973), chronic disease incidence (Kaplan et al., 1996) and mortality (Idler & Benyamini, 1997). Self-rated health was included, given the commonality of variables representative of respondents’ health states in mapping studies (Brazier et al., 2010; Mukuria et al., 2019). Age was centred at 62 years, the unweighted mean age, to improve the interpretability of coefficients. Consistent with previous studies (Bernier, Feng, & Asakawa, 2011; Grootendorst et al., 2007; Orpana et al., 2009), age was entered in linear, quadratic and cubic forms, accounting for declines in functional health at advanced ages, assuming a nonlinear functional relationship. All other variables were entered into the model as categorical dummies.

Characteristics of the HUI3 score presented challenges to empirical mapping. The distribution of the multi-attribute overall HUI3 utility score in a general health population sample is known to be highly skewed, with most respondents at excellent or near-excellent scores in functional health (Bernier et al., 2011; Feng et al., 2009; Van Doorslaer & Jones, 2003). Because of the skewed nature of the HUI3 distribution, the normality of residuals assumption may be violated during regression modelling. Further, because the HUI3 score is calibrated from -0.36 to 1.00 (Feeny et al., 2002; Horsman et al., 2003), modelling must ensure that predicted results fall within these theoretical bounds to be interpretable. Several different regression methods and outcome transformations were explored to improve predictive accuracy, attain a more normal distribution of residuals and better replicate the distributional properties of the overall HUI3 score.

Regression strategies included (1) linear regression on the untransformed HUI3 score, (2) linear regression of the HUI3 score with arcsine transformation, (3) generalized linear models (GLMs) fit to the HUI3 score following linear transformation and (4) Poisson regression on the HUI3 score following linear transformation. Quantile regression on the untransformed HUI3 score was attempted, although it frequently experienced failures in model convergence and was omitted from the report. Arcsine transformation was used to transform the HUI3 overall score in the form $a r c s i n e [2 * (\frac{H U I 3 + 0.36}{1 + 0.36}) - 1]$ . This method has been validated in previous studies to improve the distribution of residuals in regression modelling and to enable prediction of scores within the theoretical bounds of the HUI3 score (Bernier et al., 2011). Prior to transformation, the HUI3 score was first bound to the [-1, 1] interval through linear transformation to facilitate arcsine transformation.

GLMs were fit to an HUI3 score transformed as a proportion. That is, the overall HUI3 score of a given respondent was represented as the proportion or rank of their score over the theoretical range of possible scores, thereby restricting the range of transformed scores to 0 and 1. The transformation took the form $[(H U I 3 + 0.36) / (1 + 0.36)]$ . GLM regression with logit link and binomial family was used to estimate scores with this transformation (Baum, 2008) to ensure that projected scores, after being back-transformed, would fall within appropriate theoretical bounds. HUI3 scores were also transformed into an inverted form where, through inversion and linear transformation, the top bound of the HUI3 score at 1 would become the lower bound of the transformed score at 0. As the distribution of the transformed HUI3 score was constrained to positive values with the lower bound at 0, Poisson regression was used to model this transformed form. The transformation took the form $[((H U I 3 + 0.36) / (1 + 0.36))^{- 1} - 1]$ .

Further two-step estimation methods were used to try to explicitly model the distributional elements of the HUI3 score found in a general population sample. A series of two-step models was explored to impute or predict highly prevalent discrete scores of the HUI3 variable HUI3=1 and HUI3=0.973, and to separately model with regression techniques the distribution of scores anticipated to be less than 0.973.

First, a direct imputation method was tested. The value of 1.00 was imputed for respondents if they had functional health at the level of “No difficulty” in all WG-SS attributes and, similarly, if they had scores of “No difficulty” in all attributes of the WG-SS and WG ES-F (using anxiety as the measure of affect, given that perfect scores were less common). The HUI3 score was otherwise predicted using linear regression with the untransformed or arcsine transformed HUI3 score.

Second, ordinal logistic regression and multinomial logistic regression were both assessed for their ability to predict a three-category variable denoting whether the overall HUI3 score fell into discrete scores of 1.0, 0.973 or any score below 0.973. Assessments were performed on 12 sets of source variables, as in the prior regression models. Predicted categories of the discrete HUI3 score were assigned based on the highest predicted probability of a respondent having a score in a given category. In the second estimation step, linear regression on the HUI3 score (with arcsine transformation) predicted scores for respondents who were estimated to have HUI3 scores below 0.973. Scores of 1.00 or 0.973 were otherwise directly imputed based on results from the first estimation step.

Models fit on one set of data, particularly with the use of many covariates, may predict characteristics that are randomly unique to that dataset and may not necessarily show comparable predictive validity when replicated in other data sources. The head-to-head dataset was randomly partitioned into an “analytical” dataset that made up two-thirds of the subsample (N=1,731), used to statistically predict HUI3 scores, and a “hold-out” dataset with one-third of the data (N=866), used to assess predictive accuracy. Models that accurately predict HUI3 levels in the hold-out sample are expected to perform well in routine use and avoid problems inherent to overfitting (Shmueli, 2010).

In the hold-out dataset, predictive accuracy was assessed through descriptive statistics of the predicted health state utility scores (mean, median, interquartile range, minimum, maximum), in addition to several forecast statistics. No prespecified criteria determined model success. The difference between the actual HUI3 scores observed in the data and the predicted HUI3 scores in the hold-out sample respondents provided the prediction error. The mean absolute error (MAE), the mean of the absolute difference between observed and predicted HUI3 scores, was selected as an interpretable measure of the predictive accuracy of models: $\frac{1}{n} \sum_{i = 1}^{n} | P E_{i} |$ , where $P E_{i}$ =prediction error. As the MAE may be influenced by factors including the selection of the head-to-head sample, the random selection of the “training” subsample, survey sampling error, and natural variation in the HUI3 and WG measures, the precision of empirical mapping was informed by taking the weighted population estimate and 95% confidence intervals (CIs) using bootstrap standard errors (Grootendorst et al., 2007). Predictive accuracy was also assessed using the root mean squared error (RMSE), calculated as the square root of the mean squared prediction error: $\sqrt{\frac{1}{n} \sum_{i = 1}^{n} P E_{i}^{2}}$ . The RMSE weighs larger errors more heavily and may favour models that are modestly less accurate in total over more accurate models estimated with some larger errors (Grootendorst et al., 2007). Kendall’s rank correlation coefficient was also selected to assess the degree of correlation between the HUI3 and mapped measures. A ranked coefficient was selected over alternatives (e.g., Pearson’s r), given the non-normal distribution of the HUI3 score (Conover, 1999). Model R² was also explored where applicable, representing the proportion of variance in the dependent variable explained from the source measures, and was the only measure of accuracy taken from the analytical dataset subset. Proportions of predicted scores that differed from observed values by +/-0.03 units or more (the smallest change in HUI3 considered clinically important [Horsman et al., 2003]) were calculated.

3 Results

3.1 Qualitative mapping

Table 1 shows descriptive statistics of the overall HUI3 score in the sample, in addition to projected HUI3 scores and forecast statistics. Qualitative mapping was not estimated from the data, so these estimates are calculated using the full rapid-response subsample, excluding missing responses for the WG measures (N=2,537). HUI3 scores had a mean of 0.858 (95% CI=0.846, 0.870), a median and interquartile range of 0.919 (0.788 to 0.973), and range of -0.180 to 1.00. Very little difference was observed in the predictive accuracy of mapped HUI3 estimates comparing assignment of attribute levels (Method 1) or assignment of the multi-attribute scoring function (Method 2), although only Method 1 permitted the scores to reach their theoretical maximum range of 1.00. Mapped scores routinely underestimated the true HUI3 values. Qualitative mapping using anxiety (WG ES-F) as the affect measure had a mean and median of 0.730 and 0.801, while mapping using depression (WG ES-F) had a mean and median of 0.777 and 0.84. MAE tended to favour qualitative mapping using depression for affect, having an MAE of 0.133, compared with 0.168 when using anxiety for affect. This was also found in other measures, including the RMSE, Kendall’s correlation coefficient and the proportion of flagged variables.

Table 1
Model performance of qualitative mapping Table summary
The information is grouped by Mapping method and WG ES-F affect measure (appearing as row headers), , calculated using (appearing as column headers).
Mapping method and WG ES-F affect measure	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more
Note ... not applicable Notes: WG ES-F = Washington Group Extended Set on Functioning; HUI3 = Health Utilities Index Mark 3; and MAE = mean absolute error. Method 1 maps HUI3 attribute levels to Washington Group Short Set on Functioning (WG-SS) and WG ES-F response categories with random allocation if there is one-to-many attribute correspondence, and assigns HUI3 scoring functions. Method 2 maps HUI3 scoring functions to WG-SS and WG ES-F response categories with the weighted average taken if there is one-to-many attribute correspondence. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.858	0.846	0.870	-0.180	0.788	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
Method 1
Anxiety	0.730	0.713	0.748	-0.305	0.563	0.801	0.931	1.000	0.168	0.157	0.179	0.245	0.443	72.8
Depression	0.777	0.761	0.793	-0.323	0.646	0.838	0.973	1.000	0.133	0.124	0.143	0.201	0.460	65.2
Method 2
Anxiety	0.731	0.714	0.748	-0.291	0.561	0.793	0.903	0.982	0.166	0.155	0.177	0.242	0.464	72.7
Depression	0.777	0.761	0.793	-0.314	0.655	0.848	0.982	0.982	0.131	0.122	0.141	0.198	0.485	66.0

3.2 Empirical mapping 1: Single-step models

Empirical mapping used two-thirds of the head-to-head dataset (N=1,731) to statistically estimate the HUI3 score from source measures, using different regression modelling strategies and utility score transformations and 12 predictive covariate sets. Tables 2 to 4 show descriptive and forecast statistics calculated in the hold-out dataset (N=866). HUI3 scores had a mean of 0.848 (95% CI=0.828, 0.869), a median and interquartile range of 0.919 (0.744 to 0.973), and a range of -0.16 to 1.00. Table 2 shows results from linear regression models, Table 3 from GLMs and Table 4 from Poisson models. Across all empirical mapping specifications, predictive covariate sets that included measures from the WG ES-F (pain, with either anxiety or depression for affect) performed better than those with the WG-SS alone. Overall, linear regression on the HUI3 score with arcsine transformation showed the greatest predictive accuracy and adherence to the theoretical bounds of the HUI3 score, having an MAE as low as 0.086 (Table 2). Linear regression on the HUI3 score without transformation, while more closely predicting the overall mean value, had slightly higher levels of MAE; underestimated values at the upper range of the HUI3 distribution; and tended to predict results greater than 1.00, the upper bound of the HUI3 distribution. Results predicted from linear-transformed HUI3 scores using either GLMs or Poisson regression tended to have higher overall levels of MAE and to underestimate scores, particularly in the upper range of the mapped distribution.

Table 2
Model performance empirical mapping of Washington Group to Health Utilities Index Mark 3 (HUI3), linear regression on HUI3 score and arcsine-transformed HUI3 score Table summary
The information is grouped by Model (appearing as row headers), , calculated using (appearing as column headers).
Model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more	Model R²
Note ... not applicable Model 1: Washington Group Short Set on Functioning (WG-SS) Model 2: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age², age³, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health Notes: HUI3 = Health Utilities Index Mark 3; and MAE = mean absolute error. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
No transformation
1	0.848	0.833	0.863	-0.233	0.787	0.915	0.932	0.972	0.105	0.096	0.115	0.171	0.437	85.5	0.530
2	0.850	0.833	0.867	-0.218	0.790	0.892	0.951	1.002	0.091	0.082	0.100	0.158	0.466	72.6	0.624
3	0.848	0.830	0.866	-0.342	0.791	0.902	0.965	1.012	0.092	0.082	0.101	0.156	0.477	73.5	0.634
4	0.847	0.832	0.861	-0.220	0.792	0.917	0.929	0.978	0.105	0.095	0.115	0.170	0.379	82.0	0.533
5	0.850	0.833	0.867	-0.219	0.784	0.896	0.954	1.020	0.091	0.082	0.100	0.157	0.462	68.5	0.626
6	0.849	0.831	0.867	-0.348	0.789	0.901	0.961	1.026	0.091	0.082	0.101	0.156	0.463	70.9	0.636
7	0.847	0.833	0.862	-0.216	0.793	0.916	0.929	0.968	0.105	0.095	0.114	0.170	0.383	82.5	0.533
8	0.851	0.834	0.868	-0.223	0.781	0.897	0.956	1.008	0.091	0.082	0.100	0.157	0.464	69.3	0.626
9	0.849	0.832	0.867	-0.352	0.788	0.901	0.960	1.017	0.091	0.082	0.101	0.155	0.465	66.7	0.636
10	0.849	0.833	0.864	-0.327	0.785	0.901	0.950	1.069	0.099	0.090	0.109	0.160	0.434	72.7	0.578
11	0.852	0.835	0.869	-0.259	0.782	0.897	0.959	1.005	0.090	0.081	0.099	0.153	0.480	65.5	0.648
12	0.851	0.833	0.869	-0.326	0.785	0.897	0.961	1.022	0.090	0.080	0.099	0.151	0.479	64.2	0.653
Arcsine transformation
1	0.878	0.862	0.893	-0.231	0.848	0.942	0.957	0.987	0.098	0.088	0.109	0.177	0.438	70.5	0.471
2	0.873	0.856	0.891	-0.201	0.838	0.925	0.966	0.991	0.087	0.078	0.097	0.164	0.467	66.7	0.571
3	0.871	0.852	0.889	-0.290	0.837	0.937	0.975	0.994	0.087	0.077	0.097	0.162	0.477	61.6	0.581
4	0.877	0.863	0.892	-0.217	0.843	0.946	0.954	0.997	0.098	0.088	0.108	0.176	0.400	68.8	0.476
5	0.874	0.857	0.891	-0.200	0.831	0.929	0.972	1.000	0.086	0.077	0.096	0.163	0.473	62.4	0.579
6	0.871	0.853	0.890	-0.298	0.834	0.931	0.971	1.000	0.086	0.076	0.096	0.161	0.465	61.5	0.586
7	0.878	0.863	0.893	-0.213	0.843	0.945	0.954	0.995	0.098	0.088	0.108	0.176	0.400	70.6	0.476
8	0.875	0.857	0.892	-0.204	0.829	0.929	0.971	0.999	0.086	0.077	0.096	0.163	0.473	62.3	0.579
9	0.872	0.854	0.890	-0.300	0.836	0.931	0.970	1.000	0.086	0.076	0.096	0.161	0.467	61.5	0.587
10	0.875	0.860	0.891	-0.284	0.826	0.932	0.967	0.993	0.094	0.084	0.105	0.165	0.444	65.1	0.533
11	0.874	0.857	0.891	-0.217	0.822	0.928	0.970	0.997	0.087	0.077	0.097	0.159	0.487	60.5	0.606
12	0.873	0.855	0.890	-0.277	0.826	0.930	0.974	0.996	0.087	0.077	0.096	0.157	0.484	59.7	0.610

Table 3
Model performance empirical mapping of Washington Group to Health Utilities Index Mark 3 (HUI3), generalized linear models of linear-transformed HUI3 score Table summary
The information is grouped by Model (appearing as row headers), , calculated using (appearing as column headers).
Model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more
Note ... not applicable Model 1: Washington Group Short Set on Functioning (WG-SS) Model 2: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age², age³, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health Notes: HUI3 = Health Utilities Index Mark 3; and MAE = mean absolute error. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
1	0.847	0.832	0.862	-0.208	0.807	0.912	0.923	0.946	0.106	0.096	0.116	0.177	0.433	82.3
2	0.848	0.830	0.865	-0.184	0.821	0.909	0.940	0.964	0.095	0.086	0.104	0.167	0.455	73.2
3	0.847	0.828	0.865	-0.245	0.820	0.904	0.946	0.965	0.095	0.086	0.104	0.165	0.466	69.7
4	0.847	0.832	0.862	-0.198	0.820	0.913	0.921	0.950	0.105	0.096	0.115	0.176	0.376	81.5
5	0.849	0.832	0.866	-0.214	0.815	0.906	0.943	0.972	0.094	0.085	0.103	0.165	0.454	72.8
6	0.848	0.830	0.866	-0.252	0.821	0.904	0.945	0.972	0.094	0.085	0.104	0.163	0.453	70.7
7	0.848	0.833	0.862	-0.194	0.818	0.913	0.921	0.946	0.106	0.096	0.115	0.176	0.379	82.4
8	0.850	0.833	0.868	-0.212	0.811	0.906	0.944	0.968	0.094	0.085	0.103	0.165	0.456	72.1
9	0.849	0.831	0.867	-0.255	0.819	0.907	0.944	0.969	0.095	0.085	0.104	0.163	0.455	72.2
10	0.850	0.835	0.865	-0.219	0.804	0.904	0.939	0.960	0.103	0.093	0.114	0.165	0.423	78.1
11	0.853	0.836	0.870	-0.233	0.813	0.908	0.948	0.970	0.095	0.085	0.105	0.161	0.469	73.5
12	0.852	0.835	0.870	-0.211	0.820	0.908	0.949	0.968	0.094	0.085	0.104	0.159	0.469	73.8

Table 4
Model performance empirical mapping of Washington Group to Health Utilities Index Mark 3 (HUI3), Poisson regression of linear-transformed HUI3 score Table summary
The information is grouped by Model (appearing as row headers), , calculated using (appearing as column headers).
Model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more
Note ... not applicable Model 1: Washington Group Short Set on Functioning (WG-SS) Model 2: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex,age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age², age³, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health Notes: HUI3 = Health Utilities Index Mark 3; and MAE = mean absolute error. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
1	0.821	0.806	0.836	-0.186	0.788	0.898	0.904	0.949	0.118	0.108	0.128	0.185	0.418	83.3
2	0.827	0.809	0.845	-0.128	0.785	0.885	0.933	0.961	0.103	0.093	0.112	0.170	0.444	85.2
3	0.828	0.809	0.846	-0.125	0.789	0.881	0.937	0.952	0.103	0.093	0.113	0.169	0.453	84.7
4	0.821	0.806	0.835	-0.213	0.774	0.893	0.902	0.953	0.118	0.108	0.127	0.182	0.349	86.9
5	0.829	0.811	0.846	-0.224	0.779	0.887	0.932	0.965	0.102	0.092	0.111	0.168	0.441	82.7
6	0.830	0.812	0.847	-0.160	0.783	0.883	0.938	0.958	0.102	0.092	0.112	0.167	0.444	84.5
7	0.822	0.807	0.836	-0.207	0.780	0.892	0.903	0.953	0.118	0.108	0.128	0.183	0.355	86.3
8	0.830	0.813	0.848	-0.221	0.774	0.888	0.933	0.961	0.101	0.092	0.111	0.168	0.441	78.7
9	0.831	0.813	0.849	-0.158	0.785	0.886	0.935	0.954	0.102	0.092	0.112	0.167	0.443	86.0
10	0.826	0.811	0.842	-0.194	0.773	0.887	0.926	0.948	0.113	0.102	0.123	0.171	0.408	85.3
11	0.834	0.816	0.852	-0.225	0.789	0.894	0.941	0.958	0.102	0.092	0.111	0.162	0.453	75.3
12	0.835	0.817	0.853	-0.200	0.782	0.896	0.938	0.957	0.101	0.091	0.110	0.160	0.459	74.2

3.3 Empirical mapping 2: Direct imputation of perfect functional health scores

Two-step models imputed the value of 1.00 for respondents who had functional health at the level of “no difficulty” in all WG-SS attributes, or all WG-SS and WG ES-F attributes (using the anxiety measure for affect). Results were otherwise estimated using linear regression (Table 5) or linear regression on the arcsine-transformed HUI3 score (Table 6). About half (49.5%) of the hold-out sample was classified as having perfect functional health based on the WG-SS alone, and 22.5% using the WG-SS and WG ES-F. Models using the WG-SS alone to impute perfect health state scores overestimated the proportions of these scores. Similar to what was done previously, mapping to untransformed HUI3 scores led to estimation beyond the upper bounds of the HUI3 distribution. Optimal MAE was found in projected scores from arcsine-transformed models using WG ES-F items, as low as 0.089.

Table 5
Model performance empirical mapping of Washington Group to Health Utilities Index Mark 3, linear regression with direct imputation of "perfect" functional health scores Table summary
The information is grouped by Direct imputation method and model (appearing as row headers), , calculated using (appearing as column headers).
Direct imputation method and model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more	Model R²
Note ... not applicable Model 1: Washington Group Short Set on Functioning (WG-SS) Model 2: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age², age³, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health ^{Table 5 Note 23} Notes: WG-SS = Washington Group Short Set on Functioning; WG ES-F = Washington Group Extended Set on Functioning; HUI3 = Health Utilities Index Mark 3; MAE = mean absolute error. The direct imputation method imputes the value of 1.00 when scores of functional health are "perfect" across all attributes of WG-SS, or WG-SS and WG ES-F. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
WG-SS
1	0.884	0.866	0.901	-0.233	0.778	0.905	1.000	1.000	0.107	0.096	0.119	0.179	0.441	62.2	0.471
2	0.886	0.867	0.905	-0.218	0.794	0.958	1.000	1.000	0.097	0.086	0.108	0.169	0.463	59.1	0.582
3	0.886	0.866	0.906	-0.330	0.805	0.951	1.000	1.009	0.096	0.085	0.107	0.168	0.464	57.7	0.596
4	0.884	0.866	0.901	-0.226	0.792	0.933	1.000	1.000	0.106	0.095	0.118	0.178	0.437	60.3	0.476
5	0.886	0.867	0.905	-0.222	0.794	0.970	1.000	1.010	0.097	0.087	0.108	0.169	0.462	59.0	0.585
6	0.886	0.866	0.906	-0.340	0.803	0.962	1.000	1.025	0.096	0.085	0.107	0.167	0.464	57.7	0.598
7	0.884	0.867	0.902	-0.233	0.791	0.934	1.000	1.000	0.106	0.095	0.118	0.178	0.437	60.4	0.476
8	0.887	0.868	0.906	-0.229	0.793	0.972	1.000	1.000	0.097	0.086	0.108	0.169	0.464	59.1	0.586
9	0.887	0.867	0.906	-0.347	0.800	0.967	1.000	1.006	0.096	0.084	0.107	0.167	0.465	56.9	0.599
10	0.883	0.866	0.901	-0.307	0.768	0.974	1.000	1.000	0.107	0.095	0.118	0.171	0.450	61.0	0.530
11	0.887	0.869	0.906	-0.229	0.794	0.996	1.000	1.000	0.100	0.089	0.111	0.166	0.467	58.3	0.613
12	0.887	0.868	0.906	-0.300	0.798	0.998	1.000	1.006	0.098	0.087	0.109	0.165	0.471	57.7	0.620
WG-SS, WG ES-F
1	0.858	0.842	0.874	-0.233	0.780	0.906	0.914	1.000	0.104	0.094	0.114	0.171	0.441	69.2	0.500
2	0.860	0.842	0.878	-0.211	0.788	0.899	0.964	1.009	0.093	0.084	0.102	0.160	0.466	64.4	0.600
3	0.859	0.840	0.877	-0.339	0.790	0.907	0.970	1.016	0.093	0.084	0.103	0.158	0.472	66.6	0.610
4	0.858	0.842	0.874	-0.223	0.786	0.902	0.931	1.000	0.103	0.093	0.113	0.170	0.424	66.4	0.503
5	0.860	0.842	0.878	-0.211	0.782	0.901	0.971	1.033	0.093	0.084	0.102	0.159	0.470	65.1	0.603
6	0.859	0.840	0.878	-0.347	0.788	0.905	0.973	1.032	0.093	0.083	0.103	0.157	0.472	63.1	0.612
7	0.858	0.843	0.874	-0.228	0.788	0.902	0.931	1.000	0.103	0.093	0.113	0.170	0.426	66.8	0.503
8	0.861	0.843	0.878	-0.215	0.782	0.901	0.971	1.021	0.093	0.084	0.102	0.159	0.470	64.3	0.604
9	0.860	0.841	0.878	-0.350	0.787	0.904	0.975	1.023	0.093	0.083	0.103	0.157	0.472	63.3	0.612
10	0.859	0.842	0.875	-0.322	0.777	0.901	0.965	1.048	0.101	0.090	0.111	0.161	0.451	67.5	0.555
11	0.861	0.843	0.879	-0.238	0.776	0.902	0.986	1.008	0.092	0.083	0.102	0.155	0.482	62.3	0.630
12	0.860	0.842	0.879	-0.313	0.781	0.901	0.992	1.013	0.092	0.082	0.102	0.153	0.482	61.8	0.634

Table 6
Model performance empirical mapping of Washington Group to Health Utilities Index Mark 3 (HUI3), direct imputation of "perfect" functional health scores and linear regression of arcsine-transformed HUI3 Table summary
The information is grouped by Direct imputation method and model (appearing as row headers), , calculated using (appearing as column headers).
Direct imputation method and model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more	Model R²
Note ... not applicable Model 1: Washington Group Short Set on Functioning (WG-SS) Model 2: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age², age³, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health Notes: WG-SS = Washington Group Short Set on Functioning; WG ES-F = Washington Group Extended Set on Functioning; HUI3 = Health Utilities Index Mark 3; MAE = mean absolute error. The direct imputation method imputes the value of 1.00 when scores of functional health are "perfect" across all attributes of WG-SS, or WG-SS and WG ES-F. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
WG-SS
1	0.899	0.882	0.915	-0.232	0.830	0.928	1.000	1.000	0.103	0.092	0.115	0.183	0.442	59.4	0.430
2	0.897	0.879	0.916	-0.219	0.837	0.960	1.000	1.000	0.095	0.084	0.106	0.174	0.462	58.0	0.546
3	0.896	0.877	0.916	-0.284	0.847	0.958	1.000	1.000	0.094	0.083	0.105	0.172	0.467	56.3	0.561
4	0.899	0.883	0.915	-0.226	0.837	0.943	1.000	1.000	0.103	0.091	0.114	0.182	0.439	61.6	0.434
5	0.898	0.879	0.916	-0.220	0.830	0.970	1.000	1.000	0.095	0.084	0.106	0.173	0.464	57.5	0.551
6	0.896	0.877	0.916	-0.293	0.846	0.971	1.000	1.000	0.094	0.083	0.105	0.171	0.467	56.3	0.565
7	0.899	0.883	0.916	-0.230	0.835	0.944	1.000	1.000	0.103	0.091	0.114	0.182	0.439	61.4	0.434
8	0.898	0.880	0.916	-0.224	0.833	0.970	1.000	1.000	0.095	0.084	0.106	0.173	0.464	57.0	0.552
9	0.897	0.878	0.916	-0.296	0.846	0.970	1.000	1.000	0.094	0.083	0.105	0.171	0.467	57.9	0.566
10	0.897	0.881	0.913	-0.274	0.813	0.969	1.000	1.000	0.104	0.092	0.116	0.173	0.449	59.1	0.502
11	0.898	0.880	0.916	-0.215	0.837	0.981	1.000	1.000	0.098	0.087	0.110	0.170	0.466	57.6	0.589
12	0.897	0.879	0.916	-0.261	0.837	0.982	1.000	1.000	0.097	0.085	0.108	0.169	0.467	56.5	0.596
WG-SS, WG ES-F
1	0.882	0.867	0.898	-0.231	0.837	0.934	0.941	1.000	0.099	0.088	0.109	0.177	0.442	68.6	0.445
2	0.879	0.861	0.897	-0.200	0.838	0.925	0.966	1.000	0.090	0.081	0.100	0.166	0.468	67.3	0.549
3	0.877	0.858	0.895	-0.292	0.839	0.935	0.972	1.000	0.090	0.080	0.100	0.163	0.473	61.1	0.560
4	0.883	0.867	0.898	-0.224	0.834	0.932	0.960	1.000	0.098	0.088	0.108	0.176	0.435	67.3	0.450
5	0.880	0.862	0.897	-0.198	0.829	0.930	0.977	1.000	0.089	0.080	0.099	0.164	0.474	62.2	0.558
6	0.877	0.859	0.896	-0.302	0.834	0.929	0.980	1.000	0.089	0.079	0.099	0.162	0.475	62.3	0.567
7	0.883	0.868	0.898	-0.222	0.835	0.932	0.961	1.000	0.098	0.088	0.108	0.176	0.436	67.4	0.450
8	0.880	0.863	0.898	-0.201	0.826	0.931	0.978	1.000	0.089	0.080	0.099	0.164	0.474	62.3	0.559
9	0.878	0.860	0.896	-0.304	0.835	0.929	0.980	1.000	0.089	0.079	0.099	0.162	0.476	62.8	0.567
10	0.880	0.864	0.896	-0.284	0.819	0.928	0.977	1.000	0.097	0.086	0.108	0.165	0.454	61.7	0.517
11	0.879	0.862	0.897	-0.217	0.820	0.931	0.983	1.000	0.090	0.080	0.100	0.160	0.481	59.1	0.595
12	0.878	0.860	0.896	-0.273	0.827	0.930	0.985	1.000	0.089	0.080	0.099	0.158	0.481	59.0	0.599

3.4 Empirical mapping 3: Estimation and imputation of perfect and near-perfect functional health scores

Direct imputation techniques did not result in increases in overall predictive accuracy. Additional steps were taken to better estimate highly prevalent discrete scores in the upper range of the HUI3 distribution. First, ordinal or multinomial logistic regression was used to predict highly prevalent discrete scores of the HUI3 variable of 1.00, 0.973 and all scores below 0.973 on source measures through regressing on a three-category variable representing these scores. After deriving predicted probabilities for each category of the three-level variable, mapped categories were assigned based on each respondent’s highest predicted probability. Kappa scores were used to assess the level of agreement between the categories of HUI3 scores and the estimates of these categories. Kappa scores measure the level of agreement between two ratings above what might be observed due to random chance. Table 7 shows that 16.1% of the sample had a HUI3 score of 1.00, 24.5% had a score of 0.973 and 59.5% had a score below 0.973. Greater accuracy in predicting these categories was attained, with multinomial logistic regression in models controlling for the WG-SS, the WG ES-F (pain and anxiety), age, age², age³, sex and marital status (Model 8) having the highest agreement between categories (66%, kappa=0.374). The highest agreement in models not using the WG ES-F was found in Model 10 (64%, kappa=0.328), which additionally included self-rated general and mental health.

Table 7
Measure of agreement in observed and predicted categories of Health Utilities Index Mark 3 score (1, 0.973, less than 0.973)
Table summary
This table displays the results of Measure of agreement in observed and predicted categories of Health Utilities Index Mark 3 score (1. The information is grouped by Model (appearing as row headers), Ordered logistic regression and Multinomial logistic regression (appearing as column headers).
Model	Ordered logistic regression					Multinomial logistic regression
Model	Percentage HUI3<0.973	Percentage HUI3= 0.973	Percentage HUI3= 1.00	Kappa	Percentage observed agreement	Percentage HUI3<0.973	Percentage HUI3= 0.973	Percentage HUI3= 1.00	Kappa	Percentage observed agreement
HUI3	59.5	24.5	16.1	Note ...: not applicable	Note ...: not applicable	59.5	24.5	16.1	Note ...: not applicable	Note ...: not applicable
Mapped
1	99.8	0.2	0.0	-0.003	59.2	99.7	0.2	0.1	-0.005	59.1
2	64.8	35.1	0.1	0.323	64.2	62.7	37.0	0.3	0.322	63.6
3	63.5	36.4	0.1	0.305	62.9	63.2	36.8	0.0	0.308	63.0
4	82.2	12.6	5.2	0.124	58.7	65.2	25.5	9.2	0.253	60.0
5	63.9	29.7	6.5	0.353	65.2	62.8	29.0	8.2	0.361	65.4
6	62.8	30.0	7.2	0.332	63.9	62.2	28.8	9.0	0.354	64.8
7	82.6	11.9	5.5	0.137	59.4	64.0	27.4	8.7	0.252	59.7
8	64.1	29.3	6.6	0.352	65.2	62.8	28.6	8.5	0.374	66.1
9	62.8	30.3	6.9	0.332	63.9	61.8	29.0	9.2	0.354	64.7
10	62.9	30.6	6.5	0.327	63.6	63.3	28.2	8.5	0.328	63.6
11	63.4	28.9	7.7	0.318	63.2	61.8	29.7	8.5	0.353	64.7
12	64.8	27.1	8.1	0.316	63.4	61.9	30.1	8.0	0.327	63.3
... not applicable Note: WG-SS = Washington Group Short-Set questions; WG ES-F = Washington Group Extended Set of Functioning; HUI3=Health Utilities Index Mark 3 Model 1: WG-SS Model 2: WG-SS, WG ES-F (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age2, age3, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health Source: 2017 Canadian Community Health Survey Rapid Response subsample.

The next step regressed an arcsine-transformed HUI3 score on source measures for the 63% of the hold-out sample projected to have HUI3 scores below 0.973. Mapped scores were derived from reverse-transforming predicted scores and otherwise imputing discrete scores of 1.00 or 0.973 based on projected categories from the first estimation step. Table 8 shows descriptive and forecast statistics for this two-step approach, whereby discrete scores of 1.00 and 0.973 were derived from Model 8 (Table 5) of the first estimation step. The HUI3 score in the hold-out sample had a mean of 0.848 (95% CI=0.828, 0.869), a median and interquartile range of 0.919 and 0.744 to 0.973, respectively, and a range of -0.16 to 1.00. Models that included the WG ES-F in the second estimation step routinely performed better than those that did not include it, although there was little difference based on the choice of anxiety and depression for affect or from the inclusion of other non-WG predictors. The greatest predictive accuracy based on MAE estimates was found to be 0.086, with slight improvements in forecast statistics favouring the inclusion of depression. Mean predicted scores were generally higher than observed scores, although by less than the clinically important difference of 0.03. Predicted scores were constrained to the bounds of the HUI3 and aligned with the median and 75th percentiles, while the scores routinely overestimated the 25th percentile. To investigate the predictive accuracy of models not using WG ES-F measures, HUI3 categories were derived used Model 10 covariates (Table 2). The lowest MAE was found to be 0.094, with little variation observed in predictive performance across models (Table 9).

Table 8
Model performance of two-step empirical mapping to HUI3 with arcsine transformation and imputation of discrete Health Utilities Index Mark 3 scores using Washington Group Extended Set on Functioning
Table summary
This table displays the results of Model performance of two-step empirical mapping to HUI3 with arcsine transformation and imputation of discrete Health Utilities Index Mark 3 scores (0.973. The information is grouped by Model (appearing as row headers), Mean: Score, Mean: Lower
95%
confidence
interval, Mean: Upper
95%
confidence
interval, Minimum, 25th percentile, 50th percentile, 75th percentile, Maximum, Mean absolute error, Root
mean squared
error, Kendall's rank coefficient, Percentage
difference
+/- 0.03 units
or more and Model R (appearing as column headers).
Model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	Mean absolute error	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more	Model R²
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable
Mapped
1	0.878	0.862	0.894	-0.223	0.816	0.922	0.973	1.000	0.095	0.172	0.459	60.8	0.423
2	0.875	0.857	0.893	-0.202	0.837	0.925	0.973	1.000	0.087	0.163	0.476	60.6	0.526
3	0.873	0.854	0.892	-0.291	0.839	0.934	0.973	1.000	0.086	0.161	0.484	57.9	0.539
4	0.878	0.863	0.894	-0.208	0.823	0.919	0.973	1.000	0.095	0.170	0.452	61.5	0.425
5	0.876	0.858	0.893	-0.194	0.825	0.930	0.973	1.000	0.086	0.163	0.479	59.4	0.533
6	0.874	0.855	0.892	-0.296	0.834	0.929	0.973	1.000	0.086	0.160	0.486	57.6	0.543
7	0.878	0.862	0.894	-0.208	0.823	0.919	0.973	1.000	0.095	0.170	0.452	61.4	0.425
8	0.876	0.859	0.894	-0.198	0.825	0.931	0.973	1.000	0.086	0.163	0.479	59.6	0.533
9	0.874	0.856	0.892	-0.298	0.835	0.929	0.973	1.000	0.086	0.160	0.486	57.8	0.543
10	0.879	0.863	0.895	-0.267	0.811	0.936	0.973	1.000	0.092	0.163	0.473	59.5	0.493
11	0.878	0.860	0.895	-0.215	0.824	0.943	0.973	1.000	0.087	0.160	0.487	57.2	0.577
12	0.876	0.859	0.894	-0.255	0.829	0.940	0.973	1.000	0.086	0.158	0.486	57.8	0.581
... not applicable Note: WG-SS = Washington Group Short-Set on Functioning; WG ES-F = Washington Group Extended Set on Functioning; HUI3=Health Utilities Index Mark 3; CI= confidence interval; MAE=mean absolute error; Two-step method (1) imputes discrete HUI3 scores (1.00, 0.973) using predictions from multinomial model regressing on WG-SS, WG ES-F (pain and anxiety), age, age², age³, sex, marital status (2) linear regression on arcsine transformed HUI3 score Model 1: WG-SS Model 2: WG-SS, WG ES-F (pain, anxiety) Model 3: WG-SS, WG ES-F (pain, depression) Model 4: WG-SS, sex, age, age², age³ Model 5: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³ Model 6: WG-SS, WG ES-F (pain, depression), sex, age, age², age³ Model 7: WG-SS, sex, age, age², age³, marital status Model 8: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status Model 9: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status Model 10: WG-SS, sex, age, age2, age3, marital status, general health, mental health Model 11: WG-SS, WG ES-F (pain, anxiety), sex, age, age², age³, marital status, general health, mental health Model 12: WG-SS, WG ES-F (pain, depression), sex, age, age², age³, marital status, general health, mental health Source: 2017 Canadian Community Health Survey Rapid Response subsample.

Table 9
Model performance of two-step empirical mapping to HUI3 with arcsine transformation and imputation of discrete HUI3 scores not using Washington Group Extended Set on Functioning
Table summary
This table displays the results of Model performance of two-step empirical mapping to HUI3 with arcsine transformation and imputation of discrete HUI3 scores (0.973. The information is grouped by Model (appearing as row headers), Mean:
score, Mean: Lower
95%
confidence
interval, Mean: Upper
95% confidence
interval, Minimum, 25th
percentile, 50th
percentile, 75th
percentile, Maximum, Mean
absolute
error, Root
mean
squared
error, Kendall's
rank
coefficient, Percentage
difference
+/- 0.03 units
or more and Model R (appearing as column headers).
Model	Mean: score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	Mean absolute error	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more	Model R²
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable
Mapped
1	0.882	0.866	0.897	-0.220	0.829	0.933	0.973	1.000	0.095	0.176	0.443	59.8	0.438
4	0.882	0.867	0.897	-0.208	0.834	0.928	0.973	1.000	0.094	0.174	0.434	60.1	0.441
7	0.883	0.867	0.898	-0.214	0.836	0.927	0.973	1.000	0.094	0.174	0.435	60.5	0.442
10	0.881	0.866	0.897	-0.274	0.825	0.936	0.973	1.000	0.096	0.169	0.439	61.9	0.500
... not applicable Note: WG-SS = Washington Group Short-Set questions; WG ES-F = Washington Group Extended Set on Functioning; HUI3=Health Utilities Index Mark 3; CI= confidence interval; MAE= mean absolute error; Two-step method (1) imputes discrete HUI3 scores (1.00, 0.973) using predictions from multinomial model regressing on WG-SS, age, age², age³, sex, marital status, self-rated general health, self-rated mental health (2) linear regression on arcsine transformed HUI3 score Model 1: WG-SS Model 4: WG-SS, sex, age, age², age³Model 7: WG-SS, sex, age, age2, age3, marital status Model 10: WG-SS, sex, age, age2, age3, marital status, general health, mental health Source: 2017 Canadian Community Health Survey Rapid Response subsample.

Candidate models were selected based on prediction with and without the extended set of WG variables. Candidate model 1 (Table 8, Model 6) included predictive coefficients in the first estimation step for WG-SS, WG ES-F (pain and anxiety), age, age², age³, sex and marital status, and in the second estimation step for WG-SS, WG ES-F (pain and depression), age, age², age³, and sex. Candidate model 2 (Table 9, Model 4) included predictive coefficients in the first estimation step for WG-SS, age, age², age³, sex, marital status, general health and mental health, and in the second step for WG-SS, age, age², age³, and sex. Candidates were selected based on predictive performance and the exclusion of variables that did not improve predictive accuracy.

Conceptual limitations may arise if mapped scores are used in research comparisons across demographic characteristics that were also included in the prediction equation. To address this possibility, restricted versions of the prediction equation were calculated using only (1) the WG-SS or (2) the WG-SS and WG ES-F (including pain and either anxiety or depression for affect). These steps followed the two-step procedure described above. Table 10 shows descriptive and forecast statistics for these estimation models. Models only using the WG-SS had projected nearly all respondents to have health state utility values less than 0.973 (Table 7) and provided results equivalent to those found in the single-step regression (Table 2, with arcsine transformation). Results that combined the WG-SS and WG ES-F in the two estimation steps performed better overall, with only marginally less predictive accuracy than models with demographic predictors. The preferred model included the WG ES-F with anxiety in the first estimation (which was better at replicating the range of the HUI3 score) and depression in the second estimation step. This model had an MAE of 0.087. Appendix 4 outlines regression coefficients and methods to map the HUI3 score for both candidate models.

Table 10
Model performance of two-step empirical mapping to Health Utilities Index Mark 3 (HUI3) with arcsine transformation and imputation of discrete HUI3 scores (0.973, 1.00) excluding demographic covariates Table summary
The information is grouped by Model (appearing as row headers), , calculated using (appearing as column headers).
Model	Mean: Score	Mean: Lower 95% confidence interval	Mean: Upper 95% confidence interval	Minimum	25th percentile	50th percentile	75th percentile	Maximum	MAE	MAE: Lower 95% confidence interval	MAE: Upper 95% confidence interval	Root mean squared error	Kendall's rank coefficient	Percentage difference +/- 0.03 units or more	Model R²
Note ... not applicable Model 1: Step 1 and step 2: Washington Group Short Set on Functioning (WG-SS) Model 2: Step 1: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety); step 2: WG-SS Model 3: Step 1 and step 2: WG-SS, Washington Group Extended Set on Functioning (WG ES-F) (pain, anxiety) Model 4: Step 1: WG-SS, WG ES-F (pain, anxiety); step 2: WG-SS, WG ES-F (pain, depression) Model 5: Step 1: WG-SS, WG ES-F (pain, depression); step 2: WG-SS Model 6: Step 1: WG-SS, WG ES-F (pain, depression); step 2: WG-SS WG-SS, WG ES-F (pain, anxiety) Model 7: Step 1 and step 2: WG-SS, WG ES-F (pain, depression) Notes: WG-SS = Washington Group Short Set on Functioning; WG ES-F = Washington Group Extended Set on Functioning; HUI3 = Health Utilities Index Mark 3; and MAE = mean absolute error. Source: 2017 Canadian Community Health Survey rapid-response subsample.
HUI3	0.848	0.828	0.869	-0.160	0.744	0.919	0.973	1.000	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable	... not applicable
Mapped
1	0.878	0.863	0.893	-0.223	0.848	0.949	0.957	1.000	0.098	0.088	0.109	0.177	0.432	70.9	0.471
2	0.876	0.861	0.891	-0.222	0.815	0.920	0.973	1.000	0.096	0.085	0.107	0.173	0.453	61.5	0.425
3	0.874	0.857	0.891	-0.197	0.838	0.933	0.973	1.000	0.087	0.078	0.097	0.166	0.472	59.5	0.525
4	0.872	0.853	0.890	-0.291	0.840	0.936	0.973	1.000	0.087	0.077	0.097	0.163	0.480	58.3	0.537
5	0.876	0.860	0.892	-0.227	0.829	0.923	0.973	0.973	0.095	0.085	0.105	0.172	0.460	60.5	0.414
6	0.873	0.855	0.891	-0.190	0.837	0.925	0.973	0.974	0.086	0.077	0.095	0.163	0.477	60.4	0.521
7	0.870	0.852	0.889	-0.287	0.841	0.937	0.973	0.973	0.086	0.076	0.095	0.161	0.484	58.8	0.532

4 Discussion

Information on health status and quality of life obtained from the HUI3 system plays an important role in economic, clinical and population health analysis in Canada. While the adoption of the WG measure as part of two-year theme content in the CCHS permits collection of a validated and internationally comparable measure of functional capacity, the lack of preference-based scoring functions makes it unsuited to HRQoL measurement. This data gap was addressed through qualitative and empirical mapping of the overall HUI3 score from the WG measure in a CCHS subsample of the Canadian general population aged 40 years and older. This head-to-head subsample provided a comparatively large and detailed dataset representative of the non-institutionalized Canadian population.

Qualitative mapping, where conceptual overlap between attributes and attribute levels was determined through expert judgment, resulted in overall underestimation of the HUI3 score and comparatively high levels of prediction error. Conceptual mapping from Phase 1 of this research project (Asakawa et al., 2017) found that, while attributes were similar between the WG and HUI3, the phrasing of questions and response category options varied considerably between the measures, including differences concerning the use of functional aids. Consistent with prior work using different cycles of the CCHS (Asakawa et al., 2018), higher levels of functional health were found in the HUI3 system than with the WG measure. For example, while 84% of the rapid-response subsample had WG vision attribute level 1, this mapped to HUI3 vision levels 1 and 2, which made up 97% of the subsample. Further, in the ambulation attribute, the highest attribute level was 78% for the WG attribute and 92% for the corresponding HUI3 attribute. Efforts to address systemic underestimation included stratifying qualitative mapping by categories of age or sex, although these did little to improve predictive accuracy and were hampered by insufficient overlap between measures at lower attribute levels (not shown). Such differences in the phrasing of questions and response options plausibly resulted in underestimation of the mapped HUI3 score and could not be adequately addressed through qualitative mapping methods.

Empirical mapping tested several regression-based strategies to estimate the statistical relationship between the HUI3 score and the WG measure, in addition to other CCHS covariates. Generally, regression methods showed strong improvements in the overall predictive accuracy of mapped scores, compared with qualitative mapping. Linear regression on the HUI3 score with arcsine transformation on both the WG-SS and WG ES-F showed the greatest predictive accuracy and adherence to the theoretical bounds of the HUI3 score. Arcsine transformation has been shown to improve the distribution of residuals in regression modelling of the HUI3 score and to restrict predicted scores within its theoretical bounds of -0.36 and 1.00 (Bernier et al., 2011). The improved accuracy of statistical prediction of the HUI3 scores when using the WG ES-F measures may highlight the importance of conceptual overlap in empirical mapping (Brazier et al., 2010; Round & Hawton, 2017; Wailoo et al., 2017). The WG ES-F attributes of pain (conceptually mapped to the HUI3 attribute of pain) and anxiety or depression (conceptually mapped alternately to the HUI3 attribute of emotion) correspond to areas that are important contributors to the HUI3 score. First, these are among attributes that show lower distributions of functional health in the subsample: 79% of the subsample had full functional health for emotion and 71% for pain, according to the HUI3, relative to over 90% for the attributes of hearing, speech, mobility and dexterity. Second, HUI3 scoring functions for emotion (Appendix 2) tend to show comparatively lower utility at lower attribute levels. This implies that pain and emotion contribute more than other attributes toward variability in the overall HUI3 score and that the absence of corresponding measures—and inadequacies of related health data routinely collected in the CCHS, such as self-rated general and mental health—places important limitations on the predictive accuracy of the statistical model.

Regression methods led to mapped estimates of reduced variability (Fayers & Hays, 2014), which resulted in scores that did not adequately correspond to the properties of the HUI3 distribution. Direct imputation methods produced a score of 1.00 if respondents had functional health at the level of “no difficulty” in all WG-SS attributes or all WG-SS and WG ES-F attributes, otherwise mapping the HUI3 score using linear regression. This method was found to not lead to increases in predictive accuracy, potentially due to differences in the questions and response categories of the two instruments, as discussed above. A model was selected as the preferred estimation strategy, which included two estimation steps: a first step using multinomial logistic regression to predict the HUI3 score falling in highly prevalent categories, defined as 1, 0.973 or below 0.973, and a second step predicting the HUI3 score through linear regression of the arcsine-transformed score on respondents projected to have scores below 0.973. Scores of 1.00 or 0.973 were imputed based on results from the first estimation step.

The preferred estimation strategy was able to predict reasonably precise health state utility scores in a general population sample and more accurately predict perfect scores of functional health without the conceptual limitations of direct imputation. It was also able to reflect the distributional properties of the skewed HUI3 score and retain prediction to its theoretical bounds. Mapping using the WG-SS and WG ES-F resulted in a mean absolute error in the candidate model of 0.086, about 6.3% of the total range of the HUI3 health state utility score (1.36) and exceeded the predictive accuracy of many mapping studies (Brazier et al., 2010). While demonstrating group-level predictive accuracy, about 60% of the sample had mapped scores that exceeded the minimum clinically important difference of 0.03, implying difficulties in mapping at the individual level. Empirical mapping using only the WG-SS measure also generated reasonable measures in this population group, though with less predictive accuracy (MAE=0.094, or 6.9% of the overall HUI3 range). The inclusion of sociodemographic coefficients provided minor improvements to model predictive performance but was less important than the inclusion of the WG ES-F attributes overall. Predicted scores from the candidate models are validated in a companion report by the same authors (forthcoming).

Limitations of this study should be noted. Empirical mapping was tested and validated on a head-to-head sample of the Canadian population aged 40 years and older and is not generalizable to younger ages. The household population younger than 40 generally has higher levels of functional health, as measured by the HUI3, and additional methods may be required to map the score for these groups (Bushnik et al., 2018). Further, some applicable categories of the WG were absent in the head-to-head sample and may reduce reliability and replicability. In addition, mapped health state utility scores were overestimated in the hold-out sample and across demographic categories. Generally, levels of error were greater at lower HUI3 health state utility scores, a similar finding to other mapping studies (Gray, Rivero-Arias, & Clarke, 2006). Finally, mapping functions were generated on a non-institutionalized general population sample and may not be appropriate for use with other population groups, such as samples including patient data or respondents from institutional settings. Further work may incorporate methods to adjust population-level mapped health state utility scores for institutionalized populations (Bushnik et al., 2018).

This study offers a potential method, through empirical mapping, to estimate health state utility scores from the WG measure and, as such, addresses data gaps in HRQoL measurement in the CCHS. Mapped health state utility values may be used in future population studies of health-adjusted life expectancy (HALE) and quality-adjusted life years (QALYs), although further validation specific to these uses is required. Future work may further expand on mapping to the population aged less than 40 years.

Appendix 1

Appendix 1 table Washington Group Extended Set on Functioning: Pain
Table summary
This table displays the results of Washington Group Extended Set on Functioning: Pain . The information is grouped by How much pain you had last time
you had pain (appearing as row headers), Frequency of pain in past three months (appearing as column headers).
How much pain you had last time you had pain	Frequency of pain in past three months
How much pain you had last time you had pain	Never	Some days	Most days	Every day	Don’t know
Note ... not applicable (1) Never had pain OR had a little pain some days (2) Had pain every day (a little) OR had pain most days (a little OR in between) OR had pain some days (in between OR a lot) (3) Had pain every day (in between) OR had pain most days (a lot) (4) Had pain every day (a lot) (5) Not stated
Not asked	(1)	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	(5)
A little	Note ...: not applicable	(1)	(2)	(2)	(5)
In between	Note ...: not applicable	(2)	(2)	(3)	(5)
A lot	Note ...: not applicable	(2)	(3)	(4)	(5)
Don’t know	Note ...: not applicable	(5)	(5)	(5)	(5)

Washington Group Extended Set on Functioning: Anxiety
Table summary
This table displays the results of Washington Group Extended Set on Functioning: Anxiety . The information is grouped by Levels of feelings last time felt worried,
nervous, or anxious (appearing as row headers), How often feel worried, nervous, or anxious? (appearing as column headers).
Levels of feelings last time felt worried, nervous, or anxious	How often feel worried, nervous, or anxious?
	Daily	Weekly	Monthly	A few times a year	Never	Don’t know
Note ... not applicable (1) Never feel worried, nervous or anxious OR feel worried, nervous or anxious a few times a year (2) Feel worried, nervous or anxious monthly OR feel worried, nervous or anxious weekly (a little OR in between) OR feel worried, nervous or anxious daily (a little) (3) Feel worried, nervous or anxious weekly (a lot) OR feel worried, nervous or anxious daily (in between) (4) Feel worried, nervous or anxious daily (a lot) (5) Not stated
Not asked	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	(1)	(5)
A little	(2)	(2)	(2)	(1)	(1)	(5)
In between	(3)	(2)	(2)	(1)	(1)	(5)
A lot	(4)	(3)	(2)	(1)	(1)	(5)
Don’t know	(5)	(5)	(5)	(5)	(5)	(5)

Washington Group Extended Set on Functioning: Depression
Table summary
This table displays the results of Washington Group Extended Set on Functioning: Depression . The information is grouped by Level of feelings last time felt depressed (appearing as row headers), How often do you feel depressed? (appearing as column headers).
Level of feelings last time felt depressed	How often do you feel depressed?
Level of feelings last time felt depressed	Daily	Weekly	Monthly	A few times a year	Never	Don’t know
Note ... not applicable (1) Never feel depressed OR feel depressed a few times a year (2) Feel depressed monthly OR feel depressed weekly (a little OR in between) OR feel depressed daily (a little) (3) Feel depressed weekly (a lot) OR feel depressed daily (in between) (4) Feel depressed daily (a lot) (5) Not stated
Not asked	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	Note ...: not applicable	(1)	(5)
A little	(2)	(2)	(2)	(1)	(1)	(5)
In between	(3)	(2)	(2)	(1)	(1)	(5)
A lot	(4)	(3)	(2)	(1)	(1)	(5)
Don’t know	(5)	(5)	(5)	(5)	(5)	(5)

Appendix 2

Appendix 2
Health Utilities Index Mark 3 multi-attribute utility function: Simplified format of dead=0.00 / perfect health=1.00 scale Table summary
The information is grouped by Level (appearing as row headers), Vision b1, Hearing b2, Speech b3, Ambulation b4, Dexterity b5, Emotion b6, Cognition* b7 and Pain b8, calculated using score units of measure (appearing as column headers).
Level	Vision b¹	Hearing b²	Speech b³	Ambulation b⁴	Dexterity b⁵	Emotion b⁶	Cognition ^{Appendix 2 Note *} b⁷	Pain b⁸
Level	score
Note ... not applicable Note * The single-attribute utility score for level 3 cognition is greater than the single-attribute utility score for level 2 cognition. Return to note * referrer Notes: Chronic states and the perfect health state are here defined as lasting for a lifetime. Death is defined as immediate. Formula (dead to perfect health scale) $u^{} = 1.371 (b_{1} b_{2} * b_{3} * b_{4} * b_{5} * b_{6} * b_{7} * b_{8}) - 0.371$ where u* is the utility of a chronic health state¹ on a utility scale, where dead² has a utility of 0.00 and healthy³ has a utility of 1.00. Source: Adapted from Feeny et al. (2002).
1	1	1	1	1	1	1	1	1
2	0.98	0.95	0.94	0.93	0.95	0.95	0.92	0.96
3	0.89	0.89	0.89	0.86	0.88	0.85	0.95	0.9
4	0.84	0.8	0.81	0.73	0.76	0.64	0.83	0.77
5	0.75	0.74	0.68	0.65	0.65	0.46	0.6	0.55
6	0.61	0.61	... not applicable	0.58	0.56	... not applicable	0.42	... not applicable

Appendix 3

Appendix 3 table Conceptual mapping of attribute levels between the Health Utilities Index Mark 3 and the Washington Group disability measure Table summary
The information is grouped by Health Utilities Index Mark 3 attribute (appearing as row headers), , calculated using (appearing as column headers).
Health Utilities Index Mark 3 attribute	Washington Group attribute	Washington Group attribute levels	Health Utilities Index Mark 3 attribute levels
Note 1 Based on the question “Using your usual language, do you have difficulty communicating, for example understanding or being understood?” Return to note 1 referrer Note 2 Washington Group measures for affect (anxiety and depression) tested in alternate qualitative mapping specifications. Return to note 2 referrer Note 3 Washington Group attributes from the Extended Set on Functioning; all other attributes from the Short Set. Return to note 3 referrer Note: Conceptual mapping based on project phase 1 (Asakawa et al., 2017). Source: Authors tabulations.
Vision	Vision	1. No difficulty seeing with or without glasses	1. Able to see well enough to read ordinary newsprint and recognize a friend on the other side of the street, without glasses or contact lenses. 2. Able to see well enough to read ordinary newsprint and recognize a friend on the other side of the street, but with glasses. 3. Able to read ordinary newsprint with or without glasses but unable to recognize a friend on the other side of the street, even with glasses.
	Vision	2. Some difficulty seeing with or without glasses	4. Able to recognize a friend on the other side of the street with or without glasses but unable to read ordinary newsprint, even with glasses.
	Vision	3. A lot of difficulty seeing with or without glasses	5. Unable to read ordinary newsprint and unable to recognize a friend on the other side of the street, even with glasses
	Vision	4. Cannot see at all	6. Unable to see at all
Hearing	Hearing	1. No difficulty hearing even when using a hearing aid	1. Able to hear what is said in a group conversation with at least three other people, without a hearing aid. 2. Able to hear what is said in a conversation with one other person in a quiet room without a hearing aid, but requires a hearing aid to hear what is said in a group conversation with at least three other people.
	Hearing	2. Some difficulty hearing even when using a hearing aid	3. Able to hear what is said in a conversation with one other person in a quiet room with a hearing aid, and able to hear what is said in a group conversation with at least three other people, with a hearing aid. 4. Able to hear what is said in a conversation with one other person in a quiet room, without a hearing aid, but unable to hear what is said in a group conversation with at least three other people, even with a hearing aid.
	Hearing	3. A lot of difficulty hearing even when using a hearing aid.	5. Able to hear what is said in a conversation with one other person in a quiet room with a hearing aid, but unable to hear what is said in a group conversation with at least three other people, even with a hearing aid
	Hearing	4. Unable to hear at all.	6. Unable to hear at all.
Speech	Communication ^{Appendix 3 Note 1}	1. No difficulty communicating.	1. Able to be understood completely when speaking with strangers or friends. 2. Able to be understood partially when speaking with strangers but able to be understood completely when speaking with people who know me well.
	Communication ^{Appendix 3 Note 1}	2. Some difficulty communicating.	3. Able to be understood partially when speaking with strangers or people who know me well.
	Communication ^{Appendix 3 Note 1}	3. A lot of difficulty communicating.	4. Unable to be understood when speaking with strangers but able to be understood partially by people who know me well.
	Communication ^{Appendix 3 Note 1}	4. Cannot communicate at all.	5. Unable to be understood when speaking to other people (or unable to speak at all).
Ambulation	Mobility	1. No difficulty walking or climbing steps	1. Able to walk around the neighbourhood without difficulty, and without walking equipment. 2. Able to walk around the neighbourhood with difficulty, but does not require walking equipment or the help of another person.
	Mobility	2. Some difficulty walking or climbing steps	3. Able to walk around the neighbourhood with walking equipment, but without the help of another person. 4. Able to walk only short distances with walking equipment, and requires a wheelchair to get around the neighbourhood.
	Mobility	3. A lot of difficulty walking or climbing steps.	5. Unable to walk alone, even with walking equipment. Able to walk short distances with the help of another person, and requires a wheelchair to get around the neighbourhood.
	Mobility	4. Cannot walk at all.	6. Cannot walk at all.
Dexterity	Self-care	1. No difficulty with self-care, such as washing all over or dressing.	1. Full use of two hands and ten fingers. 2. Limitations in the use of hands or fingers, but does not require special tools or help of another person.
	Self-care	2. Some difficulty with self-care, such as washing all over or dressing.	3. Limitations in the use of hands or fingers, is independent with use of special tools (does not require the help of another person). 4. Limitations in the use of hands or fingers, requires the help of another person for some tasks (not independent even with use of special tools)
	Self-care	3. A lot of difficulty with self-care, such as washing all over or dressing.	5. Limitations in use of hands or fingers, requires the help of another person for most tasks (not independent even with use of special tools).
	Self-care	4. Cannot do at all.	6. Limitations in use of hands or fingers, requires the help of another person for all tasks (not independent even with use of special tools).
Emotion ^{Appendix 3 Note 2}	Affect ^{Appendix 3 Note 3} (depression)	1. Never feel depressed, or feel depressed a few times a year.	1. Happy and interested in life.
	Affect ^{Appendix 3 Note 3} (depression)	2. Feel depressed monthly or feel depressed weekly (a little or in between a little and a lot) or feel depressed daily (a little).	2. Somewhat happy.
	Affect ^{Appendix 3 Note 3} (depression)	3. Feel depressed daily (a little) or feel depressed weekly (a lot).	3. Somewhat unhappy. 4. Very unhappy.
	Affect ^{Appendix 3 Note 3} (depression)	4. Feel depressed daily (a lot).	5. So unhappy that life is not worthwhile.
Emotion ^{Appendix 3 Note 2}	Affect ^{Appendix 3 Note 3} (anxiety)	1. Never feel worried, nervous or anxious or feel worried, nervous or anxious a few times a year.	1. Happy and interested in life.
	Affect ^{Appendix 3 Note 3} (anxiety)	2. Feel worried, nervous or anxious monthly or feel worried, nervous or anxious weekly (a little or in between a little and a lot) or feel worried, nervous or anxious daily (a little)	2. Somewhat happy. 3. Somewhat unhappy.
	Affect ^{Appendix 3 Note 3} (anxiety)	3. Feel worried, nervous or anxious daily (a little) or feel worried, nervous or anxious weekly (a lot)	4. Very unhappy
	Affect ^{Appendix 3 Note 3} (anxiety)	4. Feel worried, nervous or anxious daily (a lot)	5. So unhappy that life is not worthwhile
Cognition	Cognition	1. No difficulty remembering or concentrating	1. Able to remember most things, think clearly and solve day-to-day problems. 2. Able to remember most things, but have a little difficulty when trying to think and solve day-to-day problems.
	Cognition	2. Some difficulty remembering or concentrating	3. Somewhat forgetful, but able to think clearly and solve day-to-day problems
	Cognition	3. A lot of difficulty remembering or concentrating	4. Somewhat forgetful, and have a little difficulty when trying to think or solve day-to-day problems
	Cognition	4. Cannot do at all	5. Very forgetful, and have great difficulty when trying to think or solve day-to-day problems. 6. Unable to remember anything at all, and unable to think or solve day-to-day problems.
Pain	Pain ^{Appendix 3 Note 3}	1. Never had pain or had pain some days (a little)	1. Free of pain and discomfort. 2. Mild to moderate pain that prevents no activities
	Pain ^{Appendix 3 Note 3}	2. Had pain every day (a little) or had pain most days (a little or somewhere in between a little and a lot) or had pain some days (in between a little and a lot or a lot).	3. Moderate pain that prevents a few activities.
		3. Had pain every day (in between a little and a lot) or had pain most days (a lot).	4. Moderate to severe pain that prevents some activities
	Pain ^{Appendix 3 Note 3}	4. Had pain every day (a lot)	5. Severe pain that prevents most activities

Appendix 4

Appendix 4 table
Candidate model 1 Table summary
The information is grouped by Variable description (appearing as row headers), First-step coefficients, Second-step coefficients, β, γ and δ, calculated using units of measure (appearing as column headers).
Variable description	First-step coefficients		Second-step coefficients
Variable description	β	γ	δ
Note ... not applicable Notes: WG-SS = Washington Group Short Set on Functioning; and WG ES-F = Washington Group Extended Set on Functioning. Source: 2017 Canadian Community Health Survey rapid-response subsample.
WG-SS vision
No difficulty	0	0	0
Some difficulty	-0.5138	-0.69896	-0.0463124
A lot of difficulty	-0.58369	-16.7024	-0.0806459
Unable to do	0.851381	-16.5612	-0.0700441
Not stated	18.54257	18.99978	-0.6048409
WG-SS hearing
No difficulty	0	0	0
Some difficulty	-0.35549	-0.40425	-0.0253059
A lot of difficulty	-1.88281	-17.5026	-0.3140145
Unable to do	-11.2348	18.89366	-0.3513188
Not stated	-33.2204	-34.8649	0
WG-SS mobility
No difficulty	0	0	0
Some difficulty	-0.5592	-0.73422	-0.2015701
A lot of difficulty	-2.89708	-16.5066	-0.4700025
Unable to do	-17.1235	-16.742	-0.5949685
Not stated	-19.3796	-18.8958	-0.5706593
WG-SS cognition
No difficulty	0	0	0
Some difficulty	-1.07527	-1.90556	-0.1667633
A lot of difficulty	-16.7047	-16.0525	-0.4495562
Unable to do	2.448472	14.38663	-0.2082227
Not stated	18.69593	0.866978	0.0657736
WG-SS self-care
No difficulty	0	0	0
Some difficulty	-0.73333	-15.6698	-0.1676797
A lot of difficulty	-15.6507	-14.1999	-0.5187314
Unable to do	-11.1156	3.405978	-0.5250122
Not stated	-0.53158	-0.62957	0.1093124
WG-SS communication
No difficulty	0	0	0
Some difficulty	-1.09505	-0.80345	-0.1651021
A lot of difficulty	3.739904	-10.3529	-0.0016886
Unable to do	... not applicable	... not applicable	... not applicable
Not stated	-30.9669	-16.7047	-0.2340906
WG ES-F pain
Never had pain or had pain some days (a little)	0	0	0
Had pain every day (a little) or had pain most days (a little or in between a little and a lot) or had pain some days (in between a little and a lot or a lot)	-0.96752	-0.96959	-0.1372369
Had pain every day (in between a little and a lot) or had pain most days (a lot)	-2.27801	-2.09324	-0.3591698
Had pain every day (a lot)	-2.30331	-2.08809	-0.5353398
Not stated	... not applicable	... not applicable	... not applicable
WG ES-F anxiety
Never feel worried, nervous or anxious or feel worried, nervous or anxious a few times a year	0	0	0
Feel worried, nervous or anxious monthly or feel worried, nervous or anxious weekly (a little or in between a little and a lot) or feel worried, nervous or anxious daily (a little)	-0.53158	-0.62957	... not applicable
Feel worried, nervous or anxious weekly (a lot) or feel worried, nervous or anxious daily (between a little and a lot)	-1.13668	-2.04243	... not applicable
Feel worried, nervous or anxious daily (a lot)	-1.7661	-1.18641	... not applicable
Not stated	-1.36895	0.016042	... not applicable
WG ES-F depression
Never feel depressed or feel depressed a few times a year	... not applicable	... not applicable	0
Feel depressed monthly or feel depressed weekly (a little or between a little and a lot) or feel depressed daily (a little)	... not applicable	... not applicable	-0.1741433
Feel depressed weekly (a lot) or feel depressed daily (between a little and a lot)	... not applicable	... not applicable	-0.2838989
Feel depressed daily (a lot)	... not applicable	... not applicable	-0.4179881
Not stated	... not applicable	... not applicable	-0.2063329
Age (years, centre at 62)
Age	-0.01504	-0.02237	-0.0015583
Age ^{Appendix 4 Note 2}	-0.00125	0.001793	4.43E-06
Age ^{Appendix 4 Note 3}	5.85E-05	-3.70E-05	-2.95E-06
Sex
Male	0	0	0
Female	0.436339	-0.00818	0.0151686
Marital status
Married or common law	0	0	... not applicable
Widowed; separated; divorced; or single, never married	-0.2377	0.030347	... not applicable
Not stated	-0.89605	-17.6181	... not applicable
Constant	0.625385	-0.06105	1.2501338

Appendix 4 table
Candidate model 2 Table summary
The information is grouped by Variable description (appearing as row headers), First-step coefficients, Second-step coefficients, β, γ and δ, calculated using units of measure (appearing as column headers).
Variable description	First-step coefficients		Second-step coefficients
Variable description	β	γ	δ
Note ... not applicable Note: WG-SS = Washington Group Short Set on Functioning. Source: 2017 Canadian Community Health Survey rapid-response subsample.
WG-SS vision
No difficulty	0	0	0
Some difficulty	-0.4427192	-0.622193	-0.0342789
A lot of difficulty	-0.58153122	-16.993984	-0.15771655
Unable to do	1.2729682	-16.476993	-0.06623129
Not stated	18.700756	19.271366	-1.1416135
WG-SS hearing
No difficulty	0	0	0
Some difficulty	-0.38875093	-0.44841755	-0.03267161
A lot of difficulty	-1.897773	-17.463538	-0.29629429
Unable to do	-13.133998	18.051263	-0.54203977
Not stated	-34.77817	-35.736802	0
WG-SS mobility
No difficulty	0	0	0
Some difficulty	-0.81222301	-0.96174524	-0.27593801
A lot of difficulty	-3.2810702	-17.149309	-0.69897206
Unable to do	-16.847342	-16.271138	-0.73437133
Not stated	-18.989008	-18.591085	-0.49377781
WG-SS cognition
No difficulty	0	0	0
Some difficulty	-0.83183092	-1.6617851	-0.19371209
A lot of difficulty	-17.038335	-16.768265	-0.61849676
Unable to do	1.0537305	12.539575	-0.30501034
Not stated	19.115092	1.3470731	-0.37191609
WG-SS self-care
No difficulty	0	0	0
Some difficulty	-0.7143783	-15.766142	-0.23042272
A lot of difficulty	-14.405838	-12.887276	-0.68072406
Unable to do	-11.89192	2.17125	-0.67758229
Not stated	0.90863264	0.59832036	0.10931236
WG-SS communication
No difficulty	0	0	0
Some difficulty	-1.016783	-0.63671982	-0.19149598
A lot of difficulty	4.5300097	-10.420647	0.00564958
Unable to do	... not applicable	... not applicable	... not applicable
Not stated	-33.052687	-17.591905	-0.18038516
Age (years, centre at 62)
Age	-0.00242904	-0.01161276	0.0037914
Age ^{Appendix 4 Note 2}	-0.00092281	0.00195602	0.00008082
Age ^{Appendix 4 Note 3}	0.00005345	-0.00003192	-0.000009276
Sex
Male	0	0	0
Female	0.18365673	-0.26527423	-0.02518461
Marital status
Married or common law	0	0	... not applicable
Widowed; separated; divorced; or single, never married	-0.19089891	0.03987409	... not applicable
Not stated	-0.52903053	-17.207523	... not applicable
Self-rated health
Poor	-1.0378036	-0.55357327	... not applicable
Fair	-0.59380978	-0.89205454	... not applicable
Good	-0.8547589	-0.76134274	... not applicable
Very good	-0.0067599	-0.12386246	... not applicable
Excellent	0	0	... not applicable
Not stated	-16.716375	-2.7547615	... not applicable
Self-rated mental health
Poor	-17.455517	-17.043811	... not applicable
Fair	-1.6257774	-1.8077786	... not applicable
Good	-1.1140278	-0.99820412	... not applicable
Very good	-0.20539518	-0.39988375	... not applicable
Excellent	0	0	... not applicable
Not stated	-1.027783	-0.90652266	... not applicable
Constant	0.6743065	0.09994616	1.1266818

Appendix 4 table
Restricted version model 1
Washington Group Short Set on Functioning and Extended Set on Functioning only Table summary
The information is grouped by Variable description (appearing as row headers), First-step coefficients, Second-step coeffients, β, γ and δ, calculated using units of measure (appearing as column headers).
Variable description	First-step coefficients		Second-step coeffients
Variable description	β	γ	δ
Note .. not available for a specific reference period Note ... not applicable Notes: WG-SS = Washington Group Short Set on Functioning; and WG ES-F = Washington Group Extended Set on Functioning. Source: 2017 Canadian Community Health Survey rapid-response subsample.
WG-SS vision
No difficulty	0	0	0
Some difficulty	-0.50488416	-0.64743747	-0.0424813
A lot of difficulty	-0.65292308	-17.16756	-0.08458749
Unable to do	0.63021012	-16.972111	-0.12158619
Not stated	19.194261	19.644893	-0.66279188
WG-SS hearing
No difficulty	0	0	0
Some difficulty	-0.3882268	-0.58278391	-0.03490877
A lot of difficulty	-2.0619157	-17.989298	-0.33191818
Unable to do	-11.641732	19.066766	-0.35663148
Not stated	-33.967604	-36.354004	0
WG-SS mobility
No difficulty	0	0	0
Some difficulty	-0.57753757	-1.0070102	-0.22814933
A lot of difficulty	-2.9376888	-17.15007	-0.49101296
Unable to do	-17.397093	-17.24778	-0.61464683
Not stated	-19.676444	-19.871707	-0.61481335
WG-SS cognition
No difficulty	0	0	0
Some difficulty	-1.0831727	-1.901896	-0.17041293
A lot of difficulty	-17.055188	-16.671694	-0.45362024
Unable to do	2.3698603	14.763712	-0.16957424
Not stated	19.650777	1.0112447	-0.00625451
WG-SS self-care
No difficulty	0	0	0
Some difficulty	-0.79455132	-16.203229	-0.18367618
A lot of difficulty	-15.689927	-14.680991	-0.53164723
Unable to do	-11.043761	3.0457288	-0.53446868
Not stated	-0.47979583	-0.38689063	0.10931236
WG-SS communication
No difficulty	0	0	0
Some difficulty	-1.1411	-0.77782122	-0.17078836
A lot of difficulty	3.7885564	-11.839069	-0.00521095
Unable to do	... not applicable	... not applicable	.. not available for a specific reference period
Not stated	-32.210625	-17.400008	-0.22434215
WG ES-F pain
Never had pain or had pain some days (a little)	0	0	0
Had pain every day (a little) or had pain most days (a little or in between a little and a lot) or had pain some days (in between a little and a lot or a lot)	-0.9044098	-0.93074869	-0.12206499
Had pain every day (in between a little and a lot) or had pain most days (a lot)	-2.1948775	-2.0763137	-0.34196623
Had pain every day (a lot)	-2.2775882	-2.0265376	-0.51893602
Not stated	... not applicable	... not applicable	... not applicable
WG ES-F anxiety
Never feel worried, nervous or anxious or feel worried, nervous or anxious a few times a year	0	0	... not applicable
Feel worried, nervous or anxious monthly or feel worried, nervous or anxious weekly (a little or in between a little and a lot) or feel worried, nervous or anxious daily (a little)	-0.47979583	-0.38689063	... not applicable
Feel worried, nervous or anxious weekly (a lot) or feel worried, nervous or anxious daily (between a little and a lot)	-1.0935341	-1.5592427	... not applicable
Feel worried, nervous or anxious daily (a lot)	-1.6458458	-1.029524	... not applicable
Not stated	-1.3866703	0.26188224	... not applicable
WG ES-F depression
Never feel depressed or feel depressed a few times a year	... not applicable	... not applicable	0
Feel depressed monthly or feel depressed weekly (a little or between a little and a lot) or feel depressed daily (a little)	... not applicable	... not applicable	-0.1686219
Feel depressed weekly (a lot) or feel depressed daily (between a little and a lot)	... not applicable	... not applicable	-0.27205564
Feel depressed daily (a lot)	... not applicable	... not applicable	-0.37610348
Not stated	... not applicable	... not applicable	-0.14422467
Constant	0.58475774	0.24551477	1.2564775

Appendix 4 table
Restricted version model 2
Washington Group Short Set on Functioning only Table summary
The information is grouped by Variable description (appearing as row headers), Estimation coefficients and δ, calculated using units of measure (appearing as column headers).
Variable description	Estimation coefficients
Variable description	δ
Note ... not applicable Note: WG-SS = Washington Group Short Set on Functioning. Source: 2017 Canadian Community Health Survey rapid-response subsample.
WG-SS vision
No difficulty	0
Some difficulty	-0.07201914
A lot of difficulty	-0.18244427
Unable to do	0.15636221
Not stated	0.2448262
WG-SS hearing
No difficulty	0
Some difficulty	-0.05769038
A lot of difficulty	-0.33345162
Unable to do	-0.48977881
Not stated	-1.4092114
WG-SS mobility
No difficulty	0
Some difficulty	-0.32392693
A lot of difficulty	-0.75338387
Unable to do	-0.77896701
Not stated	-0.57178032
WG-SS cognition
No difficulty	0
Some difficulty	-0.24802674
A lot of difficulty	-0.65790998
Unable to do	-0.38682362
Not stated	0.1631044
No difficulty	0
Some difficulty	-0.24926101
A lot of difficulty	-0.67379512
Unable to do	-0.64435499
Not stated	0.10931236
WG-SS communication
No difficulty	0
Some difficulty	-0.19354506
A lot of difficulty	0.07877046
Unable to do	... not applicable
Not stated	-0.51602516
Constant	1.2134445

Prior to running the first mapping step, the user may derive a three-category variable to represent prominent categories of the Health Utilities Index Mark 3 (HUI3) variable. The new variable, H3, should take the form H3=1 if HUI3<0.973, H3=2 if HUI3=0.973, and H3=3 if HUI3=1.00. The predicted probability of each level of the HUI3 may be ascertained for each individual respondent based on their observed characteristics through the following equations:

$Pr (H 3 = 3 |X = x) = \frac{exp (γ_{0} + γ_{1} X_{1} + γ_{2} X_{2} + \dots γ_{n} X_{n})}{1 + \exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots β_{n} X_{n}) + exp (γ_{0} + γ_{1} X_{1} + γ_{2} X_{2} + \dots γ_{n} X_{n})} (1)$

$Pr (H 3 = 2 |X = x) = \frac{exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots β_{n} X_{n})}{1 + \exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots β_{n} X_{n}) + exp (γ_{0} + γ_{1} X_{1} + γ_{2} X_{2} + \dots γ_{n} X_{n})} (2)$

$Pr (H 3 = 1 |X = x) = 1 - [Pr (H 3 = 3 |X = x) + Pr (H 3 = 2 |X = x)] (3)$

Work-through using candidate model 1: A vector of 36 coefficients $β_{n = 1 \dots 36}$ plus the constant $β_{0}$ are used to predict, for each individual respondent, the probability that H3=2 (HUI3=0.973), and a vector of 36 coefficients $γ_{n = 1 \dots 36}$ plus the constant $γ_{0}$ are used to predict the probability that H3=3 (HUI3=1). The probability that the HUI3 score is less than 1 (H3=1) can be derived from equation (3). The 36 coefficients relate to item scores for the Washington Group (WG) Short Set and WG Extended Set on Functioning (with pain and anxiety), age (centred at 62 and entered as linear, quadratic and cubic forms), sex, and marital status. In some instances, applicable categories of the WG measure are missing coefficients, given that these were not represented in the analytical dataset.

Each respondent in the dataset will now have a predicted probability of each value of H3. Based on the highest predicted probability for each value of H3, the values corresponding to H3=3 (HUI3=1) and H3=2 (HUI3=0.973) may be imputed directly onto a new mapped health state utility score named HUI3map. Individual records where the value of H3=1 shows the highest predicted probability undergo an additional step. As the predictive coefficients used to estimate this step are derived from arcsine transformation of a linearly transformed HUI3 score of the form $a r c s i n e [2 * (H U I 3 + 0.36 / 1 + 0.36)] - 1$ (Mapping part 2: Empirical mapping), the equation must reverse-transform the predicted values:

$H U I 3 m a p = [\frac{(\sin (δ_{0} + δ_{1} X_{1} + δ_{2} X_{2} + \dots δ_{n} X_{n}) + 1) * (1 + 0.36)}{2}] - 0.36 (4)$

where a vector of 36 coefficients $δ_{n = 1 \dots 36}$ plus the constant $δ_{0}$ are used to predict, for each respondent, the arcsine- and linear-transformed HUI3 score. Because this value is not interpretable, additional steps of reverse transformation as outlined in equation (4) are necessary.

Both candidate models, plus the restricted version of Model 1, follow steps 1 through 4. For the restricted version of Model 2, the first estimation step did not improve predictive performance and mapping requires only equation (4).

References

Asakawa, K., Bernier, J., & Kohen, D. (2017). Conceptual mapping of the Washington Group (WG) measure into the Health Utilities Mark 3 (HUI3) [Internal Working Document]. Ottawa: Statistics Canada.

Asakawa, K., Bernier, J., & Kohen, D. (2018). Mapping of the Washington Group (WG) disability measure into the Health Utilities Index Mark 3 (HUI3) [Internal Document]. Ottawa: Statistics Canada.

Bartman, B. A., Rosen, M. J., Bradham, D. D., Weissman, J., Hochberg, M., & Revicki, D. A. (1997). Relationship between health status and utility measures in older claudicants. Quality of Life Research, 7(1), 67-73.

Baum, C. F. (2008). Stata tip 63: Modeling proportions. The Stata Journal, 8(2), 299-303.

Béland, Y., Dale, V., Dufour, J., & Hamel, M. (2005). The Canadian Community Health Survey: Building on the success from the past. Paper presented at the Proceedings of the American Statistical Association Joint Statistical Meetings. Minneapolis: American Statistical Association.

Bernier, J., Feng, Y., & Asakawa, K. (2011). Strategies for handling normality assumptions in multi-level modeling: A case study estimating trajectories of Health Utilities Index Mark 3 scores. Health Reports, 22(4), 45-51.

Boyle, M., Furlong, W., Feeny, D., Torrance, G. W., & Hatcher, J. (1995). Reliability of the Health Utilities Index—Mark III used in the 1991 cycle 6 Canadian General Social Survey Health Questionnaire. Quality of Life Research, 4, 249-257.

Brazier, J. E., Yang, Y., Tsuchiya, A., & Rowen, D. L. (2010). A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures. The European Journal of Health Economics, 11(2), 215-225.

Bushnik, T., Tjepkema, M., & Martel, L. (2018). Health-adjusted life expectancy in Canada. Health Reports, 99(4), 14-22.

Cieza, A., & Stucki, G. (2008). The International Classification of Functioning Disability and Health: Its development process and content validity. European Journal of Physical and Rehabilitation Medicine, 44(3), 303-313.

Conover, W. J. (1999). Practical Nonparametric Statistics (Vol. 350): John Wiley & Sons

Drummond, M. F., Sculpher, M. J., Claxton, K., Stoddart, G. L., & Torrance, G. W. (2015). Methods for the Economic Evaluation of Health Care Programmes. Oxford: Oxford University Press.

Fayers, P. M., & Hays, R. D. (2014). Should linking replace regression when mapping from profile-based measures to preference-based measures? Value in Health, 17(2), 261-265.

Feeny, D., & Furlong, W. (1997). Classification of Levels in Health Utilities Index Mark 2 System (HUI2) and Mark 3 System (HUI3) into Disability Categories: None, Mild, Moderate, and Severe. [Unpublished document]. Hamilton, ON: McMaster University.

Feeny, D., Furlong, W., Torrance, G. W., Goldsmith, C. H., Zhu, Z., DePauw, S. Denton, M., & Boyle, M. (2002). Multiattribute and single-attribute utility functions for the Health Utilities Index Mark 3 system. Medical Care, 40(2), 113-128.

Feng, Y., Bernier, J., McIntosh, C., & Orpana, H. (2009). Validation of disability categories derived from Health Utilities Index Mark 3 scores. Health Reports, 20(2), 43-50.

Franks, P., Lubetkin, E. I., Gold, M. R., & Tancredi, D. J. (2003). Mapping the SF-12 to preference-based instruments: Convergent validity in a low-income, minority population. Medical Care, 41(11), 1277-1283.

Furlong, W. J., Feeny, D. H., Torrance, G. W., & Barr, R. D. (2001). The Health Utilities Index (HUI®) system for assessing health-related quality of life in clinical studies. Annals of Medicine, 33(5), 375-384.

Gray, A. M., Rivero-Arias, O., & Clarke, P. M. (2006). Estimating the association between SF-12 responses and EQ-5D utility values by response mapping. Medical Decision Making, 26(1), 18-29.

Grootendorst, P., Marshall, D., Pericak, D., Bellamy, N., Feeny, D., & Torrance, G. W. (2007). A model to estimate Health Utilities Index mark 3 utility scores from WOMAC index scores in patients with osteoarthritis of the knee. The Journal of Rheumatology, 34(3), 534-542.

Health Utilities Inc. (2015). Sources of Information on the Health Utilities Index (HUI®). Retrieved from http://healthutilities.com/

Heintz, E., Wiréhn, A.-B., Peebo, B. B., Rosenqvist, U., & Levin, L.-Å. (2012). QALY weights for diabetic retinopathy—a comparison of health state valuations with HUI-3, EQ-5D, EQ-VAS, and TTO. Value in Health, 15(3), 475-484.

Horsman, J., Furlong, W., Feeny, D., & Torrance, G. (2003). The Health Utilities Index (HUI®): Concepts, measurement properties and applications. Health and Quality of Life Outcomes, 1(1), 1-13.

Idler, E. L., & Benyamini, Y. (1997). Self-rated health and mortality: A review of twenty-seven community studies. Journal of Health and Social Behavior, 38(1), 21-37.

Kaplan, G. A., Goldberg, D. E., Everson, S. A., Cohen, R. D., Salonen, R., Tuomilehto, J., & Salonen, J. (1996). Perceived health status and morbidity and mortality: Evidence from the Kuopio Ischaemic Heart Disease Risk Factor Study. International Journal of Epidemiology, 25(2), 259-265.

Kopec, J. A., & Willison, K. D. (2003). A comparative review of four preference-weighted measures of health-related quality of life. Journal of Clinical Epidemiology, 56(4), 317-325.

Longworth, L., & Rowen, D. (2013). Mapping to obtain EQ-5D utility values for use in NICE health technology assessments. Value in Health, 16(1), 202-210.

Lundberg, O., & Manderbacka, K. (1996). Assessing reliability of a measure of self-rated health. Scandinavian Journal of Social Medicine, 24(3), 218-224.

Madans, J. H., & Loeb, M. (2013). Methods to improve international comparability of census and survey measures of disability. Disability and Rehabilitation, 35(13), 1070-1073.

Madans, J. H., Loeb, M. E., & Altman, B. M. (2011). Measuring disability and monitoring the UN Convention on the Rights of Persons with Disabilities: The work of the Washington Group on Disability Statistics. BMC Public Health, 11(Suppl 4), S4.

Maddox, G. L., & Douglass, E. B. (1973). Self-assessment of health: A longitudinal study of elderly subjects. Journal of Health and Social Behavior, 87-93.

Marshall, D., Pericak, D., Grootendorst, P., Gooch, K., Faris, P., Frank, C., Bellamy, N., Torrance, G. & Feeny, D. (2008). Validation of a prediction model to estimate Health Utilities Index Mark 3 utility scores from WOMAC index scores in patients with osteoarthritis of the hip. Value in Health, 11(3), 470-477.

Miller, K., Mont, D., Maitland, A., Altman, B., & Madans, J. (2011). Results of a cross-national structured cognitive interviewing protocol to test measures of disability. Quality & Quantity, 45(4), 801-815.

Mont, D. (2019). How Are the Washington Group Questions Consistent with the Social Model of Disability? Retrieved from: https://www.washingtongroup-disability.com/wg-blog/how-are-the-washington-group-questions-consistent-with-the-social-model-of-disability-65/

Mukuria, C., Rowen, D., Harnan, S., Rawdin, A., Wong, R., Ara, R., & Brazier, J. (2019). An updated systematic review of studies mapping (or cross-walking) measures of health-related quality of life to generic preference-based measures to generate utility values. Applied Health Economics and Health Policy, 17(3), 295-313.

Nichol, M. B., Sengupta, N., & Globe, D. R. (2001). Evaluating quality-adjusted life years: Estimation of the Health Utility Index (HUI2) from the SF-36. Medical Decision Making, 21(2), 105-112.

Orpana, H. M., Ross, N., Feeny, D., McFarland, B., Bernier, J., & Kaplan, M. (2009). The natural history of health-related quality of life: A 10-year cohort study. Health Reports, 20(1), 29-35.

Round, J., & Hawton, A. (2017). Statistical alchemy: Conceptual validity and mapping to generate health state utility values. PharmacoEconomics – Open, 1(4), 233-239.

Sengupta, N., Nichol, M. B., Wu, J., & Globe, D. (2004). Mapping the SF-12 to the HUI3 and VAS in a managed care population. Medical Care, 42(9), 927-937.

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289-310.

Statistics Canada. (2015). Canadian Community Health Survey - Annual Component (CCHS): Detailed Information for 2015. Ottawa: Statistics Canada. Retrieved from: https://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&Id=238854

Statistics Canada. (2018). Canadian Community Health Survey (CCHS) Annual component: User guide 2017 Microdata file. Ottawa: Statistics Canada.

United Nations. (2015). Sustainable Development Goals. Retrieved from https://www.un.org/sustainabledevelopment/

Van Doorslaer, E., & Jones, A. M. (2003). Inequalities in self-reported health: Validation of a new approach to measurement. Journal of Health Economics, 22(1), 61-87.

Wailoo, A. J., Hernandez-Alava, M., Manca, A., Mejia, A., Ray, J., Crawford, B., Botteman, M., & Busschbach, J. (2017). Mapping to estimate health-state utility from non-preference-based outcome measures: An ISPOR Good Practices for Outcomes Research Task Force report. Value in Health, 20(1), 18-27.

Washington Group on Disability Statistics. (2009). Understanding and Interpreting Disability as Measured Using the WG Short Set of Questions. Hyattsville, MD: Washington Group on Disability Statistics. Retrieved from https://www.cdc.gov/nchs/data/washington_group/meeting8/interpreting_disability.pdf

Washington Group on Disability Statistics. (2020). An Introduction to the Washington Group on Disability Statistics Question Sets. Hyattsville, MD: Washington Group on Disability Statistics. Retrieved from https://www.washingtongroup-disability.com/resources/an-introduction-to-the-washington-group-on-disability-statistics-question-sets-459/

Washington Group on Disability Statistics, Budapest Initiative, & United Nations Economic and Social Commission for Asia and the Pacific. (2016). Development of Disability Measures for Surveys: The Extended Set on Functioning. Retrieved from: https://www.cdc.gov/nchs/data/washington_group/development_of_disability_measures_for_surveys_the_extended_set_on_functioning.pdf

Date modified:: 2025-08-08

Language selection

Search and menus

Search

Analytical Studies: Methods and References
Mapping the Washington Group on Disability Statistics Disability Measure to the Health Utilities Index Mark 3: Development and Testing of Qualitative and Predictive Multivariable Models in a General Population Sample

Abstract

1 Introduction

2 Materials and methods

2.1 Data

2.2 Washington Group measures

2.3 The Health Utilities Index Mark 3

2.4 Mapping part 1: Qualitative mapping

2.5 Mapping part 2: Empirical mapping

3 Results

3.1 Qualitative mapping

3.2 Empirical mapping 1: Single-step models

3.3 Empirical mapping 2: Direct imputation of perfect functional health scores

3.4 Empirical mapping 3: Estimation and imputation of perfect and near-perfect functional health scores

4 Discussion

Appendix 1

Appendix 2

Appendix 3

Appendix 4

References

Analytical Studies: Methods and References Mapping the Washington Group on Disability Statistics Disability Measure to the Health Utilities Index Mark 3: Development and Testing of Qualitative and Predictive Multivariable Models in a General Population Sample

Abstract

1 Introduction

2 Materials and methods

2.1 Data

2.2 Washington Group measures

2.3 The Health Utilities Index Mark 3

2.4 Mapping part 1: Qualitative mapping

2.5 Mapping part 2: Empirical mapping

3 Results

3.1 Qualitative mapping

3.2 Empirical mapping 1: Single-step models

3.3 Empirical mapping 2: Direct imputation of perfect functional health scores

3.4 Empirical mapping 3: Estimation and imputation of perfect and near-perfect functional health scores

4 Discussion

Appendix 1

Appendix 2

Appendix 3

Appendix 4

References

Note of appreciation

Standards of service to the public

Copyright

Analytical Studies: Methods and References
Mapping the Washington Group on Disability Statistics Disability Measure to the Health Utilities Index Mark 3: Development and Testing of Qualitative and Predictive Multivariable Models in a General Population Sample