An alternative way of estimating a cumulative logistic model with complex survey data
Section 2. A simple example
The
National Survey on Drug Use and Health (NSDUH) is an annual survey of the civilian,
noninstitutionalized population aged 12 or older living in the United States.
Using NSDUH data from 2006 to 2010, we focus on a survey question given to
adolescents (12-17) who received depression treatment in the past year:
During the past 12 months, how
much has treatment or counseling helped you?
The viable responses were: Not
at all (l); A little (2); Some (3); A lot (4); or Extremely (5).
We
discarded missing and invalid responses both to this question and to the
question of whether the respondent received depression treatment in the past
year. We will return to this practice in the discussion section.
Using SAS, we estimated the following simple
cumulative logistic model:
for
where
when
respondent
was
taking medication for depression (0 otherwise), with both
pseudo-maximum-likelihood and the design-sensitive technique. For
pseudo-maximum-likelihood estimation, we reversed the order of the responses
with
when
responded that treatment (or counseling)
helped extremely,
when
responded that treatment helped extremely or a
lot,
when
responded that treatment helped more than a
little, and
when
responded that treatment helped at least a
little. Finally,
when
responded that treatment did not help at all.
In SAS, this meant dependent variable
was set
equal to 1 when treatment helped extremely, to 2 when treatment helped a lot,
and to 5
when treatment didn’t help at all.
For
the design-sensitive technique, we created four observations from
in
a new data set. In the
observation labeled
in SAS, a class (categorical) variable added
to the model statement, we created a dependent variable (D) equal to
in
equation (2.1). We needed to add EVENT = “1” after D in the model
statement because we were modeling when
SAS
code for both estimation techniques are in the appendix. The NSDUH data set we
used had 60 variance strata with two variance primary sampling units (PSUs) in
each and analysis weights based on the probabilities of selection and unit
response.
The
parameter estimates from our pseudo-maximum-likelihood and design-sensitive SAS
runs are displayed in Tables 2.1 and 2.2, respectively. In Table 2.1, Intercept
is the pseudo-maximum-likelihood estimate of
in
equation (2.1). The sum of the Intercept and
in
Table 2.2 is the design-sensitive estimate for
when
or
while
the design-sensitive estimate for
is
the Intercept in Table 2.2 minus the sum:
Finally (and more simply), meds in both tables estimates
In
all cases, estimates of the same parameter from the two tables are close. The
percent increase in every level of satisfaction with treatment due to having
taken drugs for depression (the estimate for
is roughly 45% (in our discussion of the
results of the logistic regressions, we treat differences of the log odds as
equal to percent differences in the odds, even though this is only
approximately true). That near equality suggests that the parallel-lines
assumption is not violated by our NSDUH data.
Table 2.1
Pseudo-maximum-likelihood estimates for the simple cumulative logistic model
Table summary
This table displays the results of Pseudo-maximum-likelihood estimates for the simple cumulative logistic model. The information is grouped by Parameter (appearing as row headers), Estimate, Standard Error, t Value and Pr > | t | (appearing as column headers).
| Parameter |
Estimate |
Standard Error |
t Value |
Pr > | t | |
| Intercept 1 |
-2.2917 |
0.0913 |
-25.10 |
< 0.0001 |
| Intercept 2 |
-0.7617 |
0.0685 |
-11.11 |
< 0.0001 |
| Intercept 3 |
0.2511 |
0.0624 |
4.02 |
0.0002 |
| Intercept 4 |
1.3695 |
0.0739 |
18.53 |
< 0.0001 |
| meds |
0.4516 |
0.0965 |
4.68 |
< 0.0001 |
Table 2.2
Design-sensitive estimates for the simple cumulative logistic model
Table summary
This table displays the results of Design-sensitive estimates for the simple cumulative logistic model. The information is grouped by Parameter (appearing as row headers), Estimate, Standard Error, t Value and Pr > | t | (appearing as column headers).
| Parameter |
Estimate |
Standard Error |
t Value |
Pr > | t | |
| Intercept |
-0.3591 |
0.0583 |
-6.16 |
< 0.0001 |
| C 1 |
-1.9329 |
0.0592 |
-32.63 |
< 0.0001 |
| C 2 |
-0.4039 |
0.0356 |
-11.33 |
< 0.0001 |
| C 3 |
0.6087 |
0.0392 |
15.52 |
< 0.0001 |
| meds |
0.4498 |
0.0955 |
4.71 |
< 0.0001 |
The parallel-lines assumption can be tested
directly by adding a class variable M to the design-sensitive data set with
When added to the model statement in SAS,
the class variable M captures the differing impacts of taking medication for
depression in the previous year on the levels of satisfaction with treatment.
For example, the estimated percent increase in the odds of being extremely
pleased by treatment due to having taken drugs for depression during the year
is, according to Table 2.3, 0.3816 (from
plus 0.0717 (from M = 1) or 45.33%.
The other percent increases are lower, but none are significantly different
from the others. We see that from the extremely low F value for M in Table 2.4.
In addition, none of the
-values for an M in Table 2.3 is
significant even at the 0.5 level (10 times larger than the standard 0.05
level).
Table 2.3
Estimating the general cumulative logistic model
Table summary
This table displays the results of Estimating the general cumulative logistic model. The information is grouped by Parameter (appearing as row headers), Estimate, Standard Error, t Value and Pr > | t | (appearing as column headers).
| Parameter |
Estimate |
Standard Error |
t Value |
Pr > | t | |
| Intercept |
-0.2919 |
0.1270 |
-2.30 |
0.0251 |
| C 1 |
-1.9636 |
0.0806 |
-24.37 |
< 0.0001 |
| C 2 |
-0.4104 |
0.0440 |
-9.33 |
< 0.0001 |
| C 3 |
0.6202 |
0.0490 |
12.66 |
< 0.0001 |
| Meds |
0.3816 |
0.1452 |
2.63 |
0.0109 |
| M 1 |
0.0717 |
0.1273 |
0.56 |
0.5754 |
| M 2 |
0.0234 |
0.0652 |
0.36 |
0.7215 |
| M 3 |
-0.0236 |
0.0719 |
-0.33 |
0.7439 |
Table 2.4
F tests for the general cumulative logistic model
Table summary
This table displays the results of F tests for the general cumulative logistic model. The information is grouped by Effect (appearing as row headers), F Value, Num DF, Den DF and Pr > F (appearing as column headers).
| Effect |
F Value |
Num DF |
Den DF |
Pr > F |
| C |
280.39 |
3 |
58 |
< 0.0001 |
| Meds |
6.91 |
1 |
60 |
0.0109 |
| M |
0.16 |
3 |
58 |
0.9239 |
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2019
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa