Browse by

4. Computing response propensities using latent trait models

Alina Matei and M. Giovanna Ranalli

The variable $θ_{k}$ can be computed using a latent trait model. In general, latent variable models are multivariate regression models that link continuous or categorical responses to unobserved covariates. A latent trait model is essentially a factor analysis model for binary data (see Bartholomew, Steele, Moustaki and Galbraith 2002; Skrondal and Rabe-Hesketh 2007).

We start by creating the matrix with elements ${x_{k ℓ}}_{k \in s; ℓ = 1, \dots, m} .$ Figure 4.1 shows a schematic of the indicators $x_{k ℓ}$ for respondents and nonrespondents. Then, we assume that the factors that drive unit response are the same as those that drive item response on selected variables of interest. In other words, item nonresponse is assumed nonignorable.

Figure 4.1 Schematic representing variables $x_{k ℓ}$ for the sets $r$ and $\bar{r}$

Figure 4.1

Description for Figure 4.1

Let $q_{k ℓ}$ be the probability of response of unit $k$ for item $ℓ,$ for all $ℓ = 1, \dots, m$ and $k \in r .$ As in the case of unit nonresponse, $q_{k ℓ}$ is modelled as a function of the variable of interest using logistic regression as follows

$q_{k ℓ} = P (x_{k ℓ} = 1 | y_{k ℓ}, θ_{k}, R_{k} = 1) = \frac{1}{1 + \exp (- (β_{ℓ 0} + β_{ℓ 1} θ_{k} + β_{ℓ 2} y_{k ℓ}))}, (4.1)$

for $ℓ = 1, \dots, m,$ and $k \in r,$ where $β_{ℓ 0}, β_{ℓ 1}$ and $β_{ℓ 2}$ are parameters. Since $y_{k ℓ}$ is known only for units with $x_{k ℓ} = 1, k \in r,$ Model (4.1) cannot be estimated. As in the case of unit nonresponse, we propose to estimate $q_{k ℓ}$ as a function of an auxiliary variable related to the variable of interest, that is $θ_{k} .$ Model (4.1) is rewritten

$q_{k ℓ} = P (x_{k ℓ} = 1 | θ_{k}, R_{k} = 1) = \frac{1}{1 + \exp (- (β_{ℓ 0} + β_{ℓ 1} θ_{k}))}, (4.2)$

for $ℓ = 1, \dots, m,$ and $k \in r .$ Model (4.2) is not an ordinary logistic regression model, because the $θ_{k} ’ s$ are unobservable values taken by a latent variable. Latent trait models can be used in this case to estimate $q_{k ℓ}, θ_{k}$ and the model parameters. Note that in the area of educational testing and psychological measurement, latent trait modelling is termed Item Response Theory.

The Rasch model (Rasch 1960) is a first simple latent trait model that is well known in the psychometrical literature and used to analyze data from assessments to measure variables such as abilities and attitudes. It takes the following form

$q_{k ℓ} = \frac{1}{1 + \exp (- (β_{ℓ 0} + β_{1} θ_{k}))} for ℓ = 1, \dots, m and k \in r . (4.3)$

The parameters $β_{ℓ 0}$ are estimated for each item $ℓ$ and reflect the extremeness (easiness) of item $ℓ :$ larger values correspond to a larger probability of a positive response at all points in the latent space. The parameter $β_{1}$ is known as the ‘discrimination’ parameter and can be fixed to some arbitrary value without affecting the likelihood as long as the scale of the individuals’ propensities is allowed to be free. In many situations the assumption that item discriminations are constant across items is too restrictive. The two-parameter logistic (2PL) model generalizes the Rasch model by allowing the slopes to vary. Specifically, the 2PL model assumes the form given in Equation (4.2). The parameters $β_{ℓ 1}$ are now estimated for each item $ℓ$ and provide a measure of how much information an item provides about the latent variable $θ_{k} .$ To achieve identifiability of Model (4.2), we can fix the value of one or more parameters $β_{ℓ 0}$ and $β_{ℓ 1}$ in the estimation process. Moran (1986) showed that in the 2PL model, all the parameters are identifiable under wide conditions, provided the number of items exceeds two, and all the slopes are assumed to be strictly positive. A further generalization to Model (4.2) is considered in the literature - the 3PL model - that includes another parameter, the guessing parameter, to model the probability that a subject with a latent variable tending to $- \infty$ responds to an item. Such an extension does not seem necessary in the context at hand and will not be considered further.

4.1 Assumptions in latent trait models

Latent trait models typically rely on the following assumptions. The first one is the so-called conditional independence assumption, which postulates that item responses are independent given the latent variable (i.e., the latent variable accounts for all association among the observed variables $x_{k ℓ}) .$ Consequently, given $θ_{k},$ the conditional probability of $x_{k}$ is

$P (x_{k} | θ_{k}) = \prod_{ℓ = 1}^{m} P (x_{k ℓ} | θ_{k}) .$

Following Bartholomew et al. (2002, page 181) ‘the assumption of conditional independence can only be tested indirectly by checking whether the model fits the data. A latent variable model is accepted as a good fit when the latent variables account for most of the association among the observed responses.’

A second assumption of Models (4.2) and (4.3) is that of monotonicity: as the latent variable $θ_{k}$ increases, the probability of response to an item increases or stays the same across intervals of $θ_{k} .$ In other words, for two values of $θ_{k},$ say $a$ and $b,$ and arbitrarily assuming that $a < b,$ monotonicity implies that $P (x_{k ℓ} = 1 | θ_{k} = a) < P (x_{k ℓ} = 1 | θ_{k} = b)$ for $ℓ = 1, \dots, m .$ Larger values of $θ_{k}$ are associated with a greater chance of a response to each item.

Finally, the third, and possibly strongest, assumption of Models (4.2) and (4.3) is that of unidimensionality, implying that a single latent variable fully explains the willingness of unit $k$ to answer the questionnaire. All these basic assumptions imply that the dependence between the items $x_{k ℓ}$ may be explained by the latent variable $θ_{k}$ which represents the units’ willingness and that the probability that a unit $k$ responds to a given variable increases with $θ_{k} .$

4.2 Estimation of the model

In what follows we focus on the two-parameter logistic (2PL) model given in (4.2). Let $β_{ℓ} = {(β_{ℓ 0}, β_{ℓ 1})}^{'}$ and $β = {β_{ℓ}, ℓ = 1, \dots, m} .$ Model (4.2) can be fitted using maximum likelihood or bayesian methods. We focus here on the former. Under the maximum likelihood approach, three major methods - joint, conditional and marginal maximum likelihood - are developed. Here, we will concentrate on marginal maximum likelihood that can be applied to fit the 2PL model. This method is also used in the simulation studies of Section 6. It consists of maximizing the likelihood of the model after the $θ_{k}$ are integrated out on the basis of a common distribution assumed on these parameters. In particular, it is assumed that $θ_{k}$ is a random variable following a distribution with the density function $h (\cdot);$ typically $θ_{k} \sim N (0,1) .$ It is also assumed that the response vectors $x_{k}$ are independent of one another and the conditional independence assumption holds.

For a set of $n_{r}$ respondents having the response vectors $x_{k}, k = 1, \dots, n_{r},$ the marginal likelihood can be expressed as

$L (β; x_{1}, \dots, x_{n_{r}}) = \prod_{k = 1}^{n_{r}} f (x_{k} | β),$

where $f (x_{k} | β) = \int_{- \infty}^{\infty} g (x_{k} | θ_{k}, β) h (θ_{k}) d θ_{k},$

$g (x_{k} | θ_{k}, β) = \prod_{ℓ = 1}^{m} q_{k ℓ}^{x_{k ℓ}} {(1 - q_{k ℓ})}^{1 - x_{k ℓ}} = \prod_{ℓ = 1}^{m} \frac{\exp (x_{k ℓ} (β_{ℓ 0} + β_{ℓ 1} θ_{k}))}{1 + \exp (β_{ℓ 0} + β_{ℓ 1} θ_{k})},$

and $h$ now denotes the density of the $N (0, 1)$ distribution. The method consists in maximizing the corresponding log-likelihood, given by

$\log L (β; x_{1}, \dots, x_{n_{r}}) = \sum_{k = 1}^{n_{r}} \log (f (x_{k} | β)),$

with respect to $β$ using, for example, the EM algorithm. Estimates of $β_{ℓ 0}$ and $β_{ℓ 1}, ℓ = 1, \dots, m$ are thus provided. Afterwards, $θ_{k}$ is estimated using the empirical Bayes method by maximizing the posterior density

$h (θ_{k} | x_{k}) = \frac{g (x_{k} | θ_{k}, β) h (θ_{k})}{g (x_{k})} \propto g (x_{k} | θ_{k}, β) h (θ_{k}),$

with respect to $θ_{k}$ and keeping item parameters and observations fixed. Estimates of $q_{k ℓ}$ are obtained using Expression (4.2), where $β_{ℓ 0}, β_{ℓ 1}$ and $θ_{k}$ are replaced with their estimates.

4.3. Goodness-of-fit measures of the model

Different goodness-of-fit measures are proposed in the literature to test whether the model given in (4.2) adequately fits the data (see e.g., Bartholomew et al. 2002). One uses two-way and three-way margins of the response items. Discrepancies between the expected $(E)$ and observed $(O)$ counts in these tables are measured using the statistic $R = {(O - E)}^{2} / E .$ Large values of $R$ for the second-order or third-order margins will identify sets of items for which the model does not fit well. Note that the residuals ${(O - E)}^{2} / E$ are not independent and they cannot be summed to give an overall test statistics distributed as a chi-squared (see Bartholomew et al. 2002, page 186). Item fit indexes (Bond and Fox 2007) can be used to this end as well. On the basis of estimated latent variables and item parameters, the expected response of a unit to an item can be computed. The similarity between the observed and expected responses to any item can be assessed through two fit mean-square statistics: the outlier-sensitive fit statistic (item outfit) and the information-weighted fit statistic (item infit). The estimate produced by the item outfit is relatively more affected by unexpected responses different from a person’s measure, i.e., it is more sensitive to unexpected observations by units on items that are relatively very easy or very hard for them to answer. The item infit has each observation weighted by the information and, on the other side, is relatively more affected by unexpected responses closer to a person’s measure, i.e., it is more sensitive to unexpected patterns of observations by units on items that are roughly targeted on them according to their latent variable value. The expected value for both statistics is one. For infit and outfit values greater/less than one indicate more/less variation between the observed and the predicted response patterns, a range of 0.5 to 1.5 is generally acceptable (Bond and Fox 2007).

In addition, point-measure correlations (Olsson, Drasgow and Dorans 1982) can be used to estimate the correlation between the latent variable and the single item response. Items for which such measures take negative or zero values should be removed from the analysis or may be evidence that the latent construct is not unidimensional. Unidimensionality can be tested by running a Principal Components Analysis (PCA) of the standardized residuals for the items (Wright 1996). In this way the first component (dimension) has already been removed, and it is possible to look at secondary dimensions, components or contrasts. Unidimensionality is supported by observing that the eigenvalue of the first PCA component in the correlation matrix of the residuals is small (usually less than 2.0). If not, the loadings on the first contrast indicate that there are contrasting patterns in the residuals.

Finally, when items are used to form a scale, they need to have internal consistency. Cronbach alpha can be used to test whether items have the reliability property, i.e., if they all measure the same thing, then they should be correlated with one another.

4.4. Estimation of $p_{k}$

Two solutions are shown here to estimate $p_{k}$ using information from the latent trait model. The first solution uses logistic regression to estimate $p_{k}$ for all $k \in s,$ and a two-stage approach.

Stage 1: First, an estimate ${\hat{θ}}_{k}$ of $θ_{k}$ is provided. To compute a value ${\hat{θ}}_{k}$ for $k \in \bar{r},$ we assume again that unit nonresponse is just an extreme form of item nonresponse. Thus, a nonrespondent does not answer any item $ℓ$ and thus $x_{k ℓ} = 0,$ for all $ℓ = 1, \dots, m .$ The computation of ${\hat{θ}}_{k}$ for $k \in \bar{r}$ is handled as follows: we add to the set $r$ a phantom respondent unit $\tilde{k}$ having $x_{\tilde{k} ℓ}$ equal to 0, for all $ℓ = 1, \dots, m .$ We denote this new set by $\tilde{r} = r \cup {\tilde{k}} .$ We estimate the parameters of Model (4.2) using all units $k \in \tilde{r},$ and compute the values ${\hat{θ}}_{k}, k \in \tilde{r} .$ Model (4.2) allows the computation of ${\hat{θ}}_{k}$ for all $k \in \tilde{r} .$ Unit $\tilde{k}$ has an estimated value ${\hat{θ}}_{\tilde{k}} .$ We assign to all units $k \in \bar{r}$ an estimate ${\hat{θ}}_{k}$ equal to ${\hat{θ}}_{\tilde{k}} .$ Thus, the same value of ${\hat{θ}}_{k}$ is provided for all $k \in \bar{r} .$ Using this method, each unit $k \in s$ has associated an estimate ${\hat{θ}}_{k} .$ This is the key feature for the estimation of the response probabilities $p_{k}$ provided in the next stage.

Stage 2: We use the estimate ${\hat{θ}}_{k},$ for $k \in s,$ provided in the first stage as a covariate in Model (3.4) instead of the unknown value of $θ_{k};$ in particular

$p_{k} = P (R_{k} = 1 | {\hat{θ}}_{k}) = \frac{1}{1 + \exp (- (α_{0} + α_{1} {\hat{θ}}_{k}))}, for all k \in s . (4.4)$

Model (4.4) provides estimates ${\hat{p}}_{k}$ of $p_{k},$ for all $k \in s .$

One of the Referees suggested the following solution to estimate $p_{k} .$ Let $S_{k} = \sum_{ℓ = 1}^{m} x_{k ℓ}$ be the raw score for unit $k,$ i.e., the number of items unit $k$ has responded to: if $k \in \bar{r},$ then $S_{k} = 0; if k \in r,$ then $S_{k} > 0.$ Then $p_{k}$ can be estimated by modelling $P (S_{k} > 0 | θ_{k}) .$ By the conditional independence assumption we have

$\begin{array}{l} p_{k} & = & P (S_{k} > 0 | θ_{k}) = 1 - P (S_{k} = 0 | θ_{k}) = 1 - P (\cap_{ℓ = 1}^{m} (x_{k ℓ} = 0 | θ_{k})) \\ = & 1 - \prod_{ℓ = 1}^{m} (1 - P (x_{k ℓ} = 1 | θ_{k})) . \end{array}$

We have $P (x_{k ℓ} = 1 | θ_{k}) = P (R_{k} = 1 | θ_{k}) P (x_{k ℓ} = 1 | θ_{k}, R_{k} = 1) + P (R_{k} = 0 | θ_{k}) P (x_{k ℓ} = 1 | θ_{k},$ $R_{k} = 0) = p_{k} q_{k ℓ},$ because $P (x_{k ℓ} = 1 | θ_{k}, R_{k} = 0) = 0.$ As a result, we obtain

$p_{k} = 1 - \prod_{ℓ = 1}^{m} (1 - p_{k} q_{k ℓ}), k \in r .$

The estimated response probability ${\hat{p}}_{k}, k \in r$ is obtained as a solution to the polynomial equation

${\hat{p}}_{k} = 1 - \prod_{ℓ = 1}^{m} (1 - {\hat{p}}_{k} {\hat{q}}_{k ℓ}) .$

This solution, although very elegant, has two drawbacks. If $m$ is large, the above polynomial equation is difficult or even impossible to solve. If it possible to solve the polynomial equation for moderate $m,$ the real solutions are not necessarily in (0, 1). This solution has not been considered here further.

Previous | Next

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology