Appendix I: Multivariate logistic regression

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

For the binary logistic regression, the goodness of fit is measured by a transformation on the maximum likelihood estimate (MLE) such that,

(A1) Goodness of Fit = -2 log(L), where L (likelihood) = Π pi Π (1-pi).

The -2log(L) value is approximately distributed as a χ2 with degrees of freedom equal to the number of variables. Since the MLE is a probability and thus cannot exceed 1, its log will be negative. As the fit improves, the ML probability increases toward 1, the log will also increase (although remain negative) and the -2log(L) will be smaller. An additional variable improves the model fit if it reduces the -2log likelihood (Table I). The initial 'control' model included social and demographic characteristics (x_1,x_2, and x₄ to x₆) such as age, income and location, found to be of importance to Internet use. The 'full' model includes online behaviours (depth, breadth and experience - x₃, x_7, and x₈ - as well as credit card concern, x₉.). The 'final' model treats experience as a categorical variable; all variables remain significant while the overall fit of the model is improved (Table I).

Text table I

Model for propensity to purchase online

The Nagelkerke statistic is a pseudo R² that attempts to provide a logistic analogy to the R² in Ordinary Least Squares (OLS). Although the Nagelkerke varies from 0 to 1, as does R² in OLS, it does not indicate the proportion of variance explained by the predictors (UCLA, 2004). Rather, it indicates the proportion of unaccounted variance that is reduced by adding variables to the model compared to the null model (i.e. just the constant). The Nagelkerke value increased from 0.102 in the control model to 0.300 in the final model.

For continuous variables, the interpretation of slope coefficients is similar to that of OLS regression. For the discrete predictor variables, the regression coefficient (B) is equal to the log odds ratio of the event for the use or non-use of the Internet by an individual. Odds are defined as p/q or p/(1 - p), where p = the probability of the event and q = (1 - p). A log odds ratio is defined as:

(A2) ln[p1/(1 - p1)]/[p0/(1 - p0)].