Assessment of the quality of the childhood physical abuse measure in the National Population Health

by Margot Shields, Wendy Hovdestad and Lil Tonmyr

Numerous studies have documented associations between childhood physical abuse and subsequent mental and physical disorders.Note 1-3 The measure of abuse in these analyses is typically based on retrospective reports from adults. In order to best interpret studies that link childhood physical abuse to health problems in later life, it is important to understand the validity and reliability of such measures.

Validity—the degree to which retrospective responses reflect events that meet objective criteria for physical abuse—is particularly difficult to assess. Poor memory of childhood maltreatment has been documentedNote 4,Note 5 and may result in inaccurate reporting. Because of its sensitive nature, respondents may not report that they experienced abuse. As well, items that are behaviourally specific have higher validity than those that are self-defined or interpretative.Note 6-9 A review of the literature on the validity of retrospective reporting of adverse childhood experiences indicates that the rate of false negatives is substantial, and that false positives are rare.Note 7 Research is needed to understand biases in reporting.

Reliability (or consensus) of reports of abuse over time is more easily assessed. As expected, studies requiring shorter-term recall have higher reliability than those with a longer follow-up period.Note 10-15 And like validity, questions that are behaviourally specific have higher reliability.Note 7

Extreme concerns about potential participant upset and consequent item non-response have been noted.Note 16 If item non-response is an actual problem and if non-respondents and respondents differ in their experiences of abuse, survey results could potentially be biased.

A retrospective question about childhood physical abuse was included in the National Population Health Survey (NPHS), and was asked in cycles 1 (1994/1995), 7 (2006/2007), and 8 (2008/2009). The purposes of this study are to:

  1. report item non-response to the NPHS childhood physical abuse question;
  2. assess the reliability of the physical abuse item over three periods: 1994/1995 to 2006/2007, 1994/1995 to 2008/2009, and 2006/2007 to 2008/2009;
  3. investigate the extent to which demographic factors are related to the reliability of reports of childhood physical abuse;
  4. compare the reliability of the childhood physical abuse question with other childhood stressor questions in the NPHS; and
  5. examine associations between response patterns to the childhood physical abuse item and depression, fair or poor self-perceived health, disability, migraine and heart disease; that is, to answer the question, “Were consistent affirmers and inconsistent responders to the childhood physical abuse item more likely than consistent deniers to report these conditions?”


Data source

The analysis is based on longitudinal data from nine cycles (1994/1995 through 2010/2011) of the NPHS. The target population of the NPHS was household residents in the 10 provinces in 1994/1995, excluding residents of Indian Reserves, institutions, Canadian Forces bases, and some remote areas.

In 1994/1995, 20,095 households were selected for the longitudinal panel. One person in each household was selected at random; of these, 86% (17,276) completed the General component of the questionnaire in 1994/1995. Every two years since then, attempts were made to re-interview these individuals. Data collection ended in 2010/2011, providing 16 years of follow-up. The NPHS “square” file contains one record for each of the 17,276 respondents with data on a comprehensive set of health-related questions asked in each cycle. In cycle 1, 75% of interviews were conducted in person. In later cycles, almost all interviews were by telephone (more than 99% in cycles 7 and 8). Detailed descriptions of the NPHS design, sample, and interview procedures are available in published reports.Note 17-19

NPHS respondents were asked for permission to share the information they provided with provincial ministries of health, Health Canada, and the Public Health Agency of Canada. Most respondents (15,631; 90%) agreed to share. This article is based on data from the “square share” file.

The questions on childhood stressors (including physical abuse) were asked in cycles 1 (1994/1995), 7 (2006/2007), and 8 (2008/2009) of all respondents aged 18 or older. The questions were also asked of respondents who were younger than 18 in cycle 1 who had turned 18 by cycle 4 (1,034). For the present analyses, cycle 1 responses were updated to reflect the cycle 4 responses for these cases. This article is based on 15,027 records for which the questions on childhood stressors were applicable in at least one cycle. Because there was no restriction on age when the original sample was selected in 1994/1995, it was possible to refresh the study sample for cycles 7 and 8 to include respondents who had turned 18 by this time. However, people who immigrated to Canada after 1994/1995 are excluded from this study.

For much of the NPHS, proxy responses (usually from a family member) were permitted if the selected respondent was incapacitated by a health problem. Because of their personal nature, this did not apply to the childhood stressor questions. Responses were set to “not stated” for respondents whose information was provided by a proxy reporter.


Childhood stressors

The question on childhood physical abuse was one of seven items in the childhood stressor module. “The next few questions ask about some things that may have happened to you while you were a child or a teenager, before you moved out of the house. Were you ever physically abused by someone close to you?”  Table 4 contains the wording of the other six items.

Health conditions

Responses to the childhood physical abuse question were examined in relation to five health conditions shown in the literature to be related to childhood physical abuse.Note 1-3 Questions about these five health conditions were asked in all nine NPHS cycles; respondents were categorized as having the condition if it was reported in one or more cycles.

Depression was measured using a subset of questions from the Composite International Diagnostic Interview, according to the method of Kessler et al.Note 20

Self-perceived health was assessed with the question, “In general, would you say your health is: excellent? very good? good? fair? poor?”

Disability was defined based on an affirmative response to any of the following questions: “Because of a long-term physical or mental condition or a health problem, are you limited in the kind or amount of activity you can do: at home? at school? at work? in other activities?” “Do you have any long-term disabilities or handicaps?”

The presence of chronic conditions was established by asking respondents if a health professional had diagnosed them as having condition(s) that had lasted, or were expected to last, at least six months. Two chronic conditions were considered in this study: migraine and heart disease.


To examine the reliability of the childhood stressor questions across NPHS cycles, two measures of consensus were calculated:  concordance rates (percentage agreement) and Cohen’s kappa statistic. Concordance rates are easy to interpret, but a disadvantage is the possibility of artificially inflated agreement rates for variables with a low prevalence because most values fall into one category.Note 21

Cohen’s kappa statistic corrects for the percentage of agreement that could be expected to occur by chance and is useful when concordance rates may be artificially high due to low prevalence.Note 21 The major disadvantage of the kappa statistic is that it is difficult to interpret. Landis and KochNote 22 proposed the following interpretation, which is commonly used in reliability studies: 0.01-0.20 slight agreement; 0.21-0.40 fair agreement; 0.41-0.60 moderate agreement; 0.61-0.80 substantial agreement; and 0.81-0.99 almost perfect agreement.

To investigate the extent to which demographic factors are related to the reliability of retrospective reports of childhood physical abuse, estimates of kappa were compared by sex, age group (younger than 25, 25 to 39, 40 to 64, 65 or older), marital status (married/living common-law; yes/no), household income (lowest household income quintile; yes/no), and race (white/non-white). As well, these factors were examined in a logistic regression model with concordance as the dependent variable. A limitation of this approach is that if prevalence is low in a demographic group (for example, married), concordance may be artificially high, and therefore, associations with concordance may be due to low prevalence rather than consistent responses. Consequently, an approach recommended by McKinneyNote 23 was used, where demographic comparisons were made only for respondents with at least one report of abuse (that is, demographic factors were examined for those who consistently affirmed abuse versus those who gave inconsistent responses).

Logistic regression was also used to examine associations between response patterns to the childhood physical abuse item and health conditions.

Analyses were conducted using SAS Enterprise Guide 5.1. All estimates are based on weighted data. Weights were created at Statistics Canada so that the data would be representative of the Canadian population living in the 10 provinces in 1994/1995, and were inflated to compensate for non-response to the NPHS in 1994/1995. Variance estimates and 95% confidence intervals (CIs) were calculated using the bootstrap technique (with the SAS “proc survey” procedures) to account for the complex survey design.Note 24 Because SAS does not provide a procedure for estimating the variance of the kappa statistic using the bootstrap technique, CIs for kappa were estimated using a design effect of two to account for the NPHS sample design.


Table 1 shows responses to the childhood physical abuse question across NPHS cycles. The main reasons for non-response were total non-response to the cycle, proxy reporting, and partial non-response to the cycle (interview terminated before childhood stressor questions were asked). Across all cycles, very few respondents refused to answer the childhood physical abuse question or answered “don’t know.” Percentages responding “don’t know” in cycles 1 and 8 and refusing to respond in cycle 7 were too low (fewer than 20 cases) to report based on NPHS release guidelines.

About 8% of respondents reported childhood physical abuse in each NPHS cycle. Based on a report of abuse in any cycle, the prevalence rose to 11%. Women were more likely than men to report physical abuse; reporting physical abuse was less likely among older respondents (Appendix Table A).

The majority of respondents answered the childhood physical abuse item in two or more cycles:  40% provided a response in three cycles; 18%, in two cycles;  35% in one cycle; and 8% had no responses to the question (data not shown). In cases where answers were inconsistent, transitions from “no” to “yes” were slightly more common than transitions from “yes” to “no” (Table 2), but only between cycles 1 and 8 was the difference between these two transitions (4.0% versus 2.9%) statistically significant.

Concordance rates and kappa statistics were calculated for respondents who provided two or more responses to the physical abuse item (Table 3).  Based on the guidelines of Landis and Koch,Note 22 the kappa statistic indicates a “moderate” level of agreement in responses between cycles 1 and 7 (12-year interval) and cycles 1 and 8 (14-year interval). Between cycles 7 and 8 (2-year interval), agreement is “substantial.” Likewise, the percentage of respondents providing consistent answers was slightly higher for the 2-year interval between cycles 7 and 8 than for the longer periods involving comparisons with cycle 1. Among respondents who provided answers in all three cycles, more than 90% were consistent across all three.

Comparisons of reliability estimates were made by sex, age group, marital status, household income, and race (data not shown). Based on kappa, reliability estimates were similar for all covariates. Similarly, among respondents with at least one report of childhood physical abuse, logistic regression results indicated that the likelihood of consistently affirming abuse was not related to demographic factors (data not shown).

Table 4 shows reliability estimates for the other items in the childhood stressor module. Based on kappa, reliability estimates were higher for the items on parental divorce and parental alcohol/drug problems, and lower for the items on father/mother not having a job when they wanted to be working, a frightening experience, and being sent away for doing something wrong.

Associations between response patterns to the physical abuse item and health conditions that have been shown to be related to childhood physical abuse are presented in Table 5. In all cases, respondents who provided inconsistent responses had higher odds of reporting health conditions than those who consistently denied that physical abuse occurred. Compared with people who consistently responded “yes,” those who said “no” in two cycles and “yes” in one cycle had somewhat lower odds of reporting health conditions.


This study, based on three cycles of data collected between 1994/1995 and 2008/2009 by the National Population Health Survey, indicates that 91% of respondents provided consistent answers about childhood physical abuse across all cycles; very few refused to answer or replied “don’t know.”

Reliability, as measured by Cohen’s kappa statistic, was “substantial” for the two-year interval between cycles 7 and 8 and “moderate” for the 12- and 14-year intervals from cycle 1. Compared with consistent deniers, consistent affirmers and inconsistent responders to the childhood physical abuse item had increased odds of reporting depression, fair or poor self-perceived health, disability, migraine, and heart disease.

The percentages reporting that they had experienced childhood physical abuse was similar in each cycle—8%—with 11% reporting physical abuse in at least one cycle. A 2012 cross-sectional survey of the Canadian population1 yielded a substantially higher estimate of retrospectively reported childhood physical abuse (26%), which was similar to estimates from a 1990 Ontario survey.Note 25 The phrase “someone close to you” in the NPHS question may have resulted in the exclusion of incidents where the perpetrator was a teacher or other individual to whom the respondent did not feel “close.” It would also exclude physical assaults committed by strangers. As well, unlike the NPHS, the other surveys asked a series of questions on specific events. NPHS estimates are more in line with estimates from the 2012 and 1990 surveys based on questions that used terms indicative of severe abuse (for example, “kick,” “punch,” “choke” or “burn”). A study that used data from two large, community-based surveys conducted in the United States in 1997 and 2003 found that 8% of respondents labelled themselves as being “physically abused,” while 16% reported at least one behaviourally defined physical abuse event.Note 9

Generally, retrospective questions that focus on behaviourally specific events using multiple items yield higher prevalence estimates with higher validityNote 6-9,Note 26,Note 27 than does the broad NPHS question, which required respondents to subjectively define physical abuse and then decide if past experiences constituted abuse. For example, the NPHS question on parental divorce (a specific event) had higher reliability, while more subjective items such as, “Did your father or mother not have a job for a long time when they wanted to be working?” yielded lower kappa estimates.

In the NPHS, women were more likely than men to report childhood physical abuse. This is contrary to findings from other Canadian studies,Note 1, Note 25,Note 28 although in two of these studies, estimates for severe physical abuse were similar between the sexes.Note 25,Note 28 A global meta-analysis involving close to 10 million participants concluded that there were no gender differences in the prevalence of physical abuse.Note 29 In the NPHS, the self-defined nature of the item and the specification of “someone close to you” may have resulted in the relatively lower estimate among men.

The relationship between childhood physical abuse and premature mortalityNote 30,Note 31 may have contributed to the lower estimates for older respondents.

Similar to the NPHS results, other retrospective studies using a variety of items or groups of items to measure childhood physical abuse have yielded kappa values in the “moderate” range.Note 23,Note 32-34 Higher kappa values (at or approaching “substantial”) have been observed for studies based on clinical samplesNote 6 and for those with relatively short intervals (less than two years) between measurements,Note 10,Note 11,Note 13,Note 14 consistent with the higher kappa value (0.69) for the two-year interval between cycles 7 and 8 of the NPHS.

Earlier studies have also found that the reliability of childhood physical abuse estimates was not related to demographic variables.Note 11,Note 13 However, contrary to NPHS results, McKinney et al.Note 23 reported that among respondents with at least one report of abuse, low education and younger age were associated with inconsistent responses.

Although assessing the reliability of the NPHS question on childhood physical abuse is important, the validity (or the extent to which responses reflect events that meet objective criteria for physical abuse) is equally or more important. When responses are inconsistent, how should respondents be classified?

The validity of retrospective reports of childhood experiences has been questioned, particularly for people with mood or anxiety disorders.Note 35-38 However, reviews of the literature support the accuracy of early memories, especially for events such as childhood maltreatment.Note 7,Note 10 Little or no evidence supports a relationship between anxiety and depression and short- or long-term memory deficits or distortions.Note 7,Note 10 Furthermore, validity studies have found that false positive reports of abuse are rare, whereas false negatives are fairly common. In well-documented cases of adults who experienced serious childhood abuse, about one-third did not report it.Note 7

In light of these studies, it would seem well-advised to classify NPHS respondents with inconsistent reports as having experienced childhood physical abuse. The elevated odds ratios for health conditions known to be associated with physical abuse among those who provided inconsistent responses versus those who consistently responded “no” supports this classification. Other researchers have noted that the estimated prevalence of childhood physical abuse increases considerably when subjects are asked about it more than once; they suggest that negative responses among inconsistent reporters may be less accurate than positive responses.Note 12,Note 23 Given that false positives are rare and that inconsistent responses are relatively common, in longitudinal surveys such as the NPHS it may be beneficial to re-ask childhood maltreatment questions to increase the accuracy of estimates.

Compared with those who consistently responded “yes” to the abuse question, those who said “no” in two cycles and “yes” in one cycle had somewhat lower odds of reporting health conditions. It has been suggested that individuals who experienced childhood abuse, but who function well in adulthood, may be more likely to forget or fail to report maltreatment,Note 7 which may explain the more favourable results for these individuals (although the odds are higher than for those who consistently reported “no”). Severity of abuse may also play a role. Those who experienced more severe abuse may be more likely to consistently recall and report it, and associations between maltreatment and health conditions tend to be stronger with more severe abuse.Note 1 Failure to treat inconsistent responses as “yes” may alter associations between physical abuse and health outcomes if the inconsistent responses represent the occurrence of less severe abuse.

Mode of data collection can also influence the validity of childhood physical abuse estimates. Some studies suggest that in-person interviews heighten rapport and trust with respondents, thereby increasing the likelihood of disclosure.Note 39-41 Other studies have found that estimates based on self-administered questionnaires tend to be higher, which suggests that the anonymity and privacy associated with this mode of inquiry may facilitate disclosure.Note 26,Note 42,Note 43 The majority of NPHS interviews in cycle 1 were conducted in person, whereas nearly all cycle 7 and 8 interviews were conducted by telephone. The similarity of estimates for abuse across cycles implies that disclosure was not influenced by whether interviews were conducted in person or by telephone. However, the extent to which estimates would be higher if self-administered questionnaires had been used is unknown.


Although the validity of the NPHS childhood physical abuse question was partially examined by comparing consistency/inconsistency in responses to questions about health conditions, a more thorough assessment would involve comparisons with a gold standard—that is, independent corroboration with external information known to represent whether or not respondents’ experiences meet recognized criteria for childhood physical abuse. The findings of this study suggest that NPHS respondents with inconsistent reports should be classified as having experienced childhood physical abuse, but the extent to which this would improve the accuracy of estimates is unknown. While false positives are rare,Note 7 they do occur. As well, it is likely that some respondents who consistently denied that abuse took place, may have experienced abuse. Another limitation of the study is non-response— to the NPHS and to the childhood physical abuse question (including non-response due to proxy reporting).


A major advantage of this study is that it is based on a large sample representative of the general population, whereas most reliability and validity studies have used small clinical or convenience samples. The comprehensive set of demographic and health-related questions in the NPHS made it possible to examine reliability across an array of covariates and to examine consistency/inconsistency of responses to childhood physical abuse in relation to several health conditions that have been shown to be related to childhood maltreatment.

Although the NPHS question on childhood physical abuse has limitations, this study suggests that it is a reliable measure over time and that inconsistent responses likely indicate that abuse did occur. The 16-year follow-up period offers the potential to investigate associations between childhood physical abuse and mortality and morbidity. This information can, in turn, assist in the development of child maltreatment and chronic disease prevention activities.


The assistance of Dr. Patrick Boily of the Centre for Quantitative Analysis and Decision Support is gratefully acknowledged, as is the assistance of Caroline Wallace with the literature review.

Date modified: