Statistics Canada - Government of Canada
Accessibility: General informationSkip all menus and go to content.Home - Statistics Canada logo Skip main menu and go to secondary menu. Français 1 of 5 Contact Us 2 of 5 Help 3 of 5 Search the website 4 of 5 Canada Site 5 of 5
Skip secondary menu and go to the module menu. The Daily 1 of 7
Census 2 of 7
Canadian Statistics 3 of 7 Community Profiles 4 of 7 Our Products and Services 5 of 7 Home 6 of 7
Other Links 7 of 7

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Skip module menu and go to content.

Appendix C: Data analysis

Coefficient of variation. The coefficient of variation (CV) is a relative measure of variability, that can be used to compare the quality of estimates. It is calculated by dividing the square root of the variance of the estimate, by the estimate itself. Note that the square root of the variance is also known as the standard error. Estimates with CVs of 16.5% or lower are considered to be of acceptable quality by Statistics Canada, and can be released without warning. Estimates with CVs in the range of 16.6% to 33.3% are of marginal quality, and should be accompanied with a warning about the relatively high levels of error. Estimates with CVs in excess of 33.3% are considered to be of unacceptable quality by Statistics Canada. Almost all CVs in the present report were in the acceptable range. The small number of estimates in the marginal range have been flagged in the tables.

Bootstrap weights for variance estimation. The following information was taken from the Microdata User Guide of the NLSCY for cycle 5 (Statistics Canada, n.d.b).

It would be difficult to derive an exact formula to calculate the sampling variance for the NLSCY due to the complex sample design, non-response adjustments, treatment of out-of-scope units, and the post-stratification. A very good way to approximate the sampling variance is to use the Bootstrap method. The idea behind the Bootstrap method is to select random sub-samples from the full sample in such a way that each of the sub-samples (or replicates) follows the same design as the full sample. The final weights for units in each replicate are recalculated following the same weighting steps used for the full sample…. These Bootstrap weights are used to calculate a population estimate for each replicate. The variance among the replicate estimates for a given characteristic is an estimate of the sampling variance of the full sample population estimate. For the NLSCY, a set of 1,000 Bootstrap weights is available. The sampling variance calculation using these 1,000 Bootstrap weights involves calculating the estimates with each of these 1,000 weights and then calculating the variance of these 1,000 estimates (p.166).

The variances and standard errors of all estimates in the present study were calculated using the bootstrap weights that were developed by Statistics Canada for the 2002/2003 cross-sectional sample. Cross-sectional weights were used for the longitudinal analysis because the sample being studied was 5-year-old children in 2002/2003, and the analysis involved looking back at their status in 2000/2001 when they were 3-year-olds.

Statistical and substantive significance. Because of the large size of the sample under study, many statistics were statistically significant even though the effects were small. Unless noted otherwise, only effects that were both statistically and substantively significant as defined below are reported as significant in this paper. Standards of substantive significance were derived from those established by Cohen (1988).

Substantive significance. Unless noted otherwise, substantive effects were defined as:

  1. percentage differences of 5 points or more
  2. mean differences of 0.25 of a standard deviation or  more
  3. correlation coefficients of r=0.22 or greater (r2=0.05)
  4. Incremental R2 of 0.01 (1%) or greater.

Statistical significance. Where multiple comparisons were made within a particular predictor variable (e.g., household income level), the nominal significance level of p=0.05 was adjusted for the number of comparisons. Where single comparisons were made, a significance level of p=0.01 was used.

Descriptive statistics. In this report, descriptive statistics were presented on basic demographic variables for the sub-group under study. Intercorrelations among child and family characteristics, among readiness to learn measures, and among home environment measures were calculated using the cross-sectional sample design weights, with statistical significance of correlation coefficients being assessed with reference to the size of design effects. 

Readiness to learn, home environment, and child and family characteristics. Means and percentages were reported for readiness to learn measures and home environment variables in 2002/2003 (and, where applicable, 2000/2001), by sex of the child, household income status, parent education level, family structure, country of birth of parent, kindergarten attendance, community size, and province of residence. Estimates of means and percentages were calculated using the 2002/2003 cross-sectional sample design weights, and the statistical significance of differences was tested using t-tests.

Readiness to learn and home environment. To determine whether there were important readiness to learn differences linked to home environment, means of the continuous readiness to learn measures were compared for the seven home environment variables, and the categorical readiness to learn measures were cross-tabulated with the home environment variables. Estimates of means and percentages were calculated using the cross-sectional sample design weights, and the statistical significance of differences was tested using t-tests.

Readiness to learn and home environment: interactions with household income level. To determine whether the demographic variable, household income level, had an indirect statistical effect on a continuous readiness to learn measure by way of a home environment variable, in addition to its direct statistical effect, linear regression analyses that were analogous to path analyses were performed. The purpose was to determine whether the home environment variable explained part of the difference in readiness to learn between lower and higher income level children. A two-stage approach was used. First, a linear regression procedure established whether the home environment variable accounted for at least 1% of the variance in the readiness to learn measure (i.e., R2 => 0.01). If so, a linear regression procedure was undertaken to determine whether the demographic predictor and the home environment predictor had a combined effect on the readiness to learn measure. If both of the regression coefficients were significant, it was possible that the demographic variable, in addition to its direct effect, had an indirect effect on the readiness to learn measure, manifested through its effect on the home environment variable. If previous results showed that the demographic variable was significantly linked with the home environment variable, this link, coupled with the regression results, would imply that the demographic variable had an indirect effect on the outcome variable through the home environment variable. The size of this indirect effect could not be estimated using this technique, but the direction of the effect would be known. This approach assumes that the direction of effect would be one-way, from the demographic variable to the home environment variable. An example of this analytical approach follows.

Example.  Children from lower income households and from more affluent households  differed significantly in the readiness to learn measure, communication skill score. To determine whether household income level had an indirect effect on communication skill score by way of participation in organized sports, a linear regression analysis was performed. Household income level and participation in organized sport were entered into a linear regression equation predicting communication skill score. The results of this analysis appear in Table D-37. The regression coefficients for participation in organized sports and household income level were both statistically significant, indicating that income had a direct effect on communication skill score, and that it could possibly have an indirect effect as well by way of participation in organized sports. An examination of earlier results (Table D-29) indicated a significant link between income and participation in organized sports, implying that household income level may have influenced communication skill score through the home environment variable.


Home | Search | Contact Us | Français Top of page
Date modified: 2006-11-27 Important Notices
Online catalogue Main page Objective Background and rationale Methods and procedures Results Summary and conclusions Tables and figures References Appendices More information Full content in PDF About the National Longitudinal Study of Children and Youth Other issues in the series _satellite.pageBottom(); >