Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Appendix C: Data analysis

Coefficient of variation. The coefficient of variation (CV) is a relative measure of variability, that can be used to compare the quality of estimates. It is calculated by dividing the square root of the variance of the estimate, by the estimate itself. Note that the square root of the variance is also known as the standard error. Estimates with CVs of 16.5% or lower are considered to be of acceptable quality by Statistics Canada, and can be released without warning. Estimates with CVs in the range of 16.6% to 33.3% are of marginal quality, and should be accompanied with a warning about the relatively high levels of error. Estimates with CVs in excess of 33.3% are considered to be of unacceptable quality by Statistics Canada. Almost all CVs in the present report were in the acceptable range. The small number of estimates in the marginal range have been flagged in the tables.

Bootstrap weights for variance estimation. The following information was taken from the Microdata User Guide of the NLSCY for cycle 5 (Statistics Canada, n.d.b).

It would be difficult to derive an exact formula to calculate the sampling variance for the NLSCY due to the complex sample design, non-response adjustments, treatment of out-of-scope units, and the post-stratification. A very good way to approximate the sampling variance is to use the Bootstrap method. The idea behind the Bootstrap method is to select random sub-samples from the full sample in such a way that each of the sub-samples (or replicates) follows the same design as the full sample. The final weights for units in each replicate are recalculated following the same weighting steps used for the full sample…. These Bootstrap weights are used to calculate a population estimate for each replicate. The variance among the replicate estimates for a given characteristic is an estimate of the sampling variance of the full sample population estimate. For the NLSCY, a set of 1,000 Bootstrap weights is available. The sampling variance calculation using these 1,000 Bootstrap weights involves calculating the estimates with each of these 1,000 weights and then calculating the variance of these 1,000 estimates (p.166).

The variances and standard errors of all estimates in the present study were calculated using the bootstrap weights that were developed by Statistics Canada for the 2002/2003 cross-sectional sample. Cross-sectional weights were used for the longitudinal analysis because the sample being studied was 5-year-old children in 2002/2003, and the analysis involved looking back at their status in 2000/2001 when they were 3-year-olds.

Statistical and substantive significance. Because of the large size of the sample under study, many statistics were statistically significant even though the effects were small. Unless noted otherwise, only effects that were both statistically and substantively significant as defined below are reported as significant in this paper. Standards of substantive significance were derived from those established by Cohen (1988).

Substantive significance. Unless noted otherwise, substantive effects were defined as:

1. percentage differences of 5 points or more
2. mean differences of 0.25 of a standard deviation or  more
3. correlation coefficients of r=0.22 or greater (r2=0.05)
4. Incremental R2 of 0.01 (1%) or greater.

Statistical significance. Where multiple comparisons were made within a particular predictor variable (e.g., household income level), the nominal significance level of p=0.05 was adjusted for the number of comparisons. Where single comparisons were made, a significance level of p=0.01 was used.

Descriptive statistics. In this report, descriptive statistics were presented on basic demographic variables for the sub-group under study. Intercorrelations among child and family characteristics, among readiness to learn measures, and among home environment measures were calculated using the cross-sectional sample design weights, with statistical significance of correlation coefficients being assessed with reference to the size of design effects.

Readiness to learn, home environment, and child and family characteristics. Means and percentages were reported for readiness to learn measures and home environment variables in 2002/2003 (and, where applicable, 2000/2001), by sex of the child, household income status, parent education level, family structure, country of birth of parent, kindergarten attendance, community size, and province of residence. Estimates of means and percentages were calculated using the 2002/2003 cross-sectional sample design weights, and the statistical significance of differences was tested using t-tests.

Readiness to learn and home environment. To determine whether there were important readiness to learn differences linked to home environment, means of the continuous readiness to learn measures were compared for the seven home environment variables, and the categorical readiness to learn measures were cross-tabulated with the home environment variables. Estimates of means and percentages were calculated using the cross-sectional sample design weights, and the statistical significance of differences was tested using t-tests.