Statistics Canada
Symbol of the Government of Canada

7.0 Guidelines for analysis and presentation

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

7.1 Applying weights

The microdata on the public use file are unweighted. It is the responsibility of data users to apply the appropriate weights in any results they wish to produce. If proper weights are not used, the estimates derived from the microdata cannot be considered to be representative of the survey population, and will not correspond to those that would be produced by Statistics Canada. On the SFS PUMF, the weight variable is named WEIGHT.

7.2 Rounding guidelines

Once it has been determined whether the results obtained are reliable, the level of rounding indicates the level of precision that the data can actually support. The following guidelines for rounding should be used:

  • Estimates of population sub-groups should be rounded to the nearest hundred units.
  • Rates and percentages should be rounded to one decimal point.

Note that all calculations are to be derived from their unrounded components, and then rounded using the normal rounding technique.

In normal rounding, if the first or only digit to be dropped is 0 to 4, the last digit to be retained is not changed. If the first or only digit to be dropped is 5 to 9, the last digit to be retained is raised by one. For example, in normal rounding to the nearest 100, the estimate 49,448 would be rounded down to 49,400 and an estimate of 49,252 would be rounded up to 49,300. The figure 1.78% would be rounded to 1.8%.

7.3 Missing values and reserved codes

There are a few types of missing values on the public use file.

If the coverage of a variable does not extend to a certain population sub-group, then there are no valid values for that sub-group and the values that do appear are in the form of 9, 99, 9.9 and so on, which indicates that the variable is not applicable. The coverage of each variable on the file is referred to in the data dictionary as the "population". This also applies to derived variables for which some components have been capped to a certain limit.

For certain records, no valid value is available, although the variable is applicable. Possibly the respondent did not provide the information or it failed an edit in processing, and the value was not imputed. Such missing values appear with a code such as 7, 97, 9.7, and so on depending on the format. For certain variables, the number of missing values has been reduced through imputation. Missing values for the income variables have been entirely imputed, but most other variables may have missing values.

The approach for dealing with missing values of this last kind depends on the type of analysis being carried out and the extent of missing data. Although the end solution may be to exclude the records with missing values from the analysis, a review should first be carried out to assess the impact of missing values on the overall representativeness of the data. Is it possible that a bias results from the missing data — for example, are the (other) characteristics of the people with missing values different from those of the observed part of the sample? It may be necessary to take into account the possible impact in some way. In all cases, analysts should note exclusions of records with missing values in their published results.