Statistics Canada
Symbol of the Government of Canada

6.0 Record layout, data dictionary and univariate distributions

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Three additional information files are provided to assist users of the SFS public use microdata file. A record layout, a data dictionary and univariate distributions are provided. These information files are organized by content themes and in some cases sub-themes.

6.1 Columns of the record layout

Variable name: Public use microdata file (PUMF) variable name: This is the variable name assigned for the microdata file. In almost every case, this name is identical to the name on the SFS internal database.

Type: Indicates whether the variable is numeric (in the sense that it can logically be used in mathematical operations) or character.

Number of categories: Shows the number of categories in the value set for the variable in question. The number applies only to “character” variables. Numeric variables have ranges, which are specified in the data dictionary.

Length: Indicates the number of spaces. For numeric variables, this includes the decimal point if there are decimal places and the number of decimal places, if any. For example, a variable which can have values of zero (00.0) to 99.9 would have a format expressed as 4.1. A variable which can have values of zero (00) to 99 would have a format express as 2.0.

Sequence number: Indicates the order that variables appear on the microdata file.

Start position: This shows the location of the variable on the public use microdata file.

Long variable name: A standardized name, with a maximum of 26 characters, which can be used to quickly identify variables, to label tables, and so on. Although still rather cryptic, it is considerably more revealing than the variable name. However, this longer name obviously excludes a lot of important information contained in the variable description shown in the data dictionary. In short, analysts are warned against making assumptions about the variable definition based on the long variable name.

6.2 Data dictionary

The data dictionary presents the complete information about each survey variable on each of the three files. For each variable are shown: the variable name, the description or definition, code lists with descriptions or alternatively the range of values that the variable can take on, the variable type, its length (or format), and the population to which the variable pertains, i.e. for whom it is applicable.

6.3 Univariate distribution

These distributions are provided to allow users of the public use microdata file to verify totals that they produce. These distributions relate to the public use files and not to the internal database; the distributions will be similar but not identical. To compare the public use file to the internal database, please see Appendices A and B at the end of this user guide.

For character variables, the weighted and unweighted frequencies for each code, including reserved codes, are produced. For numeric variables, the values are broken into several ranges and weighted and unweighted frequencies are provided for each range. The minimum value, the maximum value and the weighted mean (excluding reserved codes) are also provided.

6.4 Reserved codes

It is important to account for reserved codes in any analysis, particularly with numeric variables. If your calculation of means or aggregates seems too high, check to ensure that you have excluded reserved codes from the calculation. With a only a few exceptions, the reserved codes are the highest values permitted according to the length of the variable. A brief explanation of reserved codes is provided below.

7, 97, 9.7, etc. : Don't know / Not stated
(the respondent did not have an answer, or the value was rejected during processing without being replaced)
9, 99, 9.9, etc. : Not applicable