Data quality
About Agriculture–Population linkage
An important benefit of conducting the Census of Agriculture at the same time as the Census of Population is that information from these sources can be linked by means of an automated matching process to create the Agriculture–Population Linkage database. This database contains selected variables from the Census of Agriculture and selected variables (such as education, occupation, etc.) included on the Census of Population long-form questionnaire. The Agriculture–Population Linkage database permits the cross-tabulation of socioeconomic characteristics of farm operators and their families (for example, the age, education and income of operators) with the agricultural characteristics of farm operations (for example, farm type, operating arrangement, farm area, total gross farm receipts, total farm business operating expenses and total farm capital).
The 2016 Agriculture–Population long-form linkage database follows the Agriculture–Population linkage databases initially created for the 1971 censuses, and also available for the 1981, 1986, 1991, 1996, 2001 and 2006 censuses. A similar database was created in 2011 but linked the Census of Agriculture with information from the National Household Survey. This database is referred to as the Agriculture-National Household Survey Linkage database. The 2016 database targets farm operators and their families who were identified on the 2016 Census of Agriculture except those residing in Canada's three territories or in collective dwellings.
Because the Agriculture–Population Linkage database is an amalgamation of information from two data sources, users are encouraged to refer to the reference material from the Census of Population and the Census of Agriculture for further information on the data collection, processing and dissemination methods used.
New for 2016
The Agriculture–Population Linkage database is an amalgamation of information from two data sources. Until 2006, the population information came from the Census of Population's long form which was a mandatory questionnaire distributed to 20% of the Canadian households. In 2011, the population data source was the voluntary National Household Survey which was distributed to approximately 33% of the Canadian households. In 2016, the household data source is once again the mandatory Census of Population's long form which was distributed to 25% of Canadian households. As in 2011, but unlike the previous Agriculture–Population Linkage databases, residents of collective dwellings were not eligible to receive the 2016 Census of Population long form and thus are not represented in the Agriculture–Population Linkage database.
The methodology used to generate the weights assigned to each record was changed for the 2016 database. These changes are described in more detail in the Sampling and Weighting section of this document.
In 2016, estimates of variance are calculated for the first time for the point estimates produced from the database. This provides users with additional information related to the precision of the estimates. This is described in more detail in the Quality and Data Suppression section of this document.
Users should be aware of these changes when doing comparisons of results between the 2016 Agriculture–Population Linkage database and the 2011 Agriculture-National Household Survey Linkage database or previous Agriculture–Population Linkage databases.
Sources of Error
In a sample survey like the Census of Population long form there can be two types of errors - sampling errors and non-sampling errors. In a census like the Census of Agriculture only non-sampling errors exist.
Sampling error arises from estimating a population characteristic by measuring only a portion of the population rather than the entire population. The error can be controlled by the sample size, sample design and the method of estimation.
Non-sampling errors are errors that are unrelated to sampling. They can include errors in the frame from which the sample is drawn, inadequate collection tools, survey non-response and errors in data capture, editing, coding and other processing steps. During the planning stages, steps were implemented to reduce non-sampling error through questionnaire testing, interviewer training, quality control of data capture and coding as well as many other approaches.
Automated matching process
The fundamentals of the Agriculture–Population automated matching process are simple. A farm operator completes a Census of Agriculture questionnaire as well as a Census of Population questionnaire. The operator may also be selected to complete a Census of Population long-form questionnaire, distributed to approximately one quarter of all households. Data from the Census of Agriculture and Census of Population are linked using information which is common to both questionnaires such as name, sex, birth date and address. Using the link which already exists between the Census of Population and Census of Population long-form questionnaires, the Agriculture–Population Linkage database can be formed. The 1991 to 2016 Censuses of Agriculture allowed respondents to report up to three operators per farm, and all farm operators were included in the matching process. With this additional information, the relationship between family members living in the same household and operating the same farm can be analyzed. As well, operators in different households operating the same farm can be included in the analysis.
In some cases, the linkage between the Census of Agriculture operator and the Census of Population long-form household was unsuccessful due to non-response to the Census of Population long form, incomplete operator information on the Census of Agriculture or other factors. In the case of missing Census of Population long-form data, either the long-form data was imputed on the Agriculture–Population Linkage database with the information from a similar responding household from the Census of Population or the weights were adjusted to account for the unlinked records. For the other situations where linkage was unsuccessful, only imputation of the long-form data took place.
Response and Imputation Rates
The tables below present the weighted response rate for the entire Census of Population long form and the weighted imputation rate for the Agriculture–Population Linkage database population in 2016.
| Provinces | Census of Population long-form weighted response rate |
|---|---|
| % | |
| Canada | 95.9 |
| Newfoundland and Labrador | 95.1 |
| Prince Edward Island | 96.3 |
| Nova Scotia | 96.1 |
| New Brunswick | 96.2 |
| Quebec | 96.6 |
| Ontario | 96.3 |
| Manitoba | 95.8 |
| Saskatchewan | 95.1 |
| Alberta | 94.8 |
| British Columbia | 94.6 |
|
|
There is non-response bias when a survey's non-respondents are different from its respondents. In that case, the higher a survey's non-response is, the greater the risk of non-response bias. The quality of the estimates can be affected if such a bias is present. The 2016 Census of Population long-form weighted response rate of 95.9% is much higher than the rate for the 2011 National Household Survey which was 77.2%.
| Provinces | Agriculture–Population Linkage database weighted imputation rate |
|---|---|
| % | |
| Canada | 0.9 |
| Newfoundland and Labrador | 5.5 |
| Prince Edward Island | 2.0 |
| Nova Scotia | 1.0 |
| New Brunswick | 1.6 |
| Quebec | 1.2 |
| Ontario | 0.7 |
| Manitoba | 0.8 |
| Saskatchewan | 0.9 |
| Alberta | 0.9 |
| British Columbia | 0.9 |
The Agriculture–Population Linkage database weighted imputation rate represents the weighted percentage of households in the database for which the Census of Population long-form data was imputed. The Agriculture–Population Linkage database weighted imputation rates are very low, reducing the risk of bias due to imputation of data.
Sampling and weighting
The Agriculture–Population Linkage database contains agricultural data (farm operations and farm operators) and population data (person, household, census family and economic family). Because only a sample of the Canadian households was selected to receive the Census of Population long form, weights were assigned to the records on the Agriculture–Population Linkage database in order to represent the entire farming population. Weights have been calculated at the farm level, person level, household level, census family level and economic family level.
The weights were calculated independently within each province. An initial weight was obtained for each household from the Census of Population for most recordsNote 1 based on the number of households in a sub-provincial area and the number that responded to the Census of Population long form. Then characteristics referred to as "constraints" were identified. These were agricultural and population characteristics of primary importance to data users which were fully enumerated on either the Census of Population or Census of Agriculture. A number of changes were made to the set of constraints in 2016 including the introduction of counts by household total income categories as a constraint for the first time.
The manner in which these constraints were used was changed in 2016. Previously, one set of basic weights was created for the Agriculture–Population Linkage database which attempted to respect all of the constraints within the province. Adjustment factors were then applied to these basic weights to produce a total of nine weights. The weight to be used for an analysis depended upon the type of analysis being undertaken. In 2016, rather than create one basic weight, there were five basic weights created (one each at the farm level, person level, household level, census family level and economic family level). Only constraints associated with the level of the basic weight were used in calculating the basic weight. For example, only household level constraints were used to calculate the household level basic weight. For each province, a method known as generalized regression ensured that the Agriculture–Population database estimates of these constraints would match the known population counts at the provincial level. From these five basic weights, a total of six weights were produced. Once again, the weight to be used depended on the type of analysis.
For any given geographic area, the weighted population, household, family or farm totals or subtotals may differ from similar estimates presented in previous Census of Agriculture data releases. This is because the Census of Agriculture collected data from all farming operations whereas the estimates from the Agriculture–Population Linkage database come from a sample. The discrepancies for any variables highly correlated with at least one of the variables used to define a constraint should, in general, be quite small. For other variables, discrepancies will depend on the relationship with the variable used to define a constraint, and could be large if no relationship exists. Estimates from the Agriculture–Population Linkage database may also differ from those from the Census of Agriculture because collective dwellings (and the farms associated with these dwellings) are not included in Agriculture–Population Linkage database.
Quality and Data suppression
Results from the Agriculture–Population Linkage database may be suppressed for two reasons: (1) to protect confidentiality of individual respondent data and (2), to limit the dissemination of data of poor quality (which will subsequently be referred to as data quality).
Confidentiality of respondent data is controlled through two sets of rules. Random rounding transforms all estimates of counts to random rounded counts at a base 5 level. Employing this technique, all figures in each table, including totals, are randomly rounded either up or down to a multiple of 5. While providing protection against disclosure, this procedure does not add significant error to the data. The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same count in the same table being rounded up in one execution and rounded down in the next.
There are some variables such as those related to income, which can have highly variable responses and which have a higher risk of revealing information about an individual respondent when certain statistics such as averages are calculated. For statistics based on these variables, a more complex confidentiality review is undertaken which considers the number of contributors to an estimate and whether the estimate is influenced by any single contributor to such a degree that this contributor's (or other contributors') approximate response value could be derived with a high degree of confidence. If this analysis identifies an estimate for which data confidentiality is at risk, the estimate is not published and is replaced by a zero in the publication table.
In 2016, the data quality of the estimates from the Agriculture–Population Linkage database will be described through the use of the coefficient of variation (CV) for the first time. The CV of an estimate is the ratio of the estimate of standard error and the estimate itself, expressed as a percentage. The lower the CV, the more accurate the estimate. The CV is an interesting measure of variability, as it does not depend on the estimate's unit of measure. This makes it possible to compare the accuracy of estimates that have different units of measure.
To ease interpretation, in the publication tables the CV value is replaced with a letter indicator. More specifically, those estimates with a CV between 25% and 50% are accompanied by a superscript E to indicate that the user should use caution when interpreting these results due to a moderate level of variability associated with them. Those estimates which have a CV of greater than 50% are not published due to data quality concerns and are replaced in the tables by a letter F.
| Letter | Coefficient of variation (CV) range | Data Quality Interpretation |
|---|---|---|
| [Blank] | 0 to 25% | Acceptable or better |
| E (superscript) | 25% to 50% | Use with caution |
| F | Over 50% | Too unreliable to be published |
Note
- Date modified: