Data quality, concepts and methodology: Methodology and data quality
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
This section provides an overview of the underlying methodology of the Households and the Environment Survey (HES) Energy Use supplement, as well as key aspects of the data quality. It will also provide an understanding of the strengths and limitations of the data. The information may be of particular relevance when making comparisons with data from other surveys or sources of information and when drawing conclusions about the data.
The reference period of the HES Energy Use supplement is the 2007 calendar year and collection was conducted between the months of November 2007 and April 2008. Some questions asked the respondent to respond with respect to "winter", "summer", "heating season" or "past 12 months", while some others asked with respect to 2007.
Energy consumption data were collected for the fourteen months prior when the survey was completed by a household and was processed to reflect the 2007 calendar year.
The target population consisted of households in Canada excluding households located in Yukon, Northwest Territories and Nunavut, households located on Indian reserves or Crown lands, and households consisting entirely of full-time members of the Canadian Armed Forces. Institutions and households of certain remote regions were also excluded.
The objectives of the Energy Use Supplement were to collect data on the energy use characteristics and energy consumption for occupied dwellings in Canada. The energy use information, coupled with energy consumption data obtained from respondents' energy bills or obtained directly from energy suppliers can be used to assess the effectiveness of energy efficiency programs. The survey content also covers the following themes:
- dwelling characteristics;
- household appliances;
- electrical devices; and
- heating and air conditioning.
Working with Natural Resources Canada, the questionnaire was designed by Statistics Canada in accordance with standard practices. Content was developed considering the data needs of both the project and the larger research and policy communities. Testing of the questionnaire was done by Statistics Canada Questionnaire Design Research Centre (QDRC). Focus group sessions for both the "owner/tenant" and "landlord" versions of the questionnaire were conducted in both English and French in Ottawa in February and March 2007.
The Households and the Environment Survey - Energy Use is a supplement to the Households and the Environment Survey (HES). The HES was administered from October 2007 to February 2008 to a sub-sample of the dwellings that were part of the Canadian Community Health Survey (CCHS) Cycle 4.1 between January 1st and June 30th, 2007. Therefore the HES sample design is closely tied to that of the CCHS. All HES respondents were sent a paper questionnaire for the Energy Use supplemental survey.
The following table shows the number of responding dwellings for the 2007 HES – Energy Use supplement.
Respondents were first contacted between the months of January and June 2007 and asked to complete the Canadian Community Health Survey, Cycle 4.1. They were then surveyed for the telephone portion of the HES between the months of October 2007 and February 2008. Finally, households responding to the telephone portion of the HES were asked to complete a paper questionnaire on energy use. Occupants that were not responsible for the payment of energy bills (mostly renters) were asked to provide the name, address and telephone number of the landlord or property manager in order to collect the necessary information on energy use and also characteristics of the building that the occupant could not provide. An option to complete the survey on-line was available to the respondents and 4% of all respondents chose this mode of collection. Data collection for the HES - Energy Use supplement was carried out between November 2007 and April 2008.
The last step of the survey was to establish contact with the energy suppliers. Residential energy consumption for 2007 was collected directly from the suppliers in cases where the account holder had given their consent to do so.
The data were captured using imaging and automated data entry technology. A small proportion of questionnaires, those that could not be read by the optical scanners, were captured using heads-down keying by experienced operators. Questionable zones method with standard quality control measures were used to verify the error rate of the capture operations. For the HES, based on the quality control sample that was selected, it was determined that the overall data capture error rate did not exceed 0.1%.
The first type of error treated was related to the flow of the questionnaire, where questions which did not apply to the respondent (and should therefore not have been answered) were found to contain answers. In this case a computer edit automatically eliminated superfluous data by following the flow of the questionnaire implied by answers to previous, and in some cases, subsequent questions.
The second type of error treated involved a lack of information in questions which should have been answered. For this type of error, a non-response or "not-stated" code was assigned to the item.
The third type of error treated involved the identification of incoherent entries based on logical relationship between certain questions.
Coding of open-ended questions
A few data items on the questionnaire were reported in an open-ended format. These questions required coding for inclusion on the data file. The open-ended questions related to responses to "other" categories throughout the questionnaire.
Imputation is the process that supplies valid values for those variables that have been identified for a change either because of invalid information or because of missing information. The new values are supplied in such a way as to preserve the underlying structure of the data and to ensure that the resulting records will pass all required edits. In other words, the objective is not to reproduce the true microdata values, but rather to establish internally consistent data records that yield good aggregate estimates.
There are three types of non-response. Complete non-response is when the respondent does not provide the minimum set of answers. These records are dropped and accounted for in the weighting process. Item non-response is when the respondent does not provide an answer to one question, but goes on to the next question. These are usually handled using the "not stated" code or are imputed. Finally, partial non-response is when the respondent provides the minimum set of answers but does not finish the interview. These records can be handled as either complete non-response or multiple item non-response.
In the case of the HES - Energy Use supplement, donor imputation was used to fill in missing data for some item non-response and partial non-response.
Weighting and estimation
The principle behind estimation in a probability sample is that each unit in the sample "represents", besides itself, several other units not in the sample.
The weighting phase is a step which calculates, for each record, what this number is. This weight appears on the microdata file, and must be used to derive meaningful estimates from the survey.
The initial sampling weight was provided to the Households and the Environment Survey by the CCHS and incorporated the probability of selecting the unit in their sample, as well as other adjustments such as the treatment of non-response to the CCHS.
In order to produce the HES Energy Use supplement weights, adjustments to the HES weights were made to account for non-response to the HES Energy Use supplement.
The accuracy of the estimates was assessed using the ratio of the standard error of the survey estimate to the average value of the estimate itself. This measure is called coefficient of variation (CV). This relative measure of sampling error is usually expressed as a percentage (10% instead of 0.1).
Given the complexity of the HES multi-stage survey design and calibration, there is no simple formula that can be used to calculate variance estimates. Therefore, an approximate method was needed. The bootstrap method is used because the sample design and calibration needs to be taken into account when calculating variance estimates.
Data were compared to similar HES or Survey of Household Energy Use (SHEU) data from previous surveys to ensure consistency. Household energy use data was also compared to residential energy use data from Manufacturing and Energy Division. 1 Explanations were found for any significant differences. Subject-matter experts confronted the data using other sources as well as by identifying and researching any values that were not consistent with others in the same domain.
Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
The coverage error of the CCHS, of which the HES is a subsample, is estimated at less than 2%.
Response rates and sampling error
The response rate for this survey was 51.8%. 2 Provincial response rates ranged from 46.2% to 57.2%.
The results estimated from the HES Energy Use Supplement are based on a sample of households in Canada. The results obtained from asking the same questions to all Canadian households would differ to some known extent. The extent of this sampling error is quantified by the coefficient of variation (CV) with the following guidelines:
- 16.5% and below: acceptable estimate;
- 16.6% to 33.3%: marginal estimate requiring cautionary note to users; and
- more than 33.3%: unacceptable estimate.
Estimates that do not meet an acceptable level of quality are either flagged for caution or suppressed. CV tables are prepared by Statistics Canada and made available to help users understand the quality of individual estimates.
For example, CVs for the estimates proportion of households that had a forced air furnace in 2007 for Canada and the provinces are as follows:
Data comparability to the Households and the Environment Survey
Some data that were collected through the Households and the Environment Survey were included in the Energy Use supplement and are included in this report. However, a household's response was only included in this report if it had also completed the HES Energy Use supplement paper or internet questionnaire. For this reason, the HES had a larger sample than the HES Energy Use supplement. Estimates may therefore differ slightly.
Data presented in this report on some energy-saving practices (for example, programmable thermostat use) collected during the HES Energy Use supplement may differ slightly from data presented in the 2007 Households and the Environment report, 11-526-X, released February 10, 2009.
Data comparability to Natural Resource Canada's Survey of Household Energy Use
Natural Resource Canada's Survey of Household Energy Use is based on those respondents to the HES Energy Use Supplement who did not refuse to share their responses with Natural Resource Canada. As not all respondents agreed to share their responses, there may be some differences between the results of the HES Energy Use Supplement and the Survey of Household Energy Use.
Data comparability over time
Many of the questions included on the 2007 HES—Energy Use supplement were previously included in the 2003 Survey of Household Energy Use (SHEU). Summary and detailed data tables are available by contacting Natural Resources Canada or at http://www.oee.nrcan.gc.ca/Publications/statistics/sheu03/index.cfm?attr=0.
For the 2007 version of the survey, total energy consumption data included electricity, natural gas, heating oil, propane and wood or wood pellets. The 2003 SHEU did not include energy consumption from wood or wood pellets. Therefore the total and average energy consumption data presented in this report cannot be directly compared to data previously released through SHEU.
- Date modified: