Data quality, concepts and methodology: Deriving the estimates

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

The Statistical methodology

The overall estimates are derived from two different components: a sampled portion and a non-sampled (take-none) portion. A sample survey is conducted for larger businesses above a prescribed size using a questionnaire (the Quarterly Survey of Financial Statements). Sample results are multiplied by a weighting factor to represent the universe from which the sample was drawn. The sampling weight is based on the probability of the unit being selected in the sample. For businesses below the sampling threshold, the take-none estimate is derived by applying the quarter to quarter movement of sample responses to annual data compiled from Canada Revenue Agency financial statements representing the non-sampled portion of the business population. The model projects the value of the take-none portion of the population by the most detailed industry aggregation using estimates from the surveyed population and other parameters. The results are subsequently benchmarked to the Annual Financial and Taxation data, when the data becomes available. The proportion of each of the two components of the final estimate (survey and take-none model) varies significantly between industry aggregations. The proportion represented by the survey component ranges from 5% to 100% of the population for both revenue and assets at the most detailed industry aggregation.

(See Text tables 1 and 2 showing the percentage of assets and operating revenue represented by the take-none component.)

Specific industry detail of the take-none tables can be obtained at no charge by contacting Client Services (iofd-clientservicesunit@statcan.gc.ca).

Sample design and imputation

This is a sample survey with a cross-sectional design.

The frame used for sampling purposes is Statistics Canada’s Business Register (BR). A stratified random sample is drawn from this frame based on the size of the unit. The program is stratified by assets and revenues for the non-financial industries while the finance and insurance industries are stratified by assets only.

The sample includes a take-all portion, for the largest enterprises within an industry, and these units are sampled with certainty. In addition, there are either one or two take-some portions (depending on the industry) for which, on average, one out of eight units are sampled. Finally there is a take-none portion, from which no units are sampled, rather an estimate is derived by applying the quarter to quarter movement of sample responses to annual data compiled from Canada Revenue Agency financial statements representing the non-sampled portion of the business population.

The total sample size is approximately 5,500 enterprises.

Stratum boundaries

The stratum boundaries for the take-all, take-some and take-none strata vary by industry aggregation. The boundaries are available upon request.

Imputation

Units which do not respond in the current period are imputed, that is, their characteristics are estimated. For those units for which partial data have been collected, these partial data are used to estimate the missing data for the unit. For those units for which no current data has been collected, but for which historical data exists, this historical data is used to calculate current-period estimates taking into account growth or decline over time. For those units for which no current data has been collected, and for which no historical data exists, a donor imputation system is used. That is, estimates are created based on information from a similar-sized respondent.

Seasonal adjustment

The seasonal component of a time series reflects sub-annual movements caused by characteristic weather changes, holidays or other factors which tend to recur every year at approximately the same time. The seasonal adjustment process attempts to quantify the seasonal component in a time series and to remove its effect from observed data.

The seasonal adjustment method used is a computerized ratio-to-moving-average method in widespread use at Statistics Canada. It is based on the U.S. Bureau of the Census Method II, but has some additional features. Beginning with the first quarter of 2009, the Quarterly Financial Statistics for Enterprises series uses X12 ARIMA for "end-point" seasonal adjustment, which recalculates seasonal factors each quarter as more recent data becomes available.

Series containing no significant seasonality have not been seasonally adjusted. In these cases, the unadjusted series are used in the place of seasonally adjusted data.

Data quality

There are two categories of errors in statistical information – sampling errors and non-sampling errors. Sampling errors are errors that arise because estimates are being prepared based on a sample of the universe rather than collecting information from all units in the universe. These errors can be measured.

Non-sampling errors can arise from a variety of sources and are much more difficult to measure. Non-sampling errors include errors in the information provided by respondents, data capture errors and other processing errors.

Sampling errors

Sample surveys are designed to provide the highest sampling efficiency (the smallest sample that will produce a sampling error of a given size). This optimization is usually performed for only a few variables, limited by the data items that are available at the time of sample design and selection, the resources available, and the complexity introduced by trying to optimize for many variables at one time. The sample used for these statistics was designed to produce a reasonable level of accuracy for assets and revenue. Consequently, other items may be less accurately estimated.

A measure of the sampling error is the standard error. This measurement is based upon the idea of selecting several samples, although in reality only one sample is drawn. Sampling variability can also be expressed relative to the estimate itself. The standard error as a percentage of the estimate is called the coefficient of variation (CV), or the relative standard error. Small CVs are desirable, since the smaller the CV, the smaller the sampling variability relative to the estimate.

The sample for the Quarterly Financial Statistics for Enterprises was drawn such that the CV at the most detailed industry level of aggregation should be no more than 10% for operating revenue or total assets. The CV indicators are shown next to these variables in the tables according to the scale presented on page 2.

Estimation errors in the non-sampled strata

The estimate for small businesses (take-none portion) is prepared by applying a statistical model to predict the value of the take-none portion of the population at the most detailed industry level using the estimates from the surveyed population and other parameters. The error introduced by this method depends on several factors, including the contribution of these strata to the overall estimate and the error in estimating the movement of the strata using sampled units and other external factors.

Other non-sampling errors

There are no objective measures of other non-sampling errors applied to these statistics. However, most reporting and data entry errors are corrected as a result of the intricate computer capture and edit procedures applied to the data. This is particularly effective for financial data where accounting relationships are established and balancing is required. However, most financial data collected are derived from audited financial statements resulting in minimal errors and inconsistencies. As well, the Quarterly Financial Statistics for Enterprises utilizes trained accounting staff to capture and analyze reported data to minimize the frequency of non-sampling errors.

One source of non-sampling error is the non-response error. There are several measures that can help the user evaluate this type of error, including the response rate and the data response rate.

The response rate (see Text table 3) is a measure of the proportion of those sample units which have responded in time for inclusion in the estimate. To calculate the response rate, one should simply divide the number of actual responses by the total number of sampled units. For example, a sample with 20 active units for which 10 respond for a particular quarter would have a response rate of 50%.

Response rate is:

Figure 1: Response rate

Specific industry detail of the response rate table can be obtained at no charge by contacting Client Services (iofd-clientservicesunit@statcan.gc.ca).

The data response rate is the proportion of the estimate which is based upon actual reported data. The data response rate can be calculated by dividing the design assets or revenue (or whatever variable is being analyzed) represented by the responding units by the corresponding value for the entire sample. In the previous example, if the 10 responding units have a design asset value totalling $15 billion out of a total sample asset value of $20 billion, the data response rate for assets would be 75%.

(Where the design value is a frame value for the record which is derived from administrative sources and is available for the entire population.)

Data response rate for assets is:

Figure 2: Data response rate for assets

(Where asset values are the design values.)

Weighted data response rates consider that units in sample represent more than themselves through weighting factors. Some units contribute more to the estimates than other units when weights are applied. The weighted data response rate can be calculated by dividing the respondents’ weighted frame value by the weighted sample for assets or revenue (or whatever variable is being analyzed).

In the previous example, if the weighted asset value of the responding 10 units is $40 billion out of a total sample weighted asset value of $50 billion, the weighted asset response rate would be 80%.

Weighted data response rate for assets is:

Figure 3: Weighted data response rate for assets

(Where the weighted assets for a respondent is defined as the weighting factor multiplied by the design assets value.)

Limitations of the data

To be valid for either time-series or cross-sectional analysis, the definitions of data must be consistent within time periods or across time periods. In other words, the differences and similarities in data must reflect only real differences and not differences in the concepts or definitions used in preparing the data. The ability to use the data for analysis depends on the conceptual framework in which the data are being used.

Publication data produced according to the Generally Accepted Accounting Principles (GAAP) of the Canadian Institute of Chartered Accountants may not necessarily agree with the concepts used within the Canadian System of National Accounts.

While the GAAP concepts are appropriate for the application of the data, there may still be some problems of consistency (between units or over time) for items where GAAP does not prescribe a particular treatment or allows some latitude. One of the general problems with GAAP for some uses is that it prescribes a historical cost treatment of assets (i.e., their cost at the time of acquisition). A particular issue arising on January 1, 2011 is the adoption by Canadian enterprises of new Canadian GAAP, namely International Financial Reporting Standards (IFRS) and Accounting Standards for Private Enterprises (ASPE) that could potentially create inconsistencies in concepts and treatments when compared to Canadian GAAP used until December 31, 2010. As a result, caution should be used when comparing balance sheet, income statement data and ratios over time and across industries.

Disclosure control

Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Next technical note | Previous technical note

Date modified: