Statistics Canada
Symbol of the Government of Canada

Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Description

This survey collects the financial and operating data needed to produce statistics on the Canadian book publishing industry. Commencing with reference year 2004 and every two years thereafter, the survey also collects detailed information on the characteristics of the businesses such as types of books published, authorship, number of titles published by format and sources of funding.

These data are aggregated with information from other sources to produce official estimates of national and provincial economic production for the industry in Canada. The estimates are used by government for national and regional programs and policy planning, and by the private sector for industry performance measurement and market development.

The book publishing survey used to be administered as part of the Culture Statistics Program. Commencing with reference year 2004, this survey is administered by the Service Industries Program, in collaboration with the Culture Statistics Program. Historical time series data from the previous Culture Statistics Program are available in The Guide to Culture Statistics (online, free of charge, at catalogue number 87-008-GIE). It should be noted that data from this historical time series should not be compared with data from this new survey due to significant differences in coverage and methodology.

As of 2004, the survey covers a somewhat different set of businesses than in previous years so that data generally cannot be expected to be comparable. The list of names and addresses of businesses is now drawn from a central Statistics Canada data base. Also, a much more rigorous delineation of those companies that are considered part of the culture sector has been applied through the implementation of the North American Industry Classification System (NAICS). This industry-based classification is a departure from the activity-based classification that was used previously. In addition to these changes in coverage, commencing with 2004, the data are based on a sample of businesses which has affected our ability to publish in detail some culture variables.

Data sources and methodology

Target population
Instrument design
Sampling
Data sources
Error detection
Imputation
Estimation
Quality evaluation
Disclosure control
Data accuracy

Target population

The target population consists of all establishments classified as Book Publishers (NAICS 511130) according to the North American Industry Classification System (NAICS) during the reference year.  In addition, exclusive agents who earn at least 10% of their revenue from book publishing will be considered in scope for the survey; pure exclusive agents are excluded.

Instrument design

The annual survey questionnaire covers detailed financial and operating characteristics. In addition, every two years, questions on such topics as titles published, employment and sources of revenue are asked. The questionnaire was developed in consultation with potential respondents, data users and questionnaire design specialists.

Sampling

This is a sample survey with a cross-sectional design.

Even though the basic objective of the survey is to produce estimates for the whole industry --all incorporated and unincorporated businesses --not all businesses are surveyed. Rather, a sample is surveyed and the portion eligible for sampling is defined as all statistical establishments with revenue above a certain threshold. (Note: the threshold varies between surveys and sometimes between provinces in the same survey).

The frame is the list of establishments from which the portion eligible for sampling is determined and the sample is taken. The frame provides basic information about each firm, including: address, industry classification, and information from administrative data sources (as discussed above). The frame is maintained by Statistics Canada's Business Register, and is updated using administrative data.

Prior to the selection of a random sample, establishments are classified into homogeneous groups (i.e., groups with the same NAICS codes, same geography (province/territory) and ownership (Canadian/foreign controlled)).

The sample size for reference year 2005 was 223 with 39 collection entities.

Data sources

Responding to this survey is mandatory. Data are collected directly from survey respondents and extracted from administrative files.

Data are collected through a mail-out/mail-back process, while providing respondents with the option of telephone or electronic filing methods.

Follow-up procedures are applied when a questionnaire has not been received after a pre-specified period of time.

Error detection

Data are examined for inconsistencies and errors using automated edits coupled with analytical review. Several checks are performed on the collected data. These checks look for internal consistency such as: section totals must be equal to the components; if employees are reported, wages and salaries must be greater than zero; and the main source of income must be consistent with the assigned NAICS code.

Imputation

Where information is missing, imputation is performed using either a "nearest neighbour" procedure (donor imputation), using historical data where available or finally, using administrative data as a proxy for reported data.

Estimation

As part of the estimation process, survey data are weighted because some units in the sample represent a certain number of other book publishing establishments that were not selected in the sample. These data are then combined with administrative data for small firms whose revenues fell below cut-off thresholds to produce final industry estimates.

Quality evaluation

Prior to publication, combined survey results are analyzed for quality; in general, this includes a detailed review of individual responses (especially for the largest companies), general economic conditions, historic trends, and comparisons with administrative data (e.g., income tax, goods and services tax, payroll deductions records, industry and trade association sources).

As of 2004, the survey covers a somewhat different set of businesses than in previous years so that data generally cannot be expected to be comparable. The list of names and addresses of businesses is now drawn from a central Statistics Canada data base. Also, a much more rigorous delineation of those companies that are considered part of the culture sector has been applied through the implementation of the North American Industry Classification System (NAICS). This industry-based classification is a departure from the activity-based classification that was used previously. In addition to these changes in coverage, commencing with 2004, the data are based on a sample of businesses which has affected our ability to publish in detail some culture variables.

Disclosure control

Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Data accuracy

While considerable effort is made to ensure high standards throughout all stages of collection and processing, the resulting estimates are inevitably subject to a certain degree of error. These errors can be broken down into two major types: non-sampling and sampling.

Non-sampling error is not related to sampling and may occur for many reasons. For example, non-response is an important source of non-sampling error. Population coverage, differences in the interpretation of questions, incorrect information from respondents, and mistakes in recording, coding and processing data are other examples of non-sampling errors.

Of the sample units contributing to the estimate, the weighted response rate was 87.6% of total industry revenue, after accounting for firms that have gone out of business, have been reclassified to a different industry, are inactive or are duplicates on the frame.

Sampling error occurs because population estimates are derived from a sample of the population rather than the entire population. Sampling error depends on factors such as sample size, sampling design, and the method of estimation. An important property of probability sampling is that sampling error can be computed from the sample itself by using a statistical measure called the coefficient of variation (CV). The assumption is that over repeated surveys, the relative difference between a sample estimate and the estimate that would have been obtained from an enumeration of all units in the universe would be less than twice the CV, 95 times out of 100. The range of acceptable data values yielded by a sample is called a confidence interval. Confidence intervals can be constructed around the estimate using the CV. First, we calculate the standard error by multiplying the sample estimate by the CV. The sample estimate plus or minus twice the standard error is then referred to as a 95% confidence interval.

For the Book Publishers Survey, CVs were calculated for each estimate. Generally, the more commonly reported variables obtained excellent CVs (5% or less), while the less commonly reported variables were associated with higher but still very good CVs (under 10%). The CVs are available upon request.