Keyword search

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Portal

    Content

    1 facets displayed. 0 facets selected.
    Sort Help
    entries

    Results

    All (253)

    All (253) (220 to 230 of 253 results)

    • Surveys and statistical programs – Documentation: 11-522-X19990015650
      Description:

      The U.S. Manufacturing Plant Ownership Change Database (OCD) was constructed using plant-level data taken from the Census Bureau's Longitudinal Research Database (LRD). It contains data on all manufacturing plants that have experienced ownership change at least once during the period 1963-92. This paper reports the status of the OCD and discuss its research possibilities. For an empirical demonstration, data taken from the database are used to study the effects of ownership changes on plant closure.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015652
      Description:

      Objective: To create an occupational surveillance system by collecting, linking, evaluating and disseminating data relating to occupation and mortality with the ultimate aim of reducing or preventing excess risk among workers and the general population.

      Release date: 2000-03-02

    • Articles and reports: 11-522-X19990015654
      Description:

      A meta analysis was performed to estimate the proportion of liver carcinogens, the proportion of chemicals carcinogenic at any site, and the corresponding proportion of anticarcinogens among chemicals tested in 397 long-term cancer bioassays conducted by the U.S. National Toxicology Program. Although the estimator used was negatively biased, the study provided persuasive evidence for a larger proportion of liver carcinogens (0.43,90%CI: 0.35,0.51) than was identified by the NTP (0.28). A larger proportion of chemicals carcinogenic at any site was also estimated (0.59,90%CI: 0.49,0.69) than was identified by the NTP (0.51), although this excess was not statistically significant. A larger proportion of anticarcinogens (0.66) was estimated than carcinogens (0.59). Despite the negative bias, it was estimated that 85% of the chemicals were either carcinogenic or anticarcinogenic at some site in some sex-species group. This suggests that most chemicals tested at high enough doses will cause some sort of perturbation in tumor rates.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015666
      Description:

      The fusion sample obtained by a statistical matching process can be considered a sample out of an artificial population. The distribution of this artificial population is derived. If the correlation between specific variables is the only focus the strong demand for conditional independence can be weakened. In a simulation study the effects of violations of some assumptions leading to the distribution of the artificial population are examined. Finally some ideas concerning the establishing of the claimed conditional independence by latent class analysis are presented.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015670
      Description:

      To reach their target audience efficiently, advertisers and media planners need information on which media their customers use. For instance, they may need to know what percentage of Diet Coke drinkers watch Baywatch, or how many AT&T customers have seen an advertisement for Sprint during the last week. All the relevant data could theoretically be collected from each respondent. However, obtaining full detailed and accurate information would be very expensive. It would also impose a heavy respondent burden under current data collection technology. This information is currently collected through separate surveys in New Zealand and in many other countries. Exposure to the major media is measured continuously, and product usage studies are common. Statistical matching techniques provide a way of combining these separate information sources. The New Zealand television ratings database was combined with a syndicated survey of print readership and product usage, using statistical matching. The resulting Panorama service meets the targeting information needs of advertisers and media planners. It has since been duplicated in Australia. This paper discusses the development of the statistical matching framework for combining these databases, and the heuristics and techniques used. These included an experiment conducted using a screening design to identify important matching variables. Studies evaluating and validating the combined results are also summarized. The following three major evaluation criteria were used; accuracy of combined results, statibility of combined results and the preservation of currency results from the component databases. The paper then discusses how the prerequisites for combining the databases were met. The biggest hurdle at this stage was the differences between the analysis techniques used on the two component databases. Finally, suggestions for developing similar statistical matching systems elsewhere will be given.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015672
      Description:

      Data fusion as discussed here means to create a set of data on not jointly observed variables from two different sources. Suppose for instance that observations are available for (X,Z) on a set of individuals and for (Y,Z) on a different set of individuals. Each of X, Y and Z may be a vector variable. The main purpose is to gain insight into the joint distribution of (X,Y) using Z as a so-called matching variable. At first however, it is attempted to recover as much information as possible on the joint distribution of (X,Y,Z) from the distinct sets of data. Such fusions can only be done at the cost of implementing some distributional properties for the fused data. These are conditional independencies given the matching variables. Fused data are typically discussed from the point of view of how appropriate this underlying assumption is. Here we give a different perspective. We formulate the problem as follows: how can distributions be estimated in situations when only observations from certain marginal distributions are available. It can be solved by applying the maximum entropy criterium. We show in particular that data created by fusing different sources can be interpreted as a special case of this situation. Thus, we derive the needed assumption of conditional independence as a consequence of the type of data available.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015674
      Description:

      The effect of the environment on health is of increasing concern, in particular the effects of the release of industrial pollutants into the air, the ground and into water. An assessment of the risks to public health of any particular pollution source is often made using the routine health, demographic and environmental data collected by government agencies. These datasets have important differences in sampling geography and in sampling epochs which affect the epidemiological analyses which draw them together. In the UK, health events are recorded for individuals, giving cause codes, a data of diagnosis or death, and using the unit postcode as a geographical reference. In contrast, small area demographic data are recorded only at the decennial census, and released as area level data in areas distinct from postcode geography. Environmental exposure data may be available at yet another resolution, depending on the type of exposure and the source of the measurements.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015686
      Description:

      The U.S. Consumer Expenditure Survey uses two instruments, a diary and an in-person interview, to collect data on many categories of consumer expenditures. Consequently, it is important to use these data efficiently to estimate mean expenditures and related parameters. Three options are: (1) use only data from the diary source; (2) Use only data from the interview source; and (3) use generalized least squares, or related methods, to combine the diary and interview data. Historically, the U.S. Bureau of Labor Statistics has focused on options (1) and (2) for estimation at the five or six-digit Universal Classification Code level. Evaluation and possible implementation of option (3) depends on several factors, including possible measurement biases in the diary and interview data; the empirical magnitude of these biases, relative to the standard errors of customary mean estimators; and the degree of homogeneity of these biases across strata and periods. This paper reviews some issues related to options (1) through (3); describes a relatively simple generalized least squares method for implementation of option (3); and discussed the need for diagnostics to evaluate the feasibility and relative efficiency of the generalized least squares method.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015690
      Description:

      The artificial sample was generated in two steps. The first step, based on a master panel, was a Multiple Correspondence Analysis (MCA) carried out on basic variables. Then, "dummy" individuals were generated randomly using the distribution of each "significant" factor in the analysis. Finally, for each individual, a value was generated for each basic variable most closely linked to one of the previous factors. This method ensured that sets of variables were drawn independently. The second step consisted in grafting some other data bases, based on certain property requirements. A variable was generated to be added on the basis of its estimated distribution, using a generalized linear model for common variables and those already added. The same procedure was then used to graft the other samples. This method was applied to the generation of an artificial sample taken from two surveys. The artificial sample that was generated was validated using sample comparison testing. The results were positive, demonstrating the feasibility of this method.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015692
      Description:

      Electricity rates that vary by time-of-day have the potential to significantly increase economic efficiency in the energy market. A number of utilities have undertaken economic studies of time-of-use rates schemes for their residential customers. This paper uses meta-analysis to examine the impact of time-of-use rates on electricity demand pooling the results of thirty-eight separate programs. There are four key findings. First, very large peak to off-peak price ratios are needed to significantly affect peak demand. Second, summer peak rates are relatively effective compared to winter peak rates. Third, permanent time-or-use rates are relatively effective compared to experimental ones. Fourth, demand charges rival ordinary time-of-use rates in terms of impact.

      Release date: 2000-03-02
    Data (7)

    Data (7) ((7 results))

    • Public use microdata: 95M0029X
      Description: This hierarchical file provides data on the characteristics of the population. The 2006 Census Public Use Microdata Files (PUMFs) contain samples of anonymous responses to the 2006 Census questionnaire. The files have been carefully scrutinized to ensure the complete confidentiality of the individual responses. The individual file was released on March 4, 2010 and the hierarchical file is available as of today, May 2, 2011.

      Microdata files are unique among census products in that they give users access to non-aggregated data. The PUMFs user can group and manipulate these variables to suit data and research requirements. Tabulations excluded from other census products can be created or relationships between variables can be analysed using different statistical tests. PUMFs provide quick access to a comprehensive social and economic database about Canada and its people.

      Most of the subject matter covered by the census is included in the microdata files. To ensure the respondents' anonymity, geographic identifiers have been restricted to provinces/territories and large metropolitan areas.

      This product, offered on CD-ROM, contains the data file (in ASCII format), user documentation and SAS and SPSS program source codes to enable you to read the set of records. Note: users will require knowledge of data manipulation and retrieval software such as SAS or SPSS to be able to use this product.

      Release date: 2023-09-12

    • Data Visualization: 71-607-X2020010
      Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
      Release date: 2023-01-24

    • Table: 17-20-00022022001
      Description: The Canadian Social Environment Typology (CanSET) data file on cluster membership by dissemination area is a downloadable data file. The file includes information on the variables that were used to create the clusters and a data table with cluster options on membership by dissemination area.
      Release date: 2022-05-09

    • Table: 13-019-X
      Description: These data tables provide quarterly information on Canada's National Income and Expenditure Accounts (NIEA), 1961-2012. It contains seasonally adjusted data on gross domestic product (GDP) by income and by expenditure, saving and investment, borrowing and lending of each of four broad sectors of the economy: (i) persons and unincorporated businesses, (ii) corporate and government business enterprises, (iii) governments and (iv) non-residents. Information is also provided for selected subsectors. The tables include data beginning in 1961, and is no longer being released.
      Release date: 2012-08-31

    • Table: 23-603-X
      Description:

      This publication contains data from 1976 to date for major livestock series: cattle and calves, hogs, sheep and lambs, wool, furs, trade and prices, stocks of frozen meats, and apparent per capita meat consumption. Data highlights are also included. New and revised estimates for these data are released four times a year.

      Release date: 2003-03-05

    • Table: 51F0007X
      Description:

      For most of the post-war period, Canada and the United States have utilized an open regime to govern trade relations between the two countries. Such has not always been the case for transborder air services, however. In 1966, the two countries signed an air services accord (ASA) that governed commercial air services between the two. The 1966 accord was quite restrictive, limiting entry and price competition in transborder markets. This restrictive agreement governed Canada-U.S. air service for almost 30 years, finally being replaced in 1995 with a new ASA that has granted entry and pricing freedom in transborder markets.

      Release date: 2001-06-05

    • Table: 94F0005X
      Description:

      This CD-ROM is part of the Dimensions Series which provides an in-depth analysis of census data. More than 150 tables represent a variety of special interest subjects linking a number of Census variables. Statistical information is presented on themes of considerable public interest with some tables examining historical trends and other tables detailing significant sub-populations. Data for geographical levels of Canada, Provinces and Territories are most widely represented with some data tables produced at the Census Metropolitan Area level. The Portrait of Official Language Communities in Canada and the Portrait of Aboriginal Population of Canada contain some information at the community level.Some tables show comparisons with data from earlier censuses to provide an historical perspective.

      Release date: 1999-04-06
    Analysis (187)

    Analysis (187) (0 to 10 of 187 results)

    • Journals and periodicals: 11-632-X
      Description: The newsletter offers information aimed at three main groups, businesses (small to medium), communities and ethno-cultural groups/communities. Articles and outreach materials will assist their understanding of national and local data from the many relevant sources found on the Statistics Canada website.
      Release date: 2024-05-23

    • Journals and periodicals: 45-20-0003
      Description: The ‘Eh Sayers’ podcast explores data of interest to Canadians, like social or news-worthy topics. It also aims to foster data literacy and deliver insight into the lives of Canadians by exploring the data the agency produces and tying it to real life situations through storytelling.
      Release date: 2024-05-08

    • Articles and reports: 11-633-X2022007
      Description:

      This paper investigates how Statistics Canada can increase trust by giving users the ability to authenticate data from its website through digital signatures and blockchain technology.

      Release date: 2022-09-19

    • Stats in brief: 89-20-00082021001
      Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to perform the dominance and homogeneity test while using the Census.
      Release date: 2022-04-29

    • Stats in brief: 89-20-00082021002
      Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
      Release date: 2022-04-27

    • Stats in brief: 89-20-00082021003
      Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
      Release date: 2022-04-27

    • Stats in brief: 89-20-00082021004
      Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
      Release date: 2022-04-27

    • Stats in brief: 89-20-00082021005
      Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
      Release date: 2022-04-27

    • Stats in brief: 89-20-00082021006
      Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
      Release date: 2022-04-27

    • Stats in brief: 11-627-M2022016
      Description:

      This infographic explains the steps involved in collecting data for all Statistics Canada household and business surveys. The responses are compiled, analyzed and used to make important decisions and are kept strictly confidential.

      Release date: 2022-02-28
    Reference (55)

    Reference (55) (40 to 50 of 55 results)

    • Surveys and statistical programs – Documentation: 11-522-X19990015666
      Description:

      The fusion sample obtained by a statistical matching process can be considered a sample out of an artificial population. The distribution of this artificial population is derived. If the correlation between specific variables is the only focus the strong demand for conditional independence can be weakened. In a simulation study the effects of violations of some assumptions leading to the distribution of the artificial population are examined. Finally some ideas concerning the establishing of the claimed conditional independence by latent class analysis are presented.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015670
      Description:

      To reach their target audience efficiently, advertisers and media planners need information on which media their customers use. For instance, they may need to know what percentage of Diet Coke drinkers watch Baywatch, or how many AT&T customers have seen an advertisement for Sprint during the last week. All the relevant data could theoretically be collected from each respondent. However, obtaining full detailed and accurate information would be very expensive. It would also impose a heavy respondent burden under current data collection technology. This information is currently collected through separate surveys in New Zealand and in many other countries. Exposure to the major media is measured continuously, and product usage studies are common. Statistical matching techniques provide a way of combining these separate information sources. The New Zealand television ratings database was combined with a syndicated survey of print readership and product usage, using statistical matching. The resulting Panorama service meets the targeting information needs of advertisers and media planners. It has since been duplicated in Australia. This paper discusses the development of the statistical matching framework for combining these databases, and the heuristics and techniques used. These included an experiment conducted using a screening design to identify important matching variables. Studies evaluating and validating the combined results are also summarized. The following three major evaluation criteria were used; accuracy of combined results, statibility of combined results and the preservation of currency results from the component databases. The paper then discusses how the prerequisites for combining the databases were met. The biggest hurdle at this stage was the differences between the analysis techniques used on the two component databases. Finally, suggestions for developing similar statistical matching systems elsewhere will be given.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015672
      Description:

      Data fusion as discussed here means to create a set of data on not jointly observed variables from two different sources. Suppose for instance that observations are available for (X,Z) on a set of individuals and for (Y,Z) on a different set of individuals. Each of X, Y and Z may be a vector variable. The main purpose is to gain insight into the joint distribution of (X,Y) using Z as a so-called matching variable. At first however, it is attempted to recover as much information as possible on the joint distribution of (X,Y,Z) from the distinct sets of data. Such fusions can only be done at the cost of implementing some distributional properties for the fused data. These are conditional independencies given the matching variables. Fused data are typically discussed from the point of view of how appropriate this underlying assumption is. Here we give a different perspective. We formulate the problem as follows: how can distributions be estimated in situations when only observations from certain marginal distributions are available. It can be solved by applying the maximum entropy criterium. We show in particular that data created by fusing different sources can be interpreted as a special case of this situation. Thus, we derive the needed assumption of conditional independence as a consequence of the type of data available.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015674
      Description:

      The effect of the environment on health is of increasing concern, in particular the effects of the release of industrial pollutants into the air, the ground and into water. An assessment of the risks to public health of any particular pollution source is often made using the routine health, demographic and environmental data collected by government agencies. These datasets have important differences in sampling geography and in sampling epochs which affect the epidemiological analyses which draw them together. In the UK, health events are recorded for individuals, giving cause codes, a data of diagnosis or death, and using the unit postcode as a geographical reference. In contrast, small area demographic data are recorded only at the decennial census, and released as area level data in areas distinct from postcode geography. Environmental exposure data may be available at yet another resolution, depending on the type of exposure and the source of the measurements.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015686
      Description:

      The U.S. Consumer Expenditure Survey uses two instruments, a diary and an in-person interview, to collect data on many categories of consumer expenditures. Consequently, it is important to use these data efficiently to estimate mean expenditures and related parameters. Three options are: (1) use only data from the diary source; (2) Use only data from the interview source; and (3) use generalized least squares, or related methods, to combine the diary and interview data. Historically, the U.S. Bureau of Labor Statistics has focused on options (1) and (2) for estimation at the five or six-digit Universal Classification Code level. Evaluation and possible implementation of option (3) depends on several factors, including possible measurement biases in the diary and interview data; the empirical magnitude of these biases, relative to the standard errors of customary mean estimators; and the degree of homogeneity of these biases across strata and periods. This paper reviews some issues related to options (1) through (3); describes a relatively simple generalized least squares method for implementation of option (3); and discussed the need for diagnostics to evaluate the feasibility and relative efficiency of the generalized least squares method.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015690
      Description:

      The artificial sample was generated in two steps. The first step, based on a master panel, was a Multiple Correspondence Analysis (MCA) carried out on basic variables. Then, "dummy" individuals were generated randomly using the distribution of each "significant" factor in the analysis. Finally, for each individual, a value was generated for each basic variable most closely linked to one of the previous factors. This method ensured that sets of variables were drawn independently. The second step consisted in grafting some other data bases, based on certain property requirements. A variable was generated to be added on the basis of its estimated distribution, using a generalized linear model for common variables and those already added. The same procedure was then used to graft the other samples. This method was applied to the generation of an artificial sample taken from two surveys. The artificial sample that was generated was validated using sample comparison testing. The results were positive, demonstrating the feasibility of this method.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19990015692
      Description:

      Electricity rates that vary by time-of-day have the potential to significantly increase economic efficiency in the energy market. A number of utilities have undertaken economic studies of time-of-use rates schemes for their residential customers. This paper uses meta-analysis to examine the impact of time-of-use rates on electricity demand pooling the results of thirty-eight separate programs. There are four key findings. First, very large peak to off-peak price ratios are needed to significantly affect peak demand. Second, summer peak rates are relatively effective compared to winter peak rates. Third, permanent time-or-use rates are relatively effective compared to experimental ones. Fourth, demand charges rival ordinary time-of-use rates in terms of impact.

      Release date: 2000-03-02

    • Surveys and statistical programs – Documentation: 11-522-X19980015018
      Description:

      This paper presents a method for handling longitudinal data in which individuals belong to more than one unit at a higher level, and also where there is missing information on the identification of the units to which they belong. In education, for example, a student might be classified as belonging sequentially to a particular combination of primary and secondary school, but for some students, the identity of either the primary or secondary school may be unknown. Likewise, in a longitudinal study, students may change school or class from one period to the next, so 'belonging' to more than one higher level unit. The procedures used to model these stuctures are extensions of a random effects cross-classified multilevel model.

      Release date: 1999-10-22

    • Surveys and statistical programs – Documentation: 11-522-X19980015022
      Description:

      This article extends and further develops the method proposed by Pfeffermann, Skinner and Humphreys (1998) for the estimation of gross flows in the presence of classification errors. The main feature of that method is the use of auxiliary information at the individual level which circumvents the need for validation data for estimating the misclassification rates. The new developments in this article are the establishment of conditions for model identification, a study of the properties of a model goodness of fit statistic and modifications to the sample likelihood to account for missing data and informative sampling. The new developments are illustrated by a small Monte-Carlo simulation study.

      Release date: 1999-10-22

    • Surveys and statistical programs – Documentation: 11-522-X19980015029
      Description:

      In longitudinal surveys, sample subjects are observed over several time points. This feature typically leads to dependent observations on the same subject, in addition to the customary correlations across subjects induced by the sample design. Much research in the literature has focussed on modeling the marginal mean of a response as a function of covariates. Liang and Zeger (1986) used generalized estimating equations (GEE), requiring only correct specification of the marginal mean, and obtained standard errors of regression parameter estimates and associated Wald tests, assuming a "working" correlation structure for the repeated measurements on a sample subject. Rotnitzky and Jewell (1990) developed quasi-score tests and Rao-Scott adjustments to "working" quasi-score tests under marginal models. These methods are asymptotically robust to misspecification of the within-subject correlation structure, but assume independence of sample subjects which is not satisfied for complex longitudinal survey data based on stratified multi-stage sampling. We proposed asymptotically valid Wald and quasi-score tests for longitudinal survey data, using the Taylor Linearization and jackknife methods. Alternative tests, based on Rao-Scott adjustments to naive tests that ignore survey design features and on Bonferroni-t, are also developed. These tests are particularly useful when the effective degrees of freedom, usually taken as the total number of sample primary units (clusters) minus the number of strata, is small.

      Release date: 1999-10-22
    Date modified: