Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Author(s)

83 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (138)

All (138) (10 to 20 of 138 results)

  • Stats in brief: 89-20-00062022001
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. To properly conduct these processes, data ethics ethics must be upheld in order to ensure the appropriate use of data.

    Release date: 2022-05-24

  • Stats in brief: 89-20-00062022002
    Description:

    This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide data users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.

    Release date: 2022-05-24

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 11-522-X202100100018
    Description: Statistics Finland started publishing nowcasts of the trend indicator of output (TIO), the monthly indicator of real economic activity, to answer users´ needs during the Covid-19 pandemic. The indicator was first published in April 2020, at the very beginning of the pandemic in Finland, and had a monthly release schedule until June 2021. The TIO nowcasts are produced using open-source data on truck traffic volumes at about 100 automatic measuring points in the Helsinki/Uusimaa -region and the Economic Sentiment Indicator for Finland. Estimation is done using a machine learning approach and the methodology is based on previous work done by Statistics Finland and ETLA Economic Research.

    Key Words: nowcasting; flash estimates; machine learning; experimental statistics.

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100025
    Description:

    We propose a longitudinal analysis with a point of view connected to the organizational changes that have taken place in the Italian National Institute of Statistics in recent years. In 2016 the Institute introduced a new Directorate, intending to standardize and generalize the business process of Data Collection according to the European standard of the GAMSO model. The paper discusses the pros and cons of this change from the perspective of the survey's participation. The ICT survey response rate analysis demonstrates an increase of around 20% since the beginning of the new organization: the paper tries to focus on the impact of the changes introduced with the new organization. We focused our attention on two specific subsets of respondents - the so-called "wanted" - the ones who have never answered to an ICT survey or to any other Istat survey and - the so-called “lost” - the ones included in two consecutive survey’s samples and that answered in the previous edition but not in the current one. The paper aims to illustrate how an efficient organization of data collection reflects its benefits on survey results and what kind of actions should be taken to catch the attention of the "wanted". Finally, we apply a logistic model measuring the probability that an enterprise responding in 2018 (t-1) also answered in 2019 (t). All the analysis suggests some actions that could be taken to improve respondents' participation, data quality, and respondents' perception of the official statistics.

    Key Words: data collection strategy, response rate, paradata, response burden, ICT Survey.

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100005
    Description: The Permanent Census of Population and Housing is the new census strategy adopted in Italy in 2018: it is based on statistical registers combined with data collected through surveys specifically designed to improve registers quality and assure Census outputs. The register at the core of the Permanent Census is the Population Base Register (PBR), whose main administrative sources are the Local Population Registers. The population counts are determined correcting the PBR data with coefficients based on the coverage errors estimated with surveys data, but the need for additional administrative sources clearly emerged while processing the data collected with the first round of Permanent Census. The suspension of surveys due to global-pandemic emergency, together with a serious reduction in census budget for next years, makes more urgent a change in estimation process so to use administrative data as the main source. A thematic register has been set up to exploit all the additional administrative sources: knowledge discovery from this database is essential to extract relevant patterns and to build new dimensions called signs of life, useful for population estimation. The availability of the collected data of the two first waves of Census offers a unique and valuable set for statistical learning: association between surveys results and ‘signs of life’ could be used to build classification model to predict coverage errors in PBR. This paper present the results of the process to produce ‘signs of life’ that proved to be significant in population estimation.

    Key Words: Administrative data; Population Census; Statistical Registers; Knowledge discovery from databases.

    Release date: 2021-10-22

  • Articles and reports: 11-522-X202100100014
    Description: Recent developments in questionnaire administration modes and data extraction have favored the use of nonprobability samples, which are often affected by selection bias that arises from the lack of a sample design or self-selection of the participants. This bias can be addressed by several adjustments, whose applicability depends on the type of auxiliary information available. Calibration weighting can be used when only population totals of auxiliary variables are available. If a reference survey that followed a probability sampling design is available, several methods can be applied, such as Propensity Score Adjustment, Statistical Matching or Mass Imputation, and doubly robust estimators. In the case where a complete census of the target population is available for some auxiliary covariates, estimators based in superpopulation models (often used in probability sampling) can be adapted to the nonprobability sampling case. We studied the combination of some of these methods in order to produce less biased and more efficient estimates, as well as the use of modern prediction techniques (such as Machine Learning classification and regression algorithms) in the modelling steps of the adjustments described. We also studied the use of variable selection techniques prior to the modelling step in Propensity Score Adjustment. Results show that adjustments based on the combination of several methods might improve the efficiency of the estimates, and the use of Machine Learning and variable selection techniques can contribute to reduce the bias and the variance of the estimators to a greater extent in several situations. 

    Key Words: nonprobability sampling; calibration; Propensity Score Adjustment; Matching.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100019
    Description: Official statistical agencies must continually seek new methods and techniques that can increase both program efficiency and product relevance. The U.S. Census Bureau’s measurement of construction activity is currently a resource-intensive endeavor, relying heavily on monthly survey response via questionnaires and extensive field data collection. While our data users continually require more timely and granular data products, the traditional survey approach and associated collection cost and respondent burden limits our ability to meet that need. In 2019, we began research on whether the application of machine learning techniques to satellite imagery could accurately estimate housing starts and completions while meeting existing monthly indicator timelines at a cost equal to or less than existing methods. Using historical Census construction survey data in combination with targeted satellite imagery, the team trained, tested, and validated convolutional neural networks capable of classifying images by their stage of construction demonstrating the viability of a data science-based approach to producing official measures of construction activity.

    Key Words: Official Statistics; Housing Starts, Machine Learning, Satellite Imagery

    Release date: 2021-10-15

  • Articles and reports: 82-003-X202000700002
    Description:

    This paper's objectives are to examine the feasibility of pooling linked population health surveys from three countries, facilitate the examination of health behaviours, and present useful information to assist in the planning of international population health surveillance and research studies.

    Release date: 2020-07-29

  • Articles and reports: 12-001-X201900300004
    Description:

    Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.

    Release date: 2019-12-17
Stats in brief (3)

Stats in brief (3) ((3 results))

  • Stats in brief: 89-20-00062022004
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. In this video, we will discuss the importance of considering data ethics throughout the process of producing statistical information.

    As a pre-requisite to this video, make sure to watch the video titled “Data Ethics: An introduction” also available in Statistics Canada’s data literacy training catalogue.

    Release date: 2022-10-17

  • Stats in brief: 89-20-00062022001
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. To properly conduct these processes, data ethics ethics must be upheld in order to ensure the appropriate use of data.

    Release date: 2022-05-24

  • Stats in brief: 89-20-00062022002
    Description:

    This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide data users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.

    Release date: 2022-05-24
Articles and reports (134)

Articles and reports (134) (0 to 10 of 134 results)

  • Articles and reports: 11-522-X202200100001
    Description: Record linkage aims at identifying record pairs related to the same unit and observed in two different data sets, say A and B. Fellegi and Sunter (1969) suggest each record pair is tested whether generated from the set of matched or unmatched pairs. The decision function consists of the ratio between m(y) and u(y),probabilities of observing a comparison y of a set of k>3 key identifying variables in a record pair under the assumptions that the pair is a match or a non-match, respectively. These parameters are usually estimated by means of the EM algorithm using as data the comparisons on all the pairs of the Cartesian product ?=A×B. These observations (on the comparisons and on the pairs status as match or non-match) are assumed as generated independently of other pairs, assumption characterizing most of the literature on record linkage and implemented in software tools (e.g. RELAIS, Cibella et al. 2012). On the contrary, comparisons y and matching status in ? are deterministically dependent. As a result, estimates on m(y) and u(y) based on the EM algorithm are usually bad. This fact jeopardizes the effective application of the Fellegi-Sunter method, as well as automatic computation of quality measures and possibility to apply efficient methods for model estimation on linked data (e.g. regression functions), as in Chambers et al. (2015). We propose to explore ? by a set of samples, each one drawn so to preserve independence of comparisons among the selected record pairs. Simulations are encouraging.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100003
    Description: Estimation at fine levels of aggregation is necessary to better describe society. Small area estimation model-based approaches that combine sparse survey data with rich data from auxiliary sources have been proven useful to improve the reliability of estimates for small domains. Considered here is a scenario where small area model-based estimates, produced at a given aggregation level, needed to be disaggregated to better describe the social structure at finer levels. For this scenario, an allocation method was developed to implement the disaggregation, overcoming challenges associated with data availability and model development at such fine levels. The method is applied to adult literacy and numeracy estimation at the county-by-group-level, using data from the U.S. Program for the International Assessment of Adult Competencies. In this application the groups are defined in terms of age or education, but the method could be applied to estimation of other equity-deserving groups.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100010
    Description: Growing Up in Québec is a longitudinal population survey that began in the spring of 2021 at the Institut de la statistique du Québec. Among the children targeted by this longitudinal follow-up, some will experience developmental difficulties at some point in their lives. Those same children often have characteristics associated with higher sample attrition (low-income family, parents with a low level of education). This article describes the two main challenges we encountered when trying to ensure sufficient representativeness of these children, in both the overall results and the subpopulation analyses.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100015
    Description: We present design-based Horvitz-Thompson and multiplicity estimators of the population size, as well as of the total and mean of a response variable associated with the elements of a hidden population to be used with the link-tracing sampling variant proposed by Félix-Medina and Thompson (2004). Since the computation of the estimators requires to know the inclusion probabilities of the sampled people, but they are unknown, we propose a Bayesian model which allows us to estimate them, and consequently to compute the estimators of the population parameters. The results of a small numeric study indicate that the performance of the proposed estimators is acceptable.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100018
    Description: The Longitudinal Social Data Development Program (LSDDP) is a social data integration approach aimed at providing longitudinal analytical opportunities without imposing additional burden on respondents. The LSDDP uses a multitude of signals from different data sources for the same individual, which helps to better understand their interactions and track changes over time. This article looks at how the ethnicity status of people in Canada can be estimated at the most detailed disaggregated level possible using the results from a variety of business rules applied to linked data and to the LSDDP denominator. It will then show how improvements were obtained using machine learning methods, such as decision trees and random forest techniques.
    Release date: 2024-03-25

  • Articles and reports: 75F0002M2023001
    Description: This discussion paper describes the work being achieved and undertaken by Statistics Canada, in partnership with the Treasury Board of Canada Secretariat, the Department of Finance Canada and the Privy Council Office, on developing the Quality of Life Framework for Canada and related outputs, including an online Hub. This is the first paper in a series that will provide updates on the progress of work relating to the Framework.
    Release date: 2023-04-19

  • Articles and reports: 82-003-X202300200003
    Description: Utility scores are an important tool for evaluating health-related quality of life. Utility score norms have been published for Canadian adults, but no nationally representative utility score norms are available for non-adults. Using Health Utilities Index Mark 3 (HUI3) data from two recent cycles of the Canadian Health Measures Survey (i.e., 2016-2017 and 2018-2019), this is the first study to provide utility score norms for children aged 6 to 11 years and adolescents aged 12 to 17 years.
    Release date: 2023-02-15

  • Articles and reports: 11-633-X2022007
    Description:

    This paper investigates how Statistics Canada can increase trust by giving users the ability to authenticate data from its website through digital signatures and blockchain technology.

    Release date: 2022-09-19

  • Articles and reports: 12-001-X202200100002
    Description:

    We consider an intercept only linear random effects model for analysis of data from a two stage cluster sampling design. At the first stage a simple random sample of clusters is drawn, and at the second stage a simple random sample of elementary units is taken within each selected cluster. The response variable is assumed to consist of a cluster-level random effect plus an independent error term with known variance. The objects of inference are the mean of the outcome variable and the random effect variance. With a more complex two stage sampling design, the use of an approach based on an estimated pairwise composite likelihood function has appealing properties. Our purpose is to use our simpler context to compare the results of likelihood inference with inference based on a pairwise composite likelihood function that is treated as an approximate likelihood, in particular treated as the likelihood component in Bayesian inference. In order to provide credible intervals having frequentist coverage close to nominal values, the pairwise composite likelihood function and corresponding posterior density need modification, such as a curvature adjustment. Through simulation studies, we investigate the performance of an adjustment proposed in the literature, and find that it works well for the mean but provides credible intervals for the random effect variance that suffer from under-coverage. We propose possible future directions including extensions to the case of a complex design.

    Release date: 2022-06-21

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16
Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

  • Journals and periodicals: 84F0013X
    Geography: Canada, Province or territory
    Description:

    This study was initiated to test the validity of probabilistic linkage methods used at Statistics Canada. It compared the results of data linkages on infant deaths in Canada with infant death data from Nova Scotia and Alberta. It also compared the availability of fetal deaths on the national and provincial files.

    Release date: 1999-10-08
Date modified: