Analytical Studies: Methods and References
The System of National Quality-of-Life Statistics: Future Directions

Release date: January 31, 2022

Skip to text

Text begins

Executive summary

In response to current demands, Statistics Canada is engaged in a long-term exercise to greatly strengthen the statistics that it produces to measure the quality of life in Canada. Compared with current arrangements, these statistics will be more timely, more granular, more responsive to the needs of users, and more easily accessed.

In line with directions set out in Budget 2021, the system that is being developed will provide comprehensive, useable evidence on:

Over time, the system will also begin to provide evidence that can improve decisions about the particular social interventions that are likely to work best for specific individuals – with that information being accessible at the time when these decisions are being taken, based on analysis of what has worked best in similar situations in the past.

The resulting evidence will support government budgeting and accountability, the design of effective programs, and efficiency in their administration. Decision-making by individual citizens and service providers will be supported with practical ‘what is likely to work best’ data. The outcome will be major improvements in the operation of labour markets, health, learning and other social dimensions of life in Canada, both on average and for all population groups – and direct benefits to individuals as they make big decisions in the social, health and labour domains of life.

Unlike the existing social statistics, which are based around particular data sources such as surveys, censuses and summarized administrative records, the new system will provide far more detailed and timely information by drawing on data from multiple sources. These sources will include a much more flexible suite of surveys, finely-grained information about operation of social programs, and anonymized data about individuals that currently exist in administrative records. A multi-source system is possible as a result of the major improvements in digital information processing technology that have been introduced in recent decades, with promise of even more powerful tools and data sources on the horizon.

The foundations for this transformation are already in place and the priority on speedy implementation has increased as a result of COVID-19 and the directions outlined in Budget 2021. Many payoffs become visible over the next several years. Others will take longer to be fully realized.

One of the most visible early products will be the creation of an interactive website that will present high-level quality-of-life indicators that reflect current policy priorities, as well as a series of practical dashboards of indicators designed to support a wide range of current policy agendas for users both in and out of government. Current statistics, such as the latest unemployment rates, would be released on this site where they would be analysed along with indicators related to other dimensions of wellbeing.

The site will provide direct access to interactive tables along the lines of those on the present Statistics Canada site, except that there will be many additional tables with newly-created data, based on consultations with users, that would be drawn from different sources. This is in opposition to the present practice of presenting tables based only a particular survey or other data source.

In addition, the new techniques for combining data from multiple surveys and administrative data sources without any of the resulting data being linkable to real people opens up the possibility of developing powerful new kinds of data products for release on the web site. These are known as standard, multiple-source microdata files. These contain particularly rich data that can be used to provide a new understanding of the factors that affect the quality of people’s lives over time, including the role played by different social programs and services.

Another early payoff will be greater flexibility in responding to requests for new statistics. The interactive web site will, itself, provide far more information than is currently available. As well, many requests for new data will be met by combining existing data from different sources, without time-consuming and costly new data collection. When new data collection is needed, a flexible suite of new survey vehicles is being developed to allow fast collection of information that is finely-tuned to current needs.

Other statistical products will take longer to produce. For example, some will require developing consistent, finely-grained ways of describing the existing social, learning and health services. Such standard descriptors are needed, for example, in order to provide consistent ‘what is likely to work best’ information that will support service providers and individual citizens in making the best choices among the interventions that are open to them. In some areas, the needed information already exists. In others, much development work will be needed.

This paper describes the system as it may look in some five or so years time assuming the continuation of present plans. It does this by examining the likely future inputs, processes, outputs and outcomes of the system, and compares them with the arrangements in the traditional single-source system of social statistics that has, until recently, been the norm.

The paper also describes the main constraints and risks that will be faced as we move ahead and how these will be managed.

In terms of resource constraints, funding will come from internal reallocations and new funds outlined in Budget 2021. More powerful information technology (IT) resources will be required, but these are already being used in other applications. The biggest constraint is likely to be the shortage of people with the skills needed to produce accurate statistics from multiple sources of data, often involving large data sets and complex statistical manipulations, while ensuring privacy. New approaches will be developed to increase the size of the pool of people with these skills and to maximize the use of those skills in both Statistics Canada and in the user communities.

New ways of doing business are being proposed. The largest risk to be managed during this period of rapid change relates to the need to maintain and reinforce the trust and confidence of the users of statistics, the individual Canadians who respond to surveys and the agencies who provide the needed administrative data inputs. Managing this risk will require different kinds of outreach to explain how the new arrangements will work, how privacy and confidentiality will be assured, and how all parties will gain over time from the new approaches.

As well, the risks associated with the introduction of a new product line – i.e., the use of predictive analytics to produce ‘what is likely to work best’ evidence in support of decision-making about particular social interventions – will be mitigated by a gradual, experimental approach, working in partnership with different service providers.

In terms of social acceptance and trust, Statistics Canada has recently put in place a world-leading framework (called the Necessity and Proportionality Framework) for balancing use, privacy, and response burden. Sophisticated tools have already been developed that ensure privacy protection when linking data from multiple sources, using techniques that split out individual identifiers from the data that describe those individuals. Response burden will be reduced by lessening the demand for repetitive inputs. New emphasis will be placed on the role of Statistics Canada itself in producing statistics that are based on multiple sources of data, with built-in assurance of both quality and privacy protection, reducing past reliance on external users to carry out some of these functions. Greater use of synthetic data will simultaneously produce useful new data and assurances of anonymity.

Perhaps the most important new payoffs in terms of building acceptance and support will lie in the production of new statistics that will be directly useful, not only to researchers and policy-makers, but also to those who administer social programs and services and to individual citizens. A more immediate and transparent balance can be struck among response burden, usefulness and the protection of privacy – strengthening the implicit social contract that exists among those who provide, process and use data for statistical purposes.

A separate paper, entitled “The System of National Quality-of-Life Statistics: Conceptual Framework”, will be released as a complement to this paper that will describe the conceptual framework that underlies the new approach. It will show how the standard definitions that are used to describe the individual items of data that are collected from different sources will result in a flexible system that allows integrated analysis across all the domains that affect quality of life, including linkages with economic and environmental statistics, compatibility with the frameworks used in many academic disciplines, and comparisons with other countries.

1. Introduction and summary

Deep changes are taking place in national social statistics. This paper presents a picture of what the system might look like in some 5 or 10 years should Statistics Canada continue along its current path of rapid evolution. Possible actions in the shorter term are described, as are strategies for managing the constraints and risks that inevitably arise in making such large changes.

A supporting paper will be available shortly that describes the changes from a more technical perspective. It will describe the conceptual framework that underlies the new system and how it results in a system that is both integrated and flexible. It will show how the framework is consistent with those used in economic and environmental statistics and in different scientific disciplines.

This introductory chapter is structured as follows:

1.1. Why change? With what payoffs?

Deep changes are taking place in the collection and use of national social statistics, which – for reasons that will be explained later – will be referred to as the ‘National System of Quality-of-Life Statistics’.

The new information obtained from multiple sources is changing our understanding of how societies work, particularly as we dig down beyond broad averages and examine the diverse needs and circumstances of people in the many groups that comprise our population. It is starting to provide a far better understanding of how people change over their lifetimes, including the role played by our social institutions and programs.

Box 1. The rationale for the change: some bottom-line Q&As

What new directions were launched in 2019?

In order to meet new demands for statistical information, and as part of a larger modernization exercise, Statistics Canada launched a reform of its social statistics that makes use of current technology to shift from a traditional single-source system to one where statistics would be drawn from multiple sources, including much greater use of administrative records and a more flexible suite of surveys.

  • The purpose was to better meet the currently expressed needs of users – producing statistics that were more relevant, timely and disaggregated. 
  • The new processes would also minimize response burden, increase trust in the statistical system and use state-of-the-art tools and methods.
  • The ultimate goal was a higher quality of life for Canadians as a result of the improved effectiveness of policies and programs, and the improved decision-making in all dimensions of social life, that would result from the use of this new statistical evidence.

What changed as a result of the pandemic and Budget 2021?

The need for rapid responses to COVID-19 resulted in dramatic increases in the development of new data sources, the speed of turnaround times, and the production of disaggregated information that showed which population groups were at greatest risk. It demonstrated the validity and power of the new way of doing business.

Budget 2021 strongly reinforced the emphasis on producing policy-relevant, disaggregated statistics. It also stressed the importance of producing evidence about the multiple factors that impact on quality of life and about the effects of policies and programs in supporting a better quality-of-life, including effects on equity and sustainability.

Who will gain from the new system?

Policy makers and analysts will have much better evidence to support planning, budgeting, evaluation, and accountability.

  • In the shorter term, the production of more disaggregated data for small population groups will result in major gains in understanding the needs and circumstances of people with diverse characteristics and of the effects of programs in addressing those needs.
  • Over the longer-term, as the system becomes mature, new evidence will be created that shows how the many social services and income support programs of different orders of government work together in influencing the subsequent lives of the individuals being served.

Some of this causal information exists today for income support programs, but there is little evidence about the effects of service programs or how different programs act in combination.

Program designers and administrators will have much better evidence about the characteristics of the intended audience for their programs, the actual effects of existing programs and which program designs are likely to work best in the future.

  • In the shorter term, the main payoffs will come from better evidence on which programs are working best for people in diverse circumstances and in making adjustments to program designs or administrative process to better meet those needs.
  • As the system matures, there will be steady improvements not only in the evidence about what programs designs and processes are working best on average, but also which components of the program are working best for whom, and which components need improvement.
  • Even more important in the longer-term will be new predictive information that can be produced for some types of programming. This will show which kind of program interventions are likely to work best for particular clients during the time the referrals are being made. For example, a counsellor or case manager would be able to use this kind of information in advising clients or referring them to additional services.

Researchers in many disciplines will greatly benefit from the transformative increase in the amount of data that will be available, particularly longitudinal data at the micro level which will allow analysis of the factors that affect the course of people’s lives in their social and geographic contexts.

Businesses and organizations will, of course, gain from the better outcomes that will result from more evidence-driven policies and programs. In addition, they will be able to access more disaggregated information for marketing purposes, namely the characteristics of the potential audiences for their products and services.

Individuals will similarly see their lives improved by programs and policies that will be shaped by more disaggregated evidence that focusses on quality of life. This will be especially felt by people in visible minority groups or in particular parts of the country where needs may be quite different from those identified by more aggregated statistics that produce only broad averages.

In some areas there will be, over time, very direct uses of social statistics by individuals. The ‘what is likely to work best’ information that will help counsellors and case managers will also be available directly to individuals on interactive websites. These will provide information on the options that are available for making decisions in many of the social domains of life and the probabilities of success associated with different options, tailor-made to the individual’s circumstances and needs.

Factors that are driving change

The main factor driving these changes has been increased demand for statistical information that cannot be easily met by further investments in traditional single-source source collection and processing methods. There are demands for:

This kind of information requires detailed descriptions of social programs and services. It requires detailed micro-level measures of the changes that are taking place over the lifecourses of individuals, with individuals seen in their differing social, economic, and geographic contexts.

That information cannot come only from traditional survey and census sources taken in isolation. The parallel driver of change has been the current availability of new data sources, new processing power, and new statistical methods that can be leveraged to fill these new demands in ways that are affordable, minimize response burden and protect privacy and confidentiality.

The ‘Bottom-line Q&As’ in Box 1 provide an overview of the rationale for the changes and how different audiences will benefit.

Reform is already well launched

In response to these changing demands – and to exploit the potential of new ways of doing business - the field in Statistics Canada with prime responsibility for the collection and release of health, labour and other social statistics launched a future directions exercise as part of the work on modernization that is taking place within Statistics Canada as whole.

Important first steps were taken in 2019, including a reorganization that separated responsibilities for dealing with the substantive content of the statistics from responsibilities for collecting the statistics. This was a necessary step in the shift to a multiple-source model.

COVID-19 accelerated development. Plans for improving responsiveness and flexibility in meeting the needs of users were put to immediate test with arrival of COVID-19 in 2020. The response of the statistical system in providing Canadians with the information needed to deal with COVID-19 has been astonishing, marked by innovations that would have been unbelievable only a few years ago. This was accomplished while learning how to deliver in a completely changed work environment with people working from home. These innovations not only met current demands, but also represented important steps in the desired longer-term directions.

The energy devoted to COVID-19 did, however, mean that some longer-term plans were put temporarily on hold, including the articulation and communication of a clear statement of where the social statistics system was headed. This document attempts to fill that gap.

Budget 2021 provided funds and reinforced the directions. The budget allocated funds to Statistics Canada to take concrete action along the lines initiated in 2019. Particular emphasis was placed on timely, disaggregated evidence to support policiesand programs. It also stressed the importance of measuring the multiple dimensions of quality of life and of developing ways of using these measures to shape policy-making.

The role of a paper that shows where the system is heading

Many users and partners are not aware of the huge potential for improvement in social policies and in social decision-making that will be opened up by the shift that is underway.

In the absence of a shared understanding of future directions, both inside the statistical agency and among users and partners, it seems unlikely that the frenetic pace of change in 2020 can be sustained. Yet there is still much work to do. It is important to understand that, despite the considerable recent progress, we still do not have a system of national quality-of-life statistics in the ordinary sense of the word ‘system’.

1.2. The system of national quality-of-life statistics in summary

The chapter will end with a series of text boxes that compare the system that is being built once it is reasonably mature, perhaps after some 5 or 10 years of development, with a traditional single source system that has been the norm until recently.

Box 2 sets the context. The shift to a multi-source system in the statistical system is one dimension of a much larger shift in government towards better use of digital technology, openness, and data management. The Box shows:

Box 3 describes the outputs of the mature system, the statistics that will be produced.

Box 4 describes the inputs to the mature system, i.e., the sources of incoming data.

Box 5 describes the processes that convert the inputs to outputs.

Figure 1 is a logic model that shows how inputs, processes and outputs are inter-related and how they work together to produce the outcomes of the mature system – the uses to which the statistics will be put.

1.3. Managing constraints and risks

Chapters 2 to 5 will elaborate on the systematic description of the future system as set out in Section 1.2 and in the boxes at the end of this chapter.

The final Chapter 6 will address the main constraints and risks that will be faced during implementation and how they can be managed.

The largest resource constraint is the relatively small number of people in both Statistics Canada and in the user community with the skills needed to make full use of the potential of systems where data is drawn from multiple sources.

The largest risk would be a failure to maintain and reinforce the trust and confidence of the users and suppliers of statistics as we move to quite new ways of doing business.

1.4. Assumptions about an uncertain future and the need for flexible planning

In the real world, the statistical system will never be mature. It will always continue to evolve and will always be messier than some hypothetical model used in planning exercises. And, of course, the future cannot be predicted with any certainty, as the COVID experience has so clearly demonstrated. For this reason, an ongoing annual planning process is envisioned. In addition to up-dated statements about longer-term directions, the planning process will also include concrete action plans covering the following three years.

This paper is a draft of the long-term directions as they might appear in the 2022 version of such a strategic plan.  It examines the system as it might exist about five or ten years into the future based on a continuation of existing reform initiatives. Given that time frame, this 2022 version assumes:

Predictive analytic tools are, however, included. It might be thought that the use of predictive analytics to provide ‘what is likely to work best’ data belongs to future versions of the plan given the current lack of the use of this tool in official statistics.
What works initiatives refer to the sharing of evidence about the effectiveness of programs, often based on evaluations and random control trials. The UK, for example, has a What Works Network covering different programs areas. The ‘what is likely to work best’ initiative in this paper is quite different. It describes an approach that embeds predictive analytics right into the operational decisions made on individual cases, at the time those decisions are being made. See Section 3.4 for further explanation.
However, it assumed that the statistical system will be given the mandate to work in partnership with service providers on experimental work over the coming years to create portals that will allow the individual employees who provide the services to have real-time access to information on:

It is included in the 2022 plan because:

Box 2. The context: digital government, openness and managing data as a resource

The growing use of administrative data for statistical purposes is one dimension of current efforts in many countries towards open government, increased use of digital technology and strengthened data management techniques. 

  • Data is being seen as a resource to be carefully and transparently managed, with data holdings made available to the public, consistent with privacy and confidentiality.
  • A theme is that each individual item of data collected, each sliver of information in the system, should be fully used including in multiple applications in and out of government.
Progress has been slow until recently

The potential power of treating information as a shared resource has been discussed for decades, but progress has been slow.

  • Cultural factors. Policy and programs have remained in traditional silos that create cultures and rules that are hostile to data sharing.
  • Some preconditions missing. It is difficult to create the preconditions for effective data sharing and the management of data as a resource, for example:
    • Standard ways must be found to catalogue data holdings using consistent concepts and definitions and to provide user-friendly access to that data.
    • IT and data management solutions must be found for allowing the same data items to be used for multiple purposes, without infringing on privacy.
    • The digital literacy skills of those who collect, manipulate, and use the data must increase.
    • Trust must be built among the individuals and organizations that provide the data initially, including confidence there will be no negative repercussion for privacy and confidentiality.
  • Modest payoffs. Early efforts were often concentrated on open government themes – providing the public with access to data files that were designed to support existing internal operations:
    • One goal was greater transparency in government and a resulting increase in public oversight. Progress here has been slower than hoped for, partly because of the privacy and confidentiality restrictions placed on access to data – and because the data in question was not designed to support public accountability applications.
    • There have been some successes in the related goal of providing users with data that they could use in other applications. However, again, the resulting benefits have been far less than many had anticipated.

In reality, data sets designed for internal operational purposes are often not directly useful for other purposes and there has been a lack of capacity to make the adjustments that would be needed to increase their relevance.

Current initiatives address these weaknesses

Lesson have been learned. Recent initiatives do address the preconditions. They focus on internal data management – the flow of data within the whole system – as well as providing public access to existing data. In Canada, these initiatives include:

  • The recent Data Strategy for the Federal Public Service.
  • The creation of a Minister for Digital Governance and departmental Chief Data Officers.
  • Open government initiatives that include all orders of government, including municipalities, with links to similar initiatives in other countries.
The role of the Statistical Office

Statistics Canada will play a pivotal role in next phases of digital government and open government reform. 

  • In supporting the technical pre-conditions. Statistics Canada has world-leading expertise to help build the capacity to:
    • Develop the common concepts and classification systems that describe, catalogue, and allow access to data.
    • Develop the techniques that allow data items from different sources to be used in combination for a range of purposes, while still protecting privacy.
    • Support increased digital literacy.
    • Using the data from multiple sources to make major improvements in the statistics available to support improved quality of life as described in this paper.
  • In building trust. Statistics Canada is uniquely situated to provide a safe and trusted place where data from different sources can be combined using techniques and processes that avoid concerns about privacy and confidentiality. This trust has been strongly reinforced by recent action in Statistics Canada:
    • A world-leading framework has been developed (the Necessity and Proportionality Framework) for balancing use, privacy, and response burden.
    • Sophisticated tools now exist to ensure privacy protection, using techniques that split out individual identifiers from the data that describe those individuals. No master file of identifiable records is created.
A virtuous circle
The role played by Statistics Canada supports the sharing of data throughout the system. Increasing amounts of data from administrative files will flow into the statistical system along with data from survey respondents. This will result in more useful statistics both for the bodies that provided the administrative data and for the respondents to surveys. That, in turn will create a climate that will encourage even greater data sharing as the system evolves.

Box 3. Comparison of statistical outputs in a multiple-source and in a traditional single-source model

Audience for statistical outputs

In the traditional single-source model, there are two main audiences for social statistics:

  • A general audience with an interest in trends in the social domains of life where the goal was to provide a basic socio-economic and demographic profile of Canadians. These data provide a consistent common denominator for applications ranging from indicators of social wellbeing, to measures of the extent of social problems, and to the marketing activities of firms.
  • Analysts and researchers dealing directly or indirectly with social policy and the determinants of social problems.

In a multi-source, quality-of-life model, there will be the same two audiences, with the following differences and additions:

  • The policy applications will be more finely tuned to practical program agendas. They will be focussed on the effects of programs, and combinations of programs, on the lives of citizens in diverse situations and with diverse needs. The focus expands from trends to also include equality and sustainability considerations.
  • A new audience will be served consisting of persons and service providers who will be provided with information on the probabilities of ‘what is likely to work best’ in making decisions in the main domains of social life.
Availability of tabulated outputs
  • In the traditional model, users once had to rely on paper publications reporting on survey results, often produced with considerable time delays. More recently, users have much more flexible and timely access to data tables using Statistics Canada’s  website. However, the tables on that site are still organized by individual survey and census sources.

In the multi-source, quality-of-life model, there would be similar access via a single interactive web site. However, the site would also contain tabulations based on linked data files, would contain synthetic data, and would be increasingly structured around data content rather than data source.

Availability of microdata
  • In the traditional model, there are a range of methods of access to single-source microdata files – with varying restrictions to access related to privacy protection. Microdata is mainly seen as an additional product that users must pay to receive.
  • In the multi-source model, access restrictions related to privacy protection will be at least as strong. However, microdata will be seen as a core, free product of the system, with the statistical system itself providing standard anonymized micro databases drawn from different sources, including synthetic data of high quality. Many of these standard data bases would contain lifecourse and longitudinal data.
Indicators and dashboards
  • In the traditional model, at least in Canada, systematic production of indicators and dashboards is mainly seen as the responsibility of users, not of the statistical agency itself.
  • In the multi-source model, quality-of-life indicators and policy-relevant dashboards would be an integral feature of the website that provides access to aggregated and microdata.
Probability of success information
  • In the traditional model this output did not exist.
  • In the multi-source model,
    • Service providers, including those who refer people to particular kinds of services, would have access to a site (created in partnership with the service agency in question) that provides them with tailor-made information on which interventions would likely work best for a particular client, along with information about the ways of accessing the intervention in question.
    • Individuals seeking information – such as information related to job search or training options – would have access to similar interactive sites (again maintained in partnerships with responsible subject-matter agencies) that would provide information on the options that were open – and likely the success of each.

Box 4. Comparison of inputs in a multiple source model with those in a traditional model

Vastly more input data would be collected:

  • As a result of the growing demand for disaggregated, responsive statistics.
  • As a result of the availability of much more detailed, relevant data from administrative and other sources.
  • As a result of the computing power to manage vastly more data – without creating response burden and while maintaining privacy and confidentiality.
A change in the mix of input sources
  • Censuses and traditional surveys such as the Labour Force Survey are likely to collect either about as much data as they have in the past, or possibly less as a result of response burden issues and the availability of alternative data from administrative sources and from the new, more responsive surveys described below.
  • More flexible surveys. A suite of much more flexible survey instruments would be in place whose role to respond quickly to current demands and to provide information on topics that cannot be well covered by other sources such as administrative data.
  • Administrative data sources would play a much larger role, particularly in the coming five to ten years. Statistics Canada will, partly in consequence, play a more active role in the overall management of data throughout the federal system and nationally.
  • Hybrid vehicles that are designed from the outset to gather data from both surveys and administrative sources are already in place and could well become more common in the future, perhaps allowing designers to draw on the strength of both sources in ways that have not yet been fully explored.

Non-traditional sources such as social media or personal monitoring devices will almost certainly play a larger role in the longer-term, but not likely in the next 5 or so years since these sources are not representative or fully developed. It is difficult to predict when these new technologies will start playing a large role, but their effects are potentially transformative. Experimental and developmental activities are likely to dominate in the coming several years.

  • Feedback loops from individuals participating in ‘What is likely to work best’ initiatives will play a central role in a mature system, providing a major source of data that can explore causal relationships in policy analysis. This will take years to fully develop, however.
Linked data would become the new norm

The defining feature of a multiple-source system is its capacity to draw on data from multiple sources including combinations of data that were not on any one of the original data sources.

  • Many of the data sets that are used in these applications contain overlapping information which allows the creation of synthetic data and provides quality controls.
Stronger emphasis on longitudinal data

More of the input data will be longitudinal (following the same individual over time), arising mainly from administrative data sources. The traditional model relies on longitudinal surveys.

Inclusion of data about individuals based on their geographic location

Data about geographic areas would be calculated by aggregating the characteristics of the individuals who live and work there. In addition, economic and environmental information about those areas would be obtained from exogenous sources.

  • This would allow analysis of data from both sources to be carried out at the level of the geographic area, a form of analysis that is already in place and steadily improving.
  • More important here, it would allow economic and environmental data to be imputed to individual records based on where the individual lived, enabling a potentially powerful new form of analysis that sets individuals in their temporal and spatial environment.
Consistent data about social programs and services
  • In a mature system, some years in the future, highly disaggregated data about specific, micro-level program and service interventions would be collected and used in conjunction with the individual data collected about the participants in those programs. This would allow an examination of the outcomes of particular social programs and of the combined effects of the programs of different orders of government.
  • This contrasts with traditional macro approaches that examine systems as whole, such as the income security, health, education, social services, and justice systems. Information is collected about the volumes of people served and costs – but, in the case of services, without a direct link between the services and individuals who were served. This capacity for linked micro analysis already exists for income transfer programs.

Box 5. Processing and analysis in a multiple-source, quality-of-life microdata model

Conceptual Framework: standard definitions at the micro-level

When input data are drawn from many sources, there must be an independent set of consistent concepts and definitions that describe the individual data items that are being measured.

In the social area, standard definitions are needed to describe, at a fine level of detail, the characteristics of:

  • Residences: where people live, including the type and quality of the residences and their geographic locations. (Other forms of real estate are also be described, but in social statistics the main focus is on dwellings.)
  • Peoples’ characteristics: including basic demographic and cultural identifiers, and snapshots of status at different points in time, including skills status, employment status, health status, income and much more. These snapshots are also known as stocks.
  • Peoples’ activities: how they spend time in a day, including a full range of social and economic activities and economic transactions. These are also known as flows.
  • Organizations, including businesses which are described using concepts derived from the System of National Accounts and social programs and social services which are based on an input-process-output model.
  • Geographic areas, including environmental characteristics.
Standard ways of showing relationships

The concepts used must allow analysis of the important relationships among the topics above. Examples include:

  • People living in the same residence, including household relationships.
  • The activities (known as flows) that link people with each other and with organizations (including both economic transactions and flows of services and information).
  • Geographic proximity of residences, organizations, and commuting patterns.
  • Socio-economic characteristics of residents in different geographic spaces.
  • Social capital defined in terms of network relationships.
Standard ways of showing and describing changes over time

There are four main ways of describing change over time:

  • Longitudinal or recall data, where information for the same individual is gathered at different points in time, or where the individual is asked to recall information about his or her status or activities at earlier periods.
  • Stock and flow data, where stocks at a later period of time equal the stocks at an earlier time, plus or minus the net flows that took place between those periods.
  • Lifecourse concepts, where the point-in-time flows are coded as events. This, in turn, allows analysis of the transitions and trajectories that make up the main domains of life, such as life in the family, in work, in learning, or in caregiving and care-receiving.
  • Hypothetical changes which are the changes that might have occurred in aggregate systems based on ‘what if’ analysis using microsimulation or other analytic tools. When applied to the micro-level probabilities of success of individual decisions, ‘what is likely to work best’ predictive analytic tools are used.
Combining input data

Incoming data from the multiple sources identified in Box 4 are combined to produce the new responsive, quality-of-life statistics using new principles and new tools.

  • The principles that guide the choice of data sources. The principles used in decisions on which sources and combinations to use in a particular application is known as the Necessity and Proportionality Framework. Ittakes into account the importance to the nation of the statistical outputs to be produced, the need to minimize response burden and to protect privacy and confidentiality.
  • Tools to combine data, ensure its quality and protect privacy. A rich set of technical tools is now available to link data from different sources when required, to create synthetic variables, to ensure quality, and to protect privacy, including:
    • Traditional statistical sampling tools.
    • Tools to correct errors and impute missing data.
    • Registry tools developed in Scandinavia and elsewhere.
    • Linkage tools that do not disclose personal identifiers and tools for ensuring that statistical outputs are anonymous.
    • Microsimulation and other tools to create synthetic data.
    • More generally, tools that make use of the multiple accounting devices (double entry bookkeeping) built into the standard micro-level concepts and definitions.
  • Tools of analysis. A wide range of familiar and new analytic tools are used to analyse the rich microdata, including:
    • Static and dynamic microsimulation.
    • Geomatic tools for spatial analysis.
    • Artificial intelligence and machine learning tools.
    • Lifecourse and longitudinal analytic tools.
    • Mixed methods, including new opportunities for qualitative research.
    • Interactive analytic tools, including visualization tools.

Figure 1. A logic model of a mautre multi-source system of national social statistics

Description for figure 1

The title of the infographic is "A logic model of a mature multi-system of national social statistics".

The infographic presents a series of text boxes organized under four sub-headers.

The text boxes are connected by arrows that are either solid or dotted. A solid arrow represents a flow of data or descriptive information about that data. A dotted arrow represents a flow of advice or direction.

The first header is “Inputs”. Under this header, a first text box reads “Governance, outreach, and advice from experts and users (Links, not shown, to all parts of the system)”.

A second text box under the header “Inputs” reads “Requests to collect new statistics”.

A third text box under the header “Inputs” reads “Data inputs about individuals from surveys and censuses, from admin records, from non-traditional sources”.

A fourth text box under the header “Inputs” reads “Data about social programs & institutions”.

A fifth text box under the header “Data about geographic areas”.

A sixth text box under the header “Inputs” reads “New data resulting from the direct uses by individuals”.

A seventh text box under the header “Inputs” reads “A New data created by internal manipulations of overlapping data sets*”. The asterisk is associated with the note: ‘Note that the input “New data created by internal manipulation” is technically part of the processing function.’

The second header is “Processes”. Under this header, a first text box reads “Maintaining the Conceptual Framework (Consistent concepts, definitions, standards, and protocols)”.

A second text box under the header “Processes” reads “Conducting surveys and censuses”.

A third text box under the header “Processes” reads “Producing statistically sound versions of original data sources”.

A fourth text box under the header “Processes” reads “Maintaining internal databases to store and manipulate the data”.

A fifth text box under the header “Processes” reads “Maintaining the Data Documentation System. (Documenting the quality of the data, including deviation from the standards in the Conceptual Framework)”.

To the side of the five text boxes just described, there are two additional text boxes. The first one reads “Internal analysis, research, and methods: releasing new statistics; analysing social data; statistical and data processing methods; data standards; privacy/trust standards; international linkages”.

The second text box reads “Planning, budgeting, resourcing, and performance measurement functions. Note: links (not shown) are to all processing functions”.

The third header is “Outputs”. Under this header, a first text box reads “Common access site: indicators of social wellbeing; dashboards; supporting data; analytic papers”.

A second text box under the header “Outputs” reads “Data accessed by users: statistically sound source data (public use micro files); aggregated data; standard multi-source micro-databases; specific use applications”.

A third text box under the header “Outputs” reads “Partnered ‘what is likely to work best’ databases”.

The fourth header is “Outcomes (uses of statistics)”. Under this header, a first text box reads “Better policy and program analysis”.

A second text box under the header “Outcomes (uses of statistics)” reads “Supporting public and private decision-making, including new uses directly by citizens and service providers to inform life choices”.

A note at the bottom of figure 1 reads “Note: The chart shows the logic model for the entire system of national quality-of-life statistics, not only those parts that are within the responsibility of the field responsible for social, health and labour statistics.”

2. Outcomes: the purpose and scope of the evolving system of quality-of-life statistics

Section 2.1 discusses the first of the two outcomes identified in the logic chart in Figure 1 – supporting policies and programs.

Section 2.2 provides a similar description of the objective of supporting decision-making.

Section 2.3 proposes a possible statement of vision and mission.

2.1. Supporting policies and programs

Providing evidence to support policies and programs has always been the main goal of national social statistics.
An extended statement of the outcome: supporting policies and programs

The national system of quality-of-life statistics will provide readily accessible, disaggregated and timely information on:

  • Trends, levels, inequalities, and sustainability risks related to wellbeing and the quality of life, both overall and for different population groups — and for the various ways in which social policy responses are organized such as health, income, or learning.
  • The causes of these levels, trends, inequalities and sustainability risks and the role of existing government policies, and of alternatives to existing arrangements, in improving wellbeing.

Canada has always had a reputation for world-leading excellence in statistics. When it comes to its main traditional goal of supporting social policies and programs, the agency can build on strength.

However, there is also a long way to go to fully respond to the policy demands described in Chapter 1. This can be seen in the sidebar which provides an expanded version of the outcome that is summarized in Figure 1. While this wording is illustrative only, it captures much recent thinking in Canada and internationally.

The extended statement suggests that the mature system should be designed to improve on existing arrangements in the following areas:

Implications for the design of the multi-source system
  • A monitoring capacity that is timely and responsive, that covers all areas of social wellbeing, and that provides indicators of trends, levels, inequalities, and sustainability.
  • A capacity for causal analysis based on detailed descriptors of individuals using microanalytic, longitudinal tools.
  • Inclusion of consistent, detailed descriptors of the outputs of social programs and social services and of their beneficiaries.

The consequences for the design of the new multi-source system are shown in the second sidebar above.

2.2. Supporting public and private decision-making

The second purpose or outcome of the statistical system is to support better decisions by a wide range of actors, public and private.

Public decision-making

The improvement in public decision-making will occur because of:

Private decision-making: by organizations

Currently, most direct uses by organizations relate to marketing. Demographic, socio-economic and health data often support the marketing functions of firms and of institutions that provide health, education, and other social services. National social statistics provide a kind of common denominator that describes the needs and characteristics of potential recipients of the products and services of those organizations. Projections of key demographic variables are particularly useful in planning initiatives in both public and private organizations.

Private decision-making: by academics and researchers

Much academic research is, of course, directly or indirectly, related to the policy and program uses above. As well, a wide range of academic social research, especially research that examines the longitudinal dimensions of life, will benefit from the increased granularity and integration of the statistics produced by the multi-source system. The addition of lifecourse concepts to the micro-level conceptual framework will facilitate mixed qualitative and quantitative analysis. Many new tools of access and analysis are being developed to support external researchers (in the research data centres and through a range of virtual tools including the Data Analytics as a Service Platform initiative) and authorized researchers in other departments (e.g., Virtual Data Labs).

Box 6. Supporting decision-making by citizens and service providers

The rhetoric

A familiar, long-standing theme in the social policy literature is the importance of moving away from systems that are designed to make life easier for those who deliver highly siloed programs and services and to move towards systems that are centred on the needs and circumstances of those who use the programs and services.

  • In health, there have been decades-long calls for shift from disease-centred services to patient-centred services with an emphasis on population health and wellness. While progress has been limited, a recent C. D. Howe article shows how the massive COVID-19 shift to virtual care provides an opportunity to embrace a more patient-centric and cost-effective healthcare system.
  • Educators wish they could move from one-size-fits all classrooms to student-centred learning. Again, the possibility exists that experience with COVID-driven virtual learning can provide an opportunity to make real progress in the years ahead.
  • Case managers and counsellors aspire to coordinate flexible packages of social, health and learning interventions tailored to individual needs – even if the components of such packages were not designed to be compatible.
  • Police and social services are asked to coordinate their activities at the community level but often without the resources to do so effectively.
  • There have been endless calls to make the existing labyrinth of complex and overlapping income security programs more accessible – with the system itself taking the lead in putting together the most advantageous packages of benefits for recipients, rather than leaving potential recipients with the responsibility for determining eligibility.
The reality

Despite much effort and some successes – and the silver lining opportunities suggested by the pandemic – social programs and service providers still operate largely in separate silos organized in ways that are designed to support accountability and efficiency within each silo.

  • The obstacles to moving forward in the area of income support programming are largely jurisdictional and cultural.
  • In the service area, however, a fundamental obstacle in moving to a client-centred model is the lack of adequate information on which kinds and mixes of services and supports work best in improving quality-of-life at the level of individuals.

In the absence of such hard evidence, it is not possible to develop cross-cutting, client-centred delivery systems that are accountable, efficient, and effective.

Fundamental improvements are becoming possible

New data sources and predictive analytic techniques are opening the possibility of making real progress in providing the statistical evidence that is needed to support the shift to client-centred systems of health, education, and social services. It is now possible to foresee the day when tailor-made statistical information about ‘what is likely to work best’ can be provided directly to individuals and service providers:

  • As they make choices in the social domains of life and at the time when decisions are being made.
  • Based on probabilities of success calculated from what worked best in similar circumstances in the past.

In some areas, such as training and labour market programming, the needed multi-source microdata from has been available since the late 1990s although jurisdictional siloes have prevented its use except in evaluations.

  •  In other service areas, it will take years to develop the needed micro multi-source information in the detail required to calculate reasonable estimates of probable success.
  • In all cases, success will depend on collecting the needed microdata and on launching demonstrations and pilots over the next several years to develop the tools for using this detailed information in a range of ‘what is likely to work best’ applications.
With high potential payoffs

The payoffs from such a gradual, experimental process will eventually be high:

  • To the individuals themselves.
  • To those who design and administer social programs who will also have access to micro-level information on what is likely to work best.
  • To society, as the result of the improved operation of health, labour market, educational and other social systems, including reductions in the cost of government remedial programs.
  • To the statistical system itself since more people will have direct gains from its operation with increased trust and support and with powerful new data feedback loops that will continually improve our capacity to learn what is working best.

Private decision-making by individuals

Currently, most uses by individual Canadians are indirect. Some individual Canadians may consult the Statistics Canada website to find background information related to personal decisions. However, such direct uses are almost certainly small when compared with indirect benefits from the policy-related statistics described above. These uses will result in better policies and programs, and a more informed media. This, in turn, will have positive important effects in improving people’s lives and in the operation of institutions.

Direct uses by individuals are likely to increase. The importance of directly supporting individuals has been well recognised. For example, as seen in the sidebar, Statistics Canada’s current Modernisation Initiative highlights the importance of supporting quality decision-making by citizens.

New emphasis on serving individuals

Two of goals of modernization are specifically cast in terms of better serving individual Canadians:

  • Produce more timely and responsive statistics – ensure Canadians have the data they need when they need them.
  • Develop and release more granular statistics to ensure Canadians have the detailed information they need to make the best possible decisions.

However, the reality is that a statistical system based on single-source statistics has, in practice, provided little information that is of direct and immediate use in helping individuals make decisions.

In a multi-source system, on the other hand, a wealth of information is available to support a wide range of individual decisions using predictive analytic tools. Although this will require much development work, over the long-term, these uses could become a central outcome of the system of social statistics.

As elaborated in Box 6, this shift to serving Canadians directly responds to a long-standing aspiration in most areas of social policy, but one where little progress has been possible because of a lack of supporting statistical data.

The largest long-run payoffs will be from ‘what is likely to work best’ data. The shift to a multi-source system will make it possible to provide individual citizens and service providers with real-time information about which type of social or health intervention is likely to work best, based on what has worked best in similar circumstances in the past.

2.3. A statement of vision and mission – and a new title

Developing a statement about the purposes and scope for the multi-source system of quality-of-life statistics could help coordinate implementation activities. It could be useful in building understanding among users and partners.

A draft of such a statement of vision and mission is shown in Box 7. The statement is illustrative only and intended for planning purposes. A final version would be based on consultations and expressed with greater elegance.

Below the statement, the box provides an explanation of the factors that were taken into account in its drafting, including thoughts on the scope of Statistics Canada’s mandate in the social statistics area and the rationale for replacing the phrase social statistics with quality-of-life statistics.

Box 7. Draft statement of vision and mission for the National System of Quality-of-Life Statistics


To provide the statistical evidence that people, organizations, and governments need in order to work together in improving the quality of life in Canada.


The national system of quality-of-life statistics will support the development and evaluation of public social policies and programs by providing readily accessible and timely information on:

  • Trends, levels, inequalities, and sustainability risks related to wellbeing and the quality-of-life, both overall and for different population groups — and for the various ways in which social policy responses are organized such as health, income, or learning.
  • The causes of these levels, trends, inequalities and sustainability risks and the role of existing government policies, and of alternatives to existing arrangements, in improving quality of life.
It will support the decision-making of governments, businesses, social service providers and individual citizens by providing timely and accessible information on the social characteristics and activities of representative Canadians, and how these are shaped by social programs and interventions.

Considerations that shaped the drafting of the statement of vision and mission

Consistency with the larger Statistics Canada mission

The two outcomes of the proposed multi-source model are the same as the first objective of Statistics Canada as seen in the statement of mandate and objectives found on the agency’s web site, namely:

  • To provide statistical information and analysis about Canada’s economic and social structure to:
    • Develop and evaluate public policies and programs.
    • Improve public and private decision making for the benefit of all Canadians.
Considerations in wording a Vision and Mission Statement
  • Wording that is consistent with a gradual approach to providing direct ‘what is likely to work best’ statistics.
  • Wording that highlights the importance of partnership and inter-sectoral horizontality in the statistical system and in the uses of statistics.
  • Wording that does not suggest that the national system has a monopoly on social statistics.
Scope and limits

Related to this last bullet, it may be useful for the mission statement to signal the main roles that the national system will play, namely:

  • Statistics related to nationally representative individuals that describe their quality-of-life and the factors that affect that quality-of-life.

The federally funded, national system should have its focus on statistics that apply across the country and should not arbitrarily favour any one province or locality.

  • Statistics that describe social programs and social services.

The wording of the mission statement should, however, not preclude the following exceptions:

  • Demonstration projects or pilot studies which are not nationally representative but will usually have the potential to be scaled up to the national level if resources and demand is there.
  • In some cases, the needed representative data is simply not available and non-representative proxies must used, as in crowd-sourcing initiatives.
  • Other exceptions. There will always be cases where only the national statistical system has the expertise and capacity to undertake important projects that are not intended to be national in scope or to address social issues that are not based on data about individuals. In these cases, decisions can be taken at a senior level to make exceptions and undertake such projects on a cost-recovery basis.
Title – What we call ourselves

The use of the phrase ‘quality-of-life’ statistics, instead of the more familiar ‘social statistics’ does not signal any change in the scope of what is included in terms of subject-matter areas.  Rather it conforms to language used in Budget 2021 and is intended to signal the new focus on a micro-level understanding of the determinants of the quality of life at the level of the individual.

The change might also resolve some current ambiguities in the use of the term ‘social’:

  • The term ‘social statistics’ sometimes refers to all the domains in society that are not economic or environmental statistics – even though, logically, ‘society’ is the overall heading with the economy being only one dimension of society.
  • Moreover, even within the traditional area of social statistics, the word ‘social’ is sometimes used only to cover residual elements. For example, the title of the Statistics Canada organization, the Social, Health and Labour Statistics Field, logically implies that health and labour statistics are not social statistics – even though the real meaning is ‘social statistics including those related to health and the labour market’.
  • Additional ambiguity arises with the commonly used term ‘social and health statistics’ which sometimes means ‘social statistics, including population health statistics’ but which sometimes has a broader scope and refers to ‘social statistics including those related to population health and to the operation of the health care system which centres on diseases’.

3. Outputs: statistical products

This chapter deals with outputs – the statistical products of the proposed national system of quality-of-life statistics – that were identified in Figure 1.

3.1. Progress in building the interactive web site

Box 3 described the Quality-of-Life information on the Statistics Canada website:

Rationale for releasing on a common website. Continuing the present practice of using a  common website for all releases of the latest social statistics, such as the latest Labour Force Survey data or the latest census tables, has several benefits:

Rationale for including systematic social indicators and policy-relevant dashboards. Box 8 describes the continued interest in high level indicators that monitor progress, and point to possible problems, in achieving a high quality of life. In part, such systems are valued because they hold promise of compensating for the highly siloed nature of most current social statistics and of the policies they support. However, Box 8 indicates that indicators are not much used in practical policy-making and proposes a better approach based on the micro-level multi-source model.

Box 8. A different role for social indicators and policy dashboards

Interest in social indicators remains high

For many decades, there have been efforts to compensate for the siloed nature of social statistics by developing social indicators that provide a multi-dimensional measure the quality-of-life in a society.

The 2009 Stiglitz-Sen-Fitoussi Report gave much impetus to the social indicator movement, including the development of the OECD’s Better Life Index. Currently, some governments, including Canada, are exploring ways of casting their planning and budgeting process in a quality-of-life framework.

Content of most traditional social indicator initiatives

Typically, social indicator initiatives consist of a manageable number of indicators that attempt to summarize key changes in the different areas that are thought to be most related to quality-of-life – health, learning, income, inequality, sustainability, perceptions of happiness.

  • Arranged in several tiers, with the most summary indicators in the top tier, with further breakouts at a second level and even lower-level tiers.
  • Since there is no consensus on what constitutes wellbeing, much creativity and consultation goes into their creation.
Their weaknesses

Despite their theoretical appeal, traditional social indicators have had relatively little impact on social policy-making for the following reasons:

  • The lack of consensus on what constitutes wellbeing.
  • A lack of symmetry between the content of the indicators and the agendas under which policy discussions are framed. Policy decisions are seldom taken at the high level of generality found in typical social indicator tiers.
  • Traditional indicators do not provide quick access to the supporting data to facilitate follow-up analysis.
  • The often-long time delays before indicators are released. They are often conceived as annual paper reports rather than interactive, current web sites.
The proposed solution

The indicators on the interactive web site would:

  • Provide context for, and a quick gateway to, the much lower-level dashboards of indicators that are of actual use in policy contexts.
  • They would not be cast as an official top-down way of defining the components of wellbeing or the constituents of a high quality-of-life. Rather they would be seen only as those that are chosen by the editors of the web page to be those of greatest current interest or that corresponded with clear policy goals such as meeting poverty reduction targets.
  • Interested users would be encouraged to develop high–level sets of indicators that mirrored their own interests.
    • These would not be seen as being in competition with the ‘correct’ official approach, but rather as a legitimate reflection of the different priorities seen by different groups.
    • For example, media stories might compare the indicators sets chosen by business groups interested in efficiency with those chosen by groups interested in equality, or sustainability, or disadvantaged communities.
A core of common indicators

At any one time, a core set of indicators would be highlighted to cover topics of highest current interest from a pan-Canadian perspective.

  • If, as suggested in Budget 2021, the Government of Canada were to adopt a standard set of quality-of-life indicators as part of the budget process, these would be highlighted.
  • A Canadian version of the OECD’s Better Life indicators would be an alternative.
  • The website would not publish a single overall measure of wellbeing. The emphasis is on moving downward to more policy-relevant dashboards, not upward into theoretical constructs.
Dashboards for policy uses

Lower-level dashboards of indicators would be created. Based on consultations, these would focus on a variety of current policy agendas – where decisions are actually being made.

  • The different dashboards would not need to have the same format and they could overlap. For example, different dashboards could be created to support groups working on different aspects of a same policy issue, for example:
    • Dashboards designed for those working on an issue in a particular geographic area.
    • Dashboards that focussed on effects on different population groups.
    • More complex dashboards could be referred to as Hubs, as is the case on the present Statistics Canada website.
  • These would be designed to provide a small handful of indicators to support analysis of the policy issue at hand, including interactive graphics where this makes sense.
  • The site itself would provide quick access to a large number of standard dashboards that are in demand.
  • As well, user would be invited to create their own dashboards. When they signed in the next time, users would be asked if they wished to see an updated version of those retained dashboards.
Why such flexibility in indicators and dashboards is possible
The proposed system can be flexible because consistent, integrated concepts are built into the underlying microdata. For example, when there is need to consider a change to a high-level indicator or a dashboard, it can simply be replaced by an indicator that comes complete with its own historic time series --  in a way that allows ready comparison of the old and replacement indicators.

Progress to date and the need for additional action

Many of the desired features of the proposed interactive web site already exist on the current Statistics Canada web site, which is becoming increasingly powerful.

3.2. Access to tabulated data: progress to date and next steps

Recent improvements on the Statistics Canada website, with the shift from CANSIM to CODR were major steps in the direction of flexible access to aggregated data. In addition, a plan should be developed in consultation with users to add new tables that:

3.3. Access to microdata: progress to date and next steps

There has been much progress in recent years in using microdata for analytic purposes and in creating powerful linked data bases for research purposes as seen in Box 9. More are available in the Research Data Centres including those that use administrative data to extend the life of past longitudinal surveys.

Box 9. Examples of linked and linkable datafiles for research purposes

Longitudinal Worker File is used to measure the evolution of layoff rates. It consists of data drawn from:

  • Record of Employment and T1 and T4 tax files
  • Longitudinal Employment Analysis Program which provides employment-related information of businesses that, in turn is based on tax files and the Statistics Canada Business Register.

Longitudinal Immigration Database, which combines data from the Immigrant Landing File, the Non-Permanent Resident Permit File and T1 tax data.

Canadian Employer-Employee Dynamics Database. It is a set of linkable files that provide both longitudinal matched data between employees and employers including data from the following files:

  • 7 income tax files, both personal and business.
  • Record of Employment from Employment Insurance.
  • Trade importer and exporter files
  • National Accounts Longitudinal Microdata file.
  • Longitudinal Immigration Database and the Temporary Residents File.

Education and Labour Market Longitudinal Linkage Platform. This is a particularly powerful set of linkable data bases that includes:

  • Registered Apprenticeship Information System.
  • Post-Secondary Information System which contains information pertaining to the programs and courses offered at an institution, as well as information regarding the students.
  • T1 tax records.
  • Canadian Education Savings Program.
  • With plans to add more data including the Canada Student Loans Program, Record of employment and census data.

Today, microdata can be accessed from sites that are designed primarily to support research: The Research Data Centres, the Public Use Microdata File Collection, the Data Liberation Initiative. Microdata can be analysed at a distance through Real Time Remote Access or special requests. As noted, many tools are being developed to support researchers using microdata, including the Data Analytics as a Service Platform and the Virtual Data Labs.

However, the next step – using these rich sources to create standard multi-source databases that can be accessed through an interactive web site – is not on current planning agendas.

A new product: standard multi-source microdata files

In a multi-source system based on microdata, an important new output will be a set of standard multi-source microdata files that can be used for variety of different purposes. Responsibility for creating these is currently seen as belonging to users of the statistical system, as opposed to being a major output of the statistical system itself. In a future multi-source model:

A consultation strategy will need to be set up determine the content of these standard databases. Early examples might draw on the linked research files in Box 9 or those in the Research Data Centres. The work could be done in partnership with the researchers involved.

3.4. Partnered ‘what is likely to work best’ databases

Box 6 identified the large potential payoffs from developing predictive tools that could assist in individual decision-making. This would be a new area for Statistics Canada, and it will be important to be clear on how this kind of tool relates to other more traditional evidence-based approaches such as ‘what if’ scenarios and ‘what works’ initiatives.

Implementation strategy

In the next several years, a realistic goal might be to:

Recent work in measuring the subsequent success of post-secondary students in specific academic programs suggests that early pilots in this area might be warranted.

The area of active labour market programming (training, job counselling, subsidized work experience, etc.) would be an obvious area to consider in establishing pilots if partnerships could be arranged with ESDC and one or two provinces:

While predictive analytics are widely used in many applications, the pilots would examine their role in official statistics, particularly in calculating and explaining the level of uncertainty that is inevitable in estimating probabilities of success at the level of particular individuals.

New ways would have to be found to communicate a range of such findings to the individual or counsellor in question, in simple language and in real time. Development work would also be needed to determine the degree of granularity in describing both the individual participants and the intervention in question. What is likely to work best on average may not apply to particular individuals with different needs and aspirations.

4. Processing: transforming inputs to outputs

This chapter deals with progress and next steps in the processing section of the logic model shown in Figure 1 – how the incoming data is processed into outputs. All of the processing boxes in Figure 1 will be touched, directly or indirectly, by the shift to the multi-source system. However, the main changes will be in three functions:

4.1. The conceptual framework

The goal is to develop a standard way of defining all the data items that are contained in the source files and that will be used in producing statistical outputs.

Progress to date and the need for additional action

The existing Statistics Canada policy on standards calls for:

What is yet to be developed is:

The paper describing the conceptual framework will also:

  • Outline the historic and intellectual background to the shift to a multi-source, micro-level framework.
  • Show how it is consistent with other statistical frameworks, including the System of National Accounts and environmental indicators.
  • Explain the relationship to the concepts and terminologies used in different disciplines, including the disciplines used by statistical methodologists, IT designers, and by the researchers in different scientific disciplines.
  • Provide examples of its use, including in policy applications that involve system-level ‘what works’ applications and individual-level ‘what is likely work best’ applications.

Separate paper on the conceptual framework. A paper is being drafted that will describe the conceptual framework. It will show:

4.2. Managing and processing the data

The shift to a multi-source system requires a new approach to the management of data. Data become assets that have value in themselves, available to be used in a variety of applications, independent of their source.

Progress to date and the need for additional action

On technical matters, there will be many challenges ahead, but Statistics Canada is well positioned to take them on:

There will, of course, be a need for more development, including new tools for creating the linked data needed to analyze the course of people’s lives in various social, spatial contexts. However, the need has been recognized, a solid start has been made, and future progress will likely follow pathways that have already been established.

4.3. Data documentation

Section 4.1 described the need for a standard set of concepts and definitions that is independent of data sources. There will also, of course, be a continuing need to document the actual data that comes into the system from various sources. That includes information on:

Progress to date and the need for additional action

Existing systems of documenting data (known in statistical circles as metadata) are strong and being steadily improved.

Statistics Canada is therefore well placed technically to meet the new documentation challenges that will arise in a multi-source statistical world.

5. Inputs: the multiple sources of data

The chapter deals with the ‘input’ set of boxes in Figure 1 that describe the incoming data. The Governance, outreach and advice from experts box is discussed in Chapter 6.

5.1. Responsiveness to requests for new statistics

In the past, requests for new data (for example to meet a new policy priority), would typically be cast in terms of requests for statistics from a particular collection vehicle. Often the request would wind up in the in-baskets of staff responsible for administering a particular survey or administrative data set. If the needed data did not exist, and if the priority for obtaining it were high enough (often as demonstrated by the willingness to pay for the collection of new data), a new survey would be designed, or additional questions added to existing surveys such as the General Social Survey.

That process did work and produced important new information. However, it was typically slow and cumbersome, often taking years before the required data could be obtained.

In a mature, multi-source system, the interactive web site discussed in Chapter 3 will allow much easier access by users to existing statistical information regardless of source. Compared with today, many more requests will be able to be filled from existing data, including the expanded CODR tables and standard multi-source microdata files.

When the needed statistics are not found on the website, new approaches are being developed that will greatly improve responsiveness.

5.2. Response burden and hybrid collection vehicles

Non-response rates are increasing in surveys and response burden is a serious concern. This is one of the key factors behind the increasing use of administrative records whenever possible. Longitudinal surveys have been at special risk as non-response increases during successive waves of the survey.

Hybrid collection vehicles

Part of the solution has been to replace some questions that were once asked on surveys with data obtained from administrative sources.

Do hybrid vehicles allow flexibility in survey design?

It is important in traditional surveys that exactly the same question be asked of all respondents. Does the existence of external control totals mean that questions (and introductory material) can be better tailored to the differing circumstances of respondents, with the same or better quality and reduced non-response? Can survey questions be formulated based on what we already know from admin sources?

This use of administrative data to replace survey questions both reduces response burden and increases quality, since often the administrative data is more accurate than that which respondents to surveys can recall.

It is therefore likely that future multi-source systems will make much greater use of what can be called hybrid vehicles that are designed from the outset to draw on both administrative and survey data. These hybrids can draw on the strengths of both sources and will often be much more powerful than either source taken individually. As indicated in the sidebar, this opens up the possibility of more flexible survey designs.

Will response burden really diminish?

There may seem to be a conflict between the desire to reduce the burden on individual respondents to surveys and censuses and the objective of collecting much more detailed information. What is actually happening is more of a rebalancing:

5.3. Obtaining new administrative data

Major progress. In the traditional system, obtaining administrative data for statistical purposes has typically been difficult, even though the Statistics Act provides the agency with the power to access this information.

As shown in Box 2 in the introductory chapter, we are entering a world where priority is placed on government-wide data management, on the use of digital technologies and on open government – a world where there will be easier access to administrative files for statistical purposes, including the opportunities to shape the content of those administrative files.

In practice, huge progress has already been made in the use of linked data from multiple sources as can be seen by the examples in Box 9.

Short-term action. Despite this success, we are only at the beginning of a journey.

For those who are involved in the shift to a multi-source system of national quality-of-life statistics, future action must be taken in the context of the broader data management initiatives referred to in Box 2 and, perhaps, of political-level, post-pandemic discussions about better data sharing. Next steps to supplement thesebroader initiatives could include:

5.4. Incorporating geographic information in individual records

There has been much progress in Statistics Canada’s use of geomatics and spatial visualization tools and further strengthening is underway. Basically, these tools allow statistical information to be shown on a map of a particular neighbourhood, municipality, region, health regions, or province. This becomes a particularly powerful method of showing interrelationships when data from different sources and different subjects are shown on the map in question.

The information will be also contained in the records of individuals. In a mature multi-source system, this rich new source of data will be available not only at the level of the geographic area but will also be fed back directly into the records of the individuals themselves at the level of the geographic location of the individual’s residence. This will allow much richer analysis of the determinants of the individual’s quality of life in areas such as access to services and employment opportunities, local costs of living, or quality of housing. This would be an important new source of data that would allow richer analysis of individuals in their economic and environmental contexts.
Statistical information that can be both mapped and incorporated in the micro records of individuals
  • Access. The number of hospitals, restaurants, elementary schools, grocery stores, community centres in a geographic space.
  • Environment. Average local temperature, number of parks and the extent of green areas, measures of pollution, health indicators based on sewage wastewater, walkability indices, the incidence of disease including the effects of pandemics.
  • Economy. Local cost of living indicators, vacancies in the firms located in that area and other indicators of economic activity.
  • Society. Data on the socio-economic composition of geographic areas created by adding up the characteristics of the residents in those areas – resulting in data on income inequality, extent of victimization, average educational attainment, ethnic diversity, health indicators, and housing density.

There are already many initiatives related to spatial analysis and there would appear to be potential gains from additional work that incorporates these elements into analysis at the level of individuals.

5.5. Inputs describing social programs and services

The shift to the mature multi-source model will eventually result in micro-level data about social programs and services being included in the statistical system as described in the box below.

Traditional versus micro descriptions of social programs and services
Traditional descriptions

Traditional statistical systems described social programs and services at a macro level using tabulated data, such as:

  • The costs of health, education, and judicial systems, and of income support programs.
  • The general type of service provided, such as treatments related to different diseases, or different types of education provided, or the amount of income transferred in a tax credit program or by social assistance.
  • Characteristics of the recipients of those services.

Trends over time have been to produce increasingly more disaggregated information of this type – for smaller geographic areas and more granular descriptions of services and clients. Indeed, some health and educational data now exist for particular hospitals and schools.

Micro descriptions

For each interaction or intervention between the client and the program or service in question, there would be:

  • Standard ways of describing what happened in the organization (in doctors’ offices, in classrooms, in the rules for calculating the size of income transfers).
  • Standard ways of describing the individuals who were served, including their detailed health, skills, and income status before and after the intervention – and later in life.
  • Standard ways of describing the actual activity that took place during the interventions, including its duration and other descriptors that will be described in a forthcoming paper entitled “The System of National Quality-of-Life Statistics: Conceptual Framework”.

Such microdata now exist for programs that provide income support and are contained in Statistics Canada’s Social Policy Simulation Database and Model. Consistent, comprehensive data of this sort do not yet exist for most services.

Such micro-level data about social programs and services will allow analysis of the actual difference that particular social interventions made in the lives of individuals – the kind of information that is essential for:

As noted in the box, the needed micro data already exists in the case of income transfer programs. However little progress has been made for services:

However, it will take time to develop a comprehensive approach to describing social services using the consistent micro-level descriptors that are discussed in the Annex.

This suggests a gradual approach to development in this area, similar to the approach for ‘What is likely to works best’ pilots. The first steps would be to identify areas of low-hanging fruit and find partners willing to develop and apply these descriptors in their area of expertise.

6. Managing constraints and risks

The future directions discussed in this paper will take Statistics Canada into some quite new territory, some of it world leading. This comes with some constraints and risks during the implementation process that must be carefully managed.

6.1. Skill shortages

In terms of resource constraints, dollar funding can be largely managed by internal reallocations, many of which have already taken place, and by the new funds that were allocated in Budget 2021.

The agency’s IT plan will need to be adjusted in light of the implementation plan. Some of the newer technologies require a lot of computing power that may require considerable lead time to put in place. However, the technology itself is not new and is being used in other applications.

The big resource challenge will therefore likely be a shortage of people with the needed skills. As already discussed, creating and using multi-source microdata for statistical purposes requires individuals with skills sets that are in short supply.

The literature on the typical life cycle of technologies, summarized in Box 10, suggests we are still at early stages of development where the technology is complex and requires high skills levels to operate. Especially in the early stages of the transition period, the best allocation of scarce resources would be for the same team to be involved in the creation and use of complex multi-source data bases (and in related projects involving complex techniques such as microsimulation and predictive analytics).

Otherwise, staff with about the same skill set will be needed in both the producer and user communities. Or, to be more realistic given the shortage of these skills, there will not be adequate skills available for either community to do a good job.

In practice, this means that:

Box 10. The life cycle of technologies and skill shortages

Technologies, such as those used in developing and using large multi-source files, go through typical life cycles.

In the early stages of development, technological factors dominate:

  • The technology will typically not be easy to use. Instability, difficulty in use and inelegant appearance will often be the norm. Documentation will often be weak.
  • In the early stages, there is also a blurring of the lines among those who develop, maintain, and use the technology.

Problems associated with source data, with technological development, and with new kinds of analysis are best dealt with as an integrated package, by the same team of people.

In the middle stages and later stages, the technology moves to the background. The functions of development, maintenance and use become separated:

  • Work on product innovation takes place in separate R&D groups.
  • Maintenance becomes a separate function that includes setting standards, documentation, training materials, and marketing – making things easier for the user.
  • Users will acquire the technology as a commodity, with emphasis on its speed, flexibility, ease of use, and low cost. The technology itself will be largely invisible, the accuracy of the results simply being taken for granted.

Such a model may require some adjustment to existing approaches to data linkage. As noted earlier, some existing plans assume that Statistics Canada’s role is to provide users with the tools to develop high quality, anonymized multiple source statistics. The analysis here suggests that, in many cases, it should be Statistics Canada itself that plays the lead role, in partnership with others.

Box 10 suggests that, in addition to existing recruitment and training practices, there may be a need for a specific HR strategy to support a smooth transition to the skills that will be needed as the technology matures.

6.2. Building understanding and support in the user community

A main objective in shifting to a multi-source model is to better serve users. However, most users, partners and potential funders of social statistics will be unfamiliar with techniques that will be used and of the usefulness of the new data will be created.  Concepts such as multi-source microdata files, synthetic data, predictive analytics, and lifecourse analysis will be unfamiliar to many, even among experts. A risk therefore is that Statistics Canada could be seen as moving off in unknown directions, trying to meet new needs and to anticipate the needs of existing users rather than responding to filling existing gaps and being responsive to needs that have already been formulated.

That risk will be mitigated by the early development of the interactive web site which will provide speedier access to all existing information and by the new approach to responding to current requests for information, including the new suite of flexible survey instruments, that will produce results quickly. However, these steps should be supplemented by outreach mechanisms that will allow users to work in partnership with Statistics Canada in developing practical ways of applying the new statistical evidence that will be produced.

An emphasis on outreach. Much recent attention in Statistics Canada, including in the social area, has been devoted to the issues of horizontality, partnership, and outreach. New centres have been created with a view to increasing the capacity to respond quickly to changing demands for statistics and to anticipate new demands. This emphasis on outreach is important but needs to be carefully managed. To be effective, it should be guided by the following considerations:

A balanced outreach strategy to support the evolution of the multi-source system might therefore include the following strands:

An emphasis on partnership. Many of the outreach activities above will involve close partnerships with users. In addition, as described earlier, it is proposed that particularly close partnership relationships be established in those areas that are at the leading edge from a technical perspective:

6.3. Ethical issues: building trust

Any system that makes use of data about individuals from different sources will, properly, raise alarm bells related to privacy, consent, and confidentiality. These issues are not new. The question of getting the right balance between these concerns and those related to the value of the uses of official statistics and the problems of response burden have been the subject of in-depth discussions for many years in many countries.

The privacy-related issues are part of a wider set of ethical concerns relating to the social acceptance of the boundaries of official statistics and the role of the agencies that produce them. These include the independence and neutrality of the statistical office, the transparency of its operation, trust in the quality and relevance of the statistics that are produced, the accessibility of statistical outputs, and the efficient managing and protection of information within the system.

Canada, and its statistical agency, are well-positioned on all these ethical fronts:

The directions outlined in this paper are designed to further strengthen trust and acceptance in the population at large and among those respondents who provide data inputs:

As implementation proceeds, a more immediate and transparent balance will be struck among response burden, usefulness and the protection of privacy – strengthening the implicit social contract that exists among those who provide, process and use data for statistical purposes.

Date modified: