User Guide and Data Dictionary for Preliminary COVID-19 Data

Release date: May 22, 2020 Updated on: July 9, 2020

Skip to text

Text begins

1. Background

COVID-19 is the disease caused by SARS-CoV2, a novel coronavirus that has not been identified before the first cases were reported in Wuhan, China, in December 2019.  The virus has now spread to almost all countries around the world. The first confirmed cases in Canada appeared in January 2020.

There is still a lot that is unknown about the virus and limited data available for researchers to study it. In Canada, the 10 provinces and 3 territories are providing the Public Health Agency of Canada (PHAC) information on COVID-19 cases on a daily basis. In collaboration with PHAC, Statistics Canada (STC) contributes to make these preliminary data available to the research community and to all Canadians.

2. Objectives of the “Detailed preliminary information on confirmed cases of COVID-19 (Revised)” Table

PHAC and STC have been working closely together to be able to provide preliminary data received by the provinces and territories (PTs) to researchers.

The Detailed preliminary information on confirmed cases of COVID-19 table is a data product that provides easy access to as much data as possible, by provincial regions, while respecting confidentiality of the individuals for which information on COVID-19 history is reported.

Given the COVID-19 pandemic is still progressing, the content of this table will be updated regularly, making it a unique and relevant product. Each iteration of the table will provide up-to-date case information reported by PTs.

3. Coverage of the “Detailed preliminary information on confirmed cases of COVID-19 (Revised)” Table

The data published by STC contains cases for which detailed case information was submitted by the provincial or territorial public health authority to PHAC. The governments of Canada and the provinces and territories agreed on a common Case Report Form (CRF)Note  to be used to report cases to PHAC. With increasing numbers of cases, most provinces are now sending datasets that include the same variables as the CRF.

These data may not match the total cases reporting done at the provincial and territorial levels, which are updated daily by each jurisdiction and compiled by PHAC. The discrepancy is due to factors such as delays in reporting, or variability in reporting cut-offs. Given the under coverage, these data are a subset of the total reported cases in Canada.  

Routine updates on health outcome status are not made uniformly across Canada, and therefore the data may underestimate the number of hospitalizations, admissions to intensive care units, deaths and recoveries. 

The first table iteration contains 30,778 cases; this represents all the confirmed cases received from PHAC as of May 13, 2020. It does not include all confirmed cases in Canada. The number of cases will increase with each table iteration, as more data is received from PHAC.

The data on this table is preliminary and subject to change as updated information is received from the provinces and territories.

4. Content of the “Detailed preliminary information on confirmed cases of COVID-19 (Revised)” Table

This table is a subset of the information that provinces and territories collect using the Coronavirus Disease (COVID-19) Case Report Form. The variables selected were those that were considered to be the most important while meeting a certain quality threshold. Also, some “derived variables” were computed by PHAC based on the information contained in the case report forms.

To minimize the risk of disclosure:

  1. a few categories from the original questions collected on the form have been grouped together:
    • The provinces and territories have been grouped into the following regions:
      • British Columbia & Yukon
      • Alberta, Saskatchewan, Manitoba & the Northwest Territories
      • Ontario & Nunavut
      • Quebec
      • New Brunswick, Nova Scotia, Prince Edward Island & Newfoundland and Labrador
    • The age in years of individuals has been grouped into age groups:
      • 0-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+
  2. a few categories from the original questions have been reclassified:
    • Occupation
      • Health care workers, school or daycare workers/attendees and residents of long-term care facilities have remained as their own categories.
        • Please note that healthcare workers include those with and without direct patient contact. Long-term care residents may include residents of senior's homes, assisted living facilities, and retirement communities, as well as nursing homes. Long-term care facilities may be privately run or under provincial authority.
      • Farm worker, laboratory worker and veterinary/animal worker have been grouped with ‘other’.
  3. a few derived variables have been created:
    • Dates (Episode Date, Onset Date, Recovery Date)
      • All dates were converted to weeks, i.e., Episode Week, Onset Week, Recovery Week.
      • All cases with an episode date and/or onset date prior to February 23rd, 2020 (the first day of the 8th week of 2020) were grouped with the cases in week 8. For those cases that also had a recovery date, the recovery week was also shifted forward the same amount as the episode and/or onset date.
      • If there are insufficient cases by week for a region to be able to release the episode week without compromising the confidentiality, the cases for a given period will be grouped in the week that had the most cases (i.e., not enough cases for the Atlantic region for weeks 23, 24 and 25, most cases were in week 23, all cases will be grouped in week 23). For those cases that also had a onset week, the onset week was also shifted.

Refer to the data dictionary for detailed information about each variable.

STC is working with PHAC on the preparation of a master file which will include more details and more variables. This will be available in the near future through a new product availability announcement in the STC’s The Daily.

The information on this table is considered preliminary.

5. Limitations

This dataset includes cases who are confirmed according to the Canadian interim national case definition for COVID-19, that is: “A person with laboratory confirmation of infection with the virus that causes COVID-19 performed at a community, hospital or reference laboratory (NML or a provincial public health laboratory) running a validated assay.”Note 

COVID-19 testing was initially performed for diagnostic purposes only (i.e., to confirm the diagnosis of suspected cases of COVID-19), and then, increasingly, for screening based on public health priorities (high-risk groups or contact tracing). None of these tests were conducted for research purposes, and the screening was not designed to be conducted in a probabilistic sample representative of the Canadian population.

The expansion of laboratory testing evolved over time following the epidemiology of the disease, i.e., the spread of the disease from China to other countries and the establishment of community transmission in Canada.

With increasing laboratory capacity, some provinces were able to screen people from targeted groups, e.g., residents and staff of long-term care facilities where cases have occurred, or contacts of cases identified in epidemiologic investigations. These expansions of testing did not occur simultaneously across provinces and territories. Additionally, testing capacity and prioritization continue to differ between provinces and territories, thus skewing any inter-jurisdictional comparison.

The factors listed above must be taken in consideration when interpreting data analysis. Examples of possible bias include:

6. Data quality concerns

Routine updates on health outcome status are not made uniformly across Canada, therefore the data may underestimate the number of hospitalizations, admissions to intensive care units, deaths and recoveries. 

There is a high proportion of missing values.

Some of the variables on the PHAC COVID-19 file contained an ‘‘other, specify’’ free text field (occupation, symptoms) that was not accounted for. Those fields will be reviewed at some point and the variables on this table might be recoded.

Some sections of the case report form were filled in inconsistently.

STC and PHAC are working closely together to improve the quality of the file with the help of all provinces and territories. This will be reflected in each iteration of this table.

Appendix I - Data dictionary, Section on the Form, Concept and Universe


Appendix I - Data dictionary, Section on the Form, Concept and Universe
Table summary
This table displays the results of Appendix I - Data dictionary. The information is grouped by Variable Name (appearing as row headers), Section on the Form, Concept and Universe (appearing as column headers).
Variable Name Section on the Form Concept Universe
Case identifier number Administrative Information Unique identifier for each case All cases
Region Administrative Information Province/Territory where the case resides, grouped by regions. All cases
Episode week Administrative Information Week of the episode, derived using symptom onset date or the closest date available. All cases
Episode year Administrative Information Year of the episode, derived using symptom onset date or the closest date available. All cases
Gender Case Details The gender of the case All cases
Age group Case Details Age group corresponding to the age of the case All cases
Occupation Case Details Indicates the case's occupation All cases
Asymptomatic Symptoms Indicates if the case was asymptomatic. All cases
Onset week of symptoms Symptoms Week of symptom(s) onset All cases
Onset year of symptoms Symptoms Year of symptom(s) onset All cases
Symptom - cough Symptoms Case reported cough All cases
Symptom - fever Symptoms Case reported fever (≥38°C) All cases
Symptom - chills Symptoms Case reported feverish/chills (temperature not taken) All cases
Symptom - sore throat Symptoms Case reported sore throat All cases
Symptom - runny nose Symptoms Case reported runny nose All cases
Symptom - shortness of breath Symptoms Case reported shortness of breath/difficulty breathing All cases
Symptom - nausea Symptoms Case reported nausea/vomiting All cases
Symptom - headache Symptoms Case reported headache All cases
Symptom - weakness Symptoms Case reported general weakness All cases
Symptom - pain Symptoms Case reported pain (muscular, chest, abdominal, joint) All cases
Symptom - irritability Symptoms Case reported irritability/confusion All cases
Symptom - diarrhea Symptoms Case reported diarrhea All cases
Symptom - other Symptoms Case reported other symptoms All cases
Hospital status Clinical Course and Outcomes Indicates if the case was hospitalized and if the case was admitted to the intensive care unit. All cases
Recovered Clinical Course and Outcomes Indicates if the case has recovered. All cases
Recovery week Clinical Course and Outcomes Month reported recovered Recovered=1
Recovery year Clinical Course and Outcomes Year reported recovered Recovered=1
Death Clinical Course and Outcomes Indicates if the case died while infected by COVID-19. All cases
Transmission Exposures Location where exposure occurred All cases

Appendix II - Data dictionary, Notes and Limitations


Appendix II - Data dictionary, Notes and Limitations
Table summary
This table displays the results of Appendix II - Data dictionary. The information is grouped by Variable Name (appearing as row headers), Note and Limitation (appearing as column headers).
Variable Name Note and Limitation
Case identifier number Created randomly by Statistics Canada. The same case will have a different number every time that the file is released.
Region To ensure confidentiality, some provinces/territories have been grouped together by Statistics Canada.
Episode week Derived by Statistics Canada from EpisodeDate (not available on this dataset). Episode date is derived based on onset date > specimen collection date > lab result > NML confirmation date. 0 represents the first days of the year leading up to, but not including the first Sunday. 1 represents the first full week of the year, beginning on the first Sunday, and so on.
Episode year Derived by Statistics Canada from EpisodeDate (not available on this dataset). Episode date is derived based on onset date> specimen collection date > lab result > NML confirmation date.
Gender Derived from the Gender variable received from PHAC (not available on this dataset). Missing values and ''Other'' were assigned to 'Not Stated'.
Age group
Occupation

Healthcare workers include those with and without direct patient contact. Long-term care residents may include residents of senior's homes, assisted living facilities, and retirement communities, as well as nursing homes. Long-term care facilities may be privately run or under provincial authority. Laboratory worker handling biological specimens, Veterinary/animal worker and Farm worker have been categorized with the "Other" due to low frequencies.
The free-text field 'occupation_spec' has not been cleaned, and it is expected that some re-coding is necessary (occupation classification will change for some cases).

Asymptomatic Derived from the symptoms. If no symptoms were experienced, then asymptomatic is yes. If any symptoms were experienced then asymptomatic is no.
Onset week of symptoms Derived by Statistics Canada from OnsetDate (not available on this dataset). 0 represents the first days of the year leading up to, but not including the first Sunday. 1 represents the first full week of the year, beginning on the first Sunday, and so on.
Onset year of symptoms Derived by Statistics Canada from OnsetDate (not available on this dataset).

All symptoms

Each symptom variable has a corresponding "sym_specify" free-text field. This field is currently being reviewed and cleaned by medical experts, and it is expected that some cases will change response to the symptom variables. Cleaning and reviewing of "OtherSymptom_spec" is also underway - this is also expected to impact responses to the other symptom variables.

Hospital status

Recovered

Routine updates on health outcome status are not made uniformly across Canada, and therefore the data may underestimate the number of hospitalizations, admissions to intensive care units, deaths and recoveries.
Recovery week Derived by Statistics Canada from Recovery Date (not available on this dataset). 0 represents the first days of the year leading up to, but not including the first Sunday. 1 represents the first full week of the year, beginning on the first Sunday, and so on.
Recovery year Derived by Statistics Canada from Recovery Date (not available on this dataset).
Death Refer to the comment in “Hospital Status”.
Transmission Domestic acquisition – Contact of COVID case: Includes cases who reported having close contact with a confirmed or probable COVID-19 case in the 14 days prior to symptom onset. Domestic acquisition– Contact with traveler: Includes cases who reported having close contact with a symptomatic person who had traveled to an affected area in the 14 days prior to their illness onset. Domestic acquisition – Unknown source: Includes cases who had not travelled, and 1) who had reported no contact with a COVID-19 case or symptomatic traveller, or 2) whose information on contact with a case or contact with a symptomatic traveler was unknown or missing.
International travel: Includes cases who reported having travelled outside of their province / territory of residence or outside of Canada within the 14 days prior to symptom onset.
Information pending: Includes cases for which information on contact with a case, contact with a symptomatic traveler, and travel history were all missing or unknown.

Appendix III - Data dictionary, Source, Format and Answer Categories


Appendix III - Data dictionary, Source, Format and Answer Categories
Table summary
This table displays the results of Appendix III - Data dictionary. The information is grouped by Variable Name (appearing as row headers), Source, Format and Answer Categories (appearing as column headers).
Variable Name Source Format Answer Categories
Case identifier number Statistics Canada 8.0 Continuous value from 1 to 99999999
Region Public Health Agency of Canada 2.0 1=Atlantic (New Brunswick, Nova Scotia, Prince Edward Island, Newfoundland and Labrador), 2=Quebec, 3=Ontario and Nunavut, 4=Prairies (Manitoba, Saskatchewan, Alberta) and the Northwest Territories, 5=British Columbia and Yukon
Episode week Public Health Agency of Canada 2.0 Continuous value from 0 to 52, 99=Not stated
Episode year Public Health Agency of Canada 4.0 Only year 2020 at this point, 9999=Not stated
Gender Public Health Agency of Canada 1.0 1=Male, 2=Female, 9=Not stated/Other
Age group Public Health Agency of Canada 2.0 1=0-19, 2=20-29, 3=30-39, 4=40-49, 5=50-59, 6=60-69, 7=70-79, 8=80+, 99=Not stated
Occupation Public Health Agency of Canada 2.0 1=Health Care Worker, 2=School or daycare worker/attendee, 3=Long term care resident, 4=Other, 9=Not stated
Asymptomatic Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated
Onset week of symptoms Public Health Agency of Canada 2.0 Continuous value from 0 to 52, 99=Not stated or Not applicable
Onset year of symptoms Public Health Agency of Canada 4.0 Only year 2020 at this point, 9999=Not stated or Not applicable
Symptom - cough Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - fever Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - chills Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - sore throat Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - runny nose Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - shortness of breath Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - nausea Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - headache Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - weakness Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - pain Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - irritability Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - diarrhea Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Symptom - other Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Hospital status Public Health Agency of Canada 1.0 1=Hospitalized - ICU, 2=Hospitalized - Non-ICU, 3=Not Hospitalized, 9=Not stated/Unknown
Recovered Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated/Unknown
Recovery week Public Health Agency of Canada 2.0 Continuous value from 0 to 52, 99=Not stated/Not applicable
Recovery year Public Health Agency of Canada 4.0 Only year 2020 at this point, 9999=Not stated/Not applicable
Death Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated
Transmission Public Health Agency of Canada 1.0 1=Domestic Acquisition: “Contact of COVID Case” or “Contact with traveler” or “Unknown Source”, 2=International Travel, 9=Not stated/Pending

Appendix IV – Information on the Week Variables, for Year 2020


Appendix V
Information on the Week Variables, for Year 2020
Table summary
This table displays the results of Information on the Week Variables. The information is grouped by Week (appearing as row headers), Concept (appearing as column headers).
Week Concept
0 Week of December 29th
1 Week of January 5th
2 Week of January 12th
3 Week of January 19th
4 Week of January 26th
5 Week of February 2nd
6 Week of February 9th
7 Week of February 16th
8 Week of February 23rd
9 Week of March 1st
10 Week of March 8th
11 Week of March 15th
12 Week of March 22nd
13 Week of March 29th
14 Week of April 5th
15 Week of April 12th
16 Week of April 19th
17 Week of April 26th
18 Week of May 3rd
19 Week of May 10th
20 Week of May 17th
21 Week of May 24th
22 Week of May 31st
23 Week of June 7th
24 Week of June 14th
25 Week of June 21th
26 Week of June 28th
27 Week of July 5th
28 Week of July 12th
29 Week of July 19th
30 Week of July 26th
31 Week of August 2nd
32 Week of August 9th
33 Week of August 16th
34 Week of August 23rd
35 Week of August 30th
36 Week of September 6th
37 Week of September 13th
38 Week of September 20th
39 Week of September 27th
40 Week of October 4th
41 Week of October 11th
42 Week of October 18th
43 Week of October 25th
44 Week of November 1st
45 Week of November 8th
46 Week of November 15th
47 Week of November 22nd
48 Week of November 29th
49 Week of December 6th
50 Week of December 13th
51 Week of December 20th
52 Week of December 27th

Notes

Date modified: