User Guide and Data Dictionary for Preliminary COVID-19 Data

Release date: May 22, 2020 Updated on: January 12, 2024

Skip to text

Text begins

1. Background

COVID‑19 is the disease caused by SARS-CoV2, a novel coronavirus that had not been identified before the first cases were reported in Wuhan, China, in December 2019. The first confirmed cases in Canada appeared in January 2020.

In Canada, the 10 provinces and 3 territories are providing the Public Health Agency of Canada (PHAC) information on COVID‑19 cases on a routine basis. In collaboration with PHAC, Statistics Canada (StatCan) contributes to make these preliminary data available to the research community and to all Canadians.

2. Objectives of the Preliminary dataset on confirmed cases of COVID‑19

PHAC and StatCan have been working closely together to be able to provide preliminary data received from the provinces and territories (PTs) to researchers.

The Preliminary dataset on confirmed cases of COVID‑19 provides easy access to as much data as possible, by provincial regions, while respecting confidentiality of the individuals for which information on COVID‑19 history is reported.

Given the COVID‑19 pandemic is still progressing, the content of this dataset will be updated regularly, making it a unique and relevant product. Each iteration of the dataset will provide up-to-date case information reported by PTs.

This information was originally released in the Detailed preliminary information on confirmed cases of COVID‑19 (Revised) table but due to the increasing number of cases, this dataset could no longer be supported in this format. This table was deleted on Thursday, December 10th, 2020. The information from this table is now available in a downloadable dataset: "Preliminary dataset on confirmed cases of COVID‑19, Public Health Agency of Canada” (13-26-0003).

3. Coverage of the Preliminary dataset on confirmed cases of COVID‑19

The data published by StatCan contains cases for which detailed case information was submitted by provincial or territorial public health authorities to PHAC. The governments of Canada and the provinces and territories agreed on a common Case Report Form (CRF)Note 1 to be used to report cases to PHAC.

These data may not match the total cases reporting done at the provincial and territorial levels, which are updated routinely by each jurisdiction. Discrepancies are due to factors such as delays in reporting, or variability in reporting cut-offs. Due to these discrepancies, the data are a subset of the total reported cases in Canada.

Routine updates on health outcome status are not made uniformly across Canada, and therefore the data may underestimate the number of hospitalizations, admissions to intensive care units and deaths.

Throughout the pandemic, the CRF has been updated, which has impacted the data published by StatCan. For example information about symptoms was removed from the dataset as of March 2021 since the information was no longer collected on the CRF and the historical information was incomplete.

Variables related to the resolution status of cases were removed from the data file in June 2022. Many provinces and territories were unable to determine the resolution status of a case in alignment with the national definition of a resolved COVID‑19 infection.Note 2

The variables transmission, asymptomatic, occupation, onset year of symptoms and onset month of symptoms were removed from the data file in January 2023. Most provinces or territories were no longer reporting these variables meaning they are incomplete and no longer representative, creating challenges for interpretation as a result the quality of these variables are no longer deemed sufficient for release.

The data on this dataset is preliminary and subject to change as updated information is received from the provinces and territories.

4. Content of the Preliminary dataset on confirmed cases of COVID‑19Note 1

This dataset is a subset of the information that provinces and territories collect using the Coronavirus Disease (COVID‑19) Case Report FormNote 1. The variables selected were those that were considered to be the most important while meeting a certain quality threshold. Also, some “derived variables” were computed by PHAC based on the information contained in the case report forms.

To minimize the risk of disclosure:

  1. a few categories from the original questions collected on the form have been grouped together:
    • The provinces and territories have been grouped into the following regions:
      • British Columbia & Yukon
      • Alberta, Saskatchewan, Manitoba & the Northwest Territories
      • Ontario & Nunavut
      • Quebec
      • New Brunswick, Nova Scotia, Prince Edward Island & Newfoundland and Labrador
    • The age in years of individuals has been grouped into age groups:
      • 0-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+
  2. a few derived variables have been created:
    • Dates (Earliest Date)
      • Earliest Date is used to derive Episode week, Episode week group and Episode year.
      • All dates were converted to weeks, i.e., Episode week, Episode week group.
      • All cases with an earliest date prior to February 23rd, 2020 (the first day of the 8th week of 2020) were grouped with the cases in week 8.
      • If there are insufficient cases by week for a region to be able to release the episode week without compromising the confidentiality, the cases for a given period will be grouped in the week that had the most cases (i.e., not enough cases for the Atlantic region for weeks 23, 24 and 25, most cases were in week 23, all cases will be grouped in week 23). The Episode week group variable will indicate if a grouping was done and appendix IX will indicate which weeks have been grouped together.

Refer to the data dictionary for detailed information about each variable.

The information on this dataset is considered preliminary.

5. Limitations

This dataset includes cases who are confirmed according to the Canadian interim national case definition for COVID‑19Note 2, that is:

  • The detection of at least 1 specific gene target by a validated laboratory-based nucleic acid amplification test (NAAT) assay performed at a community, hospital, or reference laboratory, or
  • the detection of at least 1 specific gene target by a validated point-of-care NAAT that has been deemed acceptable to provide a final result, or
  • seroconversion or diagnostic rise (at least 4-fold or greater from baseline) in viral specific antibody titre in serum or plasma using a validated laboratory-based serological assay for SARS-CoV-2.

COVID‑19 testing was initially performed for diagnostic purposes only (i.e., to confirm the diagnosis of suspected cases of COVID‑19), and then, increasingly, for screening based on public health priorities (high-risk groups or contact tracing). None of these tests were conducted for research purposes, and the screening was not designed to be conducted in a probabilistic sample representative of the Canadian population.

The expansion of laboratory testing evolved over time following the epidemiology of the disease, i.e., the spread of the disease from China to other countries and the establishment of community transmission in Canada.

With increasing laboratory capacity, some provinces were able to screen people from targeted groups, e.g., residents and staff of long-term care facilities where cases have occurred, or contacts of cases identified in epidemiologic investigations. These expansions of testing did not occur simultaneously across provinces and territories. Additionally, testing capacity and prioritization continue to differ between provinces and territories, thus skewing any inter-jurisdictional comparison.

Due to changes in COVID‑19 testing policies in many jurisdictions, which were in response to a surge in demand for laboratory testing starting in December 2021, cases may be skewed towards populations deemed high priority for laboratory testing, and case counts will underestimate the total burden of COVID‑19 in the population.

Furthermore, following the surge in demand for laboratory testing, COVID‑19 rapid antigen tests (RATs) became increasingly available for public use. While positive laboratory tests have a reporting mechanism to provincial public health authorities, RATs do not, and are therefore not captured within this data set. The availability of RATs for public use limited demand for laboratory testing, resulting in case counts further underestimating of the total burden of COVID‑19 in the population.

The factors listed above must be taken in consideration when interpreting data analysis. Examples of possible bias include:

  • Following outbreaks in long-term care facilities, some jurisdictions undertook mass screening in residents of these facilities, which may impact the age distribution of cases. Mass screening in specific segments of the population may lead to their over-representation in the confirmed case data, as general population mass screening has not occurred on a large scale.
  • Any comparisons between provinces and territories using demographics or health outcomes may be biased by differences in testing criteria.
  • Starting in December 2021, cases may be skewed towards populations deemed high priority for laboratory testing, and case counts will underestimate the total burden of COVID‑19 in the population.

6. Data quality concerns

Routine updates on health outcome status are not made uniformly across Canada, therefore the data may underestimate the number of hospitalizations, admissions to intensive care units and deaths.

There is a high proportion of missing values and some sections of the case report form were provided inconsistently.

Please note that variables may be recoded or changed based on several factors, including but not limited to: new information being reported for historical cases, updates to the case report form, revised reporting by provinces and territories, etc.

StatCan and PHAC are working closely together to improve the quality of the file with the help of all provinces and territories. This will be reflected in each iteration of this dataset.


Appendix I
Data dictionary, Concept, Variable Name, Section on the Form, Description and Universe
Table summary
This table displays the results of Data dictionary. The information is grouped by Concept (appearing as row headers), Variable Name, Section on the Form, Description and Universe (appearing as column headers).
Concept Variable Name Section on the Form Description Universe
Case identifier number COV_ID Administrative Information Unique identifier for each case All cases
Region COV_REG Administrative Information Province/Territory where the case resides, grouped by regions. All cases
Episode week COV_EW Administrative Information Week of the episode, derived using the earliest of the following dates: symptom onset date, specimen collection date, laboratory testing date, date reported to the province or territory, or date reported to PHAC. All cases
Episode week group COV_EWG Administrative Information Indicates when multiple episode weeks have been grouped together to protect confidentiality. Refer to Appendix IX . All cases
Episode year COV_EY Administrative Information Year of the episode, derived using the earliest of the following dates: symptom onset date, specimen collection date, laboratory testing date, date reported to the province or territory, or date reported to PHAC. All cases
Gender COV_GDR Case Details The gender of the case. Where available, gender data was used; when gender data was unavailable, sex data was used. All cases
Age group COV_AGR Case Details Age group corresponding to the age of the case All cases
Hospital status COV_HSP Clinical Course and Outcomes Indicates if the case was hospitalized and if the case was admitted to the intensive care unit. All cases
Death COV_DTH Clinical Course and Outcomes Indicates if the case died due to COVID-19 which may be attributed when COVID-19 is the cause of death or is a contributing factor. All cases

Appendix II
Data dictionary, Notes and Limitations
Table summary
This table displays the results of Data dictionary. The information is grouped by Concept (appearing as row headers), Note and Limitation (appearing as column headers).
Concept Note and Limitation
Case identifier number Created randomly by Statistics Canada. The same case will have a different number every time that the file is released.
Region To ensure confidentiality, some provinces/territories have been grouped together by Statistics Canada.
Episode week Derived by Statistics Canada from Earliest Date (not available on this dataset). Earliest Date is derived based on the earliest of the following dates: symptom onset date, specimen collection date, laboratory testing date, date reported to the province or territory, or date reported to PHAC. 0 represents the first days of the year leading up to, but not including the first Sunday. 1 represents the first full week of the year, beginning on the first Sunday, and so on.
Episode week group Derived by Statistics Canada from Episode week. Indicates when multiple episode weeks have been grouped together to protect confidentiality. Refer to Appendix IX .
Episode year Derived by Statistics Canada from Earliest Date (not available on this dataset). Earliest Date is derived based on the earliest of the following dates: symptom onset date, specimen collection date, laboratory testing date, date reported to the province or territory, or date reported to PHAC.
Gender Derived from the Gender variable received from PHAC (not available on this dataset). Where available, gender data was used; when gender data was unavailable, sex data was used. Missing values and ''Other'' were assigned to 'Not Stated'.
Age group Note ...: not applicable
Hospital status Routine updates on health outcome status are not made uniformly across Canada, and therefore the data may underestimate the number of hospitalizations, admissions to intensive care units and deaths.
Death Derived by Statistics Canada from COVIDDeath (not available on this dataset). Routine updates on health outcome status are not made uniformly across Canada, and therefore the data may underestimate the number of hospitalizations, admissions to intensive care units and deaths.

Appendix III
Data dictionary, Source, Format and Answer Categories
Table summary
This table displays the results of Data dictionary. The information is grouped by Concept (appearing as row headers), Source, Format and Answer Categories (appearing as column headers).
Concept Source Format Answer Categories
Case identifier number Statistics Canada 8.0 Continuous value from 1 to 99999999
Region Public Health Agency of Canada 1.0 1=Atlantic (New Brunswick, Nova Scotia, Prince Edward Island, Newfoundland and Labrador), 2=Quebec, 3=Ontario and Nunavut, 4=Prairies (Manitoba, Saskatchewan, Alberta) and the Northwest Territories, 5=British Columbia and Yukon
Episode week Public Health Agency of Canada 2.0 Continuous value from 0 to 52, 99=Not stated
Episode week group Public Health Agency of Canada 2.0 Refer to Appendix IX .
Episode year Public Health Agency of Canada 2.0 20=2020, 21=2021, 22=2022, 23=2023, 24=2024, 99=Not stated
Gender Public Health Agency of Canada 1.0 1=Male, 2=Female, 9=Not stated/Other
Age group Public Health Agency of Canada 2.0 1=0-19, 2=20-29, 3=30-39, 4=40-49, 5=50-59, 6=60-69, 7=70-79, 8=80+, 99=Not stated
Hospital status Public Health Agency of Canada 1.0 1=Hospitalized - ICU, 2=Hospitalized - Non-ICU, 3=Not Hospitalized, 9=Not stated/Unknown
Death Public Health Agency of Canada 1.0 1=Yes, 2=No, 9=Not Stated

Appendix IV
Information on the Week Variables for the Year 2020
Table summary
This table displays the results of Information on the Week Variables for the Year 2020. The information is grouped by Week (appearing as row headers), Description (appearing as column headers).
Week Description
0 Week of December 29th
1 Week of January 5th
2 Week of January 12th
3 Week of January 19th
4 Week of January 26th
5 Week of February 2nd
6 Week of February 9th
7 Week of February 16th
8 Week of February 23rd
9 Week of March 1st
10 Week of March 8th
11 Week of March 15th
12 Week of March 22nd
13 Week of March 29th
14 Week of April 5th
15 Week of April 12th
16 Week of April 19th
17 Week of April 26th
18 Week of May 3rd
19 Week of May 10th
20 Week of May 17th
21 Week of May 24th
22 Week of May 31st
23 Week of June 7th
24 Week of June 14th
25 Week of June 21st
26 Week of June 28th
27 Week of July 5th
28 Week of July 12th
29 Week of July 19th
30 Week of July 26th
31 Week of August 2nd
32 Week of August 9th
33 Week of August 16th
34 Week of August 23rd
35 Week of August 30th
36 Week of September 6th
37 Week of September 13th
38 Week of September 20th
39 Week of September 27th
40 Week of October 4th
41 Week of October 11th
42 Week of October 18th
43 Week of October 25th
44 Week of November 1st
45 Week of November 8th
46 Week of November 15th
47 Week of November 22nd
48 Week of November 29th
49 Week of December 6th
50 Week of December 13th
51 Week of December 20th
52 Week of December 27th

Appendix V
Information on the Week Variables for the Year 2021
Table summary
This table displays the results of Information on the Week Variables for the Year 2021. The information is grouped by Week (appearing as row headers), Description (appearing as column headers).
Week Description
0 Note ...: not applicable
1 Week of January 3rd
2 Week of January 10th
3 Week of January 17th
4 Week of January 24th
5 Week of January 31st
6 Week of February 7th
7 Week of February 14th
8 Week of February 21st
9 Week of February 28th
10 Week of March 7th
11 Week of March 14th
12 Week of March 21st
13 Week of March 28th
14 Week of April 4th
15 Week of April 11th
16 Week of April 18th
17 Week of April 25th
18 Week of May 2nd
19 Week of May 9th
20 Week of May 16th
21 Week of May 23rd
22 Week of May 30th
23 Week of June 6th
24 Week of June 13th
25 Week of June 20th
26 Week of June 27th
27 Week of July 4th
28 Week of July 11th
29 Week of July 18th
30 Week of July 25th
31 Week of August 1st
32 Week of August 8th
33 Week of August 15th
34 Week of August 22nd
35 Week of August 29th
36 Week of September 5th
37 Week of September 12th
38 Week of September 19th
39 Week of September 26th
40 Week of October 3rd
41 Week of October 10th
42 Week of October 17th
43 Week of October 24th
44 Week of October 31st
45 Week of November 7th
46 Week of November 14th
47 Week of November 21st
48 Week of November 28th
49 Week of December 5th
50 Week of December 12th
51 Week of December 19th
52 Week of December 26th

Appendix VI
Information on the Week Variables for the Year 2022
Table summary
This table displays the results of Information on the Week Variables for the Year 2022. The information is grouped by Week (appearing as row headers), Description (appearing as column headers).
Week Description
0 Note ...: not applicable
1 Week of January 2nd
2 Week of January 9th
3 Week of January 16th
4 Week of January 23rd
5 Week of January 30th
6 Week of February 6th
7 Week of February 13th
8 Week of February 20th
9 Week of February 27th
10 Week of March 6th
11 Week of March 13th
12 Week of March 20th
13 Week of March 27th
14 Week of April 3rd
15 Week of April 10th
16 Week of April 17th
17 Week of April 24th
18 Week of May 1st
19 Week of May 8th
20 Week of May 15th
21 Week of May 22nd
22 Week of May 29th
23 Week of June 5th
24 Week of June 12th
25 Week of June 19th
26 Week of June 26th
27 Week of July 3rd
28 Week of July 10th
29 Week of July 17th
30 Week of July 24th
31 Week of July 31st
32 Week of August 7th
33 Week of August 14th
34 Week of August 21st
35 Week of August 28th
36 Week of September 4th
37 Week of September 11th
38 Week of September 18th
39 Week of September 25th
40 Week of October 2nd
41 Week of October 9th
42 Week of October 16th
43 Week of October 23rd
44 Week of October 30th
45 Week of November 6th
46 Week of November 13th
47 Week of November 20th
48 Week of November 27th
49 Week of December 4th
50 Week of December 11th
51 Week of December 18th
52 Week of December 25th

Appendix VII
Information on the Week Variables for the Year 2023
Table summary
This table displays the results of Information on the Week Variables for the Year 2023. The information is grouped by Week (appearing as row headers), Description (appearing as column headers).
Week Description
1 Week of January 1st
2 Week of January 8th
3 Week of January 15th
4 Week of January 22nd
5 Week of January 29th
6 Week of February 5th
7 Week of February 12th
8 Week of February 19th
9 Week of February 26th
10 Week of March 5th
11 Week of March 12th
12 Week of March 19th
13 Week of March 26th
14 Week of April 2nd
15 Week of April 9th
16 Week of April 16th
17 Week of April 23rd
18 Week of April 30th
19 Week of May 7th
20 Week of May 14th
21 Week of May 21st
22 Week of May 28th
23 Week of June 4th
24 Week of June 11th
25 Week of June 18th
26 Week of June 25th
27 Week of July 2nd
28 Week of July 9th
29 Week of July 16th
30 Week of July 23rd
31 Week of July 30th
32 Week of August 6th
33 Week of August 13th
34 Week of August 20th
35 Week of August 27th
36 Week of September 3rd
37 Week of September 10th
38 Week of September 17th
39 Week of September 24th
40 Week October 1st
41 Week of October 8th
42 Week of October 15th
43 Week of October 22nd
44 Week of October 29th
45 Week of November 5th
46 Week of November 12th
47 Week of November 19th
48 Week of November 26th
49 Week of December 3rd
50 Week of December 10th
51 Week of December 17th
52 Week of December 24th

Appendix VIII
Information on the Week Variables for the Year 2024 
Table summary
This table displays the results of Information on the Week Variables for the Year 2024 . The information is grouped by Week (appearing as row headers), Description (appearing as column headers).
Week Description
0 Week of December 31st
1 Week of January 7th
2 Week of January 14th
3 Week of January 21st
4 Week of January 28th
5 Week of February 4th
6 Week of February 11th
7 Week of February 18th
8 Week of February 25th
9 Week of March 3rd
10 Week of March 10th
11 Week of March 17th
12 Week of March 24th
13 Week of March 31st
14 Week of April 7th
15 Week of April 14th
16 Week of April 21st
17 Week of April 28th
18 Week of May 5th
19 Week of May 12th
20 Week of May 19th
21 Week of May 26th
22 Week of June 2nd
23 Week of June 9th
24 Week of June 16th
25 Week of June 23rd
26 Week of June 30th
27 Week of July 7th
28 Week of July 14th
29 Week of July 21st
30 Week of July 28th
31 Week of August 4th
32 Week of August 11th
33 Week of August 18th
34 Week of August 25th
35 Week of September 1st
36 Week of September 8th
37 Week of September 15th
38 Week of September 22nd
39 Week of September 29th
40 Week October 6th
41 Week of October 13th
42 Week of October 20th
43 Week of October 27th
44 Week of November 3rd
45 Week of November 10th
46 Week of November 17th
47 Week of November 24th
48 Week of December 1st
49 Week of December 8th
50 Week of December 15th
51 Week of December 22nd
52 Week of December 29th

Appendix IX
Information on the Episode Week Group Indicator
Table summary
This table displays the results of Information on the Episode Week Group Indicator. The information is grouped by Episode week group (appearing as row headers), Region(s), Episode weeks grouped, Grouped with episode week and Episode Year (appearing as column headers).
Episode week group Region(s) Episode weeks grouped Grouped with episode week Episode Year
0 All regions No grouping Note ...: not applicable Note ...: not applicable
1 All regions 0 to 8 8 20
2 Atlantic 23, 24 and 25 23 20
3 Atlantic 26, 27, 28, 29, 30 and 31 28 20
4 Atlantic 32, 33, 34 and 35 32 20
5 Atlantic 36, 37, 38 and 39 39 20
6 Atlantic 10, 11 and 12 11 21
7 Atlantic 26, 27, 28, 29 and 30 26 21
Date modified: