Using personal health insurance numbers to link the Canadian Cancer Registry and the Discharge Abstract Database

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

by Dianne Zakaria, Richard Trudeau, Claudia Sanmartin, Patricia Murison, Gisèle Carrière, Maureen MacIntyre, Donna Turner, Brandon Wagar, Mary Jane King, Kim Vriends, Ryan Woods, Gina Lockwood and Rabiâ Louchini

Data linkages enhance the usefulness of information in different sources (for example, administrative databases, censuses, surveys) and provide insight not available when data sources are used in isolation. Linking information from cancer registries to administrative health data offers opportunities to study health care use patterns (including treatment) of cancer patients.Note 1-3 The Canadian Cancer Registry (CCR),Note 4 which contains unique patient identifiers that facilitate record linkage, has been probabilistically linked to census and mortality data to examine cancer outcomes for key subgroups.Note 5

This study investigates the feasibility and validity of using personal health insurance numbers (HINs) to deterministically link the CCR and the Discharge Abstract Database (DAD) to obtain hospitalization information about people with primary cancers. Because patient names are not captured by the DAD, the provincially assigned HINs are essential for linkage, and have been used previously for deterministic data linkages in Ontario and Manitoba.Note 6-8 The methods employed to link the CCR and the DAD for nine provinces are described and the quality of this deterministic linkage is evaluated. Details on linkage rates, agreement on demographic identifiers and clinical diagnoses, and out-of-province hospital admissions are presented for prostate, female breast, colorectal and lung cancers, which together account for more than half of primary cancers diagnosed annually.Note 9 Cancers diagnosed from 2005 through 2008 with a valid HIN were included in the linkage.  Because the territiories have small cancer counts, and because Quebec does not submit data to the DAD, cancers reported by these jurisdictions were excluded from the linkage.

Data and methods

Data sources

Since 1992, the CCR has collected  demographic and clinical information about Canadian residents diagnosed with primary tumours.Note 4 The data in the CCR include a person’s name (current surname, birth surname and given name), sex, and date of birth; postal code of residence and HIN at the time of diagnosis are included in the tumour record(s) for each person.

Statistics Canada uses internal record linkage to ensure that all information for an individual is attached to a single person-level identifier. If a provincial/territorial cancer registry submits a new person-record and record linkage shows that person to be registered in the CCR, the cancer registries involved are consulted to confirm that the new record pertains to someone already in the CCR. If this is the case, that record is assigned the same person-level identifier as the other records for that individual.

Because the HIN is unique to individuals living in a specific province/territory and a key identifier in hospital discharge databases,Note 10 it is ideal for deterministic linkages. Further, because both residence and personal HIN at time of diagnosis are captured in the CCR and linked to a person-level identifier, people diagnosed with primary tumours in more than one province/territory will have more than one HIN that can be used to link to the DAD. That is, each person in the CCR will be linkable to hospital episodes related to all of his/her primary cancers and associated HINs in the CCR. However, if a person registered as having a tumour moves to another province/territory and receives a new HIN, subsequent hospital episodes using the new HIN would not be obtainable through linkage unless a new primary cancer was registered in the CCR under the new HIN.

For the present study, analyses were completed on a snapshot of the CCR (April 4, 2012), which included 2,301,833 people and 2,483,305 tumours diagnosed from 1992 through 2008.

The DAD is a national database (excluding QuebecNote 11) that contains information on all separations (discharges, deaths, sign-outs and transfers excluding stillbirths and cadaveric donor cases) from acute care institutions in Canada. Since its inception in 1963, the DAD’s coverage and content have varied substantially. As of fiscal year (FY) 2004/2005, facilities submitted directly or indirectly (via ministries of health) to the DAD using the International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Canada (ICD-10-CA)Note 12 to code diagnoses, and the Canadian Classification of Interventions (CCI)Note 13 to code diagnostic and therapeutic interventions (Text Table 1). In FY 2004/2005, the Canadian Institute for Health Information (CIHI) introduced the “analytical institution type” variable (for example,  acute care, chronic care, rehabilitation) that is assigned to each separation record. The DAD does not have a person-level identifier, but it does capture HIN and the province/territory issuing the HIN.Note 10.Note 16

Canadian Cancer Registry and Discharge Abstract Database linkage

To identify tumours with linkage potential, the CCR HIN variable was assessed for its presence and compliance with basic HIN characteristics (length, alphanumeric format, embedded patterns, check digit). For provinces with a high percentage of HINs in the CCR that satisfied basic checks (“valid” HINs), a deterministic linkage to the DAD was performed based on HIN and province reporting the tumour. The province reporting the tumour was used as a proxy for province issuing the HIN.

For provinces with a low percentage of valid HINs in the CCR, and for which Statistics Canada had access to the provincial health insurance registry (Ontario and Manitoba), a two-step linkage was performed. Name, sex, date of birth and postal code of residence were used to probabilistically link CCR records to provincial health insurance registries to obtain HINs. HIN and province reporting the tumour were then used to deterministically link the CCR to the DAD. Because tumours are linked to persons in the CCR, it was possible to associate each tumour with all of an individual’s linked DAD records.

Only linkages occurring during clinically relevant follow-up periods were considered. Linkage rates for non-invasive tumours were not examined. For colorectal, lung and female breast cancers, the follow-up period extended from 31 days before to 365 days after the CCR diagnosis date; for prostate cancer, the follow-up period extended from 31 days before to 730 days after the diagnosis date. To be retained as a linked record, the DAD date of admission had to occur within the follow-up period.

FYs 2004/2005 to 2010/2011 of the DAD were used for the linkage. During this period, neither Alberta nor Ontario submitted day surgeries to the DAD.Note 16 As well, in FY 2004/2005, two facilities in Nova Scotia did not submit day surgeries to the DAD; the number rose to three in FY 2005/2006, and to four in FY 2010/2011.Note 16,Note 17 Day surgery data from Ontario, Alberta and the four Nova Scotia facilities are submitted to the National Ambulatory Care Reporting System (NACRS).Note 18

Tumours with a valid HIN in the CCR or a HIN obtained through probabilistic linkage to provincial health insurance registries were compared with tumours without a valid HIN by sex, age at diagnosis, microscopic confirmation, and tumour behaviour (non-malignant/malignant). Linkage rates of malignant/invasive tumours with valid HINs to at least one DAD record during follow-up were examined by analytical institution type, calendar year of diagnosis, sex and age group at diagnosis. Cancers linking to at least one DAD record were compared with cancers not linking in terms of diagnostic confirmation (histology, cytology, clinical/imaging/unknown, and autopsy only/death certificate only) and number of days alive during follow-up. Because the data file used for this study had vital status confirmation complete to December 31, 2008, examination of the number of days alive was limited to cancer records that had a follow-up end date on or before December 31, 2008.

The statistical tests for categorical variables were Fisher’s exact test and the chi-square test, and for continuous variables, the two-sample t-test (α = 0.05, two-tailed). To assess the validity of the linkages, agreement on sex, date of birth and diagnosis was examined, and out-of-province hospital admissions were calculated.

Statistics Canada ensures respondent privacy during the linkage process and subsequent use of linked files. Only employees directly involved in the process have access to the unique identifying information required for linkage (such as names and health insurance numbers) and do not access health-related information. When the data linkage is completed, an analytical file is created from which identifying information is removed. This de-identified file is accessed by analysts for validation and analysis.

Results

Health Insurance Number validity check

The HIN check revealed that deterministic linkage using HINs in the CCR would not be feasible for Ontario and Manitoba (Table 1). Among the remaining provinces, the percentage of valid HINs was 99% or more for 2005 to 2008 (except for Newfoundland and Labrador in 2007 and 2008—98.4% and 95.1%, respectively; almost all the “invalid” HINs were actually missing).

For Ontario and Manitoba, 97.9% and 95.8% of tumours probabilistically linked to a HIN in the respective health insurance registries (Table 2). For the 2005-to-2008 period, the percentage of tumours with a valid HIN exceeded 98% in both provinces. Of tumours probabilistically linked, agreement on sex between the CCR and the provincial health insurance registry was 99.9% for Ontario and 100.0% for Manitoba; agreement on complete date of birth was 98.7% for Ontario and 71.9% for Manitoba. Agreement on date of birth in Manitoba rose from 51.1% in 1992 to 93.6% in 2008. Examination of the Manitoba discrepancies revealed that health registry dates of birth disproportionately used the first day of the month.

Several differences emerged between tumours with and without a valid HIN. Tumours with a valid HIN were more likely to belong to males (50.3% versus 46.1%, p < 0.0001), to be malignant (92.9% versus 83.7%, p < 0.0001), and to be microscopically confirmed (85.7% versus 74.6%, p < 0.0001). Among invasive tumours, those with a valid HIN were more likely to be microscopically confirmed (85.4% versus 70.2%, p < 0.0001).

CCR-DAD linkage rate during follow-up

Prostate cancer had the lowest linkage rate to the DAD. For provinces submitting day surgeries to the DAD, the percentage of prostate cancers linking to at least one DAD record ranged from 77.2% to 91.6% (Table 3). When the analysis was limited to DAD records submitted by acute care institutions, the percentage linking was lower, ranging from 58.1% to 65.4%. Linkage rates tended to decline with advancing age at diagnosis until 80, and then increased (Table 4).  The method of diagnostic confirmation did not differ substantially between prostate cancers that linked and those that did not (Table 5). The average number of days alive during follow-up was actually greater for prostate cancers not linking to the DAD (741.2 versus 716.1 days, p < 0.0001).

Female breast cancer linkage rates varied by analytical institution type, calendar year, and age at diagnosis. Among provinces submitting day surgeries to the DAD, the percentage of female breast cancers linking to at least one DAD record ranged narrowly from 95.6% to 98.1%. When the analysis was limited to DAD records submitted by acute care institutions, the percentage linking varied widely from 56.8% to 93.2% (Table 3). For provinces with both acute care and day surgery captured in the DAD, overall breast cancer linkage rates remained relatively stable over time, but the linkage rate to acute care records declined, suggesting a shift of procedures toward day surgery (data not shown). Apart from Ontario, linkage rates tended to be lower in the 80 or older age range (Table 4). Compared with breast cancers not linking to a DAD record, those that linked were more likely to be histologically confirmed (98.3% versus 93.3%, p < 0.0001); the average number of days alive during follow-up was almost the same (383.9 versus 385.1 days, p = 0.0626) (Table 5).

Colorectal cancer linkage rates varied little by analytical institution type, calendar year, sex and age at diagnosis. Compared with colorectal cancers not linking to a DAD record, those that linked were more likely to be histologically confirmed (95.2% versus 74.2%, p < 0.0001) and had more days alive during follow-up (345.2 versus 294.8, p < 0.0001) (Table 5). The difference in average days alive was attributable to the high percentage of autopsy-only/death-certificate-only cases among colorectal cancers that did not link; for all other diagnostic confirmation categories, average days alive were greater for colorectal cancers not linking (data not shown). In the CCR, the date of death and the date of diagnosis are the same for autopsy-only/death-certificate-only cases, which limits the number of days alive during follow-up to 31.

Lung cancer linkage rates varied by analytical institution type and age at diagnosis. Among provinces submitting day surgeries to the DAD, the percentage of lung cancers linking to at least one DAD record ranged from 92.8% to 96.3%. When the analysis was limited to records from acute care institutions, the percentage linking ranged from 81.8% to 91.4% (Table 3). Linkage rates tended to decline in the oldest age group (Table 4). Compared with lung cancers not linking to a DAD record, those that linked were more likely to be histologically confirmed (62.9% versus 45.3%, p < 0.0001), and had fewer days alive during follow-up (239.0 versus 264.4, p < 0.0001), despite having a smaller percentage of autopsy-only/death-certificate-only cases (Table 5). The average number of days alive for lung cancers that did not link significantly exceeded the average for cancers that did link for each category of diagnostic confirmation except autopsy-only/death-certificate-only, where average days alive were equal at 31 (data not shown).

Validity of linked records

More than 99% of DAD records that linked to cancers agreed with the patient sex reported in the CCR, and apart from two estimates for Prince Edward Island, more than 97% agreed with the complete date of birth in the CCR (Table 6). For Prince Edward Island, the 42 prostate cancer discrepancies involved 16 people, and the 47 lung cancer discrepancies, 19 people.  Common discrepancies were one-day and one-year differences, and transposed month and day.

Generally, prostate cancers were most likely, and female breast cancers were least likely, to link to an out-of-province hospital admission. For all cancers, Prince Edward Island had the highest percentage linking to at least one out-of-province admission; Ontario had the lowest (Table 7).

The likelihood of a CCR record linking to at least one DAD record with a consistent diagnosis varied by cancer (Table 7). Prostate cancers were least likely (77.4% to 87.8%) and colorectal cancers were most likely (94.6% to 97.7%) to link to at least one DAD record with a consistent cancer diagnosis.

Discussion

Deterministic linkage to the DAD is feasible for 8 of the 10 provinces in the CCR, because a high percentage of registered tumours have a valid HIN. For Ontario and Manitoba, direct linkages using HINs are not feasible, but a probabilistic linkage to the provincial health insurance registries obtained HINs for more than 98% of tumours in the CCR from 2005 through 2008, which made deterministic linkage to the DAD feasible. Nonetheless, because this study examined the validity of HINs in the CCR, results for Ontario and Manitoba should not be interpreted as the validity of HINs in the respective cancer registries.

The agreement on sex and date of birth among linked CCR-DAD records and patterns in the results provide construct validity for deterministic linkages using the HIN.

First, cancers for which surgery is the preferred treatment (female breast, colorectal and lung)Note 18 had high linkage rates and were the most likely to have a cancer diagnosis consistent with the CCR on at least one linked DAD record. Conversely, despite a follow-up period that was twice as long, the lowest linkage rate was for prostate cancer, optimal treatment of which is debated, taking account of life expectancy at diagnosis, the likelihood of the cancer causing problems, and side effects of treatment.Note 19 Prostate cancers were the least likely to have a consistent cancer diagnosis on at least one linked DAD record.

Second, Ontario, the province with the lowest overall linkage rate for female breast cancer, is also the most likely to perform mastectomies as day surgeries, which are not submitted to the DAD.Note 20

Last, the percentage of linked cancers with at least one out-of-province admission varied by cancer type and by provincial population, suggesting that certain treatments may not be available nearby within some provinces.

Limitations

The main limitations of this research are differences across provinces in records submitted to the DAD and different clinical practice patterns in performing interventions on an inpatient, day surgery, or outpatient basis. As demonstrated with female breast cancer, even when the institution type is limited to acute care, linkage rates vary substantially. Another limitation is the exclusion of QuebecNote 11 and the territories from the linkage.

Conclusions

Personal HINs can be used to link the CCR and the DAD to obtain hospitalization information about  people with primary cancers. Provincial variations in linkage rates of the four most commonly diagnosed cancers reflect differences in records submitted to the DAD and in clinical practice. Among linked records, agreement on basic identifiers was high. As more interventions are performed on a day surgery/outpatient basis and as more provinces/territories submit such records to the NACRS, combining data from multiple sources (for example, DAD, NACRS, physician billing databases) will be important in studying the health care experiences of people with cancer.  Finally, if information about date of death or method of diagnostic confirmation is available, researchers may consider adjusting  the follow-up period for cancers diagnosed at death to one year before the diagnosis date (date of death) to increase the potential for linkage.

Date modified: