Performance of the Cancer Risk Management Model Lung Cancer Screening Module

by William M. Flanagan, William K. Evans, Natalie R. Fitzgerald, John R. Goffin, Anthony B. Miller and Michael C. Wolfson

The National Lung Screening Trial (NLST) was conducted in the United States from August 2002 through April 2004 to determine if low-dose computed tomography (LDCT) screening could reduce lung cancer mortality. The NLST, which involved more than 53,000 participants, showed that three annual scans of a high-risk population resulted in a 20% reduction of lung cancer mortality after about six years of follow-up.Note 1,Note 2 Smaller European trials with somewhat different screening protocols and at-risk populations found either some evidence of lower mortality or no benefit.Note 3-5 The U.S. Preventive Services Task Force  rated the quality of these European trials as fair or less, but rated the NLST as a large, good-quality trial,Note 6 and gave a grade B recommendation for annual LDCT screening of people aged 55 to 80 with a 30-pack-year smoking history who currently smoke or who quit within the past 15 years.Note 7 However, the optimal “at-risk” population for screening is not known, nor is the optimal frequency or duration of screening. As well, the cost-effectiveness of LDCT and smoking cessation has been projected for a U.S. population,Note 8 but not for Canada.

The Cancer Risk Management Model (CRMM), developed by the Canadian Partnership Against Cancer, incorporates the risk of developing cancer, disease screening and clinical management with cost and labour data to assess health outcomes and economic impact. A screening module added to the lung cancer module (CRMM-LC) enables a variety of scenarios to be evaluated for different target populations with varying rates of participation, compliance, and frequency of LDCT screening. This study describes that module and assesses how well it reproduces the main outcomes of the NLST.

Methods

The CRMM simulates large, representative samples of the Canadian population, one individual at a time, from birth to death. The simulated individuals are subject to equations and probabilities derived from empirical data that shape their demographic profile, labour force characteristics, risk factor exposures, risk of developing cancer, health status, and risk of death. Life histories unfold in a continuous time, discrete-event, Monte Carlo micro-simulation with explicit competing risks. It is similar to a comprehensive longitudinal health, demographic and economic survey of the population that includes future years.

Baseline lung cancer model

The risk of developing lung cancer was calculated using a risk equation from the literatureNote 9 that multiplies baseline incidence rates (age-/sex-/province-specific) by the relative risks associated with an individual’s cumulative packs smoked and cumulative radon exposure. The risk of developing lung cancer is re-evaluated each year as the person ages, and is implemented as a waiting time to clinical detection.

Smoking behaviour was simulated to match Canadian survey data over time, by age, sex and province, based on the 1979 Canada Health Survey, the 1994/1995 National Population Health Survey,  and the 2008 Canadian Community Health Survey.Note 10-12 Smoking trajectories were externally validated against other survey years and tobacco manufacturers’ data.Note 13 Trajectories before 1979 were extrapolated and compared with smoking data previously compiled for Canada.Note 14 Recent smoking trends were extrapolated after 2008.

Lung cancer stage at the time of diagnosis was assigned according to age and sex distributions in Canadian Cancer Registry data. Stage-specific survival curves derived from chart review and the literature were applied at initial diagnosis to generate times to relapse and death. Baseline incidence rates were calibrated to the number of new cases in the Canadian Cancer Registry for 2005 and assessed for alignment across years 1999 to 2009. Lung cancer mortality was calibrated to the Canadian Mortality Database for 2005 and compared across time. The full model and its parameters are published on the Canadian Partnership Against Cancer’s website.Note 15 Details about the lung module of CRMM and its data sources are described elsewhere.Note 16,Note 17 CRMM version 2.1 was used for this analysis.

Simulating screening

The CRMM-LC simulates a detectable preclinical cancer phase—the period before clinical identification, during which screening would be able to detect cancer. Detectable preclinical cancer phases are randomly drawn at birth from an assumed exponential distribution (with mean determined through model-fitting) to allow for heterogeneity of tumour growth. As the person ages, the assigned detectable preclinical cancer phase is compared with the current time to clinical detection to determine if the cancer is detectable by screening.

When cancer is first simulated as detectable, the sensitivity of LDCT screening is applied to generate true positives and false negatives. True positives have a lead time equal to the time between screen-detection and when the cancer would have been detected clinically in the simulation. False positives are generated from the LDCT specificity parameter.

LDCT is assumed to be able to detect non-small-cell lung cancers earlier, thereby leading to more effective treatment, and hence, a mortality reduction. LDCT may also detect small-cell lung cancers, but screening is not expected to change the prognosis for this aggressive cancer.

Screening cohort

To participate in the NLST, individuals had to be at high risk of lung cancer; that is, aged 55 to 74 with no history of lung cancer, but a 30-pack-year smoking history for current smokers (48%) and for former smokers who quit within the last 15 years (52%).Note 2 Most NLST participants were aged 55 to 59 (43%) or 60 to 64 (31%); 18% were aged 65 to 69; and 9% were aged 70 to 74. The majority of participants—59%—were men. Stage distributions were reported by screening round and by screening result (positive screen, negative screen, no screen). False positive results were reported.

According to the CRMM-LC, about 1.4 million Canadians met NLST eligibility criteria for screening in 2014. The simulated population was older and contained higher percentages of men and current smokers than the NLST, and was, therefore, at higher risk. For the purposes of model-fitting to the NLST results on a similar at-risk population, a screening cohort was randomly selected from the pool of eligible Canadians to match the age, sex, and smoking strata distributions of NLST participants. The screening cohort consisted of about 555,000 simulated persons, although the results reported from the CRMM-LC were scaled to the NLST trial size for comparability.

The screening cohort was simulated with and without LDCT screening for model fit and assessment. The screening scenario replicated the NLST protocol and consisted of three annual LDCT screens, with a 95% chance of a repeat screen at each round. The simulated control scenario was no screening, rather than a chest x-ray, because chest x-rays are not associated with a mortality reduction.Note 18,Note 19

Parameter estimation for model fit

The first objective of the model assessment was to find parameter estimates for sensitivity and mean detectable preclinical cancer phases, that, when combined, generated an increased number of cancers over the first three screens and reduced numbers of new cancers in rounds 4 to 6 to match the NLST. At the same time, the percentage of cancers detected following both positive and negative screens had to match. The second objective was to ensure reasonable fit to the mortality reduction reported in the NLST by screening round. Mortality reduction is realized through a combination of stage shift and within-stage survival improvement of screen-detected cases.

A stepwise and iterative directed search was used to obtain parameter estimates that met the CRMM-LC model-fitting objectives. First, stage distributions were evaluated. Second, sensitivity and mean detectable preclinical cancer phase were estimated together to ensure that the number of new cases matched the NLST by screening round. Third, baseline incidence rates were adjusted. Fourth, stage-specific survival curves were matched to the NLST by estimating lead time scalars by stage and applying a hazard ratio to baseline survival curves by stage. One parameter solution is offered. Parameter uncertainty was not assessed. Model structure was assessed by testing for internal consistency and plausibility of results from a range of screening scenarios that extend beyond the scope of the NLST design, and by comparing the results to other studies. Monte Carlo error was minimized by running large simulations of 32 million people.

Stage-shift probabilities were derived by comparing the non-small-cell lung cancer stage distribution in the screen arm of the NLST with the general U.S. population aged 55 to 74 from the Surveillance, Epidemiology, and End Results Program.Note 20 Stage distributions for small-cell lung cancer were assumed to remain unchanged. The stage distribution of non-small-cell lung cancers by screen result (positive, negative, no screen) was inferred from NLST results by assuming that small-cell lung cancer tumours were distributed proportionally across categories of detection and removing them from the totals. A second set of stage-shift probabilities was estimated based on variations in stage distribution by screening round. In the simulation, the stage of clinically detected non-small-cell lung cancers was shifted, first according to screen result, and then, based on the screening round. The full set of stage-shift probabilities is published as input parameters with the CRMM-LC.Note 15

The specificity of LDCT was calculated directly from the NLST results by screening round. Sensitivity was calculated from the NLST data based on the one-year interval between screens. However, for the purposes of modeling, sensitivity had to be estimated in conjunction with the mean detectable preclinical cancer phase, as it reflects the ability of LDCT to detect cancers during that period. Table 1 shows the sensitivity and mean detectable preclinical cancer phase obtained through model calibration. Calibration targets included the number of cases reported by screening round in the NLST and one-year sensitivity.

Screening detects cancers that would have been diagnosed clinically in subsequent years. To simulate the impact on incidence in later years, baseline incidence rates were adjusted by a set of multipliers that vary by time from screening initiation and cessation. The multipliers account for the cumulative impact of repeated screens, screen frequency, sensitivity of LDCT, mean detectable preclinical cancer phase, and a model-fitting parameter (α=0.8) applied until three years after screening ends to fit the mortality and incidence outcomes of the NLST.

Based on NLST data obtained by special request, lung-cancer-specific survival curves were constructed by stage. The difference in the stage-specific survival curves between the screen and control arms can be attributed to a combination of lead time and improved within-stage survival from early detection. Over-diagnosed cases contribute to the within-stage survival benefit of stage I patients, although these cases do not result in a mortality reduction. Parameters for lead time and within-stage survival improvement were estimated simultaneously (Table 2) so that the difference between survival curves in the screen and control arms matches NLST differences. The within-stage survival benefit was estimated to be 20% for screen-detected stage I non-small-cell lung cancer patients, although this is confounded by over-diagnosis, and of no benefit for higher stages. Lead times were estimated to be longer for cancers detected at earlier stages.

To assess the robustness of the CRMM-LC and to compare it with results from the Cancer Intervention and Surveillance Modeling Network (CISNET),Note 21 sensitivity analyses were performed on age of screening initiation and cessation, smoking history, and screening interval for a cohort of 45-year-olds with perfect screening adherence followed to age 90.

Results

The one-year sensitivity of LDCT scanning, calculated as the ratio of screen-detected cancers to all cancers detected within 12 months of screening, was 0.938 in the NLST and 0.939 in the CRMM-LC.

Over six years, the difference in the number of lung cancer cases in the screen arms of the CRMM-LC and the NLST was, at most, 2.3% (Figure 1); the difference in cumulative incidence at six years was less than 1% (Figure 2). The control arm of the CRMM-LC generated 32% fewer cases than the control arm of the NLST in the first year, but by year six, virtually no difference existed in cumulative incidence. By comparison, the CRMM-LC generated 120% more lung cancer cases on the first LDCT screen than a comparable unscreened cohort, and closely matched levels reported in the NLST. By year six, the excess cumulative number of lung cancer cases in the screen arm was 19% in the CRMM-LC, compared with 15% in the NLST. Stabilization of the difference in the cumulative number of NLST lung cancer cases in subsequent years demonstrates the return to status quo levels after screening ceases. Follow-up in the last two years of the NLST was insufficient for meaningful comparisons with the CRMM-LC results.

The percentage of cases over-diagnosed, calculated as the net increase in the number of lung cancer cases in the screen arm as a percentage of all screen-detected cancers after an average of 6.5 years of follow-up, was 24.8% in the CRMM-LC, compared with 18.5% in the NLST. The CRMM-LC estimate of over-diagnosis after 20 years of follow-up was 21.9%.

The average lead time for screen-detected non-small-cell lung cancers estimated from the CRMM-LC was 1.8 years, and varied by stage (2.4, 0.8, 0.6 and 0.5 years for stages I, II, III and IV, respectively). Survival curves for screen-detected stage I non-small-cell lung cancers were adjusted with an average lead time of 2.4 years (during which death from lung cancer was not possible) and a 20% (relative risk 0.8) reduction from the baseline survival curve to generate survival curves comparable to the NLST.

Differences between the screen and control arms of the CRMM-LC and the NLST are summarized in Table 3. Stage I results are comparable between the CRMM-LC and the NLST, which is the most important result, as the majority of cases are stage I. Stage II was more difficult to align in the first three years of screening, but longer-term survival was similar. This would have little effect on the CRMM-LC simulation results, because stage II accounts for fewer than 8% of cases. Stage III and IV survival curves are similar in shape to the NLST. Five-year survival of lung cancer patients in the simulated unscreened CRMM-LC cohort was lower than in the control arm of the NLST, by as much as 10% for stage I, 8% for stage II, and around 3% for stages III and IV, which may reflect over-diagnosis from chest x-ray screening.

Simulation of three annual scans yielded an 18% mortality reduction after six years, which matched the reduction calculated directly from NLST data (Table 4), and was slightly below the published NLST estimate of 20%. A maximum mortality reduction of 23% was reached in year four of the CRMM-LC, compared with a maximum of 19% calculated from NLST data.

Sensitivity analyses on a fully compliant 45-year-old cohort demonstrated model robustness to changes in screening age, smoking history and screening interval—more screening yielded more benefit. Compared with the CISNET, the CRMM-LC generated somewhat lower estimates of people eligible for screening, stage shift, and life-years gained; substantially higher estimates of over-diagnosis; and substantially lower mortality reduction (Table 5).

Discussion

CRMM-LC screening parameters were estimated to fit the staging, incidence and mortality patterns in the NLST. Model parameters were estimated for the sensitivity of the LDCT scan, duration of the detectable preclinical cancer phase, stage shift, downstream impact of screening on baseline incidence, lead time, and within-stage survival benefits. Model fit was achieved with plausible increases in incidence by study year and a maximum mortality reduction of 23%, which exceeds the published estimate of 20% from the NLST.

The simulated CRMM-LC cohort was matched to the NLST by age, sex and smoking status. Other characteristics that could improve the match, such as family history of lung cancer and the presence of respiratory diseases, were reported for the NLST but were not incorporated in the simulation because of the complexity of modeling dynamic equations for each one and of developing a new lung-cancer risk incidence equation that includes them as predictors. Tammemagi et al. presented a range of characteristics that would improve identification of high risk and developed a risk calculator, which might be considered for future enhancements of the current model.Note 22 Nevertheless, a comparison of incidence in the chest x-ray control arm of the NLST with the unscreened CRMM-LC cohort shows differences across study years similar to those reported in the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial, a large population-based randomized trial sponsored by the National Cancer Institute to determine the effects of screening on cancer mortality among 55- to 74-year-olds.Note 19

Estimates of sensitivity and mean detectable preclinical cancer phase were varied by screening round to fit the NLST results. This may reflect artifacts of the trial that may not materialize in population-based screening. The increased number of cancers detected on the first NLST screen may be due to factors such as prevalent cases and over-diagnosis. A higher prevalence of advanced disease would lead to higher estimates of sensitivity on the first screen; over-diagnosis would contribute to a longer-than-average detectable preclinical cancer phase on the first screen. The increase in the number of cancers detected between NLST screening rounds 2 and 3 could reflect more thorough investigation of suspicious lesions at round 3, the last opportunity to screen patients within the trial. In simulations of a population-based program that promotes continued screening, it may be more plausible to maintain sensitivity constant rather than increase it on the third screening round.

The NLST estimate of over-diagnosis (18.5%)Note 23 may be too high, given the loss of follow-up at trial termination. Longer follow-up would determine if some of the excess at 6.5 years would have been detected clinically in the control arm. The CRMM-LC projection of the screening cohort 20 years forward found that the prevalence of over-diagnosis could be 3% lower than the value calculated at six years. The CISNET models projected 8.7% over-diagnosis in a fully compliant 45-year-old cohort screened from ages 55 to 77 who were followed to age 90, compared with 22.9% in the CRMM-LC.

The CRMM-LC’s incidence-based approach is robust to alternative screening frequencies and length of screening, and sensitive to assumptions about the impact of screening on downstream baseline incidence rates. Other models, including CISNET models, have been based on natural histories of disease progression. A comparison of CRMM-LC and CISNET results suggests that the models behave in similar and expected ways for various screening programs; that is, more screening leads to more benefit. However, discrepancies also emerged.

Differences in the percentage of the population eligible for screening likely reflect differences in the prevalence and intensity of smoking between the two models. Lung cancer mortality reduction is substantially less in the CRMM-LC than in the CISNET because of the lower percentage of eligible cases and later stage, but this may also reflect different assumptions about long-term survival, cure and mortality. Life years gained, which is perhaps a more important indicator of screening benefit than is mortality reduction, is more closely aligned in the two models. In the unscreened cohort, the CRMM-LC generated more cancer cases per 100,000 than the CISNET, consistent with findings for the populations of CanadaNote 24 and the U.S.Note 25 Further probing of underlying model assumptions and country differences are needed to fully account for the differences between the models.

The cost-effectiveness of biennial versus annual screening is an important policy question. “Real world” sensitivity, stage shift and within-stage survival benefit may differ from those estimated for annual screening. Because the NLST provides no direct evidence to estimate parameters for biennial screening, sensitivity analyses are necessary to test a range of possible solutions.

Limitations

Fitting a model to the details of the NLST does not guarantee its generalizability. One issue relates to extrapolating the screening parameters beyond the three scans of the NLST. As well, the NLST population differs from the CRMM-LC population. And the way that health professionals work in the context of a clinical trial is not the same as the way a broad, population-based program would operate.

Even the definition of a suspicious lesion and how it is investigated may differ in a population-based screening program compared with the NLST, where the screens and diagnostic work-up were performed in selected centers with a high level of sophistication. Extension of screening to a larger population and wider range of health care providers would likely result in different estimates of sensitivity and specificity for LDCT scans and require additional assumptions about the staging tests used and stage distribution.

Conclusion

Based on the National Lung Screening Trial in the U.S., a new module of the Canadian Cancer Risk Management Model—the CRMM-LC—was developed to simulate low-dose computed tomography screening for lung cancer.  By constructing a population designed to replicate the NLST population, a plausible parameter solution was found that allowed simulated screening to fit the NLST results.

The CRMM-LC can generate outcomes of various screening strategies in order to estimate the cost-effectiveness, budgetary impact and resource demands of LDCT screening in Canada. Policy-makers can use this information in deciding whether and how to introduce population-based lung cancer screening programs. The potential costs and benefits of lung screening can be evaluated for various smoking histories, participation rates and compliance, screening frequencies and durations, and adjunct smoking cessation programs. Deterministic sensitivity analyses are recommended to provide a range of projections to reflect model uncertainty.

Acknowledgments

The authors thank Andy Coldman and Sonya Cressman for comments on the manuscript, Sharon Fung for analysis of the NLST data, and Saima Memon for running test scenarios with the model. The authors also acknowledge the con­tribution of a large number of other individuals who have been involved in the making of the Cancer Risk Management Model (see www.cancer­view.ca/crmmacknowledgements).

This analysis is based on the Canadian Partnership Against Cancer’s Cancer Risk Management Model. The Cancer Risk Management Model was made possible by a financial contribu­tion from Health Canada through the Canadian Partnership Against Cancer. The assumptions and calculations under­lying the simulation results were not prepared by the Canadian Partnership Against Cancer, and the Partnership is not responsible for the use and interpret­ation of these data.

Date modified: