Development of a population-based microsimulation mode of physical activity in Canada

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Claude Nadeau, Suzy L. Wong, William M. Flanagan, Jillian Oderkirk, Doug Manuel, Ronald Wall and Mark S. Tremblay

While the importance of regular physical activity to good health is widely recognized,Note1-3 the percentage of Canadian adults who meet physical activity guidelines is estimated at just 15%.Note4 A greater understanding of the complex dynamics underlying the association between population levels of physical activity and health outcomes is useful in the formulation of  policies and programs to increase physical activity.

With the capacity to integrate data on current conditions, socio-economic and demographic trends, multiple diseases and risk factors, and potential changes in policies and programs, computer simulation modeling has been used for decades to inform population health and health care policy.Note5-7 Projections from computer simulations can provide baseline trends, such as the future prevalence of a disease, assuming no interventions beyond existing policies and practices, and also explore “what-if” scenarios that change baseline assumptions in a population.

Computer simulation has been applied to a wide variety of health outcomes and health policies.Note5-7 For instance, modeling has been used to support epidemic preparedness efforts by projecting infectious disease incidence and prevalence based on patterns of contact, mode of transmission, incubation period, and vectors.Note8 Models have explored the impact of cancer screening on morbidity and mortality.Note9,10 Health care policy models have assessed how providing insurance coverage to the uninsured in the United States would affect service use and health care costs.Note6 Models have also been constructed to estimate prevalence, incidence and outcomes of chronic conditions, such as obesity,Note11 diabetes,Note12,13 and cardiovascular disease.Note14

Although numerous computer simulation models exist for numerous applications, they are designed to address specific questions about specific populations. For example, a model designed to predict the prevalence of obesity in the United States cannot be used to predict the prevalence of obesity in Canada.

To inform and support population health policies and programs, a computer simulation model of physical activity was developed for the Canadian adult population using longitudinal data from the National Population Health Survey and cross-sectional data from the Canadian Community Health Survey. The model is based on the Population Health Model (POHEM) platform developed by Statistics Canada.

This article presents an overview of POHEM and describes the additions that were made to create the physical activity module (POHEM-PA). These additions include changes in physical activity over time, and the relationship between physical activity levels and health-adjusted life expectancy, life expectancy and the onset of selected chronic conditions.  Estimates from simulation projections are compared with nationally representative survey data to provide an indication of the validity of POHEM-PA.

Population Health Model (POHEM)

POHEM is an empirically grounded, longitudinal microsimulation model of diseases and risk factors, representing the lifecycle dynamics of the Canadian population.Note15 Previous applications include models to evaluate the cost and cost-effectiveness of diagnostic, therapeutic and preventive approaches for lungNote16,17 and breast cancerNote18,19; to examine the impact of screening for colorectal cancer on mortality, cost-effectiveness and resource requirementsNote10; and to quantify the health and economic burden of osteoarthritis associated with changes in risk factors and treatments.Note20 POHEM has also been used by other countries; for example, to investigate the lifetime direct costs associated with metastatic breast cancer in the United States,Note21 and the benefits of interventions to reduce smoking during pregnancy in New Zealand.Note22

POHEM is a continuous-time, Monte Carlo microsimulation tool in which the basic unit of analysis is the individual. The dynamic simulation recreates the Canadian population at a given point in time and ages it, one person at a time, until death. The life trajectory of individual simulated persons unfolds by exposure to a multitude of life events, such as smoking initiation and cessation, changes in physical activity and body mass, and incidence and progression of diseases. POHEM integrates data distributions and equations derived from sources that include nationally representative cross-sectional and longitudinal surveys, and vital statistics and cancer registries.

Population Health Model Physical Activity Module

The physical activity module (POHEM-PA) starts with an initial population that represents the 2001 Canadian household population aged 18 or older, and includes socio-demographic and health risk exposures, such as physical activity and current disease state. A key step of model development was generation of equations to propel this initial population forward—simulating how individuals will evolve over time. After the model was developed, outcomes and death were calibrated to data that were not used to build the model. To assess model plausibility, projections were compared with observed data for 2001 to 2009.

The additional equations required for the physical activity module were derived from the Canadian Community Health Survey (CCHS) and the National Population Health Survey (NPHS). Many components of the CCHS and NPHS were identical, which provided consistency in data measurement and definitions. This facilitated loading the relevant information from a cross-sectional snapshot of the population (CCHS cycle 1.1) and using equations built on the longitudinal NPHS to simulate how individuals will evolve over time.

Canadian Community Health Survey

Every two years, the CCHS interviews a representative cross-sectional sample of the non-institutionalized household population aged 12 or older in all provinces and territories. The survey excludes full-time members of the Canadian Forces and residents of Indian reserves, Canadian Forces bases and some remote areas. Details of the design, sample and interview procedures can be found elsewhere.Note23,24

The starting population for POHEM-PA is based on data from CCHS cycle 1.1 (conducted in 2000/2001;  n > 100,000; restricted to those aged 18 or older for the model). CCHS survey sample weights are applied to the data to recreate the Canadian population in the simulation. Because of  missing data that could not be reasonably imputed, a number of records were excluded; the reduced dataset was re-weighted to represent the population for 2001.

The advantage of loading the starting population from the CCHS is that it provides a coherent set of variables at the level of the individual at a common point in time: sex, age group, province of residence, education, income quartile, body mass index (BMI), smoking, alcohol consumption, various measures of physical activity, chronic conditions, and the Health Utilities Index Mark 3 (HUI3, a measure of health-related quality of lifeNote25). Nationally representative data from subsequent CCHS cycles—2.1 (2002/2003), 3.1 (2004/2005), 4.1 (2006/2007), and 5.1 (2008/2009)—offer an opportunity to validate the POHEM-PA projections with information not used in the development of the model.

National Population Health Survey

The longitudinal component of the NPHS follows a group of respondents who were randomly selected in 1994/1995 to be representative of the Canadian household population living in the 10 provinces at that time. It excludes full-time members of the Canadian Forces and residents of Indian Reserves, Canadian Forces bases, health care institutions, and some remote areas. The initial panel of 17,276 respondents was re-interviewed every other year. Details of the design, sample and interview procedures of the NPHS can be found elsewhere.Note26-28

Data from the first seven NPHS cycles (1994/1995 through 2006/2007) were used to derive the equations describing how the dynamics of physical activity and its relationship to the onset of diabetes, hypertension, heart disease and cancer, and to mortality evolve over time at the individual level. In the development of POHEM-PA, NPHS bootstrap weights were used to take the uncertainty about the model coefficients into account. Forty bootstrap replicates were used, each leading to its own set of equations that were equally plausible, given the data available. A simulation was performed for each bootstrap replicate (that is, for each set of equations), and the average of these was calculated to produce the POHEM-PA projection estimates.

Calibration data

Cancer incidence rates were adjusted to agree with data from the Canadian Cancer Registry database.Note29  Mortality rates were adjusted to agree with Statistics Canada’s population projections for 2000 to 2026.Note30

Outcome variables

A brief description of the modeling of the outcome variables is provided below.  Details of the derivation of equations and resulting model coefficients are available on request.

Physical activity

POHEM-PA includes four measures of physical activity, based on self-reports from the CCHSNote23 and NPHSNote26 :

  • Leisure-time physical activity: none;  > 0 to < 30 minutes a day;  ≥ 30 to < 60 minutes a day;  ≥ 60 minutes a day
  • Walking for transportation: none; some but no more than 5 hours a week; 6 to 10 hours a week; more than 10 hours a week
  • Biking for transportation: none; some
  • Usual daily activities or work habits: sit; stand/walk; carry light loads; carry heavy loads

For each measure, generalized logit regression was used to estimate equations governing the dynamics of change in physical activity based on changes observed at the individual level over the 1994-to-2006 period in the NPHS. These equations were applied starting in 2001 in the simulation to project future physical activity based on a number of covariates (Table 1), which, themselves, co-evolve over time in the simulation.

The model for a given measure of physical activity (for example, leisure-time physical activity) may involve other types of activity (for example, walk for transportation). This ensures that the temporal correlation in physical activity is captured, as well as the correlation between different types of physical activity.

Variable selection was guided by forward selection and the potential covariates that came from simple models (containing only covariate candidate and age). Further investigation could add or delete variables from the initial automated solution. For instance, accounting for the complex survey sampling might reveal that significance tests used by the forward selection (which ignored survey design) were liberal and included variables that were less significant than expected. Covariates that seemed to be significant according to the “simple” models, but that had not been chosen in the forward selection, were examined to ensure they were not improperly omitted. As a result of this process, the covariates included in the models are not the same for each of the four measures of physical activity (Table 1).

Because the equations are based on NPHS data, which are collected every two years, activity levels for a given individual in the simulation are updated every other year, at which point the individual may become more active, less active, or remain at the same level. Although all four physical activity measures were included in the model, only the results for leisure-time physical activity are presented here. Results for the remaining measures are available on request.

Chronic conditions

CCHS and NPHS respondents were asked if they had specific chronic conditions that had been diagnosed by a health professional. The conditions included in POHEM-PA were heart disease, diabetes, hypertension and cancer. NPHS data were used to model the onset of these conditions through a hazard function with a similar set of covariates as for physical activity (Table 1). This allowed the onset of a condition to occur any time during the simulated life, rather than at discrete intervals such as the beginning or end of the year.

Variable selection was similar to the semi-automatic process described for the physical activity models. However, because POHEM-PA was designed to investigate the impact of physical activity on health outcomes, physical activity covariates were given preferential treatment.  That is, even if they were not significant they could still be retained in the model if coefficients were plausible. Some interaction terms involving age and/or sex were also added to the model (not shown in Table 1).

Chronic conditions are not always chronic. An individual with a disease may become disease-free (for example, cancer remission), based on hazard equations. If such a transition occurs, the individual again becomes eligible for the onset of the disease. The covariates involved in the remission equations are a subset of those in the onset equations (Table 1). Because remissions are observed less frequently than onsets in the NPHS data, models for remissions cannot include as many covariates.

Health-related Quality of Life and Health-adjusted Life Expectancy

To derive the Health Utilities Index Mark 3 (HUI3),Note25 CCHS and NPHS respondents were asked a series of questions about overall functional health, based on eight attributes: vision, hearing, speech, cognition, mobility, dexterity, emotion and pain. The maximum value of HUI3 is 1.00, which corresponds to full health, and the minimum value is -0.36 (a negative value indicates such poor health that it is considered worse than being dead, which has a value of 0). The model updates the HUI based on other characteristics of the individual, such as age and chronic conditions (Table 1).

HUI is needed to compute health-adjusted life expectancy (HALE).  Each year lived contributes 1 to life expectancy (LE). That same year’s contribution to HALE is the HUI during that year lived. Thus, for someone in full health (HUI=1), HALE is incremented by 1; otherwise, by an amount less than 1. Because the first 18 years of life are not explicitly simulated, these years are assumed to contribute 17 to HALE. This is based on data from the NPHS and the Canadian Health Measures Survey, which observed a mean HUI of approximately 94% for males and females aged 5 to 17 (inclusive).

Unlike physical activity, chronic conditions and mortality, the HUI component of the model was built using CCHS data (cycle 1.1). HUI is seen as a function of an individual’s current characteristics; therefore, cross-sectional data were adequate for deriving the equations.

After a linear transformation to constrain HUI to be within 0 to 1, HUI was assumed to follow the Beta distribution, with the mean and variance being a function of the covariates listed in Table 1. Variable selection was conducted using the same process as variable selection for chronic conditions.

Mortality

Time of death was modeled using a hazard function based on mortality data from the NPHS. As with chronic conditions, the use of a hazard function allows death to occur any time during the simulated life, rather than at discrete intervals. Covariates in the hazard function are listed in Table 1. Variable selection was conducted using the same process as variable selection for chronic conditions.

Covariates

The simulation software automatically increases age and calendar time. The remaining varying covariates in Table 1 each have equations to propel them forward through time. Two-year transitions were modeled using NPHS data and involved a number of covariates. Education and income were modeled as categorical variables using generalized logit autoregressive modeling. BMI was modeled with an autoregressive model, with the categories shown in Table 1 derived from the continuous BMI variable. Smoking was modeled using fourth-order Markov chains, with transitions that are conditional on age group, sex, and past history of smoking. Details of the derivation of equations and resulting model coefficients are available on request.

Comparison of POHEM-PA projections with observed data

POHEM-PA projections of trends in physical activity level, chronic disease prevalence, and HUI for 2001 to 2009 were similar to those derived from the CCHS.  Figures 1a to 3b are based on data collected by the CCHS and on estimates from POHEM-PA. They show the percentages of adults engaging in various levels of leisure-time physical activity (Figures 1a and 1b), the prevalence of hypertension, heart disease, diabetes and cancer (Figures 2a and 2b), and average HUI3 scores (Figures 3a and 3b). These figures display results for men and women aged 18 or older; results exhibited similar agreement when examined by age group (18 to 29, 30 to 39, 40 to 49, 50 to 59, 60 to 69, 70 to 79, 80 or older) (data not shown). The degree of similarity between the projections and CCHS results is an indication of the plausibility of POHEM-PA.

Limitations

POHEM-PA has a number of limitations. The chronic disease questions in the CCHS and NPHS were not designed to inform the development of this model. For example, respondents were not asked to distinguish between type 1 and type 2 diabetes, although the incidence and prevalence of type 1 diabetes would not be expected to be related to physical activity levels. However, a relatively small percentage of people with diabetes have type 1 (5% to 10%Note31).  Similarly, respondents were not asked to specify types of cancer, some of which have been shown to be related to physical activity, while others have not. Nor were respondents told which medical conditions should be considered “heart disease.”

As well, the data on chronic conditions are self-reported, and so are subject to interpretation, recall and response bias. Even so, self-reported data are not necessarily unreliable. For example, in contrast to the results of a review that concluded hypertension awareness was low and highly variable,Note32 a recent nationally representative study found self-report to be relatively accurate for hypertension.Note33 Although it is typically asymptomatic, an estimated 83% of Canadian adults accurately self-reported having hypertension.  Further, despite the limitations of self-reported data, obtaining directly measured data is sometimes not possible.  Nationally representative surveys have collected directly measured data during exams at mobile examination centres, but for those with hypertension that is controlled by medication, hypertension would not be detected at such an exam.  Similarly, it is not possible to determine the presence or absence of cancer under such conditions.

Physical activity was self-reported, and therefore, subject to bias and to social desirability issues.Note34 Accelerometer-based measures yield much lower levels of physical activity,Note34 but because such data have only recently been collected (for 2007 to 2009 and 2009 to 2011 in Canada,Note35 and for 2003 to 2004 and 2005 to 2006 in the United StatesNote36), it is not yet possible to establish trends. Moreover, the purpose of physical activity (leisure, occupation or transportation) cannot be identified in accelerometer-based data, so it would not be possible to determine if levels are increasing for one purpose, but decreasing for others.

Additional limitations of Canada’s physical activity data include changes to the definition of being physically active, the phrasing of questions, and the use of telephone versus personal interviews.Note37 Nonetheless, CCHS and NPHS results are generally consistent with findings in other countries, which report stable or rising levels of leisure-time physical activity among adults.Note38-40 However, international trends suggest that non-leisure-time physical activity (occupation or transportation) is decreasing, but NPHS data indicate slight increases. This may be because the NPHS uses superficial measures of occupation and transportation-related physical activity and because of potential double-reporting of activity.Note37

Another limitation is that POHEM-PA does not reflect the dynamic nature of some characteristics, such as alcohol consumption.

The model is restricted to projections of the outcomes for which it was designed:  physical activity, selected chronic conditions, HUI, HALE and life expectancy. Modifications would be required to include other physical activity-related health conditions, such as stroke, depression and injuries, and covariates such as sedentary behaviour. However, sufficient and appropriate data are necessary for such modifications. The development of a model is ideally an iterative process, through which updates occur as new data become available and understanding of the relationship between simulated attributes (for example, risk factors, disease status) improves.  Although the current version of POHEM-PA does not model every physical activity-related condition, modelling the association with HUI and mortality captures physical activity’s contribution to loss of health and to deaths that resulted from diseases not explicitly modelled.

Conclusion

This is the first dynamic microsimulation model of physical activity based on the Canadian population. Rather than merely extrapolating current trends, POHEM-PA accounts for changes over time in individuals’ physical activity levels, age, BMI, and chronic diseases.  In addition, POHEM-PA can explore “what-if” scenarios that change baseline assumptions, and thereby, provide insight into the possible consequences of such changes. For example, “If everyone engaged in 30 to 60 minutes of physical activity a day, what would be the impact on life expectancy, HALE, HUI and the prevalence of heart disease, diabetes, hypertension, and cancer?”

Though usually designed to make projections, microsimulation modeling often improves understanding of how factors are related to each other and how changes in one factor may affect others. Examining how a model is constructed can promote discussion about the parameters that were included and the assumptions that were made, and highlight gaps in knowledge and data.

As currently captured by surveys, self-reported physical activity has well-documented biases. Nevertheless, a consistent and plausible set of equations has been developed using self-reported data. Despite its limitations, POHEM-PA is a powerful tool for exploring the complex dynamics of physical activity and health outcomes as a population ages.

Acknowledgments

The authors acknowledge and thank Geoff Rowe and Didier Garriguet from Statistics Canada and Jan Trumble Waddell, Victoria Edge and Daniel Gillis from the Public Health Agency of Canada for their contributions to the development of the POHEM-PA module. Doug Manuel holds a Canadian Institute for Health Research/Public Health Agency of Canada Chair in Applied Public Health.

Date modified: