# Health Reports

Using a Betabinomial distribution to estimate the prevalence of adherence to physical activity guidelines among children and youth

## Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

**by Didier Garriguet**

**Release date:**April 20, 2016

**Correction date:**(if required)

Population health surveys routinely use activity monitors to measure physical activity.^{Note 1}^{Note 2}^{Note 3}^{Note 4} This results in a finite number of days when respondents wear the device and accumulate enough physical activity to meet a predetermined level at which the day will be deemed active. In large samples, a minimum number of wear-days for a minimum number of hours is required in order to have a valid representation of a respondent’s activity level. This yields a different number of valid days for each respondent. Each combination of active and valid days can then be used to calculate a probability of adherence to a certain frequency of active days (for example, physical activity guidelines).

In 2008, Troiano^{Note 1} reported the probability of adherence to physical activity guidelines for the American population using a method developed by Dodd.^{Note 5} The probability of adherence is distributed as a Binomial(*n*, *p*) where *n* represents the number of days and *p* the probability of being active on any given day. The parameter *p* is randomly distributed. Assuming that *p* is unknown but bounded by 0 and 1, *p* is distributed as a Uniform(0,1). By Bayes’ Theorem, the conditional distribution of adherence to the guidelines given a number of active and inactive days is Beta(1 + active days, 1 + inactive days).

When this method was applied to Canadian data,^{Note 2} 7% of children met the guidelines of 60 minutes of moderate-to-vigorous physical activity every day. However, because the Beta distribution is continuous, the probability of adherence was estimated as the probability of being active at least 6 out of 7 days, or 85.7% of the time. Estimating adherence 100% of the time is not possible; for example, only one failure in a year would make the person non-adherent.

For pre-schoolers, the physical activity guidelines stipulate at least 180 minutes of activity at any level of intensity every day. Using the method developed by Dodd, the conditional probability of adherence to the guidelines at least 6 out of 7 days, given 7 active days out of 7 valid days, is 71%, and 60% given 5 active days out of 5 valid days. In other words, if all respondents reported 7 valid days and they were active on all those days, the prevalence of adherence to the guidelines in the population would be 71%. This reflects the finite number of days and the original assumption that, on average, 50% of the days collected are active. For example, 14 active days out of 14 valid days would result in a prevalence of 90%.

More than 90% of the days collected for pre-schoolers are active (that is, 90% of the 4,000 days collected for this population), far from the 50% average assumed in the Uniform(0,1) distribution. As well, more than 80% of pre-schoolers met the guidelines on every valid day they reported.^{Note 4} According to Dodd’s method, the prevalence of adherence to the guidelines at least 6 out of 7 days would be estimated at 61.6% based on 7 days of data in 2009-to-2011, and 44.7% based on 5 days of data in 2012-to-2013. This discrepancy is larger than expected.

These outcomes suggest that a different assumption about the distribution of *p* is warranted. Ideally, the conditional probability using the new distribution would: result in little change in the previously published prevalence of adherence to physical activity guidelines among children aged 6 or older; reconcile the prevalence of adherence for pre-schoolers with estimates using all valid days; and be a discrete distribution allowing estimation of adherence every day.

This study proposes a new distribution for *p* and compares estimates of the prevalence of adherence to the guidelines from this new method with the existing one.

## Distribution of probability of adherence

The conceptual model developed by Dodd remains the same; only the assumption about the distribution of *p* changes. The probability of adherence is distributed as a Binomial(*n*, *p*), where *n* represents the number of days and *p* is a randomly distributed variable—rather than assuming that *p* is distributed as a Uniform(0,1), it is assumed that *p* is randomly distributed as a Beta(*α*, *β*). According to Bayes’ Theorem, given *k* active days out of *n* valid days and the parameters *α* and *β* of the prior Beta distribution, the conditional distribution of *p* is a Betabinomial distribution BetaBin(*n*, *α*, *β*). The BetaBin distribution is a discrete distribution. The density function is:

where *k* is the number of active days; *n* is the number of valid days; and the parameters *α* and *β* are the parameters of the prior Beta distribution.

The parameters *α* and *β* can be estimated by maximum-likelihood using the empirical distribution of the probability of a day being active. This can be done using a PROC NLMIXED in SAS version 9.3.

The conditional probability of adherence to the physical activity guidelines for a specific combination of active and inactive days can be obtained by adding the number of active days to *α* and the number of inactive days to *β*. These conditional probabilities can then be assigned to a respondent according to the respondent’s pattern of active and inactive days.

The estimated population prevalence is the weighted average of these individual probabilities.^{Note 1}

### Physical activity guidelines

The physical activity guidelines for 3- and 4-year–olds recommend at least 180 minutes of activity at any intensity every day.^{Note 6} The guidelines for 5- to 17-year-olds advise at least 60 minutes of moderate-to-vigorous physical activity (MVPA) every day.^{Note 7}^{Note 8}

In this study, “meeting the guidelines every day” is defined as the probability of meeting the guidelines at least 6 out of 7 days when using the Beta distribution approach, and 7 out of 7 days when using the Betabinomial distribution.

## Data source

The data are from three cycles of the Canadian Health Measures Survey (CHMS): 2007 to 2009 (cycle 1), 2009 to 2011 (cycle 2), and 2012 to 2013 (cycle 3). The CHMS collected data from private household residents aged 3 to 17 in cycles 2 and 3; in cycle 1, collection started at age 6. Residents of Indian Reserves, institutions, some remote regions, or areas with low population density, and full-time members of the Canadian Forces were excluded. The sample represents more than 96% of the Canadian population.^{Note 9}^{Note 10}^{Note 11} Ethics approval was obtained from Health Canada’s Research Ethics Board.^{Note 12} Detailed information about the content and sample design of the CHMS is available elsewhere.^{Note 9}^{Note 10}^{Note 11}^{Note 12}

The CHMS involves an in-person interview at the respondent’s residence to gather sociodemographic, health and lifestyle information, and a subsequent a visit to a mobile examination center (MEC) for direct physical measures. Upon completion of the MEC visit, ambulatory respondents were asked to wear an Actical accelerometer (Phillips – Respironics, Oregon, USA) over their right hip on an elasticized belt during their waking hours for 7 consecutive days.

### Individual cycles

Cycle 1 data were collected at 15 sites; cycle 2 and 3 data were collected at 18 and 16 sites, respectively. The combined response rates for the activity monitor, including the response rates for the household (69.6%, 75.9%, 74.1%), household questionnaire (88.3%, 90.5%, 88.4%), MEC (84.8%, 81.7%, 78.8%), and returning the activity monitor (81.2%, 77.6%, 75.7%) with at least 3 valid days for 3- to 5-year-olds and 4 valid days otherwise were 41.8%, 44.1% and 38.3% for cycles 1, 2 and 3, respectively. Non-response models were created using all information available at a specific level. Weights were adjusted accordingly.^{Note 9}^{Note 10}^{Note 11} Table 1 shows the number of respondents per cycle.

### Combined cycles

Data from consecutive CHMS cycles can be combined to increase the sample size. The total population for the combined cycles was derived from the average population total for each collection period. Each cycle was adjusted based on region and the number of sites. Information about combining CHMS cycles is available elsewhere.^{Note 13}

Respondents aged 6 to 17 in cycles 1 through 3 can be combined in three ways: cycles 1 and 2 (2007 to 2011), cycles 2 and 3 (2009 to 2013), and all three cycles (2007 to 2013). Respondents aged 3 and 4 can be combined for cycles 2 and 3 (2009 to 2013), but accelerometry data for children of these ages were collected using different epoch lengths (60 seconds in cycle 2; 15 seconds in cycle 3). Also, because of the memory capacity of the Actical accelerometer, a maximum of 5.6 days, rather than 7, is available for cycle 3. Therefore, in the combined sample, days 6 and 7 for cycle 2 are excluded. A correction factor can be used to adjust the cycle 2 data into 15-second epochs^{Note 14} (Text table 1).

Although respondents aged 5 are subject to the same physical activity guidelines as 6- to 17-year-olds, data for these children were collected using the methodology for 3- to 4-year-olds. Five-year-olds were analyzed separately, adjusting the 60-second epoch data to 15-second epochs, similar to 3- and 4-year-olds.

### Accelerometry data reduction

The Actical accelerometer measures and records time-stamped acceleration in all directions, providing an index of physical activity intensity. The Actical has been validated to measure physical activity in children,^{Note 15}^{Note 16} including pre-schoolers.^{Note 17}^{Note 18}

The monitors were initialized to start collecting data in one-minute epochs (except for children aged 3 to 5 in cycle 3 for whom epoch length was set to 15 seconds) at midnight following the MEC appointment. All data were blind to respondents while they wore the device. Respondents received a prepaid envelope in which to return the monitors to Statistics Canada, where the data were downloaded, and the monitor was checked to ensure that it was still within the manufacturer’s calibration specifications.^{Note 19}

The digitized values were summed over the epoch length, resulting in a count per epoch. A valid day was defined as 5 or more hours of wear-time^{Note 20} for 3- to 5-year-olds and 10 hours for older children. Wear-time was determined by subtracting non wear-time from 24 hours. For 3- to 5- year-olds, non wear-time was defined as at least 240 intervals of 15 seconds of zero counts with allowance for 30 seconds of counts between 0 and 25.^{Note 1}^{Note 21} For 6- to 17-year-olds, non wear-time was defined as at least 60 consecutive minutes of zero counts, with allowance for 2 minutes of counts between 0 and 100.

Cut-points were used to determine physical activity intensity (sedentary, light, moderate-to-vigorous). The cut-point for moderate-to-vigorous physical activity (MVPA) was 1,500 counts per minute (cpm)^{Note 16} for respondents aged 6 to 17, and 1,150 cpm or 288 counts per 15 seconds (cp15s) for respondents aged 3 to 5.^{Note 17} A cut-point of 100 cpm or 25 cp15s according to epoch length was used to distinguish sedentary from light physical activity (LPA) intensity.^{Note 21} Total physical activity was defined as the sum of LPA and MVPA.

Respondents aged 3 to 5 with at least 3 valid days and those aged 6 to 17 with at least 4 valid days were retained for analysis.

### Statistical analysis

Prevalence of adherence to the physical activity guidelines given *a* active days and *n* valid days was estimated using two conditional distributions:

All analyses were performed using SAS v9.3 (SAS Institute, Cary, NC) and were based on weighted data. To account for survey design effects (primary sampling units, sites, and stratum), standard errors, coefficients of variation, 95% confidence intervals and paired t-tests were estimated with the bootstrap techniques using 11, 13 and 11 degrees of freedom for cycles 1 through 3. Degrees of freedom were summed when combining cycles. Statistical significance was set at 0.05.

## Results

For each cycle combination (individual or combined) and age group, Table 1 presents the average probability of a day meeting the physical activity guidelines and the parameters α and β of the Beta distribution estimated by maximum likelihood. For children aged 5 and 6- to 17-year-olds, the parameters of the prior Beta distribution have a U-shape (Figure 1); for children aged 3 to 4, the distribution is negatively skewed, with a mode at a probability equal to 1 (Figure 2).

Conditional probabilities for a given number of valid and active days are shown in Tables 2 to 4 for 3- to 4 year-olds, 5-year-olds, and 6- to 17-year-olds, respectively. In all cases, the conditional Betabinomial distribution yields a higher probability of adherence to the physical activity guidelines. The differences are greater for 3- to 4-year-olds than for 6- to 17-year-olds.

The prevalence of adherence to the physical activity guidelines by CHMS cycle and age group according to the two conditional probabilities is presented in Table 5. Because all individual probabilities shown in Tables 2 to 4 are higher with the Betabinomial distribution, paired t-tests comparing the prevalence with the two conditional probabilities show significant differences. Differences among 3- to 4-year-olds range between 24.5 and 28.3 percentage points; differences among 5-year-olds range between 4.2 and 10.9 percentage points; and among 6- to 17-year-olds, between 1.6 and 2.8 percentage points.

## Discussion

Use of a Betabinomial distribution to estimate the prevalence of adherence to physical activity guidelines addresses most of the shortcomings associated with the use of the Beta distribution. It is a discrete distribution that allows estimation of the prevalence of adherence every day. By more closely estimating the distribution of the probability *p* of an active day, it increases the prevalence of adherence among pre-schoolers to a level in line with the probability of being active on every valid day and their probability of being active on any given day. In addition, the Betabinomial distribution has little impact on the prevalence of adherence among older children.

The original Beta distribution is simple to use, and the conditional probability of adherence, given a certain combination of active and valid days, is always the same and does not depend on data fluctuations by survey year. Within a certain range of empirical distributions for *p*, with *p* averaging between 40% and 60%, both Beta and Betabinomial conditional probabilities yield similar results. However, the cost of this simplicity is too high when *p* falls outside these ranges, as can be the case for the highly skewed distribution of *p* for pre-schoolers.

In both cases, the prior binomial distribution assumes that each day is independent of the others. A different method is needed to account for potential correlation between days. The Betabinomial distribution also confirms that the use of a probability of adherence on at least 6 out of 7 days was the right one for the Beta distribution.

Although subject to annual fluctuation, the *α* and *β* parameters of the prior Beta distribution of *p* when using the Betabinomial conditional distribution were stable by age group and year, a consequence of the fact that the probability of a day being active remains the same year after year for each age group in the CHMS. Still, it is necessary to decide at what level the prior distribution of *p* should be estimated. The approach in this study was to use the entire population divided by age group when the guidelines or collection method differed. Future research could look at the prior distribution of *p* in narrower age ranges, by sex or by other variables, especially if significant differences are observed in other physical activity measures (for instance, average minutes of MVPA).

Survey data and accelerometers have inherent limitations, notably, response rates and the inability to measure certain types of activities. However, conclusions about the use of the Betabinomial distribution remain the same, regardless of how the combination of active and valid days was achieved

This is the first time that the Betabinomial distribution has been used to calculate the prevalence of adherence to physical activity guidelines. Its advantages and lack of limitations may help overcome some limitations of the previous method.

## References

- Date modified: