# Analytical Studies: Methods and References Experimental Economic Activity Indexes for Canadian Provinces and Territories: Experimental Measures Based on Combinations of Monthly Time Series

by Nada Habli, Ryan Macdonald and Jesse Tweedle
Economic Analysis Division, Statistics Canada No. 027

Release date: August 19, 2020

Skip to text

Text begins

## Acknowledgements

The authors would like to express their thanks to Etienne Saint-Pierre, Yvan Clermont and Danny Leung and Brenda Bugge for their support and guidance during the course of the project. They would also like to thank Steve Matthews, Fred Picard and Michel Ferland for their comments and input on modelling. They also wish to thank Philip Smith and Karen Wilson for their advice, suggestions and support.

## Abstract

This paper explores methods for creating a monthly indicator of economic activity for the provinces and territories. It begins by constructing a dataset for the provinces and territories composed of monthly series about the labour force, including wages and employment; international trade; output measures such as manufacturing sales or electricity production; and, prices (consumer, housing and electricity). Where necessary, the series are seasonally adjusted, linked and deflated to create continuous time series from January 2002 to April 2020. Variable reduction methods are then applied to the monthly provincial and territorial dataset to create the experimental monthly provincial and territorial economic indicator indexes. Three methods are examined: Principal components analysis (PCA), Least absolute shrinkage and selection operator (LASSO) and a simple index comprised of three pre-determined series (total employment, total exports and retail sales). A weighted average of the simple index and the PCA index is also constructed. In general, the indexes produced track provincial economic activity reasonably well, following cyclical movements of provincial and territorial economies. However, the set-up for the models is not ideal as annual data are used to produce model parameters, and this leads to uncertainty about model performance. As a result, multiple indexes are reported. A quality assessment provides an indication of the strengths and limitations of the indexes with respect to different uses.

## 1 Introduction

Timely measures of economic activity are critical for understanding how economies perform, and for informing policy responses to macroeconomic fluctuations. The onset of the pandemic due to the emergence of the SARS-Cov-2 virus emphasized this, as well as the need for geography-specific measures. Presently, Canada has a robust system for producing up-to-date measures of activity, such as real gross domestic product (GDP), at the national level. For provincial and territorial economies, monthly information on labour markets or particular activities such as manufacturing or international trade are available, but a monthly measure of aggregate economic activity is not available.

Under normal circumstances, producing a new set of aggregate economic indicators for the provinces and territories would require creating exploratory measures, possibly launching new surveys or expansions of existing statistical collection activities, as well as creating the infrastructure necessary to produce and disseminate the indicators on an ongoing basis. These changes require time to implement. In the current context, where the SARS-Cov-2 pandemic is accentuating needs for monthly regional economic indicators, the time necessary to constitute a new statistical program to meet current requirements means that this approach is unfeasible.

A more timely approach is to adopt a statistical-model based strategy to quickly create exploratory measures of provincial economic activity. While this approach introduces a new measure of economic activity in a timely fashion, the trade-off is that the methods employed are not used in their ideal situations, that inputs for models use currently available series without an ability to tailor their uses to the creation of an indicator, and that the models are somewhat a-theoretic. That is, the models look for correlations in data rather than employing economic theory to help guide their construction. And, lastly, the models employed here typically have a different set of inputs for each province or territory. As a result, consistency in model structures cannot be maintained across provinces and territories, and this may affect inter-jurisdictional comparisons.Note

To estimate the activity indexes, a twofold strategy is applied. First, a balanced panel data set of monthly provincial and territorial time series is constructed from publicly available Common Output Data Repository (CODR) tables. This data set spans January 2002 to the latest available data points (currently April 2020). Second, three methods for transforming the monthly series into an indicator of provincial and territorial economic activity are applied to the data set: a simple model, Principal components analysis (PCA) and Least absolute shrinkage and selection estimation (LASSO). A fourth index that combines the index from the simple model and the index based on PCA is also produced. The results from these approaches form the exploratory measures of provincial aggregate economic activity presented here.

The experimental indexes are based on the use of statistical models with an input data set that contains a series of approximations and assumptions. Notable among the assumptions for the input data set are: the use of national deflators to produce real provincial series when provincial deflators do not exist, the assumption that the growth rates for all series are covariance stationary, and the assumption that winsorizing data at the 5th and 95th percentiles is appropriate.

For the models, annual real GDP growth across the provinces and territories is used as a measure of aggregate provincial activity against which the derived index measures of economic activity are compared. This produces a situation where a small number of annual observations are being used to estimate model parameters that are used to infer monthly fluctuations. The small number of observations affects the ability of models to provide sufficient inference, and the use of annual data may mask important differences in the monthly timing of changes in prices, outputs and employment.

To allow for the possibility that using annual data may affect model performance and inference, the simple model assumes a set of variables is appropriate and uses OLS to determine their relative contributions. For PCA, a maximized adjusted R-squared from OLS regressions is used to select which of the first ten principal components should be included. In both of these situations the OLS regressions include variables that are statistically insignificant. For LASSO, statistical significance of potential input variables determines the final model. However, for LASSO, for some economies, no input variables are selected. In these cases, elastic net or a general-to-specific modelling strategy is employed instead.

Since the input data set contains a number of assumptions and approximations, and since the statistical models are used in an imperfect setting, the experimental indexes that are created should be viewed as approximations to aggregate economic activity rather than as exact measures. The activity indexes tend to present considerable monthly volatility, and when compared with the real GDP estimates produced by the Government of Quebec or the Government of Ontario, they can exhibit greater cyclical volatility.

Across the measures, the simple models and the LASSO models tend to rely more on employment series as inputs. The simple index has the strength that it is straightforward to understand, and is the most comparable across provinces and territories since it holds the variables used in the activity index constant regardless of the jurisdiction. The PCA indexes appear to capture more fluctuations related to other aspects of overall activity (e.g. sales or exports), but they also have greater variability. The weighted combination of the simple, researcher-defined indexes and the PCA indexes has a better correlation with real GDP growth than the constituent indexes.

Currently, the weighted indexes or the LASSO based indexes appear to offer the best trade-off between signals present in the data and variability of monthly series. This assessment is based on how well the models appear to conform with the set-up used for estimation as well as on the behavior of the indexes. In some cases, such as the PCA based index for Newfoundland and Labrador, an anomaly is present that calls the veracity of the index into question.  When these types of situations occur, the data are deemed to be unfit for use at the present time, and the estimates for that index are not provided.  The assessments are ongoing and may lead to changes in recommended uses or indexes as further development of the input data set, model refinements or alternative model strategies are explored.

The remainder of this paper is structured as follows. Section 2 discusses the creation of the input data set as well as the assumptions employed to filter and transform the data prior to modelling. Section 3 describes the models employed, the assumptions embedded in the models, as well as their strengths, their limitations and their application for creating monthly indexes. Section 4 provides analysis of the model performance and illustrates the resulting indexes. Section 5 concludes.

## 2 Input data set

The input data set is comprised of province- and territory-specific measures for economic activity and Canada-level deflators, except in instances where province- and territory-specific deflators are available. The monthly input series are comprised of monthly surveys for labour, outputs and prices (Table 1). In some cases, active tables do not contain continuous information from January, 2002 to the present. In these cases historical tables are used to backcast active tables.

﻿
Table 1
Input data tables of provincial and territorial time series
Table summary
This table displays the results of Input data tables of provincial and territorial time series. The information is grouped by Table number (appearing as row headers), Table title (appearing as column headers).
Table number Table title
12100099 Merchandise imports and exports, customs-based, by Harmonized commodity description and coding system (HS) section, Canada, provinces and territories, United States, states
12100119 International merchandise trade by province, commodity, and Principal Trading Partners
14100036 Actual hours worked by industry, monthly, unadjusted for seasonality
14100201 Employment by industry, monthly, unadjusted for seasonality
14100222 Employment, average hourly and weekly earnings (including overtime), and average weekly hours for the industrial aggregate excluding unclassified businesses, monthly, seasonally adjusted
14100287 Labour force characteristics, monthly, seasonally adjusted and trend-cycle, last 5 months
14100292 Labour force characteristics by territory, three-month moving average, seasonally adjusted and unadjusted, last 5 months
14100355 Employment by industry, monthly, seasonally adjusted and unadjusted, and trend-cycle, last 5 months
16100048 Manufacturing sales by industry and province, monthly (dollars unless otherwise noted)
18100004 Consumer Price Index, monthly, not seasonally adjusted
18100204 Electric power selling price index, monthly
18100205 New housing price index, monthly
20100008 Retail trade sales by province and territory
20100074 Wholesale trade, sales
21100019 Monthly survey of food services and drinking places
24100002 Number of vehicles travelling between Canada and the United States
25100001 Electric power statistics, with data for years 1950 - 2007
25100015 Electric power generation, monthly generation by type of electricity
34100003 Building permits, values by activity sector
34100066 Building permits, by type of structure and type of work
34100158 Canada Mortgage and Housing Corporation, housing starts, all areas, Canada and provinces, seasonally adjusted at annual rates, monthly

Deflators are primarily collected from Canada-level survey programs for measures of economic activity (Table 2). Statistics Canada does not currently produce province- and territory-specific deflators for current dollar measures of international trade, manufacturing sales, wholesale sales, retail sales or the Monthly Survey of Food Services. To collect deflators, Canada-level price indexes are taken from surveys when they are available. In the case of manufacturing, the implicit price index is derived as the ratio of the nominal value to the real value. Deflators exist for nominal series back to January 2002, except for manufacturing. For manufacturing, the Index Produce Price Index (IPPI) by industry, IPPI by product group, and the ratio of nominal to real monthly GDP are used as projectors for manufacturing deflators.

﻿
Table 2
Data tables for deflator time series
Table summary
This table displays the results of Data tables for deflator time series. The information is grouped by Table number (appearing as row headers), Table title (appearing as column headers).
Table number Table title
12100128 International merchandise trade, by commodity, price and volume indexes, monthly
16100013 Real manufacturing sales, orders, inventory owned and inventory to sales ratio, 2012 dollars, seasonally adjusted
16100047 Manufacturers' sales, inventories, orders and inventory to sales ratios, by industry (dollars unless otherwise noted)
18100004 Consumer Price Index, monthly, not seasonally adjusted
18100029 Industrial product price index, by major product group, monthly
18100032 Industrial product price index, by industry, monthly
20100003 Wholesale sales, price and volume, by industry, seasonally adjusted
20100038 Retail trade, sales, chained dollars and price index, inactive
20100051 Wholesale trade, sales, chained dollars and price index, inactive
20100078 Retail sales, price, and volume, seasonally adjusted
36100434 Gross domestic product (GDP) at basic prices, by industry, monthly

To combine the provincial data and deflator data to create the input data set, there are 4 steps:

1. Assemble data. The CODR tables are filtered to select the desired data. Only variables with continuous data are selected. Series subject to suppression are excluded, however series with 0 values are included. Series subject to suppression are typically smaller value series, meaning they contain less information for aggregate economic fluctuations. Although methods exist for interpolating these data points, should the suppression occur in the latest month, a forecast would be required to infill the suppressed data point. Given that the index is being constructed to provide information about the largest shock to affect the Canadian economy since World War 2, it is considered inadvisable to include series with forecasted values.
2. Seasonally adjust data series. Not all series are provided on a seasonally adjusted basis.  There are a total of 966 series seasonally adjusted for use in the indicator. Given the high number of series, the auto options of ARIMA-SEATS algorithm from the R package Seasonal is employed to remove seasonality. Seasonal parameters are determined based on the available monthly time series up to December 2019. In the event the time series do not span the whole period, such as discontinued series, the data up to the most recent period are used to determine the seasonal adjustment options.
• In order to ensure that the seasonally adjusted series are of good quality, they were validated by a range of quality measures. This includes checking for the presence of seasonality, the amount of stable seasonality present relative to the amount of moving seasonality (M7), the absence of seasonal effects in the irregular component, the smoothness of the seasonally adjusted series against its raw form and controlling the number of outliers auto detected by ARIMA-SEATS to a maximum of five. Series that were found to be non-seasonal (125 series) are kept in their raw form. Seasonally adjusted series with poor quality (183 series) are filtered out of the data set prior to estimation.
1. -Link data. After seasonal adjustment, data are linked as necessary. When overlapping periods exist, links are made by treating data as indexes and chaining backwards over historical periods. If overlapping periods do not exist, level values are joined “as is”.
2. Apply deflators. Where necessary, deflators are applied to current dollar series or to price series to produce relative price variables.

The full data set prior to filtering contains province- and territory-specific seasonally adjusted and not-seasonally adjusted series, nominal series and deflated series. For modelling provincial and territorial economic activity, series in natural units (e.g. employees, hours worked), deflated series, rates (e.g. the unemployment rate), and relative prices are selected.

## 3 Estimation

The objective is to estimate monthly time series for aggregate economic activity in the provinces and territories as a function of available monthly provincial and territorial economic time series:

${y}_{monthly,t}=f\left(monthly/time/series\right)$

The most commonly cited measure of aggregate economic activity is the real GDP measure described in the 2008 System of National Accounts (United Nations 2010).Note  For the Canada-wide level, the function for transforming input series into monthly real GDP is based on industry-specific methodologies and benchmarks that have been developed and enriched over time. The methodologies use a number of data sources to estimate changes in gross output that serve as proxies for real GDP. The proxies are combined with annual real GDP benchmarks to produce the monthly real GDP series.  In many cases, a direct measure of gross output is available. However, in some industries, direct measures of output are not available and estimates are constructed using alternative data sources, such as employment (Statistics Canada 2020b).Note  This methodology forms the function $f\left(.\right)$ into which the monthly series are placed to produce monthly real GDP for Canada.

The challenge for measuring provincial and territorial aggregate economic activity is that the function $f\left(.\right)$ for the provinces and territories is unknown, and that the desired series, ${y}_{monthly,t}$ is also unknown.

Since the goal is to estimate ${y}_{monthly,t}$ , if a close substitute existed, it could be used as an instrument for the true ${y}_{monthly,t}$ . The function $f\left(.\right)$ for combining monthly information to produce an aggregate measure of economic activity could then be approximated. While a close monthly substitute for ${y}_{monthly,t}$ does not exist, the Canadian System of Macroeconomic Accounts does produce a measure of provincial and territorial GDP at an annual frequency. The approach followed here, therefore, assumes that the annual data can be used as an instrument to help inform the structure of $f\left(.\right)$ when the monthly growth rates from the available input series are averaged within a calendar year and used to estimate parameter values. The parameters of $f\left(.\right)$ and the variance characteristics of the monthly input series are then adjusted to account for the difference in periodicity. Monthly indexes of aggregate economic activity are then constructed from the estimated monthly values of ${y}_{monthly,t}$.

Using a lower frequency variable as the instrument for monthly economic activity lowers the number of degrees for freedom and introduces issues related to the timing of monthly versus annual fluctuations. These issues will have consequences for the ability of the models to produce a monthly estimate of aggregate economic activity. The small degrees of freedom and the covariance around the 2008 recession will tend to produce statistically insignificant parameters for $f\left(.\right)$, if the regressors are not importantly affected by business cycle fluctuations. Moreover, models may tend toward selecting a smaller number of inputs at an annual frequency than is necessary for explaining monthly variation as important monthly fluctuations may be masked through aggregation to a lower frequency. Additionally, changes related to prices, sales/output and employment will occur contemporaneously at an annual frequency. However, at a monthly frequency these fluctuations may not align.

Given that a conversion from annual to monthly frequency is necessary to generate the desired estimated values, the modelling strategy includes some approaches that err on capturing more variation in the data rather than focusing solely on model parsimony. This does not mean that models are produced in an ad-hoc fashion. Rather, selection criteria, such as maximizing an adjusted R-squared, are employed alongside more traditional general-to-specific-type modelling strategies.

In summary, because the functional form for transforming monthly data into an aggregate measures of economic activity is unknown, and because the actual values for the series ${y}_{monthly,t}$ are also unknown, the best that is possible is to approximate the true ${y}_{monthly,t}$. This means that the series ${\stackrel{^}{y}}_{monthly,t}$ will have the flavor for what a real GDP series could look like, but it will not be a true measure of monthly real monthly GDP. Instead, it will be an estimate for an economic activity index which corresponds to macroeconomic conditions in the provinces and territories.

### 3.1 Estimation strategy

The function $f\left(.\right)$ is a set of instructions for transforming a large number inputs into a single series. In this paper, it is assumed that the function can be approximated based on a linear combination of inputs, and that the inputs can be selected either by selecting a subset of the available data or by creating a combination of all input data series. Below, three approaches are explored: 1) a simple model; 2) PCA; and 3) LASSO. The simple and the LASSO model fall in the former category while PCA falls in the latter category.

The assumptions and implementation of the models is discussed in detail below. Across modelling strategies, the following steps are followed to estimate index values in all cases:

• 1: Prepare the input data.

The input data set has 1,341 series that are distributed unevenly across the provinces and territories (Table 3). Not all series have equal utility for modelling aggregate economic activity. In cases where seasonal adjustment failed, the series are removed. Similarly, series with 0 values are removed. These tend to be series where 0 values are interspersed with nominal values. In these cases, seasonally adjusted values can be negative, growth rate or log-difference transformations do not work, and the series have questionable value for use as an ongoing indicator of economic activity. Overall, 198 variables are dropped for these reasons.

﻿
Starting vectors With 0 SA failed Dropped Top 15 Top 25 Vectors Vectors for LASSO for PCA 108 3 8 8 15 25 83 73 101 12 15 20 12 20 65 57 115 2 14 14 15 25 84 74 115 1 12 13 15 24 81 72 131 1 10 11 18 29 97 86 133 0 11 11 18 30 99 87 120 1 6 6 17 28 94 83 118 0 11 11 16 26 85 75 122 0 8 8 17 27 91 81 122 0 8 8 17 28 92 81 63 22 20 26 6 10 31 27 59 27 29 31 5 7 23 21 46 29 29 32 2 4 11 9 1,353 98 181 199 173 283 936 826 Notes: LASSO: least absolute shrinkage and selection operator; PCA: Principal components analysis; SA: Seasonal adjustment. Source: Statistics Canada, authors' compilation.

The filtered input data set then contains 1,143 series. However, the series are typically reported in levels (e.g. hours worked or manufacturing sales in chained dollars) and present strong trends over the sample period. To account for the trends, the series are transformed to month-to-month growth rates. These growth rates will ultimately be compared to measures of real GDP growth, and they have the advantage of being bound by -100% for the maximum decline.Note  The growth rates for the series are assumed to be covariance stationary.

To use the series in estimation, all series are demeaned and scaled to have unit variance at a monthly frequency. This normalization process is applied to variables to prevent variables with naturally larger unit values from affecting results. Because the monthly time series can be have high variability, and because periods of economic shocks such as recessions can produce aberrant data points, all series are winsorized (or top and bottom coded) prior to estimation. This prevents extreme data points from affecting results. For creating monthly indexes, the parameter values from models based on the winsorized data are combined with un-winsorized data which permits larger values to have their full influence when large shocks occur.

Finally, the noisiest series are removed. For PCA, the top 25% of series by variance are removed by province while for LASSO the top 15% of series by variance are removed. These thresholds are arbitrary, but their imposition was found to improve the ability of the models to inform on aggregate activity (i.e. improve the signal relative to noise), and to improve consistency of results across methods. After adjusting for high variance series there are a total of 928 input variables for LASSO and 820 for PCA. Nunavut has the fewest series available while Ontario and Quebec have the most.

• 2: Using the winsorized growth rates, calculate the annual average of monthly growth rates for use in the models.
• 3: Estimate the model parameters.

The estimation is initiated by combining real GDP growth with annual averages of monthly input series for years 2002 to 2018. Real GDP growth is not scaled or demeaned. Using real GDP growth as the target variable and the annual averages of monthly series as input variables, the functional form of $f\left(.\right)$ is estimated. In all models employed, it is assumed that a linear combination of input variables can be used to transform the multiplicity of input variables into a single measure of aggregate activity. In the case of the simple model and LASSO, a subset of the variables is used directly. In the case of PCA, the first ten principal components are employed as the starting point. It is also assumed that OLS can be used to generate contributions for combining input variables. By using regression methods to combine inputs, a further assumption that economic structures are, on average, the same over the entire sample period is imposed.

• 4: Use the model to estimate monthly growth rates.

Since all approaches can be viewed as an OLS regression with demeaned inputs, the intercept can be interpreted as the average annual growth rate of real GDP between 2002 and 2018. The selected inputs (monthly time series or principal components) produce fluctuations around this average growth rate. For 2019 and 2020, it is assumed that the average growth rate from 2002 to 2018 is representative of underlying growth.

To produce monthly estimates, it is necessary to adjust parameter estimates or monthly series to account for the difference in periodicity. The model constant is adjusted to a monthly frequency based on the monthly compound growth rate that is equivalent to the annual estimate:

${\stackrel{^}{\beta }}_{0,monthly}={\left({\stackrel{^}{\beta }}_{0,annual}+1\right)}^{1/12}-1$

To estimate monthly fluctuations around the trend growth rate, the raw input series are used. This allows large fluctuations in the time series to present their full impact when economic shocks, such as recessions or commodity price cycles, impact provincial and territorial economies. The monthly inputs have their variances adjusted to match the annual variance prior to use. OLS regression estimates are based on the ratio of cov(x,y)/var(x). In the current context, aggregation through time reduces the variance of the X matrix. To account for this, the variance of the monthly data is re-scaled to match the variance of the annual data for each series as:

${\sigma }_{monthly,adjusted}=\left(\frac{\left(x-{\mu }_{x}\right)}{{\sigma }_{x,monthly}}\right)*{\sigma }_{x,annual}+{\mu }_{x}$
• 5: Generate level indexes

The fitted values from the models are estimates for monthly growth in economic activity. They can be transformed into indexes by adding 1 to create a linking value for a chain index. The index level is then calculated by chaining forward from January 2002.

The growth rate estimates have a confidence interval associated with them as there is quantifiable uncertainty that comes from the model. There is also unquantifiable uncertainty that arises from possible model misspecification. To produce a level index, it is necessary to assume that the growth rate estimates are sufficiently accurate that they can be employed for chaining even though errors are compounded through time. This is a strong assumption, but is consistent with the way mean values from survey data for prices, values and quantities are combined to produce chain-quantity or chain-price indexes.

The use of multiple models in step 3 will ultimately lead to different flavors of the activity index being presented. In the current context, where statistical models are being used to inform about economic activity in an environment where they cannot be optimally implemented, the creation of multiple versions of the activity index serves an important role. Because the true value for ${y}_{monthly,t}$ is unknown, validating $\stackrel{^}{f\left(.\right)}$ and ${\stackrel{^}{y}}_{monthly,t}$ is challenging. The outputs from the different methods provide a natural form of data confrontation which helps to gauge the adequacy and generalizability of the estimates.

#### 3.1.1 Simple model

The simple model imposes the a priori assumption that total employment, total exports and total retail sales contain the appropriate information for understanding aggregate economic fluctuations. This is likely too strong an assumption as more than three inputs are needed to fully capture the complexities of aggregate economic activity. However, the model is consistent across all provinces and territories, and it is straightforward to understand. It, therefore, has value as a base against which more complex methods can be assessed. The simple approach also represents a method that can be viewed as consistent with the types of projectors that are used to infer gross output movements that are used as inputs for monthly GDP for Canada (Statistics Canada 2020b).

Since the series do not have a natural aggregation structure for combining them, regressions are used to determine relative contributions of the variables rather than an index number formula. Because the inputs are assumed to be the necessary inputs, the three series are included regardless of their statistical significance in regressions.

#### 3.1.2 Principal components analysis

PCA is a variable reduction technique that aims to explain the variance of a given data set using a smaller number of principal components (OECD 2008, Jollife 2002). A data set with p variables $X=\left[{x}_{1},{x}_{1},...,{x}_{p}\right]$ , can be transformed to produce p principal components:

$Z=AX$

where

${z}_{1}={a}_{1,1}{x}_{1}+{a}_{1,2}{x}_{2}+..+{a}_{1,p}{x}_{p}$ ${z}_{2}={a}_{2,1}{x}_{1}+{a}_{2,2}{x}_{2}+..+{a}_{2,p}{x}_{p}$ ${z}_{p}={a}_{p,1}{x}_{1}+{a}_{p,2}{x}_{2}+..+{a}_{p,p}{x}_{p}$

The principal components are constructed as linear combinations of the input variables (the monthly series). The first principal component explains the largest proportion of the variance of the input variables. It is the eigenvector associated with the largest eigenvalue. The second principal component is orthogonal (uncorrelated) with the first principal component and is the eigenvector associated with the second largest eigenvalue. It explains the second largest component of the input variables. It can, therefore, be said to measure a different statistical dimension of the available series. The third principal component is orthogonal to the first two and measures the third largest proportion of variance in the data. And so on until the ${p}^{th}$ principal component.

To implement PCA here, the principal components that are used for model estimation and the loadings are determined using the winsorized, demeaned, unit variance monthly time series. The loadings are then applied to the raw, un-winsorized series to produce the raw principal components that are used to predict the monthly growth rates.

When PCA works well, a large portion of the variance of a data set can be explained by the first few principal components, and only the first few principal components are used for analysis. Unfortunately, in the case of the input data set for the provincial and territorial economies, PCA does not work well for reducing the scope of information in the data set (Table 4). The first principal component typically accounts for less than 10% of the variation in the data set. And, when averaged to produce an annual frequency estimate, the first principal component does not correlate well with annual real GDP growth for most provinces and territories (Table 5).

﻿
Principal components 1 2 3 4 5 6 7 8 9 10.5 16.2 21.0 25.4 29.7 33.3 36.6 39.7 42.6 45.3 12.1 19.5 25.7 31.2 35.2 39.1 42.7 46.0 49.2 52.2 9.0 15.2 20.6 24.9 28.5 32.0 35.2 38.3 41.1 43.8 9.6 16.4 21.3 25.8 29.8 33.6 37.3 40.4 43.4 46.3 7.0 13.0 17.8 21.9 25.5 28.7 31.6 34.3 37.0 39.7 8.9 14.6 19.3 23.1 26.4 29.6 32.5 35.3 37.8 40.2 7.9 13.6 19.2 23.8 27.3 30.7 34.0 37.2 39.9 42.6 8.6 14.5 19.7 24.4 28.7 32.0 35.1 38.0 40.9 43.6 10.5 16.4 21.5 25.9 29.5 33.0 36.1 39.0 41.7 44.3 8.6 14.7 20.2 25.3 29.4 32.8 35.8 38.6 41.3 44.0 14.2 24.8 33.8 39.8 45.1 50.2 54.9 59.3 63.4 67.5 17.8 30.6 37.6 44.3 50.3 56.1 61.4 66.6 71.4 76.1 35.8 49.4 62.6 73.6 83.7 92.5 98.1 99.6 100.0 Note ...: not applicable ... not applicable Source: Statistics Canada, authors' compilation.

The correlations indicate that outside of Alberta, British Columbia and Ontario, using only the first principal component will not produce an activity index that provides a suitable measure for determining the performance of the provinces and territories based on aggregating month-to-month fluctuations. As a result, an activity index based only on the first principal component, such as the one produced by the Federal Reserve Board of Chicago (Federal Reserve Board of Chicago 2020, Brave and Butters 2010, Evans and Pham-Kanter 2002), is not pursued here.

﻿
Winsorized index Un-Winsorized index -0.349 -0.256 -0.199 -0.307 0.531 0.491 -0.709 -0.673 0.657 0.661 0.775 0.778 0.562 0.567 0.477 0.338 0.937 0.940 0.840 0.848 0.177 0.251 0.423 0.422 0.355 0.444 Source: Statistics Canada, authors' compilation.

While the first principal component has difficulties correlating with annual fluctuations in real GDP, this does not mean there is no information in the first few principal components for explaining real GDP growth. Therefore, to generate a model based on the principal components, regressions are performed on all combinations of the first 10 principal components as regressors for explaining real GDP growth. The regression that maximizes the adjusted R-squared is then selected as the preferred model. This produces 13 models that perform reasonably well for explaining real GDP growth. Moreover, the models generally do well for explaining the 2008 recession and other, province- and territory-specific fluctuations.

#### 3.1.3 Least Absolute Shrinkage and Selection Operator

LASSO is the solution to a constrained optimization problem similar to OLS. Under classical linear regression, $X=\left[{x}_{1},{x}_{2},...,{x}_{p}\right]$ is a n x p matrix holding the predictor variables which are used to explain the variation in a target vector $y$ of length n. The coefficients for the regression $\stackrel{^}{\beta }=\left[\stackrel{^}{{\beta }_{0}},...,\stackrel{^}{{\beta }_{p}}\right]$ are then the solution to the problem that seeks to minimization the sum of the squared errors between $y$ and a linear combination of the variables in $X$ :

${\stackrel{^}{\beta }}_{OLS}|=\underset{\beta }{\text{argmin}}\sum _{i=1}^{n}{\left({y}_{i}-{\beta }_{0}-{\beta }_{1}{x}_{i1}-...-{\beta }_{p}{x}_{ip}\right)}^{2}$

LASSO (Tibshirani 1996) is one of a class of estimators that seeks to penalize the OLS estimator for over fitting (i.e. including too many variables) through its regulatory parameter $\lambda$ . It is similar to using an adjusted R-squared or information criterion to penalize for including too many regressors. However, it goes further than penalizing for extra regressors when looking at model quality. It selects relevant variables. LASSO is the solution to:

${\stackrel{^}{\beta }}_{LASSO}|=\underset{\beta }{\text{argmin}}\left[\sum _{i=1}^{n}{\left({y}_{i}-{\beta }_{0}-{\beta }_{1}{x}_{i1}-...-{\beta }_{p}{x}_{ip}\right)}^{2}+\lambda \sum _{j=1}^{p}\left(|{\beta }_{j}|\right)\right]$

The $\lambda \ge 0$ parameter controls the strength of the penalty, the larger the value of lambda, the greater the amount of shrinkage. The LASSO algorithm is only permitted to include values for ${\beta }_{j}$ up to a particular absolute total. As a result, LASSO sets less consequential variable coefficients to 0. This can be viewed as similar to the type ore result found using a general-to-specific modelling strategy, but which is applicable on a larger scale. The result is a method for dealing with large data sets where a large number of predictors can be included, and the algorithm will select those whose covariance properties are most important for predicting the target variable ( $y$ ).

LASSO comes with its own limitations. In cases when groups of predictor variables are highly correlated with each other, LASSO tend to keep one variable from each group and shrink the coefficient of the other variables to zero. And in other cases, when the data set has small n and large p, LASSO selects at most n variables before it is saturated. However, there may be more than n variables with non-zero coefficient in the true model.

The Elastic Net method (Zou and Hastie 2005) is an extension of LASSO. By controlling the penalty weight $\alpha$ , the Elastic Net model stabilizes the variable selection from a group of correlated variables and removes the limitation on the number of variables selected. The coefficients are estimated as follows:

${\stackrel{^}{\beta }}_{EN}À=\underset{\beta }{\text{argmin}}\left\{\sum _{i=1}^{n}{\left({y}_{i}-{\beta }_{0}-{\beta }_{1}{x}_{i1}-...-{\beta }_{p}{x}_{ip}\right)}^{2}+\lambda \sum _{j=1}^{p}\left[\frac{1}{2}\left(1-\alpha \right){\beta }_{j}^{2}+\alpha |{\beta }_{j}|\right]\right\}$

Where $0\le \alpha \le 1$ is the penalty weight. With $\alpha$ equal to 1, the Elastic Net is the same as the LASSO model and, with $\alpha$ close to 1, the Elastic Net behave similar to LASSO, but removes the problematic behavior caused by high correlations among variables.

The outputs of LASSO in terms of the number of variable selected and their statistical significant were carefully studied. In almost all cases where LASSO worked, LASSO seems to include variables in the model that are not statistical significant. To ensure that the relation between the target y and regressors is justifiable with a better statistical result, a step wise regression with backward selection is used on the variables selected by LASSO to remove non-significant variables from the model.

Step wise regression is a method that examines the statistical significant of each independent variable within the model. It builds a model by successively adding (forward selection) or removing (backward selection) variables based on the t-statistics of their estimated coefficients. The backward elimination method begins by including all variables in the model, then each variable is removed one at a time, to test its importance. Those variables that are not statistically significant are removed from the model.

The LASSO model did not select any variables for New Brunswick, Nova Scotia, Ontario and Northwest territories. The Elastic Net method is for used for these jurisdictions instead. And for the other two territories, Yukon and Nunavut, a manual step wise regression is performed.

In both methods, LASSO and Elastic Net, cross validation from the R package caret is used to tune parameters lambda and alpha. The cross validation uses a rolling forecasting origin technique (Hundman and Athanasopoulos 2014) instead of the simple random sampling. This technique is specific to time series data sets.

## 4 Monthly index assessment

The three approaches have different strengths and weaknesses, which affects their use (Table 6). The simple index and the LASSO index have the strength that their models are parsimonious, and the indexes they produce are less noisy than PCA-based indexes. However, these indexes are based on a greatly reduced set of variables, which for the simple indexes are often statistically insignificant in annual regressions. These indexes also tend to focus on employment series rather than a broad range of economic activities, and so may not present ideal predictors of monthly activity fluctuations if changes in production are not contemporaneously aligned with labour variables.

The PCA index has the strength that the methodology is sound and well understood. It works for all provinces and territories. However, it produces the noisiest activity indexes making them difficult to interpret, and in some cases (e.g., NL) the index can decline sharply. The PCA indexes are also combined based on maximizing the adjusted R-squared across regressions. This produces a linear combination of principal components that are statistically significant and insignificant. These inclusions err on the side of adding additional information that includes some noise as it is not clear that data at an annual frequency represents month-to-month variability.

﻿
Table 6
Characteristics of index estimation approaches
Table summary
This table displays the results of Characteristics of index estimation approaches. The information is grouped by Criteria (appearing as row headers), Simple index, PCA index, Weighted index and LASSO index (appearing as column headers).
Criteria Simple index PCA index Weighted index LASSO index
Consistent inputs across geographies Yes No No No
Consistent model-types across geographies Yes Yes Yes No
Model specification 3 inputs, some insignificant variables Variable number of principle components. Some insignificant variables Combination of Simple and PCA Variable input selection
Model fit Goodness of fit can vary across provinces and territories Generally good in-sample fit Improved in-sample fit compared to the simple or PCA indexes Generally good in-sample fit
Interpretability Easy to understand inputs and contributions Difficult to understand what contributes to changes

Difficult to interpret principle components

High variability indexes
Difficult to understand what contributes to changes Inputs based on correlations

Interpretable contributions

Low variance index
Model suitability Models can perform poorly based on statistical significance

Inputs align with expectations about important variables
Models can perform poorly based on statistical significance

Comprehensive use of input data
Inherits properties of input indexes Modelling approach not well suited to current set-up

Combining the indexes provides an additional method for their use. Since the simple index is relatively stable, but focuses on a limited number of fundamental series, and the PCA is more variable, but includes linear combinations of all inputs, these series are combined to produce a weighted index that has better characteristics than the components. As with the regression coefficients, annual real GDP growth is used as the comparison as it is the primary source of aggregate economic activity available for the provinces and territories. To combine the PCA index and simple index, values of nu between 1% and 100% are used to create weighted indexes as:

$weighted_index=\left(1-\nu \right)*simple_index+\nu *activity_index$

The nu corresponding to the weighted index that has the highest correlation with real GDP growth is then selected.

The methods for generating monthly indexes generally return similar types of information on economic cycles and major economic shocks in the provinces and territories. As examples, the indexes for Alberta (Panel 1) and Newfoundland and Labrador (Panel 2), are presented below.

For larger economics, such as Alberta, all approaches return similar information on periods of growth or decline, but the magnitude of the cycles can differ depending on methodology.  In general, the PCA based indexes have the largest variability while the simple index has the least.  In some cases, such as the PCA index for Newfoundland and Labrador, the model fails to produce a reasonable result. In these cases, the index will not be made available, and is deemed not-fit-for-use.  Nevertheless, when the indexes appear to have the appropriate characteristics, there is a strong correlation across measures for the implied economic activity, and the movement of the indexes through time corresponds with what is known about provincial and territorial economic performance.

Additionally, comparisons with sub-annual real GDP estimates for Ontario and Quebec show that the year-to-year growth rates are highly correlated but that business cycles can be accentuated in the activity indexes. The indexes, therefore, appear to capture relevant information for economic cycles, periods of stronger or weaker growth and for understanding economic performance. They do not, however, correspond directly with real GDP, and should not be interpreted as a direct measure of real GDP.

Data table for Chart 1 ﻿
Principle component based activity index Simple index Weighted index LASSO based index 44.02 46.61 43.8 53.41 42.57 47.96 43.94 54.03 43.38 46.68 43.61 53.1 43.66 47.1 43.95 52.9 42.7 47.27 43.63 53.07 46.2 48.41 45.75 54.21 48.41 48.82 46.9 54.67 49.49 51.78 48.98 56.37 48.55 51.02 48.18 55.53 48.16 51.42 48.23 55.81 48.55 51.27 48.32 55.87 51.29 51.54 49.6 56.95 49.35 51.91 49.03 56.81 50.07 50.62 48.62 56.36 49.06 52.54 49.28 57.45 46.89 51.99 48.06 57.38 51.1 51.61 49.67 57.31 51.11 51.9 49.84 57.42 53.42 51.25 50.42 57.95 54.38 50.81 50.55 57.93 55.08 51.13 51.01 58.18 58.9 53.63 53.94 60.27 57.34 53.12 53.04 59.42 58.65 51.93 52.87 58.7 58.52 53.3 53.62 59.04 57.42 52.46 52.71 58.44 61.58 54.96 55.77 60.01 63.78 57.09 57.86 61.91 64.62 58.28 58.88 62.76 63.31 58.73 58.64 62.47 63.92 59.92 59.57 63.02 67.74 61.08 61.74 63.31 72.25 64.06 65.2 64.77 69.04 63.96 63.93 64.7 71.06 65.47 65.59 65.95 75.37 68.87 69.24 67.56 74.58 68.91 68.95 66.55 78.88 70.83 71.74 68.35 79.81 71.25 72.34 69.63 83.03 74.76 75.64 70.38 86.59 78.15 78.99 72.45 86.26 77.58 78.53 72.65 90.69 80.97 82.21 74.71 92.54 82.08 83.57 75.23 89.46 82.22 82.48 75.36 93.96 83.52 84.98 76 99.23 84.08 87.32 76.41 105.42 86.85 91.27 77.9 112.03 92.56 97.16 81.19 124.82 97.78 104.99 86.05 129.22 98.89 107.23 85.29 131.71 104.09 111.37 87.32 151.64 108.36 121.1 90.3 157.75 114.43 127.09 92.96 155.85 111.55 124.59 92.11 154.08 110.1 123.06 91.1 158.42 114.46 127.34 94.08 164.81 114.27 129.37 95.42 189.61 123.48 143.6 99.53 182.2 129.73 145.46 100.77 190.56 134.1 151.1 102.35 198.82 136.14 155.18 103.41 198.63 138.32 156.56 105.25 199.1 134.66 154.32 102.72 190.58 133.71 150.91 102.79 206.86 138.25 159.3 104.63 206.74 135.7 157.55 104.9 221.09 137.19 163.15 106.49 212.06 134.81 158.71 104.74 212.26 137.38 160.52 105.98 222.26 136.46 163.08 105.2 214.99 134.42 159.43 105.75 213.26 139.22 162.19 107.36 210.38 137.17 159.88 105.66 229.88 145.51 171.75 109.47 227.03 147.08 171.93 111.48 217.63 143.91 166.79 110.19 224.1 146.92 170.9 111.12 228.86 148.4 173.41 111.88 231.95 154.54 178.56 113.96 245.26 151.44 180.79 113.02 251.6 152.78 183.68 113.92 226.74 146.33 171.56 110.44 199.86 140.83 159.28 106.41 184.33 130.35 147.21 101.74 152.56 123.85 132.29 98.07 132.39 110.29 116.55 91.91 120.77 104.76 108.86 88.75 117.82 104.35 107.49 86.86 111.94 97.83 101.34 84.4 109.03 97.57 100.08 83.67 96.8 87.94 89.64 79.63 103.38 89.66 93.21 80.47 95.74 87.62 89.09 78.73 98.14 86.53 89.38 78.59 101.93 86.01 90.52 78.65 104.88 91.76 95.14 80.38 101.29 91.16 93.4 79.83 106.13 92.36 96 80.49 112.77 93.93 99.46 82.04 114.31 93.21 99.59 82.08 114.69 95.96 101.43 83.79 113.32 97.73 102.01 83.77 114.88 97.17 102.26 83.54 115.52 99.14 103.7 85.08 123.24 102.62 108.72 87.49 117.97 100.26 105.32 87.09 124.56 105.13 110.76 89.72 129.69 103.5 111.68 89.83 139.07 106.75 117.11 92.08 142.32 108.93 119.64 92.54 150.84 113.07 125.29 94.7 143.64 111.31 121.65 93.88 159.07 118.1 131.44 97.47 162.9 121.64 135.05 98.81 170.97 121.2 137.58 99.78 177.27 125.39 142.46 102.34 181.7 126.83 144.91 103.3 187.92 130.88 149.68 105.69 202.43 135.47 157.58 108.6 200.04 136.08 157.21 108.2 203.94 137.33 159.33 107.93 215.35 143.44 167.19 111.99 231.23 147.11 174.84 113.99 246.96 158.95 188 119.03 245.54 163.36 190.57 121.09 251.4 163.48 192.57 121.8 255.07 167.99 196.83 123.66 256.26 166.98 196.53 122.35 256.04 163.98 194.41 121.8 282.48 168.88 206.21 124.01 257.26 162.99 194.31 121.25 269.64 166.71 200.8 123.41 278.58 174.14 208.79 126.47 285.1 178.01 213.54 128.04 284.24 178.9 213.88 129.26 295.09 181.18 218.89 130.5 276.77 180.16 212.47 129.64 293.1 185.99 221.72 132.52 308.64 190.45 229.75 134.29 297.08 187.28 223.91 134.34 303.6 193.58 230.35 136.04 309.71 194.63 233.02 136.76 300.19 192.49 228.53 136.8 305.95 195.89 232.71 138.24 331.2 199.63 243.35 139.67 320 202.51 241.93 139.89 306.86 206.6 240.59 141.57 338.35 210.18 253.38 144.25 360.37 217.84 265.66 145.48 362.64 217.87 266.39 146.54 365.55 228.27 274.66 149.51 380.11 229.51 280.12 152.28 375.74 229.38 278.68 153.12 352.84 218.69 264.01 149.41 366.34 218.46 268.09 149.61 361.92 217.89 266.33 151.59 306.55 204.05 239.4 143.6 299.28 197.25 232.39 140.47 295.88 191.31 227.22 139.19 281.93 189.24 221.3 136.94 260.26 175.06 204.53 131.56 252.67 173.31 200.84 130.75 230.04 163.62 186.78 126.55 231.82 162.33 186.53 124.92 211.19 158.65 177.11 123.26 199.83 154.68 170.53 120.67 194.02 151.04 166.12 118.77 168.77 137.88 148.65 112.8 155.99 129.98 138.98 108.6 157.89 128.71 138.91 108.89 152.23 128.23 136.51 106.99 122.95 118.24 119.31 102.17 123.07 118.25 119.37 101.38 123.43 116.97 118.77 100.09 119.78 110.53 113.5 97.46 120.69 109.76 113.4 97.07 118.47 106.32 110.47 96.07 114.29 105.29 108.2 94.64 115.2 106.63 109.37 96.17 114.58 105.27 108.31 95.49 122.6 109.75 114.17 98.4 119.37 107.19 111.36 96.66 122.15 107.16 112.44 96.39 132.14 110.7 118.45 98.47 136.75 115.5 123.16 101.72 134.51 115.82 122.52 102.28 134.36 116.62 122.94 102.25 133.1 116.94 122.66 102.76 138.57 114.98 123.58 102.86 139.61 113.7 123.17 102.16 136.83 113.16 121.81 102.4 137.46 114.2 122.69 102.85 138.15 112.28 121.76 102.74 145.44 115.03 126.19 104.34 141.36 114.17 124.15 105.22 143.43 116.15 126.16 106.82 135.88 115.2 122.77 106.09 127.34 109.22 115.84 102.31 132.52 110.19 118.41 103.62 127.19 108.21 115.18 101.71 124.91 109.43 115.06 102.67 140.51 108.39 120.46 104.76 122.05 105.9 112.21 102.77 121.52 108.71 113.73 103.86 119.51 107.99 112.51 104.21 117.9 106.8 111.15 103.37 120.14 106.63 111.94 103.2 118.47 106.15 110.99 103.11 120.62 104.4 110.78 102.44 115.19 106.68 110.08 102.69 112.98 106.6 109.15 103.03 113.06 104.25 107.79 103.06 112.01 103.32 106.81 102.05 101.43 101.08 101.22 100.56 100 100 100 100 95.17 96.91 96.18 99.03 95.22 91.26 92.95 96.41 36.95 55.25 47.79 72.21 Notes: LASSO = Least absolute shrinkage and selection operator. Source: Statistics Canada, authors' calculations.

Data table for Chart 2 ﻿
Principle component based activity index Simple index Weighted index LASSO based index 5562.24 79.43 85.09 80.21 5224.48 82.88 87.45 78.31 4243.91 82.37 84.54 78.13 5626.24 84.65 90.65 80.77 6239.63 81.43 89.2 82.66 5657.11 81.77 88.28 83.57 6292.55 82.05 90.01 84.18 5814.76 82.49 89.4 83.92 6267.22 82.16 90.14 85.7 7667.45 82.73 93.69 89.83 7413.24 82.92 93.41 91.66 6900.72 82.58 92.12 90.72 6240.78 82.16 90.4 92.15 7319.34 81.15 91.79 93.04 8199.98 81.5 93.79 95.4 8855.62 80.74 94.17 91.16 4712.39 81.16 87.98 88.97 4942.04 81.57 89 91.22 6042.07 81.75 92.14 92.7 5297.57 82.13 90.8 90.71 5638.03 81.89 91.45 92.14 5705.55 82.33 92.03 89.2 6637.96 80.51 92.55 91.53 6354.64 80.75 92.2 92.96 6377.55 83.85 95.26 94.08 5984.01 83.32 93.86 93.11 5781.09 84.31 94.33 92.74 8976.38 83.1 101 93.32 1043.44 82.13 86.61 89.81 1157.32 84.15 89.84 92.31 919.54 83 86.03 92.46 1002.15 82.87 87.07 93.35 1046.54 82.96 87.73 93.51 956.18 83.13 86.75 91.98 1082.97 83.53 88.83 93.92 1096.23 82.95 88.47 93.98 1116.83 82.75 88.53 93.83 1076.08 83.74 88.95 95.46 1129 82.7 88.67 94.95 1016.3 83.22 87.82 93.02 1098.25 81.74 87.55 91.89 1176.3 82.06 88.78 92.55 1088.7 82.72 88.39 93.77 1129.19 82.43 88.62 94.73 1369.72 82.15 91.19 97.08 1143.89 81.95 88.75 96.92 1272.18 82.39 90.65 97.13 1155.15 82.72 89.71 96.55 1105.85 83.92 90.24 97.59 1022.72 83.8 89.11 96.3 1259.88 82.88 91.38 98.09 1598.84 85.67 97.68 100.91 1009.01 84.61 91.25 97.97 920.31 84.23 89.7 96.91 1062.74 84.44 91.97 99.3 1117.95 84.42 92.67 100.63 962.48 84.22 90.55 98.02 891.39 83.85 89.21 93.66 515.95 84.98 84.6 95.24 639.62 85.12 87.75 97.2 718.78 85.51 89.72 98.18 648.56 85.72 88.6 96 670.28 86.26 89.51 99.94 592.42 85.91 87.65 100.93 762.1 86.63 92.04 101.5 769.86 87.65 93.1 103.11 518.87 87.32 88.25 101.85 487.58 87.06 87.22 100.81 499.02 87.96 88.3 100.92 584.42 86.96 89.71 104.28 276.1 87.64 83.21 105.64 420.52 87.82 89.89 105.83 397.36 88.9 90.08 107.45 378.08 87.45 88.18 106.61 381.3 88.94 89.56 107.61 343.09 89.24 88.48 106.06 382.96 89.14 89.94 109 484.06 90.99 95.08 109.61 552.41 91.47 97.52 111.14 564.08 90.41 96.87 110.34 496.56 91.21 95.86 108.02 401.15 91.39 93.26 105.07 395.31 89.98 91.83 106.9 317.61 88.5 87.84 102.41 315.15 87.29 86.72 100.27 375.72 90.33 91.79 100.7 309.45 89.43 88.58 98.69 381 88.94 91.24 98.34 227.61 89.52 86.23 95.2 179.03 88.21 82.4 95.84 187.74 88.7 83.39 94.13 194.6 90.51 85.29 95.36 186.3 89.79 84.18 92.3 200.51 90.83 85.97 93.41 236.33 90.01 87.61 95.4 211.69 89.78 86.05 93.21 192.67 90.29 85.3 93.07 191.56 90.71 85.57 92.63 187.44 90.41 85.05 91.48 222.62 90.44 87.47 96.67 186.31 90.45 85.34 97.23 181.82 91.75 86.07 92.9 195.99 90.85 86.36 93.1 179.67 91.77 86.03 92.81 150.84 91.84 84.01 93.32 180.48 92.38 86.9 97.67 221.21 92.94 90.29 99.72 181.91 93.74 88.55 97.65 206.77 93.54 90.21 98.85 187.98 92.59 88.2 98.28 201 95.23 91.25 98.67 140.74 94.21 86.32 102.23 162.37 93.96 88.11 102.39 157.52 93.24 87.14 98.21 144.83 93.42 86.23 100.57 140.45 93.81 86.15 98.95 166.69 95.38 89.78 99.3 138.77 95.91 87.95 101.57 164.42 96.13 90.56 104.76 147.8 94.85 88.17 99.43 142.08 96.22 88.74 98.02 148.76 95.74 88.98 99.26 138.32 96.41 88.58 98.81 130.13 97.23 88.44 100.15 148.26 96.3 89.56 99.07 139.09 96.74 89.08 99.27 147.07 96.41 89.59 101.18 139.89 95.43 88.16 100.48 143.71 95.52 88.59 98.6 144.87 94.49 87.88 98.07 76.93 97.55 84.12 98.14 105.77 98.14 89.29 97.44 110.1 98.33 89.98 96.9 124.72 97.25 90.93 101.37 126.11 97.57 91.34 96.88 117.26 97.79 90.55 95.77 121.19 100.05 92.78 96.3 140.83 97.78 93.25 95.59 147.7 98.68 94.67 95.71 150.69 99.45 95.58 93.94 145.86 99.56 95.21 93.83 123.04 99.44 92.88 95.26 125.26 98.31 92.23 95.27 129.96 99.35 93.58 98.35 116.14 99.82 92.46 97.39 112.75 100.65 92.71 97.06 118.67 101.05 93.76 98.02 161 98.79 96.99 96.92 105.85 100.86 93.73 96.98 107.89 100.51 93.73 96.43 114.48 102.25 95.96 96.48 100.08 100.96 93.12 93.48 98.67 98.89 91.31 92.31 110.99 99.04 93.14 91.93 120.98 99.41 94.69 91.65 100.72 98.89 91.89 92.84 86.14 98.79 89.82 90.75 95.56 99.16 91.58 91.14 106.77 98.31 92.52 90.5 78.52 99.25 89.6 89.76 94.69 99.1 92.25 92.56 109.12 100.51 95.48 91.86 109.15 98.69 94.01 91.64 97.06 101.17 94.45 88.82 80.65 100.61 91.61 90.74 94.85 99.78 93.39 90.89 94.74 98.67 92.5 93.48 77.41 99.14 90.33 87.75 78.59 98.31 89.89 85.93 73.8 98.83 89.48 84.6 62.92 98.26 87.06 82.98 70.72 99.04 89.26 83.76 84.59 98.91 91.79 84.32 85.66 99.87 92.72 87.33 76.91 97.76 89.63 82.94 80.6 98.9 91.17 82.78 96.01 98.86 93.75 85.34 96.03 100.01 94.69 83.33 101.63 100.81 96.15 85.67 85.76 99.91 93.18 87.36 104.53 100.64 96.81 89.11 117.8 99.94 98.08 89.2 99.03 99.56 95.42 87.82 107.97 98.53 95.87 85.07 87.54 98.89 93.45 84.81 67.77 99.19 90.53 81.75 67.93 98.47 90 82.18 76.06 98.5 91.64 85 79.49 99.06 92.7 88.71 75.91 99.16 92.16 85.73 76.33 98.36 91.6 87.48 88.05 98.08 93.49 89.36 90.06 97.6 93.42 87.6 84.04 99.14 93.73 87.7 74.03 98.21 91.31 86.61 79.26 98.29 92.35 85.5 87.79 97.65 93.32 88.91 84.41 97.19 92.41 90.13 88.39 99.65 95.05 90.52 83.11 98.62 93.36 91.14 78.01 98.75 92.61 89.41 80.64 98.88 93.19 87.89 54.06 97.45 87.43 87.69 72.03 97.61 91.91 90.2 50.7 98.78 88.76 89 57.45 98.4 90.25 89.15 68.69 99.5 93.75 91.92 73.79 99.2 94.56 92.19 66.16 98.65 92.64 92.53 72.85 97.98 93.52 92.59 75.07 98.67 94.5 94.2 82.25 97.96 95.28 94.37 81.95 97.08 94.5 94.25 83.12 99.87 97.01 95.23 95.79 99.41 98.85 96.55 100 100 100 100 106.7 98.56 99.78 99.31 83.86 98.9 96.87 96.97 116.72 95.31 99.58 96.15 Notes: LASSO = Least absolute shrinkage and selection operator. Source: Statistics Canada, authors' calculations.

## 5 Conclusion

Measures of aggregate economic activity for economies are important for informing decisions about fiscal and monetary policy, for determining the characteristics of business cycles and for examining economic performance. In this study, four indexes of provincial and territorial economic activity based on different methodological approaches are estimated and presented. The methodologies are based on a simple model; PCA; a weighted combination of the simple index and the PCA index; and LASSO. In most cases, all approaches produce roughly similar results. However, the degree of cyclicality and the variance of month-to-month changes can differ significantly. As a general rule, PCA produces the greatest variability and the largest cycles while the simple index is the most stable.

Based on the properties of the methodologies and their outputs, the simple index is the most consistent across provinces and territories. It is also the easiest to interpret in terms of variable contributions and justification for variable inclusion. However, parameter values are often statistically insignificant and the input series are chosen as much for their economic importance as for their presence in all jurisdictions. As a result, these models offer a more limited approach for examining aggregate economic activity, but also present a basis for comparisons to more complex models.

Indexes based on PCA appear to offer a more complete sense of how activity is evolving over time, but it is unclear at the moment how the principal components should be interpreted. Because of this, and because the PCA indexes have the largest variability, they present a trade-off between overall use of input series and interpretability.

Weighting the simple index and PCA index produces a result that has a superior correlation with annual GDP fluctuations. The weighted combination continues to have more variability than the simple index. Since the PCA is included, it is also not as easy to interpret as the simple index, but likely provides a better measure of aggregate activity than its constituent parts.

The LASSO index performs well when compared to annual real GDP, but the model set-up is not as well suited to the situation encountered when trying to estimate the activity indexes. In particular, the relatively small number of observations limits the ability of the algorithms to perform cross validation. Moreover, while the input series are a distinct subset of the input data set, and their contributions can be generated in a straightforward manner, there is no theoretical reason for why the variables are important, and this limits the model’s interpretability.

Given the strengths and weaknesses present between the suitability of the models, their performance and examinations of their outputs, the assessments made thus far suggest that the simple indexes or LASSO indexes present results related to a set of fundamental inputs (often heavily influenced by employment series), that the PCA indexes relate more to some form of short-term activity (but the signal is noisy), and that the weighted index presents a compromise between the two.

The indexes as currently estimated are correlated with annual measures of real GDP and sub-annual measures of real GDP for Ontario and Quebec, but they should not be interpreted as being a real GDP measure. The indexes display greater variability and cyclicality that real GDP measures, and are constituted from measures of gross outputs, employment, relative prices and important ratios such as the unemployment rate. This makes the indexes appropriate for understanding economic activity, but they are not real GDP. Moreover, the indexes do not inform about differing levels of economic activity between provinces and territories.

The indexes are also based on an input data set and modelling strategies that are not ideal. Numerous assumptions must be imposed to produce the indexes, any of which may be a source of important measurement errors. As a consequence, the indexes presented here should be viewed as experimental, and are subject to revision or replacement as future research improves the processes and/or test the assumptions for their validity.

At the current time, the correlations between the different approaches, their positive correlation with provincially produced measures of sub-annual real GDP and examinations of their properties against known provincial and territorial economic performance supports their use as indicators of business cycles, for understanding the magnitude of shocks relative to a provinces’ or territory’s history and for understanding how regional economies are progressing. Inter-provincial comparisons are also supported, but with the caveat that model performance is difficult to understand in all situations, and that level comparisons across provinces are not possible using the index values.

## References

Brave, Scott., and R. Andrew Butters. 2010. “Chicago Fed National Activity Index Turns Ten — Analyzing Its First Decade of Performance.” Chicago Fed Letter, no. 273 (April). Federal Reserve Bank of Chicago. https://www.chicagofed.org/~/media/publications/chicago-fed-letter/2010/cflapril2010-273-pdf.pdf.

Statistics Canada 2020a.” Gross domestic product (GDP) at basic prices, by industry, monthly (36100434).” Statistics Canada. Statistics Canada, January 24, 2020. https://www150.statcan.gc.ca/n1/pub/13-607-x/2016001/230-eng.htm (accessed June 2, 2020)

Statistics Canada 2020b.” Gross domestic product (GDP) at basic prices, by industry, monthly (36100434).” Statistics Canada. Statistics Canada, July 31, 2019. https://www.statcan.gc.ca/eng/statistical-programs/document/1301_D1_V3 (accessed June 2, 2020)

Federal Reserve Board of Chicago 2020. “Chicago Fed National Activity Index (CFNAI) Current Data.” Federal Reserve Board of Chicago, June 22, 2020. https://www.chicagofed.org/research/data/cfnai/current-data (accessed June 2, 2020).

Evans, Liu, Charles L., and Genevieve Pham-Kanter. 2002. “The 2001 Recession and the Chicago Fed National Activity Index: Identifying Business Cycle Turning Points.” Economic Perspectives 26 (3). Federal Reserve Bank of Chicago: 26 – 43. https://www.chicagofed.org/~/media/publications/economic-perspectives/2002/3qepart2-pdf.pdf.

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on June 2, 2020

Jollife, I.T. 2002. Principle Components Analysis Second Edition. Springer-Verlag New York Inc, New York, NY.

United Nations (UN), European Commission (EC). International Monetary Fund (IMF), Organisation for Economic Co-operation and Development (OECD), and World Bank (WB). 2009. System of National Accounts, 2008. New York: United Nations. Available at: https://unstats.un.org/unsd/nationalaccount/docs/sna2008.pdf (accessed June 2, 2020).

Organisation for Economic Co-operation and Development (OECD). 2008. Handbook on Constructing Composite Indicators Methodology and User Guide. Organization for Economic Development. https://www.oecd.org/sdd/42495745.pdf (accessed June 2, 2020).

Tibshirani, Robert. 1996. “Regularization Shrinkage and Selection via the Lasso.” Journal of Royal Statistical Society: Series B.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301 – 20. doi:10.1111/j.1467-9868.2005.00503.x.

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: