Analytical Studies: Methods and References
Experimental Economic Activity Indexes for Canadian Provinces and Territories: Experimental Measures Based on Combinations of Monthly Time Series
by Nada Habli, Ryan Macdonald and Jesse Tweedle
Economic Analysis Division, Statistics Canada No. 027
Skip to text
Text begins
Acknowledgements
The authors would like to express their thanks to Etienne SaintPierre, Yvan Clermont and Danny Leung and Brenda Bugge for their support and guidance during the course of the project. They would also like to thank Steve Matthews, Fred Picard and Michel Ferland for their comments and input on modelling. They also wish to thank Philip Smith and Karen Wilson for their advice, suggestions and support.
Abstract
This paper explores methods for creating a monthly indicator of economic activity for the provinces and territories. It begins by constructing a dataset for the provinces and territories composed of monthly series about the labour force, including wages and employment; international trade; output measures such as manufacturing sales or electricity production; and, prices (consumer, housing and electricity). Where necessary, the series are seasonally adjusted, linked and deflated to create continuous time series from January 2002 to April 2020. Variable reduction methods are then applied to the monthly provincial and territorial dataset to create the experimental monthly provincial and territorial economic indicator indexes. Three methods are examined: Principal components analysis (PCA), Least absolute shrinkage and selection operator (LASSO) and a simple index comprised of three predetermined series (total employment, total exports and retail sales). A weighted average of the simple index and the PCA index is also constructed. In general, the indexes produced track provincial economic activity reasonably well, following cyclical movements of provincial and territorial economies. However, the setup for the models is not ideal as annual data are used to produce model parameters, and this leads to uncertainty about model performance. As a result, multiple indexes are reported. A quality assessment provides an indication of the strengths and limitations of the indexes with respect to different uses.
1 Introduction
Timely measures of economic activity are critical for understanding how economies perform, and for informing policy responses to macroeconomic fluctuations. The onset of the pandemic due to the emergence of the SARSCov2 virus emphasized this, as well as the need for geographyspecific measures. Presently, Canada has a robust system for producing uptodate measures of activity, such as real gross domestic product (GDP), at the national level. For provincial and territorial economies, monthly information on labour markets or particular activities such as manufacturing or international trade are available, but a monthly measure of aggregate economic activity is not available.
Under normal circumstances, producing a new set of aggregate economic indicators for the provinces and territories would require creating exploratory measures, possibly launching new surveys or expansions of existing statistical collection activities, as well as creating the infrastructure necessary to produce and disseminate the indicators on an ongoing basis. These changes require time to implement. In the current context, where the SARSCov2 pandemic is accentuating needs for monthly regional economic indicators, the time necessary to constitute a new statistical program to meet current requirements means that this approach is unfeasible.
A more timely approach is to adopt a statisticalmodel based strategy to quickly create exploratory measures of provincial economic activity. While this approach introduces a new measure of economic activity in a timely fashion, the tradeoff is that the methods employed are not used in their ideal situations, that inputs for models use currently available series without an ability to tailor their uses to the creation of an indicator, and that the models are somewhat atheoretic. That is, the models look for correlations in data rather than employing economic theory to help guide their construction. And, lastly, the models employed here typically have a different set of inputs for each province or territory. As a result, consistency in model structures cannot be maintained across provinces and territories, and this may affect interjurisdictional comparisons.^{Note }
To estimate the activity indexes, a twofold strategy is applied. First, a balanced panel data set of monthly provincial and territorial time series is constructed from publicly available Common Output Data Repository (CODR) tables. This data set spans January 2002 to the latest available data points (currently April 2020). Second, three methods for transforming the monthly series into an indicator of provincial and territorial economic activity are applied to the data set: a simple model, Principal components analysis (PCA) and Least absolute shrinkage and selection estimation (LASSO). A fourth index that combines the index from the simple model and the index based on PCA is also produced. The results from these approaches form the exploratory measures of provincial aggregate economic activity presented here.
The experimental indexes are based on the use of statistical models with an input data set that contains a series of approximations and assumptions. Notable among the assumptions for the input data set are: the use of national deflators to produce real provincial series when provincial deflators do not exist, the assumption that the growth rates for all series are covariance stationary, and the assumption that winsorizing data at the 5^{th} and 95^{th} percentiles is appropriate.
For the models, annual real GDP growth across the provinces and territories is used as a measure of aggregate provincial activity against which the derived index measures of economic activity are compared. This produces a situation where a small number of annual observations are being used to estimate model parameters that are used to infer monthly fluctuations. The small number of observations affects the ability of models to provide sufficient inference, and the use of annual data may mask important differences in the monthly timing of changes in prices, outputs and employment.
To allow for the possibility that using annual data may affect model performance and inference, the simple model assumes a set of variables is appropriate and uses OLS to determine their relative contributions. For PCA, a maximized adjusted Rsquared from OLS regressions is used to select which of the first ten principal components should be included. In both of these situations the OLS regressions include variables that are statistically insignificant. For LASSO, statistical significance of potential input variables determines the final model. However, for LASSO, for some economies, no input variables are selected. In these cases, elastic net or a generaltospecific modelling strategy is employed instead.
Since the input data set contains a number of assumptions and approximations, and since the statistical models are used in an imperfect setting, the experimental indexes that are created should be viewed as approximations to aggregate economic activity rather than as exact measures. The activity indexes tend to present considerable monthly volatility, and when compared with the real GDP estimates produced by the Government of Quebec or the Government of Ontario, they can exhibit greater cyclical volatility.
Across the measures, the simple models and the LASSO models tend to rely more on employment series as inputs. The simple index has the strength that it is straightforward to understand, and is the most comparable across provinces and territories since it holds the variables used in the activity index constant regardless of the jurisdiction. The PCA indexes appear to capture more fluctuations related to other aspects of overall activity (e.g. sales or exports), but they also have greater variability. The weighted combination of the simple, researcherdefined indexes and the PCA indexes has a better correlation with real GDP growth than the constituent indexes.
Currently, the weighted indexes or the LASSO based indexes appear to offer the best tradeoff between signals present in the data and variability of monthly series. This assessment is based on how well the models appear to conform with the setup used for estimation as well as on the behavior of the indexes. In some cases, such as the PCA based index for Newfoundland and Labrador, an anomaly is present that calls the veracity of the index into question. When these types of situations occur, the data are deemed to be unfit for use at the present time, and the estimates for that index are not provided. The assessments are ongoing and may lead to changes in recommended uses or indexes as further development of the input data set, model refinements or alternative model strategies are explored.
The remainder of this paper is structured as follows. Section 2 discusses the creation of the input data set as well as the assumptions employed to filter and transform the data prior to modelling. Section 3 describes the models employed, the assumptions embedded in the models, as well as their strengths, their limitations and their application for creating monthly indexes. Section 4 provides analysis of the model performance and illustrates the resulting indexes. Section 5 concludes.
2 Input data set
The input data set is comprised of province and territoryspecific measures for economic activity and Canadalevel deflators, except in instances where province and territoryspecific deflators are available. The monthly input series are comprised of monthly surveys for labour, outputs and prices (Table 1). In some cases, active tables do not contain continuous information from January, 2002 to the present. In these cases historical tables are used to backcast active tables.
Table number  Table title 

12100099  Merchandise imports and exports, customsbased, by Harmonized commodity description and coding system (HS) section, Canada, provinces and territories, United States, states 
12100119  International merchandise trade by province, commodity, and Principal Trading Partners 
14100036  Actual hours worked by industry, monthly, unadjusted for seasonality 
14100201  Employment by industry, monthly, unadjusted for seasonality 
14100222  Employment, average hourly and weekly earnings (including overtime), and average weekly hours for the industrial aggregate excluding unclassified businesses, monthly, seasonally adjusted 
14100287  Labour force characteristics, monthly, seasonally adjusted and trendcycle, last 5 months 
14100292  Labour force characteristics by territory, threemonth moving average, seasonally adjusted and unadjusted, last 5 months 
14100355  Employment by industry, monthly, seasonally adjusted and unadjusted, and trendcycle, last 5 months 
16100048  Manufacturing sales by industry and province, monthly (dollars unless otherwise noted) 
18100004  Consumer Price Index, monthly, not seasonally adjusted 
18100204  Electric power selling price index, monthly 
18100205  New housing price index, monthly 
20100008  Retail trade sales by province and territory 
20100074  Wholesale trade, sales 
21100019  Monthly survey of food services and drinking places 
24100002  Number of vehicles travelling between Canada and the United States 
25100001  Electric power statistics, with data for years 1950  2007 
25100015  Electric power generation, monthly generation by type of electricity 
34100003  Building permits, values by activity sector 
34100066  Building permits, by type of structure and type of work 
34100158  Canada Mortgage and Housing Corporation, housing starts, all areas, Canada and provinces, seasonally adjusted at annual rates, monthly 
Note: HS: harmonized system. Source: Statistics Canada, authors' compilation. 
Deflators are primarily collected from Canadalevel survey programs for measures of economic activity (Table 2). Statistics Canada does not currently produce province and territoryspecific deflators for current dollar measures of international trade, manufacturing sales, wholesale sales, retail sales or the Monthly Survey of Food Services. To collect deflators, Canadalevel price indexes are taken from surveys when they are available. In the case of manufacturing, the implicit price index is derived as the ratio of the nominal value to the real value. Deflators exist for nominal series back to January 2002, except for manufacturing. For manufacturing, the Index Produce Price Index (IPPI) by industry, IPPI by product group, and the ratio of nominal to real monthly GDP are used as projectors for manufacturing deflators.
Table number  Table title 

12100128  International merchandise trade, by commodity, price and volume indexes, monthly 
16100013  Real manufacturing sales, orders, inventory owned and inventory to sales ratio, 2012 dollars, seasonally adjusted 
16100047  Manufacturers' sales, inventories, orders and inventory to sales ratios, by industry (dollars unless otherwise noted) 
18100004  Consumer Price Index, monthly, not seasonally adjusted 
18100029  Industrial product price index, by major product group, monthly 
18100032  Industrial product price index, by industry, monthly 
20100003  Wholesale sales, price and volume, by industry, seasonally adjusted 
20100038  Retail trade, sales, chained dollars and price index, inactive 
20100051  Wholesale trade, sales, chained dollars and price index, inactive 
20100078  Retail sales, price, and volume, seasonally adjusted 
36100434  Gross domestic product (GDP) at basic prices, by industry, monthly 
Note: HS: harmonized system. Source: Statistics Canada, authors' compilation. 
To combine the provincial data and deflator data to create the input data set, there are 4 steps:
 Assemble data. The CODR tables are filtered to select the desired data. Only variables with continuous data are selected. Series subject to suppression are excluded, however series with 0 values are included. Series subject to suppression are typically smaller value series, meaning they contain less information for aggregate economic fluctuations. Although methods exist for interpolating these data points, should the suppression occur in the latest month, a forecast would be required to infill the suppressed data point. Given that the index is being constructed to provide information about the largest shock to affect the Canadian economy since World War 2, it is considered inadvisable to include series with forecasted values.
 Seasonally adjust data series. Not all series are provided on a seasonally adjusted basis. There are a total of 966 series seasonally adjusted for use in the indicator. Given the high number of series, the auto options of ARIMASEATS algorithm from the R package Seasonal is employed to remove seasonality. Seasonal parameters are determined based on the available monthly time series up to December 2019. In the event the time series do not span the whole period, such as discontinued series, the data up to the most recent period are used to determine the seasonal adjustment options.
 In order to ensure that the seasonally adjusted series are of good quality, they were validated by a range of quality measures. This includes checking for the presence of seasonality, the amount of stable seasonality present relative to the amount of moving seasonality (M7), the absence of seasonal effects in the irregular component, the smoothness of the seasonally adjusted series against its raw form and controlling the number of outliers auto detected by ARIMASEATS to a maximum of five. Series that were found to be nonseasonal (125 series) are kept in their raw form. Seasonally adjusted series with poor quality (183 series) are filtered out of the data set prior to estimation.
 Link data. After seasonal adjustment, data are linked as necessary. When overlapping periods exist, links are made by treating data as indexes and chaining backwards over historical periods. If overlapping periods do not exist, level values are joined “as is”.
 Apply deflators. Where necessary, deflators are applied to current dollar series or to price series to produce relative price variables.
The full data set prior to filtering contains province and territoryspecific seasonally adjusted and notseasonally adjusted series, nominal series and deflated series. For modelling provincial and territorial economic activity, series in natural units (e.g. employees, hours worked), deflated series, rates (e.g. the unemployment rate), and relative prices are selected.
3 Estimation
The objective is to estimate monthly time series for aggregate economic activity in the provinces and territories as a function of available monthly provincial and territorial economic time series:
$${y}_{monthly,t}=f\left(monthly/time/series\right)$$The most commonly cited measure of aggregate economic activity is the real GDP measure described in the 2008 System of National Accounts (United Nations 2010).^{Note } For the Canadawide level, the function for transforming input series into monthly real GDP is based on industryspecific methodologies and benchmarks that have been developed and enriched over time. The methodologies use a number of data sources to estimate changes in gross output that serve as proxies for real GDP. The proxies are combined with annual real GDP benchmarks to produce the monthly real GDP series. In many cases, a direct measure of gross output is available. However, in some industries, direct measures of output are not available and estimates are constructed using alternative data sources, such as employment (Statistics Canada 2020b).^{Note } This methodology forms the function $f(.)$ into which the monthly series are placed to produce monthly real GDP for Canada.
The challenge for measuring provincial and territorial aggregate economic activity is that the function $f(.)$ for the provinces and territories is unknown, and that the desired series, ${y}_{monthly,t}$ is also unknown.
Since the goal is to estimate ${y}_{monthly,t}$ , if a close substitute existed, it could be used as an instrument for the true ${y}_{monthly,t}$ . The function $f(.)$ for combining monthly information to produce an aggregate measure of economic activity could then be approximated. While a close monthly substitute for ${y}_{monthly,t}$ does not exist, the Canadian System of Macroeconomic Accounts does produce a measure of provincial and territorial GDP at an annual frequency. The approach followed here, therefore, assumes that the annual data can be used as an instrument to help inform the structure of $f(.)$ when the monthly growth rates from the available input series are averaged within a calendar year and used to estimate parameter values. The parameters of $f(.)$ and the variance characteristics of the monthly input series are then adjusted to account for the difference in periodicity. Monthly indexes of aggregate economic activity are then constructed from the estimated monthly values of ${y}_{monthly,t}$.
Using a lower frequency variable as the instrument for monthly economic activity lowers the number of degrees for freedom and introduces issues related to the timing of monthly versus annual fluctuations. These issues will have consequences for the ability of the models to produce a monthly estimate of aggregate economic activity. The small degrees of freedom and the covariance around the 2008 recession will tend to produce statistically insignificant parameters for $f(.)$, if the regressors are not importantly affected by business cycle fluctuations. Moreover, models may tend toward selecting a smaller number of inputs at an annual frequency than is necessary for explaining monthly variation as important monthly fluctuations may be masked through aggregation to a lower frequency. Additionally, changes related to prices, sales/output and employment will occur contemporaneously at an annual frequency. However, at a monthly frequency these fluctuations may not align.
Given that a conversion from annual to monthly frequency is necessary to generate the desired estimated values, the modelling strategy includes some approaches that err on capturing more variation in the data rather than focusing solely on model parsimony. This does not mean that models are produced in an adhoc fashion. Rather, selection criteria, such as maximizing an adjusted Rsquared, are employed alongside more traditional generaltospecifictype modelling strategies.
In summary, because the functional form for transforming monthly data into an aggregate measures of economic activity is unknown, and because the actual values for the series ${y}_{monthly,t}$ are also unknown, the best that is possible is to approximate the true ${y}_{monthly,t}$. This means that the series ${\widehat{y}}_{monthly,t}$ will have the flavor for what a real GDP series could look like, but it will not be a true measure of monthly real monthly GDP. Instead, it will be an estimate for an economic activity index which corresponds to macroeconomic conditions in the provinces and territories.
3.1 Estimation strategy
The function $f(.)$ is a set of instructions for transforming a large number inputs into a single series. In this paper, it is assumed that the function can be approximated based on a linear combination of inputs, and that the inputs can be selected either by selecting a subset of the available data or by creating a combination of all input data series. Below, three approaches are explored: 1) a simple model; 2) PCA; and 3) LASSO. The simple and the LASSO model fall in the former category while PCA falls in the latter category.
The assumptions and implementation of the models is discussed in detail below. Across modelling strategies, the following steps are followed to estimate index values in all cases:
 1: Prepare the input data.
The input data set has 1,341 series that are distributed unevenly across the provinces and territories (Table 3). Not all series have equal utility for modelling aggregate economic activity. In cases where seasonal adjustment failed, the series are removed. Similarly, series with 0 values are removed. These tend to be series where 0 values are interspersed with nominal values. In these cases, seasonally adjusted values can be negative, growth rate or logdifference transformations do not work, and the series have questionable value for use as an ongoing indicator of economic activity. Overall, 198 variables are dropped for these reasons.
Starting vectors  With 0  SA failed  Dropped  Top 15  Top 25  Vectors for LASSO 
Vectors for PCA 


Newfoundland and Labrador  108  3  8  8  15  25  83  73 
Prince Edward Island  101  12  15  20  12  20  65  57 
Nova Scotia  115  2  14  14  15  25  84  74 
New Brunswick  115  1  12  13  15  24  81  72 
Quebec  131  1  10  11  18  29  97  86 
Ontario  133  0  11  11  18  30  99  87 
Manitoba  120  1  6  6  17  28  94  83 
Saskatchewan  118  0  11  11  16  26  85  75 
Alberta  122  0  8  8  17  27  91  81 
British Columbia  122  0  8  8  17  28  92  81 
Yukon  63  22  20  26  6  10  31  27 
Northwest Territories  59  27  29  31  5  7  23  21 
Nunavut  46  29  29  32  2  4  11  9 
Total  1,353  98  181  199  173  283  936  826 
Notes: LASSO: least absolute shrinkage and selection operator; PCA: Principal components analysis; SA: Seasonal adjustment. Source: Statistics Canada, authors' compilation. 
The filtered input data set then contains 1,143 series. However, the series are typically reported in levels (e.g. hours worked or manufacturing sales in chained dollars) and present strong trends over the sample period. To account for the trends, the series are transformed to monthtomonth growth rates. These growth rates will ultimately be compared to measures of real GDP growth, and they have the advantage of being bound by 100% for the maximum decline.^{Note } The growth rates for the series are assumed to be covariance stationary.
To use the series in estimation, all series are demeaned and scaled to have unit variance at a monthly frequency. This normalization process is applied to variables to prevent variables with naturally larger unit values from affecting results. Because the monthly time series can be have high variability, and because periods of economic shocks such as recessions can produce aberrant data points, all series are winsorized (or top and bottom coded) prior to estimation. This prevents extreme data points from affecting results. For creating monthly indexes, the parameter values from models based on the winsorized data are combined with unwinsorized data which permits larger values to have their full influence when large shocks occur.
Finally, the noisiest series are removed. For PCA, the top 25% of series by variance are removed by province while for LASSO the top 15% of series by variance are removed. These thresholds are arbitrary, but their imposition was found to improve the ability of the models to inform on aggregate activity (i.e. improve the signal relative to noise), and to improve consistency of results across methods. After adjusting for high variance series there are a total of 928 input variables for LASSO and 820 for PCA. Nunavut has the fewest series available while Ontario and Quebec have the most.
 2: Using the winsorized growth rates, calculate the annual average of monthly growth rates for use in the models.
 3: Estimate the model parameters.
The estimation is initiated by combining real GDP growth with annual averages of monthly input series for years 2002 to 2018. Real GDP growth is not scaled or demeaned. Using real GDP growth as the target variable and the annual averages of monthly series as input variables, the functional form of $f(.)$ is estimated. In all models employed, it is assumed that a linear combination of input variables can be used to transform the multiplicity of input variables into a single measure of aggregate activity. In the case of the simple model and LASSO, a subset of the variables is used directly. In the case of PCA, the first ten principal components are employed as the starting point. It is also assumed that OLS can be used to generate contributions for combining input variables. By using regression methods to combine inputs, a further assumption that economic structures are, on average, the same over the entire sample period is imposed.
 4: Use the model to estimate monthly growth rates.
Since all approaches can be viewed as an OLS regression with demeaned inputs, the intercept can be interpreted as the average annual growth rate of real GDP between 2002 and 2018. The selected inputs (monthly time series or principal components) produce fluctuations around this average growth rate. For 2019 and 2020, it is assumed that the average growth rate from 2002 to 2018 is representative of underlying growth.
To produce monthly estimates, it is necessary to adjust parameter estimates or monthly series to account for the difference in periodicity. The model constant is adjusted to a monthly frequency based on the monthly compound growth rate that is equivalent to the annual estimate:
$${\widehat{\beta}}_{0,monthly}={({\widehat{\beta}}_{0,annual}+1)}^{1/12}1$$To estimate monthly fluctuations around the trend growth rate, the raw input series are used. This allows large fluctuations in the time series to present their full impact when economic shocks, such as recessions or commodity price cycles, impact provincial and territorial economies. The monthly inputs have their variances adjusted to match the annual variance prior to use. OLS regression estimates are based on the ratio of cov(x,y)/var(x). In the current context, aggregation through time reduces the variance of the X matrix. To account for this, the variance of the monthly data is rescaled to match the variance of the annual data for each series as:
$${\sigma}_{monthly,adjusted}=\left(\frac{\left(x{\mu}_{x}\right)}{{\sigma}_{x,monthly}}\right)*{\sigma}_{x,annual}+{\mu}_{x}$$ 5: Generate level indexes
The fitted values from the models are estimates for monthly growth in economic activity. They can be transformed into indexes by adding 1 to create a linking value for a chain index. The index level is then calculated by chaining forward from January 2002.
The growth rate estimates have a confidence interval associated with them as there is quantifiable uncertainty that comes from the model. There is also unquantifiable uncertainty that arises from possible model misspecification. To produce a level index, it is necessary to assume that the growth rate estimates are sufficiently accurate that they can be employed for chaining even though errors are compounded through time. This is a strong assumption, but is consistent with the way mean values from survey data for prices, values and quantities are combined to produce chainquantity or chainprice indexes.
The use of multiple models in step 3 will ultimately lead to different flavors of the activity index being presented. In the current context, where statistical models are being used to inform about economic activity in an environment where they cannot be optimally implemented, the creation of multiple versions of the activity index serves an important role. Because the true value for ${y}_{monthly,t}$ is unknown, validating $\widehat{f(.)}$ and ${\widehat{y}}_{monthly,t}$ is challenging. The outputs from the different methods provide a natural form of data confrontation which helps to gauge the adequacy and generalizability of the estimates.
3.1.1 Simple model
The simple model imposes the a priori assumption that total employment, total exports and total retail sales contain the appropriate information for understanding aggregate economic fluctuations. This is likely too strong an assumption as more than three inputs are needed to fully capture the complexities of aggregate economic activity. However, the model is consistent across all provinces and territories, and it is straightforward to understand. It, therefore, has value as a base against which more complex methods can be assessed. The simple approach also represents a method that can be viewed as consistent with the types of projectors that are used to infer gross output movements that are used as inputs for monthly GDP for Canada (Statistics Canada 2020b).
Since the series do not have a natural aggregation structure for combining them, regressions are used to determine relative contributions of the variables rather than an index number formula. Because the inputs are assumed to be the necessary inputs, the three series are included regardless of their statistical significance in regressions.
3.1.2 Principal components analysis
PCA is a variable reduction technique that aims to explain the variance of a given data set using a smaller number of principal components (OECD 2008, Jollife 2002). A data set with p variables $X=\left[{x}_{1},{x}_{1},\mathrm{...},{x}_{p}\right]$ , can be transformed to produce p principal components:
$$Z=AX$$where
$${z}_{1}={a}_{1,1}{x}_{1}+{a}_{1,2}{x}_{2}+\mathrm{..}+{a}_{1,p}{x}_{p}$$ $${z}_{2}={a}_{2,1}{x}_{1}+{a}_{2,2}{x}_{2}+\mathrm{..}+{a}_{2,p}{x}_{p}$$ $${z}_{p}={a}_{p,1}{x}_{1}+{a}_{p,2}{x}_{2}+\mathrm{..}+{a}_{p,p}{x}_{p}$$The principal components are constructed as linear combinations of the input variables (the monthly series). The first principal component explains the largest proportion of the variance of the input variables. It is the eigenvector associated with the largest eigenvalue. The second principal component is orthogonal (uncorrelated) with the first principal component and is the eigenvector associated with the second largest eigenvalue. It explains the second largest component of the input variables. It can, therefore, be said to measure a different statistical dimension of the available series. The third principal component is orthogonal to the first two and measures the third largest proportion of variance in the data. And so on until the ${p}^{th}$ principal component.
To implement PCA here, the principal components that are used for model estimation and the loadings are determined using the winsorized, demeaned, unit variance monthly time series. The loadings are then applied to the raw, unwinsorized series to produce the raw principal components that are used to predict the monthly growth rates.
When PCA works well, a large portion of the variance of a data set can be explained by the first few principal components, and only the first few principal components are used for analysis. Unfortunately, in the case of the input data set for the provincial and territorial economies, PCA does not work well for reducing the scope of information in the data set (Table 4). The first principal component typically accounts for less than 10% of the variation in the data set. And, when averaged to produce an annual frequency estimate, the first principal component does not correlate well with annual real GDP growth for most provinces and territories (Table 5).
Principal components  

1  2  3  4  5  6  7  8  9  10  
Newfoundland and Labrador  10.5  16.2  21.0  25.4  29.7  33.3  36.6  39.7  42.6  45.3 
Prince Edward Island  12.1  19.5  25.7  31.2  35.2  39.1  42.7  46.0  49.2  52.2 
Nova Scotia  9.0  15.2  20.6  24.9  28.5  32.0  35.2  38.3  41.1  43.8 
New Brunswick  9.6  16.4  21.3  25.8  29.8  33.6  37.3  40.4  43.4  46.3 
Quebec  7.0  13.0  17.8  21.9  25.5  28.7  31.6  34.3  37.0  39.7 
Ontario  8.9  14.6  19.3  23.1  26.4  29.6  32.5  35.3  37.8  40.2 
Manitoba  7.9  13.6  19.2  23.8  27.3  30.7  34.0  37.2  39.9  42.6 
Saskatchewan  8.6  14.5  19.7  24.4  28.7  32.0  35.1  38.0  40.9  43.6 
Alberta  10.5  16.4  21.5  25.9  29.5  33.0  36.1  39.0  41.7  44.3 
British Columbia  8.6  14.7  20.2  25.3  29.4  32.8  35.8  38.6  41.3  44.0 
Yukon  14.2  24.8  33.8  39.8  45.1  50.2  54.9  59.3  63.4  67.5 
Northwest Territories  17.8  30.6  37.6  44.3  50.3  56.1  61.4  66.6  71.4  76.1 
Nunavut  35.8  49.4  62.6  73.6  83.7  92.5  98.1  99.6  100.0  Note ...: not applicable 
... not applicable Source: Statistics Canada, authors' compilation. 
The correlations indicate that outside of Alberta, British Columbia and Ontario, using only the first principal component will not produce an activity index that provides a suitable measure for determining the performance of the provinces and territories based on aggregating monthtomonth fluctuations. As a result, an activity index based only on the first principal component, such as the one produced by the Federal Reserve Board of Chicago (Federal Reserve Board of Chicago 2020, Brave and Butters 2010, Evans and PhamKanter 2002), is not pursued here.
Winsorized index  UnWinsorized index  

Newfoundland and Labrador  0.349  0.256 
Prince Edward Island  0.199  0.307 
Nova Scotia  0.531  0.491 
New Brunswick  0.709  0.673 
Quebec  0.657  0.661 
Ontario  0.775  0.778 
Manitoba  0.562  0.567 
Saskatchewan  0.477  0.338 
Alberta  0.937  0.940 
British Columbia  0.840  0.848 
Yukon  0.177  0.251 
Northwest Territories  0.423  0.422 
Nunavut  0.355  0.444 
Source: Statistics Canada, authors' compilation. 
While the first principal component has difficulties correlating with annual fluctuations in real GDP, this does not mean there is no information in the first few principal components for explaining real GDP growth. Therefore, to generate a model based on the principal components, regressions are performed on all combinations of the first 10 principal components as regressors for explaining real GDP growth. The regression that maximizes the adjusted Rsquared is then selected as the preferred model. This produces 13 models that perform reasonably well for explaining real GDP growth. Moreover, the models generally do well for explaining the 2008 recession and other, province and territoryspecific fluctuations.
3.1.3 Least Absolute Shrinkage and Selection Operator
LASSO is the solution to a constrained optimization problem similar to OLS. Under classical linear regression, $X=\left[{x}_{1},{x}_{2},\mathrm{...},{x}_{p}\right]$ is a n x p matrix holding the predictor variables which are used to explain the variation in a target vector $y$ of length n. The coefficients for the regression $\widehat{\beta}=\left[\widehat{{\beta}_{0}},\mathrm{...},\widehat{{\beta}_{p}}\right]$ are then the solution to the problem that seeks to minimization the sum of the squared errors between $y$ and a linear combination of the variables in $X$ :
$${\widehat{\beta}}_{OLS}=\underset{\beta}{\text{argmin}}{\displaystyle \sum}_{i=1}^{n}{({y}_{i}{\beta}_{0}{\beta}_{1}{x}_{i1}\mathrm{...}{\beta}_{p}{x}_{ip})}^{2}$$LASSO (Tibshirani 1996) is one of a class of estimators that seeks to penalize the OLS estimator for over fitting (i.e. including too many variables) through its regulatory parameter $\lambda $ . It is similar to using an adjusted Rsquared or information criterion to penalize for including too many regressors. However, it goes further than penalizing for extra regressors when looking at model quality. It selects relevant variables. LASSO is the solution to:
$${\widehat{\beta}}_{LASSO}=\underset{\beta}{\text{argmin}}\left[{\displaystyle \sum}_{i=1}^{n}{({y}_{i}{\beta}_{0}{\beta}_{1}{x}_{i1}\mathrm{...}{\beta}_{p}{x}_{ip})}^{2}+\lambda {\displaystyle \sum}_{j=1}^{p}(\left{\beta}_{j}\right)\right]$$The $\lambda \ge 0$ parameter controls the strength of the penalty, the larger the value of lambda, the greater the amount of shrinkage. The LASSO algorithm is only permitted to include values for ${\beta}_{j}$ up to a particular absolute total. As a result, LASSO sets less consequential variable coefficients to 0. This can be viewed as similar to the type ore result found using a generaltospecific modelling strategy, but which is applicable on a larger scale. The result is a method for dealing with large data sets where a large number of predictors can be included, and the algorithm will select those whose covariance properties are most important for predicting the target variable ( $y$ ).
LASSO comes with its own limitations. In cases when groups of predictor variables are highly correlated with each other, LASSO tend to keep one variable from each group and shrink the coefficient of the other variables to zero. And in other cases, when the data set has small n and large p, LASSO selects at most n variables before it is saturated. However, there may be more than n variables with nonzero coefficient in the true model.
The Elastic Net method (Zou and Hastie 2005) is an extension of LASSO. By controlling the penalty weight $\alpha $ , the Elastic Net model stabilizes the variable selection from a group of correlated variables and removes the limitation on the number of variables selected. The coefficients are estimated as follows:
$${\widehat{\beta}}_{EN}\xc0=\underset{\beta}{\text{argmin}}\left\{{\displaystyle \sum}_{i=1}^{n}{({y}_{i}{\beta}_{0}{\beta}_{1}{x}_{i1}\mathrm{...}{\beta}_{p}{x}_{ip})}^{2}+\lambda {\displaystyle \sum}_{j=1}^{p}\left[\frac{1}{2}\left(1\alpha \right){\beta}_{j}^{2}+\alpha \left{\beta}_{j}\right\right]\right\}$$Where $0\le \alpha \le 1$ is the penalty weight. With $\alpha $ equal to 1, the Elastic Net is the same as the LASSO model and, with $\alpha $ close to 1, the Elastic Net behave similar to LASSO, but removes the problematic behavior caused by high correlations among variables.
The outputs of LASSO in terms of the number of variable selected and their statistical significant were carefully studied. In almost all cases where LASSO worked, LASSO seems to include variables in the model that are not statistical significant. To ensure that the relation between the target y and regressors is justifiable with a better statistical result, a step wise regression with backward selection is used on the variables selected by LASSO to remove nonsignificant variables from the model.
Step wise regression is a method that examines the statistical significant of each independent variable within the model. It builds a model by successively adding (forward selection) or removing (backward selection) variables based on the tstatistics of their estimated coefficients. The backward elimination method begins by including all variables in the model, then each variable is removed one at a time, to test its importance. Those variables that are not statistically significant are removed from the model.
The LASSO model did not select any variables for New Brunswick, Nova Scotia, Ontario and Northwest territories. The Elastic Net method is for used for these jurisdictions instead. And for the other two territories, Yukon and Nunavut, a manual step wise regression is performed.
In both methods, LASSO and Elastic Net, cross validation from the R package caret is used to tune parameters lambda and alpha. The cross validation uses a rolling forecasting origin technique (Hundman and Athanasopoulos 2014) instead of the simple random sampling. This technique is specific to time series data sets.
4 Monthly index assessment
The three approaches have different strengths and weaknesses, which affects their use (Table 6). The simple index and the LASSO index have the strength that their models are parsimonious, and the indexes they produce are less noisy than PCAbased indexes. However, these indexes are based on a greatly reduced set of variables, which for the simple indexes are often statistically insignificant in annual regressions. These indexes also tend to focus on employment series rather than a broad range of economic activities, and so may not present ideal predictors of monthly activity fluctuations if changes in production are not contemporaneously aligned with labour variables.
The PCA index has the strength that the methodology is sound and well understood. It works for all provinces and territories. However, it produces the noisiest activity indexes making them difficult to interpret, and in some cases (e.g., NL) the index can decline sharply. The PCA indexes are also combined based on maximizing the adjusted Rsquared across regressions. This produces a linear combination of principal components that are statistically significant and insignificant. These inclusions err on the side of adding additional information that includes some noise as it is not clear that data at an annual frequency represents monthtomonth variability.
Criteria  Simple index  PCA index  Weighted index  LASSO index 

Consistent inputs across geographies  Yes  No  No  No 
Consistent modeltypes across geographies  Yes  Yes  Yes  No 
Model specification  3 inputs, some insignificant variables  Variable number of principle components. Some insignificant variables  Combination of Simple and PCA  Variable input selection 
Model fit  Goodness of fit can vary across provinces and territories  Generally good insample fit  Improved insample fit compared to the simple or PCA indexes  Generally good insample fit 
Interpretability  Easy to understand inputs and contributions  Difficult to understand what contributes to changes Difficult to interpret principle components High variability indexes 
Difficult to understand what contributes to changes  Inputs based on correlations Interpretable contributions Low variance index 
Model suitability  Models can perform poorly based on statistical significance Inputs align with expectations about important variables 
Models can perform poorly based on statistical significance Comprehensive use of input data 
Inherits properties of input indexes  Modelling approach not well suited to current setup 
Notes: PCA: principle components analysis; LASSO: least absolute shrinkage and selection operator. Source: Statistics Canada, authors' compilation. 
Combining the indexes provides an additional method for their use. Since the simple index is relatively stable, but focuses on a limited number of fundamental series, and the PCA is more variable, but includes linear combinations of all inputs, these series are combined to produce a weighted index that has better characteristics than the components. As with the regression coefficients, annual real GDP growth is used as the comparison as it is the primary source of aggregate economic activity available for the provinces and territories. To combine the PCA index and simple index, values of nu between 1% and 100% are used to create weighted indexes as:
$$weighted\_index=\left(1\nu \right)*simple\_index+\nu *activity\_index$$The nu corresponding to the weighted index that has the highest correlation with real GDP growth is then selected.
The methods for generating monthly indexes generally return similar types of information on economic cycles and major economic shocks in the provinces and territories. As examples, the indexes for Alberta (Panel 1) and Newfoundland and Labrador (Panel 2), are presented below.
For larger economics, such as Alberta, all approaches return similar information on periods of growth or decline, but the magnitude of the cycles can differ depending on methodology. In general, the PCA based indexes have the largest variability while the simple index has the least. In some cases, such as the PCA index for Newfoundland and Labrador, the model fails to produce a reasonable result. In these cases, the index will not be made available, and is deemed notfitforuse. Nevertheless, when the indexes appear to have the appropriate characteristics, there is a strong correlation across measures for the implied economic activity, and the movement of the indexes through time corresponds with what is known about provincial and territorial economic performance.
Additionally, comparisons with subannual real GDP estimates for Ontario and Quebec show that the yeartoyear growth rates are highly correlated but that business cycles can be accentuated in the activity indexes. The indexes, therefore, appear to capture relevant information for economic cycles, periods of stronger or weaker growth and for understanding economic performance. They do not, however, correspond directly with real GDP, and should not be interpreted as a direct measure of real GDP.
Data table for Chart 1
Principle component based activity index  Simple index  Weighted index  LASSO based index  

index level  
01/01/2002  44.02  46.61  43.8  53.41 
01/02/2002  42.57  47.96  43.94  54.03 
01/03/2002  43.38  46.68  43.61  53.1 
01/04/2002  43.66  47.1  43.95  52.9 
01/05/2002  42.7  47.27  43.63  53.07 
01/06/2002  46.2  48.41  45.75  54.21 
01/07/2002  48.41  48.82  46.9  54.67 
01/08/2002  49.49  51.78  48.98  56.37 
01/09/2002  48.55  51.02  48.18  55.53 
01/10/2002  48.16  51.42  48.23  55.81 
01/11/2002  48.55  51.27  48.32  55.87 
01/12/2002  51.29  51.54  49.6  56.95 
01/01/2003  49.35  51.91  49.03  56.81 
01/02/2003  50.07  50.62  48.62  56.36 
01/03/2003  49.06  52.54  49.28  57.45 
01/04/2003  46.89  51.99  48.06  57.38 
01/05/2003  51.1  51.61  49.67  57.31 
01/06/2003  51.11  51.9  49.84  57.42 
01/07/2003  53.42  51.25  50.42  57.95 
01/08/2003  54.38  50.81  50.55  57.93 
01/09/2003  55.08  51.13  51.01  58.18 
01/10/2003  58.9  53.63  53.94  60.27 
01/11/2003  57.34  53.12  53.04  59.42 
01/12/2003  58.65  51.93  52.87  58.7 
01/01/2004  58.52  53.3  53.62  59.04 
01/02/2004  57.42  52.46  52.71  58.44 
01/03/2004  61.58  54.96  55.77  60.01 
01/04/2004  63.78  57.09  57.86  61.91 
01/05/2004  64.62  58.28  58.88  62.76 
01/06/2004  63.31  58.73  58.64  62.47 
01/07/2004  63.92  59.92  59.57  63.02 
01/08/2004  67.74  61.08  61.74  63.31 
01/09/2004  72.25  64.06  65.2  64.77 
01/10/2004  69.04  63.96  63.93  64.7 
01/11/2004  71.06  65.47  65.59  65.95 
01/12/2004  75.37  68.87  69.24  67.56 
01/01/2005  74.58  68.91  68.95  66.55 
01/02/2005  78.88  70.83  71.74  68.35 
01/03/2005  79.81  71.25  72.34  69.63 
01/04/2005  83.03  74.76  75.64  70.38 
01/05/2005  86.59  78.15  78.99  72.45 
01/06/2005  86.26  77.58  78.53  72.65 
01/07/2005  90.69  80.97  82.21  74.71 
01/08/2005  92.54  82.08  83.57  75.23 
01/09/2005  89.46  82.22  82.48  75.36 
01/10/2005  93.96  83.52  84.98  76 
01/11/2005  99.23  84.08  87.32  76.41 
01/12/2005  105.42  86.85  91.27  77.9 
01/01/2006  112.03  92.56  97.16  81.19 
01/02/2006  124.82  97.78  104.99  86.05 
01/03/2006  129.22  98.89  107.23  85.29 
01/04/2006  131.71  104.09  111.37  87.32 
01/05/2006  151.64  108.36  121.1  90.3 
01/06/2006  157.75  114.43  127.09  92.96 
01/07/2006  155.85  111.55  124.59  92.11 
01/08/2006  154.08  110.1  123.06  91.1 
01/09/2006  158.42  114.46  127.34  94.08 
01/10/2006  164.81  114.27  129.37  95.42 
01/11/2006  189.61  123.48  143.6  99.53 
01/12/2006  182.2  129.73  145.46  100.77 
01/01/2007  190.56  134.1  151.1  102.35 
01/02/2007  198.82  136.14  155.18  103.41 
01/03/2007  198.63  138.32  156.56  105.25 
01/04/2007  199.1  134.66  154.32  102.72 
01/05/2007  190.58  133.71  150.91  102.79 
01/06/2007  206.86  138.25  159.3  104.63 
01/07/2007  206.74  135.7  157.55  104.9 
01/08/2007  221.09  137.19  163.15  106.49 
01/09/2007  212.06  134.81  158.71  104.74 
01/10/2007  212.26  137.38  160.52  105.98 
01/11/2007  222.26  136.46  163.08  105.2 
01/12/2007  214.99  134.42  159.43  105.75 
01/01/2008  213.26  139.22  162.19  107.36 
01/02/2008  210.38  137.17  159.88  105.66 
01/03/2008  229.88  145.51  171.75  109.47 
01/04/2008  227.03  147.08  171.93  111.48 
01/05/2008  217.63  143.91  166.79  110.19 
01/06/2008  224.1  146.92  170.9  111.12 
01/07/2008  228.86  148.4  173.41  111.88 
01/08/2008  231.95  154.54  178.56  113.96 
01/09/2008  245.26  151.44  180.79  113.02 
01/10/2008  251.6  152.78  183.68  113.92 
01/11/2008  226.74  146.33  171.56  110.44 
01/12/2008  199.86  140.83  159.28  106.41 
01/01/2009  184.33  130.35  147.21  101.74 
01/02/2009  152.56  123.85  132.29  98.07 
01/03/2009  132.39  110.29  116.55  91.91 
01/04/2009  120.77  104.76  108.86  88.75 
01/05/2009  117.82  104.35  107.49  86.86 
01/06/2009  111.94  97.83  101.34  84.4 
01/07/2009  109.03  97.57  100.08  83.67 
01/08/2009  96.8  87.94  89.64  79.63 
01/09/2009  103.38  89.66  93.21  80.47 
01/10/2009  95.74  87.62  89.09  78.73 
01/11/2009  98.14  86.53  89.38  78.59 
01/12/2009  101.93  86.01  90.52  78.65 
01/01/2010  104.88  91.76  95.14  80.38 
01/02/2010  101.29  91.16  93.4  79.83 
01/03/2010  106.13  92.36  96  80.49 
01/04/2010  112.77  93.93  99.46  82.04 
01/05/2010  114.31  93.21  99.59  82.08 
01/06/2010  114.69  95.96  101.43  83.79 
01/07/2010  113.32  97.73  102.01  83.77 
01/08/2010  114.88  97.17  102.26  83.54 
01/09/2010  115.52  99.14  103.7  85.08 
01/10/2010  123.24  102.62  108.72  87.49 
01/11/2010  117.97  100.26  105.32  87.09 
01/12/2010  124.56  105.13  110.76  89.72 
01/01/2011  129.69  103.5  111.68  89.83 
01/02/2011  139.07  106.75  117.11  92.08 
01/03/2011  142.32  108.93  119.64  92.54 
01/04/2011  150.84  113.07  125.29  94.7 
01/05/2011  143.64  111.31  121.65  93.88 
01/06/2011  159.07  118.1  131.44  97.47 
01/07/2011  162.9  121.64  135.05  98.81 
01/08/2011  170.97  121.2  137.58  99.78 
01/09/2011  177.27  125.39  142.46  102.34 
01/10/2011  181.7  126.83  144.91  103.3 
01/11/2011  187.92  130.88  149.68  105.69 
01/12/2011  202.43  135.47  157.58  108.6 
01/01/2012  200.04  136.08  157.21  108.2 
01/02/2012  203.94  137.33  159.33  107.93 
01/03/2012  215.35  143.44  167.19  111.99 
01/04/2012  231.23  147.11  174.84  113.99 
01/05/2012  246.96  158.95  188  119.03 
01/06/2012  245.54  163.36  190.57  121.09 
01/07/2012  251.4  163.48  192.57  121.8 
01/08/2012  255.07  167.99  196.83  123.66 
01/09/2012  256.26  166.98  196.53  122.35 
01/10/2012  256.04  163.98  194.41  121.8 
01/11/2012  282.48  168.88  206.21  124.01 
01/12/2012  257.26  162.99  194.31  121.25 
01/01/2013  269.64  166.71  200.8  123.41 
01/02/2013  278.58  174.14  208.79  126.47 
01/03/2013  285.1  178.01  213.54  128.04 
01/04/2013  284.24  178.9  213.88  129.26 
01/05/2013  295.09  181.18  218.89  130.5 
01/06/2013  276.77  180.16  212.47  129.64 
01/07/2013  293.1  185.99  221.72  132.52 
01/08/2013  308.64  190.45  229.75  134.29 
01/09/2013  297.08  187.28  223.91  134.34 
01/10/2013  303.6  193.58  230.35  136.04 
01/11/2013  309.71  194.63  233.02  136.76 
01/12/2013  300.19  192.49  228.53  136.8 
01/01/2014  305.95  195.89  232.71  138.24 
01/02/2014  331.2  199.63  243.35  139.67 
01/03/2014  320  202.51  241.93  139.89 
01/04/2014  306.86  206.6  240.59  141.57 
01/05/2014  338.35  210.18  253.38  144.25 
01/06/2014  360.37  217.84  265.66  145.48 
01/07/2014  362.64  217.87  266.39  146.54 
01/08/2014  365.55  228.27  274.66  149.51 
01/09/2014  380.11  229.51  280.12  152.28 
01/10/2014  375.74  229.38  278.68  153.12 
01/11/2014  352.84  218.69  264.01  149.41 
01/12/2014  366.34  218.46  268.09  149.61 
01/01/2015  361.92  217.89  266.33  151.59 
01/02/2015  306.55  204.05  239.4  143.6 
01/03/2015  299.28  197.25  232.39  140.47 
01/04/2015  295.88  191.31  227.22  139.19 
01/05/2015  281.93  189.24  221.3  136.94 
01/06/2015  260.26  175.06  204.53  131.56 
01/07/2015  252.67  173.31  200.84  130.75 
01/08/2015  230.04  163.62  186.78  126.55 
01/09/2015  231.82  162.33  186.53  124.92 
01/10/2015  211.19  158.65  177.11  123.26 
01/11/2015  199.83  154.68  170.53  120.67 
01/12/2015  194.02  151.04  166.12  118.77 
01/01/2016  168.77  137.88  148.65  112.8 
01/02/2016  155.99  129.98  138.98  108.6 
01/03/2016  157.89  128.71  138.91  108.89 
01/04/2016  152.23  128.23  136.51  106.99 
01/05/2016  122.95  118.24  119.31  102.17 
01/06/2016  123.07  118.25  119.37  101.38 
01/07/2016  123.43  116.97  118.77  100.09 
01/08/2016  119.78  110.53  113.5  97.46 
01/09/2016  120.69  109.76  113.4  97.07 
01/10/2016  118.47  106.32  110.47  96.07 
01/11/2016  114.29  105.29  108.2  94.64 
01/12/2016  115.2  106.63  109.37  96.17 
01/01/2017  114.58  105.27  108.31  95.49 
01/02/2017  122.6  109.75  114.17  98.4 
01/03/2017  119.37  107.19  111.36  96.66 
01/04/2017  122.15  107.16  112.44  96.39 
01/05/2017  132.14  110.7  118.45  98.47 
01/06/2017  136.75  115.5  123.16  101.72 
01/07/2017  134.51  115.82  122.52  102.28 
01/08/2017  134.36  116.62  122.94  102.25 
01/09/2017  133.1  116.94  122.66  102.76 
01/10/2017  138.57  114.98  123.58  102.86 
01/11/2017  139.61  113.7  123.17  102.16 
01/12/2017  136.83  113.16  121.81  102.4 
01/01/2018  137.46  114.2  122.69  102.85 
01/02/2018  138.15  112.28  121.76  102.74 
01/03/2018  145.44  115.03  126.19  104.34 
01/04/2018  141.36  114.17  124.15  105.22 
01/05/2018  143.43  116.15  126.16  106.82 
01/06/2018  135.88  115.2  122.77  106.09 
01/07/2018  127.34  109.22  115.84  102.31 
01/08/2018  132.52  110.19  118.41  103.62 
01/09/2018  127.19  108.21  115.18  101.71 
01/10/2018  124.91  109.43  115.06  102.67 
01/11/2018  140.51  108.39  120.46  104.76 
01/12/2018  122.05  105.9  112.21  102.77 
01/01/2019  121.52  108.71  113.73  103.86 
01/02/2019  119.51  107.99  112.51  104.21 
01/03/2019  117.9  106.8  111.15  103.37 
01/04/2019  120.14  106.63  111.94  103.2 
01/05/2019  118.47  106.15  110.99  103.11 
01/06/2019  120.62  104.4  110.78  102.44 
01/07/2019  115.19  106.68  110.08  102.69 
01/08/2019  112.98  106.6  109.15  103.03 
01/09/2019  113.06  104.25  107.79  103.06 
01/10/2019  112.01  103.32  106.81  102.05 
01/11/2019  101.43  101.08  101.22  100.56 
01/12/2019  100  100  100  100 
01/01/2020  95.17  96.91  96.18  99.03 
01/02/2020  95.22  91.26  92.95  96.41 
01/03/2020  36.95  55.25  47.79  72.21 
Notes: LASSO = Least absolute shrinkage and selection operator. Source: Statistics Canada, authors' calculations. 
Data table for Chart 2
Principle component based activity index  Simple index  Weighted index  LASSO based index  

index level  
01/01/2002  5562.24  79.43  85.09  80.21 
01/02/2002  5224.48  82.88  87.45  78.31 
01/03/2002  4243.91  82.37  84.54  78.13 
01/04/2002  5626.24  84.65  90.65  80.77 
01/05/2002  6239.63  81.43  89.2  82.66 
01/06/2002  5657.11  81.77  88.28  83.57 
01/07/2002  6292.55  82.05  90.01  84.18 
01/08/2002  5814.76  82.49  89.4  83.92 
01/09/2002  6267.22  82.16  90.14  85.7 
01/10/2002  7667.45  82.73  93.69  89.83 
01/11/2002  7413.24  82.92  93.41  91.66 
01/12/2002  6900.72  82.58  92.12  90.72 
01/01/2003  6240.78  82.16  90.4  92.15 
01/02/2003  7319.34  81.15  91.79  93.04 
01/03/2003  8199.98  81.5  93.79  95.4 
01/04/2003  8855.62  80.74  94.17  91.16 
01/05/2003  4712.39  81.16  87.98  88.97 
01/06/2003  4942.04  81.57  89  91.22 
01/07/2003  6042.07  81.75  92.14  92.7 
01/08/2003  5297.57  82.13  90.8  90.71 
01/09/2003  5638.03  81.89  91.45  92.14 
01/10/2003  5705.55  82.33  92.03  89.2 
01/11/2003  6637.96  80.51  92.55  91.53 
01/12/2003  6354.64  80.75  92.2  92.96 
01/01/2004  6377.55  83.85  95.26  94.08 
01/02/2004  5984.01  83.32  93.86  93.11 
01/03/2004  5781.09  84.31  94.33  92.74 
01/04/2004  8976.38  83.1  101  93.32 
01/05/2004  1043.44  82.13  86.61  89.81 
01/06/2004  1157.32  84.15  89.84  92.31 
01/07/2004  919.54  83  86.03  92.46 
01/08/2004  1002.15  82.87  87.07  93.35 
01/09/2004  1046.54  82.96  87.73  93.51 
01/10/2004  956.18  83.13  86.75  91.98 
01/11/2004  1082.97  83.53  88.83  93.92 
01/12/2004  1096.23  82.95  88.47  93.98 
01/01/2005  1116.83  82.75  88.53  93.83 
01/02/2005  1076.08  83.74  88.95  95.46 
01/03/2005  1129  82.7  88.67  94.95 
01/04/2005  1016.3  83.22  87.82  93.02 
01/05/2005  1098.25  81.74  87.55  91.89 
01/06/2005  1176.3  82.06  88.78  92.55 
01/07/2005  1088.7  82.72  88.39  93.77 
01/08/2005  1129.19  82.43  88.62  94.73 
01/09/2005  1369.72  82.15  91.19  97.08 
01/10/2005  1143.89  81.95  88.75  96.92 
01/11/2005  1272.18  82.39  90.65  97.13 
01/12/2005  1155.15  82.72  89.71  96.55 
01/01/2006  1105.85  83.92  90.24  97.59 
01/02/2006  1022.72  83.8  89.11  96.3 
01/03/2006  1259.88  82.88  91.38  98.09 
01/04/2006  1598.84  85.67  97.68  100.91 
01/05/2006  1009.01  84.61  91.25  97.97 
01/06/2006  920.31  84.23  89.7  96.91 
01/07/2006  1062.74  84.44  91.97  99.3 
01/08/2006  1117.95  84.42  92.67  100.63 
01/09/2006  962.48  84.22  90.55  98.02 
01/10/2006  891.39  83.85  89.21  93.66 
01/11/2006  515.95  84.98  84.6  95.24 
01/12/2006  639.62  85.12  87.75  97.2 
01/01/2007  718.78  85.51  89.72  98.18 
01/02/2007  648.56  85.72  88.6  96 
01/03/2007  670.28  86.26  89.51  99.94 
01/04/2007  592.42  85.91  87.65  100.93 
01/05/2007  762.1  86.63  92.04  101.5 
01/06/2007  769.86  87.65  93.1  103.11 
01/07/2007  518.87  87.32  88.25  101.85 
01/08/2007  487.58  87.06  87.22  100.81 
01/09/2007  499.02  87.96  88.3  100.92 
01/10/2007  584.42  86.96  89.71  104.28 
01/11/2007  276.1  87.64  83.21  105.64 
01/12/2007  420.52  87.82  89.89  105.83 
01/01/2008  397.36  88.9  90.08  107.45 
01/02/2008  378.08  87.45  88.18  106.61 
01/03/2008  381.3  88.94  89.56  107.61 
01/04/2008  343.09  89.24  88.48  106.06 
01/05/2008  382.96  89.14  89.94  109 
01/06/2008  484.06  90.99  95.08  109.61 
01/07/2008  552.41  91.47  97.52  111.14 
01/08/2008  564.08  90.41  96.87  110.34 
01/09/2008  496.56  91.21  95.86  108.02 
01/10/2008  401.15  91.39  93.26  105.07 
01/11/2008  395.31  89.98  91.83  106.9 
01/12/2008  317.61  88.5  87.84  102.41 
01/01/2009  315.15  87.29  86.72  100.27 
01/02/2009  375.72  90.33  91.79  100.7 
01/03/2009  309.45  89.43  88.58  98.69 
01/04/2009  381  88.94  91.24  98.34 
01/05/2009  227.61  89.52  86.23  95.2 
01/06/2009  179.03  88.21  82.4  95.84 
01/07/2009  187.74  88.7  83.39  94.13 
01/08/2009  194.6  90.51  85.29  95.36 
01/09/2009  186.3  89.79  84.18  92.3 
01/10/2009  200.51  90.83  85.97  93.41 
01/11/2009  236.33  90.01  87.61  95.4 
01/12/2009  211.69  89.78  86.05  93.21 
01/01/2010  192.67  90.29  85.3  93.07 
01/02/2010  191.56  90.71  85.57  92.63 
01/03/2010  187.44  90.41  85.05  91.48 
01/04/2010  222.62  90.44  87.47  96.67 
01/05/2010  186.31  90.45  85.34  97.23 
01/06/2010  181.82  91.75  86.07  92.9 
01/07/2010  195.99  90.85  86.36  93.1 
01/08/2010  179.67  91.77  86.03  92.81 
01/09/2010  150.84  91.84  84.01  93.32 
01/10/2010  180.48  92.38  86.9  97.67 
01/11/2010  221.21  92.94  90.29  99.72 
01/12/2010  181.91  93.74  88.55  97.65 
01/01/2011  206.77  93.54  90.21  98.85 
01/02/2011  187.98  92.59  88.2  98.28 
01/03/2011  201  95.23  91.25  98.67 
01/04/2011  140.74  94.21  86.32  102.23 
01/05/2011  162.37  93.96  88.11  102.39 
01/06/2011  157.52  93.24  87.14  98.21 
01/07/2011  144.83  93.42  86.23  100.57 
01/08/2011  140.45  93.81  86.15  98.95 
01/09/2011  166.69  95.38  89.78  99.3 
01/10/2011  138.77  95.91  87.95  101.57 
01/11/2011  164.42  96.13  90.56  104.76 
01/12/2011  147.8  94.85  88.17  99.43 
01/01/2012  142.08  96.22  88.74  98.02 
01/02/2012  148.76  95.74  88.98  99.26 
01/03/2012  138.32  96.41  88.58  98.81 
01/04/2012  130.13  97.23  88.44  100.15 
01/05/2012  148.26  96.3  89.56  99.07 
01/06/2012  139.09  96.74  89.08  99.27 
01/07/2012  147.07  96.41  89.59  101.18 
01/08/2012  139.89  95.43  88.16  100.48 
01/09/2012  143.71  95.52  88.59  98.6 
01/10/2012  144.87  94.49  87.88  98.07 
01/11/2012  76.93  97.55  84.12  98.14 
01/12/2012  105.77  98.14  89.29  97.44 
01/01/2013  110.1  98.33  89.98  96.9 
01/02/2013  124.72  97.25  90.93  101.37 
01/03/2013  126.11  97.57  91.34  96.88 
01/04/2013  117.26  97.79  90.55  95.77 
01/05/2013  121.19  100.05  92.78  96.3 
01/06/2013  140.83  97.78  93.25  95.59 
01/07/2013  147.7  98.68  94.67  95.71 
01/08/2013  150.69  99.45  95.58  93.94 
01/09/2013  145.86  99.56  95.21  93.83 
01/10/2013  123.04  99.44  92.88  95.26 
01/11/2013  125.26  98.31  92.23  95.27 
01/12/2013  129.96  99.35  93.58  98.35 
01/01/2014  116.14  99.82  92.46  97.39 
01/02/2014  112.75  100.65  92.71  97.06 
01/03/2014  118.67  101.05  93.76  98.02 
01/04/2014  161  98.79  96.99  96.92 
01/05/2014  105.85  100.86  93.73  96.98 
01/06/2014  107.89  100.51  93.73  96.43 
01/07/2014  114.48  102.25  95.96  96.48 
01/08/2014  100.08  100.96  93.12  93.48 
01/09/2014  98.67  98.89  91.31  92.31 
01/10/2014  110.99  99.04  93.14  91.93 
01/11/2014  120.98  99.41  94.69  91.65 
01/12/2014  100.72  98.89  91.89  92.84 
01/01/2015  86.14  98.79  89.82  90.75 
01/02/2015  95.56  99.16  91.58  91.14 
01/03/2015  106.77  98.31  92.52  90.5 
01/04/2015  78.52  99.25  89.6  89.76 
01/05/2015  94.69  99.1  92.25  92.56 
01/06/2015  109.12  100.51  95.48  91.86 
01/07/2015  109.15  98.69  94.01  91.64 
01/08/2015  97.06  101.17  94.45  88.82 
01/09/2015  80.65  100.61  91.61  90.74 
01/10/2015  94.85  99.78  93.39  90.89 
01/11/2015  94.74  98.67  92.5  93.48 
01/12/2015  77.41  99.14  90.33  87.75 
01/01/2016  78.59  98.31  89.89  85.93 
01/02/2016  73.8  98.83  89.48  84.6 
01/03/2016  62.92  98.26  87.06  82.98 
01/04/2016  70.72  99.04  89.26  83.76 
01/05/2016  84.59  98.91  91.79  84.32 
01/06/2016  85.66  99.87  92.72  87.33 
01/07/2016  76.91  97.76  89.63  82.94 
01/08/2016  80.6  98.9  91.17  82.78 
01/09/2016  96.01  98.86  93.75  85.34 
01/10/2016  96.03  100.01  94.69  83.33 
01/11/2016  101.63  100.81  96.15  85.67 
01/12/2016  85.76  99.91  93.18  87.36 
01/01/2017  104.53  100.64  96.81  89.11 
01/02/2017  117.8  99.94  98.08  89.2 
01/03/2017  99.03  99.56  95.42  87.82 
01/04/2017  107.97  98.53  95.87  85.07 
01/05/2017  87.54  98.89  93.45  84.81 
01/06/2017  67.77  99.19  90.53  81.75 
01/07/2017  67.93  98.47  90  82.18 
01/08/2017  76.06  98.5  91.64  85 
01/09/2017  79.49  99.06  92.7  88.71 
01/10/2017  75.91  99.16  92.16  85.73 
01/11/2017  76.33  98.36  91.6  87.48 
01/12/2017  88.05  98.08  93.49  89.36 
01/01/2018  90.06  97.6  93.42  87.6 
01/02/2018  84.04  99.14  93.73  87.7 
01/03/2018  74.03  98.21  91.31  86.61 
01/04/2018  79.26  98.29  92.35  85.5 
01/05/2018  87.79  97.65  93.32  88.91 
01/06/2018  84.41  97.19  92.41  90.13 
01/07/2018  88.39  99.65  95.05  90.52 
01/08/2018  83.11  98.62  93.36  91.14 
01/09/2018  78.01  98.75  92.61  89.41 
01/10/2018  80.64  98.88  93.19  87.89 
01/11/2018  54.06  97.45  87.43  87.69 
01/12/2018  72.03  97.61  91.91  90.2 
01/01/2019  50.7  98.78  88.76  89 
01/02/2019  57.45  98.4  90.25  89.15 
01/03/2019  68.69  99.5  93.75  91.92 
01/04/2019  73.79  99.2  94.56  92.19 
01/05/2019  66.16  98.65  92.64  92.53 
01/06/2019  72.85  97.98  93.52  92.59 
01/07/2019  75.07  98.67  94.5  94.2 
01/08/2019  82.25  97.96  95.28  94.37 
01/09/2019  81.95  97.08  94.5  94.25 
01/10/2019  83.12  99.87  97.01  95.23 
01/11/2019  95.79  99.41  98.85  96.55 
01/12/2019  100  100  100  100 
01/01/2020  106.7  98.56  99.78  99.31 
01/02/2020  83.86  98.9  96.87  96.97 
01/03/2020  116.72  95.31  99.58  96.15 
Notes: LASSO = Least absolute shrinkage and selection operator. Source: Statistics Canada, authors' calculations. 
5 Conclusion
Measures of aggregate economic activity for economies are important for informing decisions about fiscal and monetary policy, for determining the characteristics of business cycles and for examining economic performance. In this study, four indexes of provincial and territorial economic activity based on different methodological approaches are estimated and presented. The methodologies are based on a simple model; PCA; a weighted combination of the simple index and the PCA index; and LASSO. In most cases, all approaches produce roughly similar results. However, the degree of cyclicality and the variance of monthtomonth changes can differ significantly. As a general rule, PCA produces the greatest variability and the largest cycles while the simple index is the most stable.
Based on the properties of the methodologies and their outputs, the simple index is the most consistent across provinces and territories. It is also the easiest to interpret in terms of variable contributions and justification for variable inclusion. However, parameter values are often statistically insignificant and the input series are chosen as much for their economic importance as for their presence in all jurisdictions. As a result, these models offer a more limited approach for examining aggregate economic activity, but also present a basis for comparisons to more complex models.
Indexes based on PCA appear to offer a more complete sense of how activity is evolving over time, but it is unclear at the moment how the principal components should be interpreted. Because of this, and because the PCA indexes have the largest variability, they present a tradeoff between overall use of input series and interpretability.
Weighting the simple index and PCA index produces a result that has a superior correlation with annual GDP fluctuations. The weighted combination continues to have more variability than the simple index. Since the PCA is included, it is also not as easy to interpret as the simple index, but likely provides a better measure of aggregate activity than its constituent parts.
The LASSO index performs well when compared to annual real GDP, but the model setup is not as well suited to the situation encountered when trying to estimate the activity indexes. In particular, the relatively small number of observations limits the ability of the algorithms to perform cross validation. Moreover, while the input series are a distinct subset of the input data set, and their contributions can be generated in a straightforward manner, there is no theoretical reason for why the variables are important, and this limits the model’s interpretability.
Given the strengths and weaknesses present between the suitability of the models, their performance and examinations of their outputs, the assessments made thus far suggest that the simple indexes or LASSO indexes present results related to a set of fundamental inputs (often heavily influenced by employment series), that the PCA indexes relate more to some form of shortterm activity (but the signal is noisy), and that the weighted index presents a compromise between the two.
The indexes as currently estimated are correlated with annual measures of real GDP and subannual measures of real GDP for Ontario and Quebec, but they should not be interpreted as being a real GDP measure. The indexes display greater variability and cyclicality that real GDP measures, and are constituted from measures of gross outputs, employment, relative prices and important ratios such as the unemployment rate. This makes the indexes appropriate for understanding economic activity, but they are not real GDP. Moreover, the indexes do not inform about differing levels of economic activity between provinces and territories.
The indexes are also based on an input data set and modelling strategies that are not ideal. Numerous assumptions must be imposed to produce the indexes, any of which may be a source of important measurement errors. As a consequence, the indexes presented here should be viewed as experimental, and are subject to revision or replacement as future research improves the processes and/or test the assumptions for their validity.
At the current time, the correlations between the different approaches, their positive correlation with provincially produced measures of subannual real GDP and examinations of their properties against known provincial and territorial economic performance supports their use as indicators of business cycles, for understanding the magnitude of shocks relative to a provinces’ or territory’s history and for understanding how regional economies are progressing. Interprovincial comparisons are also supported, but with the caveat that model performance is difficult to understand in all situations, and that level comparisons across provinces are not possible using the index values.
References
Brave, Scott., and R. Andrew Butters. 2010. “Chicago Fed National Activity Index Turns Ten — Analyzing Its First Decade of Performance.” Chicago Fed Letter, no. 273 (April). Federal Reserve Bank of Chicago. https://www.chicagofed.org/~/media/publications/chicagofedletter/2010/cflapril2010273pdf.pdf.
Statistics Canada 2020a.” Gross domestic product (GDP) at basic prices, by industry, monthly (36100434).” Statistics Canada. Statistics Canada, January 24, 2020. https://www150.statcan.gc.ca/n1/pub/13607x/2016001/230eng.htm (accessed June 2, 2020)
Statistics Canada 2020b.” Gross domestic product (GDP) at basic prices, by industry, monthly (36100434).” Statistics Canada. Statistics Canada, July 31, 2019. https://www.statcan.gc.ca/eng/statisticalprograms/document/1301_D1_V3 (accessed June 2, 2020)
Federal Reserve Board of Chicago 2020. “Chicago Fed National Activity Index (CFNAI) Current Data.” Federal Reserve Board of Chicago, June 22, 2020. https://www.chicagofed.org/research/data/cfnai/currentdata (accessed June 2, 2020).
Evans, Liu, Charles L., and Genevieve PhamKanter. 2002. “The 2001 Recession and the Chicago Fed National Activity Index: Identifying Business Cycle Turning Points.” Economic Perspectives 26 (3). Federal Reserve Bank of Chicago: 26 – 43. https://www.chicagofed.org/~/media/publications/economicperspectives/2002/3qepart2pdf.pdf.
Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on June 2, 2020
Jollife, I.T. 2002. Principle Components Analysis Second Edition. SpringerVerlag New York Inc, New York, NY.
United Nations (UN), European Commission (EC). International Monetary Fund (IMF), Organisation for Economic Cooperation and Development (OECD), and World Bank (WB). 2009. System of National Accounts, 2008. New York: United Nations. Available at: https://unstats.un.org/unsd/nationalaccount/docs/sna2008.pdf (accessed June 2, 2020).
Organisation for Economic Cooperation and Development (OECD). 2008. Handbook on Constructing Composite Indicators Methodology and User Guide. Organization for Economic Development. https://www.oecd.org/sdd/42495745.pdf (accessed June 2, 2020).
Tibshirani, Robert. 1996. “Regularization Shrinkage and Selection via the Lasso.” Journal of Royal Statistical Society: Series B.
Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301 – 20. doi:10.1111/j.14679868.2005.00503.x.
 Date modified: