Analytical Studies: Methods and References
Experimental Economic Activity Indexes for Canadian Provinces and Territories: Experimental Measures Based on Combinations of Monthly Time Series

by Nada Habli, Ryan Macdonald and Jesse Tweedle
Economic Analysis Division, Statistics Canada No. 027

Release date: August 19, 2020

Skip to text

Text begins

Acknowledgements

The authors would like to express their thanks to Etienne Saint-Pierre, Yvan Clermont and Danny Leung and Brenda Bugge for their support and guidance during the course of the project. They would also like to thank Steve Matthews, Fred Picard and Michel Ferland for their comments and input on modelling. They also wish to thank Philip Smith and Karen Wilson for their advice, suggestions and support.

Abstract

This paper explores methods for creating a monthly indicator of economic activity for the provinces and territories. It begins by constructing a dataset for the provinces and territories composed of monthly series about the labour force, including wages and employment; international trade; output measures such as manufacturing sales or electricity production; and, prices (consumer, housing and electricity). Where necessary, the series are seasonally adjusted, linked and deflated to create continuous time series from January 2002 to April 2020. Variable reduction methods are then applied to the monthly provincial and territorial dataset to create the experimental monthly provincial and territorial economic indicator indexes. Three methods are examined: Principal components analysis (PCA), Least absolute shrinkage and selection operator (LASSO) and a simple index comprised of three pre-determined series (total employment, total exports and retail sales). A weighted average of the simple index and the PCA index is also constructed. In general, the indexes produced track provincial economic activity reasonably well, following cyclical movements of provincial and territorial economies. However, the set-up for the models is not ideal as annual data are used to produce model parameters, and this leads to uncertainty about model performance. As a result, multiple indexes are reported. A quality assessment provides an indication of the strengths and limitations of the indexes with respect to different uses.

1 Introduction

Timely measures of economic activity are critical for understanding how economies perform, and for informing policy responses to macroeconomic fluctuations. The onset of the pandemic due to the emergence of the SARS-Cov-2 virus emphasized this, as well as the need for geography-specific measures. Presently, Canada has a robust system for producing up-to-date measures of activity, such as real gross domestic product (GDP), at the national level. For provincial and territorial economies, monthly information on labour markets or particular activities such as manufacturing or international trade are available, but a monthly measure of aggregate economic activity is not available.

Under normal circumstances, producing a new set of aggregate economic indicators for the provinces and territories would require creating exploratory measures, possibly launching new surveys or expansions of existing statistical collection activities, as well as creating the infrastructure necessary to produce and disseminate the indicators on an ongoing basis. These changes require time to implement. In the current context, where the SARS-Cov-2 pandemic is accentuating needs for monthly regional economic indicators, the time necessary to constitute a new statistical program to meet current requirements means that this approach is unfeasible.

A more timely approach is to adopt a statistical-model based strategy to quickly create exploratory measures of provincial economic activity. While this approach introduces a new measure of economic activity in a timely fashion, the trade-off is that the methods employed are not used in their ideal situations, that inputs for models use currently available series without an ability to tailor their uses to the creation of an indicator, and that the models are somewhat a-theoretic. That is, the models look for correlations in data rather than employing economic theory to help guide their construction. And, lastly, the models employed here typically have a different set of inputs for each province or territory. As a result, consistency in model structures cannot be maintained across provinces and territories, and this may affect inter-jurisdictional comparisons.Note 

To estimate the activity indexes, a twofold strategy is applied. First, a balanced panel data set of monthly provincial and territorial time series is constructed from publicly available Common Output Data Repository (CODR) tables. This data set spans January 2002 to the latest available data points (currently April 2020). Second, three methods for transforming the monthly series into an indicator of provincial and territorial economic activity are applied to the data set: a simple model, Principal components analysis (PCA) and Least absolute shrinkage and selection estimation (LASSO). A fourth index that combines the index from the simple model and the index based on PCA is also produced. The results from these approaches form the exploratory measures of provincial aggregate economic activity presented here.

The experimental indexes are based on the use of statistical models with an input data set that contains a series of approximations and assumptions. Notable among the assumptions for the input data set are: the use of national deflators to produce real provincial series when provincial deflators do not exist, the assumption that the growth rates for all series are covariance stationary, and the assumption that winsorizing data at the 5th and 95th percentiles is appropriate.

For the models, annual real GDP growth across the provinces and territories is used as a measure of aggregate provincial activity against which the derived index measures of economic activity are compared. This produces a situation where a small number of annual observations are being used to estimate model parameters that are used to infer monthly fluctuations. The small number of observations affects the ability of models to provide sufficient inference, and the use of annual data may mask important differences in the monthly timing of changes in prices, outputs and employment.

To allow for the possibility that using annual data may affect model performance and inference, the simple model assumes a set of variables is appropriate and uses OLS to determine their relative contributions. For PCA, a maximized adjusted R-squared from OLS regressions is used to select which of the first ten principal components should be included. In both of these situations the OLS regressions include variables that are statistically insignificant. For LASSO, statistical significance of potential input variables determines the final model. However, for LASSO, for some economies, no input variables are selected. In these cases, elastic net or a general-to-specific modelling strategy is employed instead.

Since the input data set contains a number of assumptions and approximations, and since the statistical models are used in an imperfect setting, the experimental indexes that are created should be viewed as approximations to aggregate economic activity rather than as exact measures. The activity indexes tend to present considerable monthly volatility, and when compared with the real GDP estimates produced by the Government of Quebec or the Government of Ontario, they can exhibit greater cyclical volatility.

Across the measures, the simple models and the LASSO models tend to rely more on employment series as inputs. The simple index has the strength that it is straightforward to understand, and is the most comparable across provinces and territories since it holds the variables used in the activity index constant regardless of the jurisdiction. The PCA indexes appear to capture more fluctuations related to other aspects of overall activity (e.g. sales or exports), but they also have greater variability. The weighted combination of the simple, researcher-defined indexes and the PCA indexes has a better correlation with real GDP growth than the constituent indexes.

Currently, the weighted indexes or the LASSO based indexes appear to offer the best trade-off between signals present in the data and variability of monthly series. This assessment is based on how well the models appear to conform with the set-up used for estimation as well as on the behavior of the indexes. In some cases, such as the PCA based index for Newfoundland and Labrador, an anomaly is present that calls the veracity of the index into question.  When these types of situations occur, the data are deemed to be unfit for use at the present time, and the estimates for that index are not provided.  The assessments are ongoing and may lead to changes in recommended uses or indexes as further development of the input data set, model refinements or alternative model strategies are explored.

The remainder of this paper is structured as follows. Section 2 discusses the creation of the input data set as well as the assumptions employed to filter and transform the data prior to modelling. Section 3 describes the models employed, the assumptions embedded in the models, as well as their strengths, their limitations and their application for creating monthly indexes. Section 4 provides analysis of the model performance and illustrates the resulting indexes. Section 5 concludes.

2 Input data set

The input data set is comprised of province- and territory-specific measures for economic activity and Canada-level deflators, except in instances where province- and territory-specific deflators are available. The monthly input series are comprised of monthly surveys for labour, outputs and prices (Table 1). In some cases, active tables do not contain continuous information from January, 2002 to the present. In these cases historical tables are used to backcast active tables.


Table 1
Input data tables of provincial and territorial time series
Table summary
This table displays the results of Input data tables of provincial and territorial time series. The information is grouped by Table number (appearing as row headers), Table title (appearing as column headers).
Table number Table title
12100099 Merchandise imports and exports, customs-based, by Harmonized commodity description and coding system (HS) section, Canada, provinces and territories, United States, states
12100119 International merchandise trade by province, commodity, and Principal Trading Partners
14100036 Actual hours worked by industry, monthly, unadjusted for seasonality
14100201 Employment by industry, monthly, unadjusted for seasonality
14100222 Employment, average hourly and weekly earnings (including overtime), and average weekly hours for the industrial aggregate excluding unclassified businesses, monthly, seasonally adjusted
14100287 Labour force characteristics, monthly, seasonally adjusted and trend-cycle, last 5 months
14100292 Labour force characteristics by territory, three-month moving average, seasonally adjusted and unadjusted, last 5 months
14100355 Employment by industry, monthly, seasonally adjusted and unadjusted, and trend-cycle, last 5 months
16100048 Manufacturing sales by industry and province, monthly (dollars unless otherwise noted)
18100004 Consumer Price Index, monthly, not seasonally adjusted
18100204 Electric power selling price index, monthly
18100205 New housing price index, monthly
20100008 Retail trade sales by province and territory
20100074 Wholesale trade, sales
21100019 Monthly survey of food services and drinking places
24100002 Number of vehicles travelling between Canada and the United States
25100001 Electric power statistics, with data for years 1950 - 2007
25100015 Electric power generation, monthly generation by type of electricity
34100003 Building permits, values by activity sector
34100066 Building permits, by type of structure and type of work
34100158 Canada Mortgage and Housing Corporation, housing starts, all areas, Canada and provinces, seasonally adjusted at annual rates, monthly

Deflators are primarily collected from Canada-level survey programs for measures of economic activity (Table 2). Statistics Canada does not currently produce province- and territory-specific deflators for current dollar measures of international trade, manufacturing sales, wholesale sales, retail sales or the Monthly Survey of Food Services. To collect deflators, Canada-level price indexes are taken from surveys when they are available. In the case of manufacturing, the implicit price index is derived as the ratio of the nominal value to the real value. Deflators exist for nominal series back to January 2002, except for manufacturing. For manufacturing, the Index Produce Price Index (IPPI) by industry, IPPI by product group, and the ratio of nominal to real monthly GDP are used as projectors for manufacturing deflators.


Table 2
Data tables for deflator time series
Table summary
This table displays the results of Data tables for deflator time series. The information is grouped by Table number (appearing as row headers), Table title (appearing as column headers).
Table number Table title
12100128 International merchandise trade, by commodity, price and volume indexes, monthly
16100013 Real manufacturing sales, orders, inventory owned and inventory to sales ratio, 2012 dollars, seasonally adjusted
16100047 Manufacturers' sales, inventories, orders and inventory to sales ratios, by industry (dollars unless otherwise noted)
18100004 Consumer Price Index, monthly, not seasonally adjusted
18100029 Industrial product price index, by major product group, monthly
18100032 Industrial product price index, by industry, monthly
20100003 Wholesale sales, price and volume, by industry, seasonally adjusted
20100038 Retail trade, sales, chained dollars and price index, inactive
20100051 Wholesale trade, sales, chained dollars and price index, inactive
20100078 Retail sales, price, and volume, seasonally adjusted
36100434 Gross domestic product (GDP) at basic prices, by industry, monthly

To combine the provincial data and deflator data to create the input data set, there are 4 steps:

  1. Assemble data. The CODR tables are filtered to select the desired data. Only variables with continuous data are selected. Series subject to suppression are excluded, however series with 0 values are included. Series subject to suppression are typically smaller value series, meaning they contain less information for aggregate economic fluctuations. Although methods exist for interpolating these data points, should the suppression occur in the latest month, a forecast would be required to infill the suppressed data point. Given that the index is being constructed to provide information about the largest shock to affect the Canadian economy since World War 2, it is considered inadvisable to include series with forecasted values.
  2. Seasonally adjust data series. Not all series are provided on a seasonally adjusted basis.  There are a total of 966 series seasonally adjusted for use in the indicator. Given the high number of series, the auto options of ARIMA-SEATS algorithm from the R package Seasonal is employed to remove seasonality. Seasonal parameters are determined based on the available monthly time series up to December 2019. In the event the time series do not span the whole period, such as discontinued series, the data up to the most recent period are used to determine the seasonal adjustment options.
  1. -Link data. After seasonal adjustment, data are linked as necessary. When overlapping periods exist, links are made by treating data as indexes and chaining backwards over historical periods. If overlapping periods do not exist, level values are joined “as is”.
  2. Apply deflators. Where necessary, deflators are applied to current dollar series or to price series to produce relative price variables.

The full data set prior to filtering contains province- and territory-specific seasonally adjusted and not-seasonally adjusted series, nominal series and deflated series. For modelling provincial and territorial economic activity, series in natural units (e.g. employees, hours worked), deflated series, rates (e.g. the unemployment rate), and relative prices are selected.

3 Estimation

The objective is to estimate monthly time series for aggregate economic activity in the provinces and territories as a function of available monthly provincial and territorial economic time series:

y m o n t h l y , t = f ( m o n t h l y / t i m e / s e r i e s ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaGcpeGaeyypa0Jaam Ozamaabmaapaqaa8qacaWGTbGaam4Baiaad6gacaWG0bGaamiAaiaa dYgacaWG5bGaai4laiaadshacaWGPbGaamyBaiaadwgacaGGVaGaam 4CaiaadwgacaWGYbGaamyAaiaadwgacaWGZbaacaGLOaGaayzkaaaa aa@54F9@

The most commonly cited measure of aggregate economic activity is the real GDP measure described in the 2008 System of National Accounts (United Nations 2010).Note  For the Canada-wide level, the function for transforming input series into monthly real GDP is based on industry-specific methodologies and benchmarks that have been developed and enriched over time. The methodologies use a number of data sources to estimate changes in gross output that serve as proxies for real GDP. The proxies are combined with annual real GDP benchmarks to produce the monthly real GDP series.  In many cases, a direct measure of gross output is available. However, in some industries, direct measures of output are not available and estimates are constructed using alternative data sources, such as employment (Statistics Canada 2020b).Note  This methodology forms the function f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ into which the monthly series are placed to produce monthly real GDP for Canada.

The challenge for measuring provincial and territorial aggregate economic activity is that the function f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ for the provinces and territories is unknown, and that the desired series, y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ is also unknown.

Since the goal is to estimate y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ , if a close substitute existed, it could be used as an instrument for the true y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ . The function f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ for combining monthly information to produce an aggregate measure of economic activity could then be approximated. While a close monthly substitute for y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ does not exist, the Canadian System of Macroeconomic Accounts does produce a measure of provincial and territorial GDP at an annual frequency. The approach followed here, therefore, assumes that the annual data can be used as an instrument to help inform the structure of f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ when the monthly growth rates from the available input series are averaged within a calendar year and used to estimate parameter values. The parameters of f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ and the variance characteristics of the monthly input series are then adjusted to account for the difference in periodicity. Monthly indexes of aggregate economic activity are then constructed from the estimated monthly values of y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ .

Using a lower frequency variable as the instrument for monthly economic activity lowers the number of degrees for freedom and introduces issues related to the timing of monthly versus annual fluctuations. These issues will have consequences for the ability of the models to produce a monthly estimate of aggregate economic activity. The small degrees of freedom and the covariance around the 2008 recession will tend to produce statistically insignificant parameters for f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ , if the regressors are not importantly affected by business cycle fluctuations. Moreover, models may tend toward selecting a smaller number of inputs at an annual frequency than is necessary for explaining monthly variation as important monthly fluctuations may be masked through aggregation to a lower frequency. Additionally, changes related to prices, sales/output and employment will occur contemporaneously at an annual frequency. However, at a monthly frequency these fluctuations may not align.

Given that a conversion from annual to monthly frequency is necessary to generate the desired estimated values, the modelling strategy includes some approaches that err on capturing more variation in the data rather than focusing solely on model parsimony. This does not mean that models are produced in an ad-hoc fashion. Rather, selection criteria, such as maximizing an adjusted R-squared, are employed alongside more traditional general-to-specific-type modelling strategies.

In summary, because the functional form for transforming monthly data into an aggregate measures of economic activity is unknown, and because the actual values for the series y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ are also unknown, the best that is possible is to approximate the true y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ . This means that the series y ^ m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qaceWG5bWdayaajaWaaSbaaSqaa8qacaWGTbGaam4Baiaad6gacaWG 0bGaamiAaiaadYgacaWG5bGaaiilaiaadshaa8aabeaaaaa@3FD6@ will have the flavor for what a real GDP series could look like, but it will not be a true measure of monthly real monthly GDP. Instead, it will be an estimate for an economic activity index which corresponds to macroeconomic conditions in the provinces and territories.

3.1 Estimation strategy

The function f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ is a set of instructions for transforming a large number inputs into a single series. In this paper, it is assumed that the function can be approximated based on a linear combination of inputs, and that the inputs can be selected either by selecting a subset of the available data or by creating a combination of all input data series. Below, three approaches are explored: 1) a simple model; 2) PCA; and 3) LASSO. The simple and the LASSO model fall in the former category while PCA falls in the latter category.

The assumptions and implementation of the models is discussed in detail below. Across modelling strategies, the following steps are followed to estimate index values in all cases:

The input data set has 1,341 series that are distributed unevenly across the provinces and territories (Table 3). Not all series have equal utility for modelling aggregate economic activity. In cases where seasonal adjustment failed, the series are removed. Similarly, series with 0 values are removed. These tend to be series where 0 values are interspersed with nominal values. In these cases, seasonally adjusted values can be negative, growth rate or log-difference transformations do not work, and the series have questionable value for use as an ongoing indicator of economic activity. Overall, 198 variables are dropped for these reasons.


Table 3
Number of input variables
Table summary
This table displays the results of Number of input variables. The information is grouped by (appearing as row headers), Starting vectors, With 0, SA failed, Dropped, Top 15, Top 25, Vectors for LASSO and Vectors for PCA (appearing as column headers).
Starting vectors With 0 SA failed Dropped Top 15 Top 25 Vectors
for LASSO
Vectors
for PCA
Newfoundland and Labrador 108 3 8 8 15 25 83 73
Prince Edward Island 101 12 15 20 12 20 65 57
Nova Scotia 115 2 14 14 15 25 84 74
New Brunswick 115 1 12 13 15 24 81 72
Quebec 131 1 10 11 18 29 97 86
Ontario 133 0 11 11 18 30 99 87
Manitoba 120 1 6 6 17 28 94 83
Saskatchewan 118 0 11 11 16 26 85 75
Alberta 122 0 8 8 17 27 91 81
British Columbia 122 0 8 8 17 28 92 81
Yukon 63 22 20 26 6 10 31 27
Northwest Territories 59 27 29 31 5 7 23 21
Nunavut 46 29 29 32 2 4 11 9
Total 1,353 98 181 199 173 283 936 826

The filtered input data set then contains 1,143 series. However, the series are typically reported in levels (e.g. hours worked or manufacturing sales in chained dollars) and present strong trends over the sample period. To account for the trends, the series are transformed to month-to-month growth rates. These growth rates will ultimately be compared to measures of real GDP growth, and they have the advantage of being bound by -100% for the maximum decline.Note  The growth rates for the series are assumed to be covariance stationary.

To use the series in estimation, all series are demeaned and scaled to have unit variance at a monthly frequency. This normalization process is applied to variables to prevent variables with naturally larger unit values from affecting results. Because the monthly time series can be have high variability, and because periods of economic shocks such as recessions can produce aberrant data points, all series are winsorized (or top and bottom coded) prior to estimation. This prevents extreme data points from affecting results. For creating monthly indexes, the parameter values from models based on the winsorized data are combined with un-winsorized data which permits larger values to have their full influence when large shocks occur.

Finally, the noisiest series are removed. For PCA, the top 25% of series by variance are removed by province while for LASSO the top 15% of series by variance are removed. These thresholds are arbitrary, but their imposition was found to improve the ability of the models to inform on aggregate activity (i.e. improve the signal relative to noise), and to improve consistency of results across methods. After adjusting for high variance series there are a total of 928 input variables for LASSO and 820 for PCA. Nunavut has the fewest series available while Ontario and Quebec have the most.

The estimation is initiated by combining real GDP growth with annual averages of monthly input series for years 2002 to 2018. Real GDP growth is not scaled or demeaned. Using real GDP growth as the target variable and the annual averages of monthly series as input variables, the functional form of f ( . ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGMbWaaeWaa8aabaWdbiaac6caaiaawIcacaGLPaaaaaa@395C@ is estimated. In all models employed, it is assumed that a linear combination of input variables can be used to transform the multiplicity of input variables into a single measure of aggregate activity. In the case of the simple model and LASSO, a subset of the variables is used directly. In the case of PCA, the first ten principal components are employed as the starting point. It is also assumed that OLS can be used to generate contributions for combining input variables. By using regression methods to combine inputs, a further assumption that economic structures are, on average, the same over the entire sample period is imposed.

Since all approaches can be viewed as an OLS regression with demeaned inputs, the intercept can be interpreted as the average annual growth rate of real GDP between 2002 and 2018. The selected inputs (monthly time series or principal components) produce fluctuations around this average growth rate. For 2019 and 2020, it is assumed that the average growth rate from 2002 to 2018 is representative of underlying growth.

To produce monthly estimates, it is necessary to adjust parameter estimates or monthly series to account for the difference in periodicity. The model constant is adjusted to a monthly frequency based on the monthly compound growth rate that is equivalent to the annual estimate:

β ^ 0 , m o n t h l y = ( β ^ 0 , a n n u a l + 1 ) 1 / 12 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacuaHYoGypaGbaKaadaWgaaWcbaWdbiaaicdacaGGSaGaamyBaiaa d+gacaWGUbGaamiDaiaadIgacaWGSbGaamyEaaWdaeqaaOWdbiabg2 da9iaacIcacuaHYoGypaGbaKaadaWgaaWcbaWdbiaaicdacaGGSaGa amyyaiaad6gacaWGUbGaamyDaiaadggacaWGSbaapaqabaGcpeGaey 4kaSIaaGymaiaacMcapaWaaWbaaSqabeaapeGaaGymaiaac+cacaaI XaGaaGOmaaaakiabgkHiTiaaigdaaaa@525F@

To estimate monthly fluctuations around the trend growth rate, the raw input series are used. This allows large fluctuations in the time series to present their full impact when economic shocks, such as recessions or commodity price cycles, impact provincial and territorial economies. The monthly inputs have their variances adjusted to match the annual variance prior to use. OLS regression estimates are based on the ratio of cov(x,y)/var(x). In the current context, aggregation through time reduces the variance of the X matrix. To account for this, the variance of the monthly data is re-scaled to match the variance of the annual data for each series as:

σ m o n t h l y , a d j u s t e d = ( ( x μ x ) σ x , m o n t h l y ) * σ x , a n n u a l + μ x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHdpWCpaWaaSbaaSqaa8qacaWGTbGaam4Baiaad6gacaWG0bGa amiAaiaadYgacaWG5bGaaiilaiaadggacaWGKbGaamOAaiaadwhaca WGZbGaamiDaiaadwgacaWGKbaapaqabaGcpeGaeyypa0ZaaeWaa8aa baWdbmaalaaapaqaa8qadaqadaWdaeaapeGaamiEaiabgkHiTiabeY 7aT9aadaWgaaWcbaWdbiaadIhaa8aabeaaaOWdbiaawIcacaGLPaaa a8aabaWdbiabeo8aZ9aadaWgaaWcbaWdbiaadIhacaGGSaGaamyBai aad+gacaWGUbGaamiDaiaadIgacaWGSbGaamyEaaWdaeqaaaaaaOWd biaawIcacaGLPaaacaGGQaGaeq4Wdm3damaaBaaaleaapeGaamiEai aacYcacaWGHbGaamOBaiaad6gacaWG1bGaamyyaiaadYgaa8aabeaa k8qacqGHRaWkcqaH8oqBpaWaaSbaaSqaa8qacaWG4baapaqabaaaaa@698D@

The fitted values from the models are estimates for monthly growth in economic activity. They can be transformed into indexes by adding 1 to create a linking value for a chain index. The index level is then calculated by chaining forward from January 2002.

The growth rate estimates have a confidence interval associated with them as there is quantifiable uncertainty that comes from the model. There is also unquantifiable uncertainty that arises from possible model misspecification. To produce a level index, it is necessary to assume that the growth rate estimates are sufficiently accurate that they can be employed for chaining even though errors are compounded through time. This is a strong assumption, but is consistent with the way mean values from survey data for prices, values and quantities are combined to produce chain-quantity or chain-price indexes.

The use of multiple models in step 3 will ultimately lead to different flavors of the activity index being presented. In the current context, where statistical models are being used to inform about economic activity in an environment where they cannot be optimally implemented, the creation of multiple versions of the activity index serves an important role. Because the true value for y m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5bWdamaaBaaaleaapeGaamyBaiaad+gacaWGUbGaamiDaiaa dIgacaWGSbGaamyEaiaacYcacaWG0baapaqabaaaaa@3FC6@ is unknown, validating f ( . ) ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaecaaeaaqa aaaaaaaaWdbiaadAgadaqadaWdaeaapeGaaiOlaaGaayjkaiaawMca aaWdaiaawkWaaaaa@3A2D@ and y ^ m o n t h l y , t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qaceWG5bWdayaajaWaaSbaaSqaa8qacaWGTbGaam4Baiaad6gacaWG 0bGaamiAaiaadYgacaWG5bGaaiilaiaadshaa8aabeaaaaa@3FD6@ is challenging. The outputs from the different methods provide a natural form of data confrontation which helps to gauge the adequacy and generalizability of the estimates.

3.1.1 Simple model

The simple model imposes the a priori assumption that total employment, total exports and total retail sales contain the appropriate information for understanding aggregate economic fluctuations. This is likely too strong an assumption as more than three inputs are needed to fully capture the complexities of aggregate economic activity. However, the model is consistent across all provinces and territories, and it is straightforward to understand. It, therefore, has value as a base against which more complex methods can be assessed. The simple approach also represents a method that can be viewed as consistent with the types of projectors that are used to infer gross output movements that are used as inputs for monthly GDP for Canada (Statistics Canada 2020b).

Since the series do not have a natural aggregation structure for combining them, regressions are used to determine relative contributions of the variables rather than an index number formula. Because the inputs are assumed to be the necessary inputs, the three series are included regardless of their statistical significance in regressions.

3.1.2 Principal components analysis

PCA is a variable reduction technique that aims to explain the variance of a given data set using a smaller number of principal components (OECD 2008, Jollife 2002). A data set with p variables X = [ x 1 , x 1 , ... , x p ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGybGaeyypa0ZaamWaa8aabaWdbiaadIhapaWaaSbaaSqaa8qa caaIXaaapaqabaGcpeGaaiilaiaadIhapaWaaSbaaSqaa8qacaaIXa aapaqabaGcpeGaaiilaiaac6cacaGGUaGaaiOlaiaacYcacaWG4bWd amaaBaaaleaapeGaamiCaaWdaeqaaaGcpeGaay5waiaaw2faaaaa@44EF@ , can be transformed to produce p principal components:

Z = A X MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGAbGaeyypa0JaamyqaiaadIfaaaa@399F@

where

z 1 = a 1 , 1 x 1 + a 1 , 2 x 2 + .. + a 1 , p x p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG6bWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabg2da9iaa dggapaWaaSbaaSqaa8qacaaIXaGaaiilaiaaigdaa8aabeaak8qaca WG4bWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabgUcaRiaadgga paWaaSbaaSqaa8qacaaIXaGaaiilaiaaikdaa8aabeaak8qacaWG4b WdamaaBaaaleaapeGaaGOmaaWdaeqaaOWdbiabgUcaRiaac6cacaGG UaGaey4kaSIaamyya8aadaWgaaWcbaWdbiaaigdacaGGSaGaamiCaa WdaeqaaOWdbiaadIhapaWaaSbaaSqaa8qacaWGWbaapaqabaaaaa@4EB5@ z 2 = a 2 , 1 x 1 + a 2 , 2 x 2 + .. + a 2 , p x p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG6bWdamaaBaaaleaapeGaaGOmaaWdaeqaaOWdbiabg2da9iaa dggapaWaaSbaaSqaa8qacaaIYaGaaiilaiaaigdaa8aabeaak8qaca WG4bWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabgUcaRiaadgga paWaaSbaaSqaa8qacaaIYaGaaiilaiaaikdaa8aabeaak8qacaWG4b WdamaaBaaaleaapeGaaGOmaaWdaeqaaOWdbiabgUcaRiaac6cacaGG UaGaey4kaSIaamyya8aadaWgaaWcbaWdbiaaikdacaGGSaGaamiCaa WdaeqaaOWdbiaadIhapaWaaSbaaSqaa8qacaWGWbaapaqabaaaaa@4EB9@ z p = a p , 1 x 1 + a p , 2 x 2 + .. + a p , p x p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG6bWdamaaBaaaleaapeGaamiCaaWdaeqaaOWdbiabg2da9iaa dggapaWaaSbaaSqaa8qacaWGWbGaaiilaiaaigdaa8aabeaak8qaca WG4bWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabgUcaRiaadgga paWaaSbaaSqaa8qacaWGWbGaaiilaiaaikdaa8aabeaak8qacaWG4b WdamaaBaaaleaapeGaaGOmaaWdaeqaaOWdbiabgUcaRiaac6cacaGG UaGaey4kaSIaamyya8aadaWgaaWcbaWdbiaadchacaGGSaGaamiCaa WdaeqaaOWdbiaadIhapaWaaSbaaSqaa8qacaWGWbaapaqabaaaaa@4F9D@

The principal components are constructed as linear combinations of the input variables (the monthly series). The first principal component explains the largest proportion of the variance of the input variables. It is the eigenvector associated with the largest eigenvalue. The second principal component is orthogonal (uncorrelated) with the first principal component and is the eigenvector associated with the second largest eigenvalue. It explains the second largest component of the input variables. It can, therefore, be said to measure a different statistical dimension of the available series. The third principal component is orthogonal to the first two and measures the third largest proportion of variance in the data. And so on until the p t h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGWbWdamaaCaaaleqabaWdbiaadshacaWGObaaaaaa@393E@ principal component.

To implement PCA here, the principal components that are used for model estimation and the loadings are determined using the winsorized, demeaned, unit variance monthly time series. The loadings are then applied to the raw, un-winsorized series to produce the raw principal components that are used to predict the monthly growth rates.

When PCA works well, a large portion of the variance of a data set can be explained by the first few principal components, and only the first few principal components are used for analysis. Unfortunately, in the case of the input data set for the provincial and territorial economies, PCA does not work well for reducing the scope of information in the data set (Table 4). The first principal component typically accounts for less than 10% of the variation in the data set. And, when averaged to produce an annual frequency estimate, the first principal component does not correlate well with annual real GDP growth for most provinces and territories (Table 5).


Table 4
Percent of variation by principle component
Table summary
This table displays the results of Percent of variation by principle component Principal components (appearing as column headers).
Principal components
1 2 3 4 5 6 7 8 9 10
Newfoundland and Labrador 10.5 16.2 21.0 25.4 29.7 33.3 36.6 39.7 42.6 45.3
Prince Edward Island 12.1 19.5 25.7 31.2 35.2 39.1 42.7 46.0 49.2 52.2
Nova Scotia 9.0 15.2 20.6 24.9 28.5 32.0 35.2 38.3 41.1 43.8
New Brunswick 9.6 16.4 21.3 25.8 29.8 33.6 37.3 40.4 43.4 46.3
Quebec 7.0 13.0 17.8 21.9 25.5 28.7 31.6 34.3 37.0 39.7
Ontario 8.9 14.6 19.3 23.1 26.4 29.6 32.5 35.3 37.8 40.2
Manitoba 7.9 13.6 19.2 23.8 27.3 30.7 34.0 37.2 39.9 42.6
Saskatchewan 8.6 14.5 19.7 24.4 28.7 32.0 35.1 38.0 40.9 43.6
Alberta 10.5 16.4 21.5 25.9 29.5 33.0 36.1 39.0 41.7 44.3
British Columbia 8.6 14.7 20.2 25.3 29.4 32.8 35.8 38.6 41.3 44.0
Yukon 14.2 24.8 33.8 39.8 45.1 50.2 54.9 59.3 63.4 67.5
Northwest Territories 17.8 30.6 37.6 44.3 50.3 56.1 61.4 66.6 71.4 76.1
Nunavut 35.8 49.4 62.6 73.6 83.7 92.5 98.1 99.6 100.0 Note ...: not applicable

The correlations indicate that outside of Alberta, British Columbia and Ontario, using only the first principal component will not produce an activity index that provides a suitable measure for determining the performance of the provinces and territories based on aggregating month-to-month fluctuations. As a result, an activity index based only on the first principal component, such as the one produced by the Federal Reserve Board of Chicago (Federal Reserve Board of Chicago 2020, Brave and Butters 2010, Evans and Pham-Kanter 2002), is not pursued here.


Table 5
Correlation between first principle component and annual real gross domestic product (GDP) growth
Table summary
This table displays the results of Correlation between first principle component and annual real gross domestic product (GDP) growth Winsorized index and Un-Winsorized index (appearing as column headers).
Winsorized index Un-Winsorized index
Newfoundland and Labrador -0.349 -0.256
Prince Edward Island -0.199 -0.307
Nova Scotia 0.531 0.491
New Brunswick -0.709 -0.673
Quebec 0.657 0.661
Ontario 0.775 0.778
Manitoba 0.562 0.567
Saskatchewan 0.477 0.338
Alberta 0.937 0.940
British Columbia 0.840 0.848
Yukon 0.177 0.251
Northwest Territories 0.423 0.422
Nunavut 0.355 0.444

While the first principal component has difficulties correlating with annual fluctuations in real GDP, this does not mean there is no information in the first few principal components for explaining real GDP growth. Therefore, to generate a model based on the principal components, regressions are performed on all combinations of the first 10 principal components as regressors for explaining real GDP growth. The regression that maximizes the adjusted R-squared is then selected as the preferred model. This produces 13 models that perform reasonably well for explaining real GDP growth. Moreover, the models generally do well for explaining the 2008 recession and other, province- and territory-specific fluctuations.

3.1.3 Least Absolute Shrinkage and Selection Operator

LASSO is the solution to a constrained optimization problem similar to OLS. Under classical linear regression, X = [ x 1 , x 2 , ... , x p ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGybGaeyypa0ZaamWaa8aabaWdbiaadIhapaWaaSbaaSqaa8qa caaIXaaapaqabaGcpeGaaiilaiaadIhapaWaaSbaaSqaa8qacaaIYa aapaqabaGcpeGaaiilaiaac6cacaGGUaGaaiOlaiaacYcacaWG4bWd amaaBaaaleaapeGaamiCaaWdaeqaaaGcpeGaay5waiaaw2faaaaa@44F0@ is a n x p matrix holding the predictor variables which are used to explain the variation in a target vector y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5baaaa@3715@ of length n. The coefficients for the regression β ^ = [ β 0 ^ , ... , β p ^ ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacuaHYoGypaGbaKaapeGaeyypa0ZaamWaa8aabaWaaecaaeaapeGa eqOSdi2damaaBaaaleaapeGaaGimaaWdaeqaaaGccaGLcmaapeGaai ilaiaac6cacaGGUaGaaiOlaiaacYcapaWaaecaaeaapeGaeqOSdi2d amaaBaaaleaapeGaamiCaaWdaeqaaaGccaGLcmaaa8qacaGLBbGaay zxaaaaaa@45F0@ are then the solution to the problem that seeks to minimization the sum of the squared errors between y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5baaaa@3715@ and a linear combination of the variables in X MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGybaaaa@36F4@ :

β ^ O L S | = argmin β i = 1 n ( y i β 0 β 1 x i 1 ... β p x i p ) 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacuaHYoGypaGbaKaadaWgaaWcbaWdbiaad+eacaWGmbGaam4uaaWd aeqaaOGaaiiFa8qacqGH9aqppaWaaCbeaeaapeGaaeyyaiaabkhaca qGNbGaaeyBaiaabMgacaqGUbaal8aabaWdbiabek7aIbWdaeqaaOWd bmaawahabeWcpaqaa8qacaWGPbGaeyypa0JaaGymaaWdaeaapeGaam OBaaqdpaqaa8qacqGHris5aaGccaGGOaGaamyEa8aadaWgaaWcbaWd biaadMgaa8aabeaak8qacqGHsislcqaHYoGypaWaaSbaaSqaa8qaca aIWaaapaqabaGcpeGaeyOeI0IaeqOSdi2damaaBaaaleaapeGaaGym aaWdaeqaaOWdbiaadIhapaWaaSbaaSqaa8qacaWGPbGaaGymaaWdae qaaOWdbiabgkHiTiaac6cacaGGUaGaaiOlaiabgkHiTiabek7aI9aa daWgaaWcbaWdbiaadchaa8aabeaak8qacaWG4bWdamaaBaaaleaape GaamyAaiaadchaa8aabeaak8qacaGGPaWdamaaCaaaleqabaWdbiaa ikdaaaaaaa@6485@

LASSO (Tibshirani 1996) is one of a class of estimators that seeks to penalize the OLS estimator for over fitting (i.e. including too many variables) through its regulatory parameter λ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaH7oaBaaa@37CB@ . It is similar to using an adjusted R-squared or information criterion to penalize for including too many regressors. However, it goes further than penalizing for extra regressors when looking at model quality. It selects relevant variables. LASSO is the solution to:

β ^ L A S S O | = argmin β [ i = 1 n ( y i β 0 β 1 x i 1 ... β p x i p ) 2 + λ j = 1 p ( | β j | ) ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacuaHYoGypaGbaKaadaWgaaWcbaWdbiaadYeacaWGbbGaam4uaiaa dofacaWGpbaapaqabaGccaGG8bWdbiabg2da98aadaWfqaqaa8qaca qGHbGaaeOCaiaabEgacaqGTbGaaeyAaiaab6gaaSWdaeaapeGaeqOS digapaqabaGcpeWaamWaa8aabaWdbmaawahabeWcpaqaa8qacaWGPb Gaeyypa0JaaGymaaWdaeaapeGaamOBaaqdpaqaa8qacqGHris5aaGc caGGOaGaamyEa8aadaWgaaWcbaWdbiaadMgaa8aabeaak8qacqGHsi slcqaHYoGypaWaaSbaaSqaa8qacaaIWaaapaqabaGcpeGaeyOeI0Ia eqOSdi2damaaBaaaleaapeGaaGymaaWdaeqaaOWdbiaadIhapaWaaS baaSqaa8qacaWGPbGaaGymaaWdaeqaaOWdbiabgkHiTiaac6cacaGG UaGaaiOlaiabgkHiTiabek7aI9aadaWgaaWcbaWdbiaadchaa8aabe aak8qacaWG4bWdamaaBaaaleaapeGaamyAaiaadchaa8aabeaak8qa caGGPaWdamaaCaaaleqabaWdbiaaikdaaaGccqGHRaWkcqaH7oaBda GfWbqabSWdaeaapeGaamOAaiabg2da9iaaigdaa8aabaWdbiaadcha a0WdaeaapeGaeyyeIuoaaOGaaiikamaaemaapaqaa8qacqaHYoGypa WaaSbaaSqaa8qacaWGQbaapaqabaaak8qacaGLhWUaayjcSdGaaiyk aaGaay5waiaaw2faaaaa@78C8@

The λ 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaH7oaBcqGHLjYScaaIWaaaaa@3A4B@ parameter controls the strength of the penalty, the larger the value of lambda, the greater the amount of shrinkage. The LASSO algorithm is only permitted to include values for β j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHYoGypaWaaSbaaSqaa8qacaWGQbaapaqabaaaaa@3901@ up to a particular absolute total. As a result, LASSO sets less consequential variable coefficients to 0. This can be viewed as similar to the type ore result found using a general-to-specific modelling strategy, but which is applicable on a larger scale. The result is a method for dealing with large data sets where a large number of predictors can be included, and the algorithm will select those whose covariance properties are most important for predicting the target variable ( y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG5baaaa@3715@ ).

LASSO comes with its own limitations. In cases when groups of predictor variables are highly correlated with each other, LASSO tend to keep one variable from each group and shrink the coefficient of the other variables to zero. And in other cases, when the data set has small n and large p, LASSO selects at most n variables before it is saturated. However, there may be more than n variables with non-zero coefficient in the true model.

The Elastic Net method (Zou and Hastie 2005) is an extension of LASSO. By controlling the penalty weight α MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHXoqyaaa@37B6@ , the Elastic Net model stabilizes the variable selection from a group of correlated variables and removes the limitation on the number of variables selected. The coefficients are estimated as follows:

β ^ E N À = argmin β { i = 1 n ( y i β 0 β 1 x i 1 ... β p x i p ) 2 + λ j = 1 p [ 1 2 ( 1 α ) β j 2 + α | β j | ] } MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacuaHYoGypaGbaKaadaWgaaWcbaWdbiaadweacaWGobaapaqabaGc caWGadWdbiabg2da98aadaWfqaqaa8qacaqGHbGaaeOCaiaabEgaca qGTbGaaeyAaiaab6gaaSWdaeaapeGaeqOSdigapaqabaGcpeWaaiWa a8aabaWdbmaawahabeWcpaqaa8qacaWGPbGaeyypa0JaaGymaaWdae aapeGaamOBaaqdpaqaa8qacqGHris5aaGccaGGOaGaamyEa8aadaWg aaWcbaWdbiaadMgaa8aabeaak8qacqGHsislcqaHYoGypaWaaSbaaS qaa8qacaaIWaaapaqabaGcpeGaeyOeI0IaeqOSdi2damaaBaaaleaa peGaaGymaaWdaeqaaOWdbiaadIhapaWaaSbaaSqaa8qacaWGPbGaaG ymaaWdaeqaaOWdbiabgkHiTiaac6cacaGGUaGaaiOlaiabgkHiTiab ek7aI9aadaWgaaWcbaWdbiaadchaa8aabeaak8qacaWG4bWdamaaBa aaleaapeGaamyAaiaadchaa8aabeaak8qacaGGPaWdamaaCaaaleqa baWdbiaaikdaaaGccqGHRaWkcqaH7oaBdaGfWbqabSWdaeaapeGaam OAaiabg2da9iaaigdaa8aabaWdbiaadchaa0WdaeaapeGaeyyeIuoa aOWaamWaa8aabaWdbmaalaaapaqaa8qacaaIXaaapaqaa8qacaaIYa aaamaabmaapaqaa8qacaaIXaGaeyOeI0IaeqySdegacaGLOaGaayzk aaGaeqOSdi2damaaDaaaleaapeGaamOAaaWdaeaapeGaaGOmaaaaki abgUcaRiabeg7aHnaaemaapaqaa8qacqaHYoGypaWaaSbaaSqaa8qa caWGQbaapaqabaaak8qacaGLhWUaayjcSdaacaGLBbGaayzxaaaaca GL7bGaayzFaaaaaa@847C@

Where 0 α 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaaIWaGaeyizImQaeqySdeMaeyizImQaaGymaaaa@3C95@ is the penalty weight. With α MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHXoqyaaa@37B6@ equal to 1, the Elastic Net is the same as the LASSO model and, with α MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHXoqyaaa@37B6@ close to 1, the Elastic Net behave similar to LASSO, but removes the problematic behavior caused by high correlations among variables.

The outputs of LASSO in terms of the number of variable selected and their statistical significant were carefully studied. In almost all cases where LASSO worked, LASSO seems to include variables in the model that are not statistical significant. To ensure that the relation between the target y and regressors is justifiable with a better statistical result, a step wise regression with backward selection is used on the variables selected by LASSO to remove non-significant variables from the model.

Step wise regression is a method that examines the statistical significant of each independent variable within the model. It builds a model by successively adding (forward selection) or removing (backward selection) variables based on the t-statistics of their estimated coefficients. The backward elimination method begins by including all variables in the model, then each variable is removed one at a time, to test its importance. Those variables that are not statistically significant are removed from the model.

The LASSO model did not select any variables for New Brunswick, Nova Scotia, Ontario and Northwest territories. The Elastic Net method is for used for these jurisdictions instead. And for the other two territories, Yukon and Nunavut, a manual step wise regression is performed.

In both methods, LASSO and Elastic Net, cross validation from the R package caret is used to tune parameters lambda and alpha. The cross validation uses a rolling forecasting origin technique (Hundman and Athanasopoulos 2014) instead of the simple random sampling. This technique is specific to time series data sets.

4 Monthly index assessment

The three approaches have different strengths and weaknesses, which affects their use (Table 6). The simple index and the LASSO index have the strength that their models are parsimonious, and the indexes they produce are less noisy than PCA-based indexes. However, these indexes are based on a greatly reduced set of variables, which for the simple indexes are often statistically insignificant in annual regressions. These indexes also tend to focus on employment series rather than a broad range of economic activities, and so may not present ideal predictors of monthly activity fluctuations if changes in production are not contemporaneously aligned with labour variables.

The PCA index has the strength that the methodology is sound and well understood. It works for all provinces and territories. However, it produces the noisiest activity indexes making them difficult to interpret, and in some cases (e.g., NL) the index can decline sharply. The PCA indexes are also combined based on maximizing the adjusted R-squared across regressions. This produces a linear combination of principal components that are statistically significant and insignificant. These inclusions err on the side of adding additional information that includes some noise as it is not clear that data at an annual frequency represents month-to-month variability.


Table 6
Characteristics of index estimation approaches
Table summary
This table displays the results of Characteristics of index estimation approaches. The information is grouped by Criteria (appearing as row headers), Simple index, PCA index, Weighted index and LASSO index (appearing as column headers).
Criteria Simple index PCA index Weighted index LASSO index
Consistent inputs across geographies Yes No No No
Consistent model-types across geographies Yes Yes Yes No
Model specification 3 inputs, some insignificant variables Variable number of principle components. Some insignificant variables Combination of Simple and PCA Variable input selection
Model fit Goodness of fit can vary across provinces and territories Generally good in-sample fit Improved in-sample fit compared to the simple or PCA indexes Generally good in-sample fit
Interpretability Easy to understand inputs and contributions Difficult to understand what contributes to changes

Difficult to interpret principle components

High variability indexes
Difficult to understand what contributes to changes Inputs based on correlations

Interpretable contributions

Low variance index
Model suitability Models can perform poorly based on statistical significance

Inputs align with expectations about important variables
Models can perform poorly based on statistical significance

Comprehensive use of input data
Inherits properties of input indexes Modelling approach not well suited to current set-up

Combining the indexes provides an additional method for their use. Since the simple index is relatively stable, but focuses on a limited number of fundamental series, and the PCA is more variable, but includes linear combinations of all inputs, these series are combined to produce a weighted index that has better characteristics than the components. As with the regression coefficients, annual real GDP growth is used as the comparison as it is the primary source of aggregate economic activity available for the provinces and territories. To combine the PCA index and simple index, values of nu between 1% and 100% are used to create weighted indexes as:

w e i g h t e d _ i n d e x = ( 1 ν ) * s i m p l e _ i n d e x + ν * a c t i v i t y _ i n d e x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWG3bGaamyzaiaadMgacaWGNbGaamiAaiaadshacaWGLbGaamiz aiaac+facaWGPbGaamOBaiaadsgacaWGLbGaamiEaiabg2da9maabm aapaqaa8qacaaIXaGaeyOeI0IaeqyVd4gacaGLOaGaayzkaaGaaiOk aiaadohacaWGPbGaamyBaiaadchacaWGSbGaamyzaiaac+facaWGPb GaamOBaiaadsgacaWGLbGaamiEaiabgUcaRiabe27aUjaacQcacaWG HbGaam4yaiaadshacaWGPbGaamODaiaadMgacaWG0bGaamyEaiaac+ facaWGPbGaamOBaiaadsgacaWGLbGaamiEaaaa@658D@

The nu corresponding to the weighted index that has the highest correlation with real GDP growth is then selected.

The methods for generating monthly indexes generally return similar types of information on economic cycles and major economic shocks in the provinces and territories. As examples, the indexes for Alberta (Panel 1) and Newfoundland and Labrador (Panel 2), are presented below.

For larger economics, such as Alberta, all approaches return similar information on periods of growth or decline, but the magnitude of the cycles can differ depending on methodology.  In general, the PCA based indexes have the largest variability while the simple index has the least.  In some cases, such as the PCA index for Newfoundland and Labrador, the model fails to produce a reasonable result. In these cases, the index will not be made available, and is deemed not-fit-for-use.  Nevertheless, when the indexes appear to have the appropriate characteristics, there is a strong correlation across measures for the implied economic activity, and the movement of the indexes through time corresponds with what is known about provincial and territorial economic performance.

Additionally, comparisons with sub-annual real GDP estimates for Ontario and Quebec show that the year-to-year growth rates are highly correlated but that business cycles can be accentuated in the activity indexes. The indexes, therefore, appear to capture relevant information for economic cycles, periods of stronger or weaker growth and for understanding economic performance. They do not, however, correspond directly with real GDP, and should not be interpreted as a direct measure of real GDP.

Chart 1

Data table for Chart 1 
Data table for Chart 1
Table summary
This table displays the results of Data table for Chart 1 Principle component based activity index, Simple index, Weighted index and LASSO based index, calculated using index level units of measure (appearing as column headers).
Principle component based activity index Simple index Weighted index LASSO based index
index level
01/01/2002 44.02 46.61 43.8 53.41
01/02/2002 42.57 47.96 43.94 54.03
01/03/2002 43.38 46.68 43.61 53.1
01/04/2002 43.66 47.1 43.95 52.9
01/05/2002 42.7 47.27 43.63 53.07
01/06/2002 46.2 48.41 45.75 54.21
01/07/2002 48.41 48.82 46.9 54.67
01/08/2002 49.49 51.78 48.98 56.37
01/09/2002 48.55 51.02 48.18 55.53
01/10/2002 48.16 51.42 48.23 55.81
01/11/2002 48.55 51.27 48.32 55.87
01/12/2002 51.29 51.54 49.6 56.95
01/01/2003 49.35 51.91 49.03 56.81
01/02/2003 50.07 50.62 48.62 56.36
01/03/2003 49.06 52.54 49.28 57.45
01/04/2003 46.89 51.99 48.06 57.38
01/05/2003 51.1 51.61 49.67 57.31
01/06/2003 51.11 51.9 49.84 57.42
01/07/2003 53.42 51.25 50.42 57.95
01/08/2003 54.38 50.81 50.55 57.93
01/09/2003 55.08 51.13 51.01 58.18
01/10/2003 58.9 53.63 53.94 60.27
01/11/2003 57.34 53.12 53.04 59.42
01/12/2003 58.65 51.93 52.87 58.7
01/01/2004 58.52 53.3 53.62 59.04
01/02/2004 57.42 52.46 52.71 58.44
01/03/2004 61.58 54.96 55.77 60.01
01/04/2004 63.78 57.09 57.86 61.91
01/05/2004 64.62 58.28 58.88 62.76
01/06/2004 63.31 58.73 58.64 62.47
01/07/2004 63.92 59.92 59.57 63.02
01/08/2004 67.74 61.08 61.74 63.31
01/09/2004 72.25 64.06 65.2 64.77
01/10/2004 69.04 63.96 63.93 64.7
01/11/2004 71.06 65.47 65.59 65.95
01/12/2004 75.37 68.87 69.24 67.56
01/01/2005 74.58 68.91 68.95 66.55
01/02/2005 78.88 70.83 71.74 68.35
01/03/2005 79.81 71.25 72.34 69.63
01/04/2005 83.03 74.76 75.64 70.38
01/05/2005 86.59 78.15 78.99 72.45
01/06/2005 86.26 77.58 78.53 72.65
01/07/2005 90.69 80.97 82.21 74.71
01/08/2005 92.54 82.08 83.57 75.23
01/09/2005 89.46 82.22 82.48 75.36
01/10/2005 93.96 83.52 84.98 76
01/11/2005 99.23 84.08 87.32 76.41
01/12/2005 105.42 86.85 91.27 77.9
01/01/2006 112.03 92.56 97.16 81.19
01/02/2006 124.82 97.78 104.99 86.05
01/03/2006 129.22 98.89 107.23 85.29
01/04/2006 131.71 104.09 111.37 87.32
01/05/2006 151.64 108.36 121.1 90.3
01/06/2006 157.75 114.43 127.09 92.96
01/07/2006 155.85 111.55 124.59 92.11
01/08/2006 154.08 110.1 123.06 91.1
01/09/2006 158.42 114.46 127.34 94.08
01/10/2006 164.81 114.27 129.37 95.42
01/11/2006 189.61 123.48 143.6 99.53
01/12/2006 182.2 129.73 145.46 100.77
01/01/2007 190.56 134.1 151.1 102.35
01/02/2007 198.82 136.14 155.18 103.41
01/03/2007 198.63 138.32 156.56 105.25
01/04/2007 199.1 134.66 154.32 102.72
01/05/2007 190.58 133.71 150.91 102.79
01/06/2007 206.86 138.25 159.3 104.63
01/07/2007 206.74 135.7 157.55 104.9
01/08/2007 221.09 137.19 163.15 106.49
01/09/2007 212.06 134.81 158.71 104.74
01/10/2007 212.26 137.38 160.52 105.98
01/11/2007 222.26 136.46 163.08 105.2
01/12/2007 214.99 134.42 159.43 105.75
01/01/2008 213.26 139.22 162.19 107.36
01/02/2008 210.38 137.17 159.88 105.66
01/03/2008 229.88 145.51 171.75 109.47
01/04/2008 227.03 147.08 171.93 111.48
01/05/2008 217.63 143.91 166.79 110.19
01/06/2008 224.1 146.92 170.9 111.12
01/07/2008 228.86 148.4 173.41 111.88
01/08/2008 231.95 154.54 178.56 113.96
01/09/2008 245.26 151.44 180.79 113.02
01/10/2008 251.6 152.78 183.68 113.92
01/11/2008 226.74 146.33 171.56 110.44
01/12/2008 199.86 140.83 159.28 106.41
01/01/2009 184.33 130.35 147.21 101.74
01/02/2009 152.56 123.85 132.29 98.07
01/03/2009 132.39 110.29 116.55 91.91
01/04/2009 120.77 104.76 108.86 88.75
01/05/2009 117.82 104.35 107.49 86.86
01/06/2009 111.94 97.83 101.34 84.4
01/07/2009 109.03 97.57 100.08 83.67
01/08/2009 96.8 87.94 89.64 79.63
01/09/2009 103.38 89.66 93.21 80.47
01/10/2009 95.74 87.62 89.09 78.73
01/11/2009 98.14 86.53 89.38 78.59
01/12/2009 101.93 86.01 90.52 78.65
01/01/2010 104.88 91.76 95.14 80.38
01/02/2010 101.29 91.16 93.4 79.83
01/03/2010 106.13 92.36 96 80.49
01/04/2010 112.77 93.93 99.46 82.04
01/05/2010 114.31 93.21 99.59 82.08
01/06/2010 114.69 95.96 101.43 83.79
01/07/2010 113.32 97.73 102.01 83.77
01/08/2010 114.88 97.17 102.26 83.54
01/09/2010 115.52 99.14 103.7 85.08
01/10/2010 123.24 102.62 108.72 87.49
01/11/2010 117.97 100.26 105.32 87.09
01/12/2010 124.56 105.13 110.76 89.72
01/01/2011 129.69 103.5 111.68 89.83
01/02/2011 139.07 106.75 117.11 92.08
01/03/2011 142.32 108.93 119.64 92.54
01/04/2011 150.84 113.07 125.29 94.7
01/05/2011 143.64 111.31 121.65 93.88
01/06/2011 159.07 118.1 131.44 97.47
01/07/2011 162.9 121.64 135.05 98.81
01/08/2011 170.97 121.2 137.58 99.78
01/09/2011 177.27 125.39 142.46 102.34
01/10/2011 181.7 126.83 144.91 103.3
01/11/2011 187.92 130.88 149.68 105.69
01/12/2011 202.43 135.47 157.58 108.6
01/01/2012 200.04 136.08 157.21 108.2
01/02/2012 203.94 137.33 159.33 107.93
01/03/2012 215.35 143.44 167.19 111.99
01/04/2012 231.23 147.11 174.84 113.99
01/05/2012 246.96 158.95 188 119.03
01/06/2012 245.54 163.36 190.57 121.09
01/07/2012 251.4 163.48 192.57 121.8
01/08/2012 255.07 167.99 196.83 123.66
01/09/2012 256.26 166.98 196.53 122.35
01/10/2012 256.04 163.98 194.41 121.8
01/11/2012 282.48 168.88 206.21 124.01
01/12/2012 257.26 162.99 194.31 121.25
01/01/2013 269.64 166.71 200.8 123.41
01/02/2013 278.58 174.14 208.79 126.47
01/03/2013 285.1 178.01 213.54 128.04
01/04/2013 284.24 178.9 213.88 129.26
01/05/2013 295.09 181.18 218.89 130.5
01/06/2013 276.77 180.16 212.47 129.64
01/07/2013 293.1 185.99 221.72 132.52
01/08/2013 308.64 190.45 229.75 134.29
01/09/2013 297.08 187.28 223.91 134.34
01/10/2013 303.6 193.58 230.35 136.04
01/11/2013 309.71 194.63 233.02 136.76
01/12/2013 300.19 192.49 228.53 136.8
01/01/2014 305.95 195.89 232.71 138.24
01/02/2014 331.2 199.63 243.35 139.67
01/03/2014 320 202.51 241.93 139.89
01/04/2014 306.86 206.6 240.59 141.57
01/05/2014 338.35 210.18 253.38 144.25
01/06/2014 360.37 217.84 265.66 145.48
01/07/2014 362.64 217.87 266.39 146.54
01/08/2014 365.55 228.27 274.66 149.51
01/09/2014 380.11 229.51 280.12 152.28
01/10/2014 375.74 229.38 278.68 153.12
01/11/2014 352.84 218.69 264.01 149.41
01/12/2014 366.34 218.46 268.09 149.61
01/01/2015 361.92 217.89 266.33 151.59
01/02/2015 306.55 204.05 239.4 143.6
01/03/2015 299.28 197.25 232.39 140.47
01/04/2015 295.88 191.31 227.22 139.19
01/05/2015 281.93 189.24 221.3 136.94
01/06/2015 260.26 175.06 204.53 131.56
01/07/2015 252.67 173.31 200.84 130.75
01/08/2015 230.04 163.62 186.78 126.55
01/09/2015 231.82 162.33 186.53 124.92
01/10/2015 211.19 158.65 177.11 123.26
01/11/2015 199.83 154.68 170.53 120.67
01/12/2015 194.02 151.04 166.12 118.77
01/01/2016 168.77 137.88 148.65 112.8
01/02/2016 155.99 129.98 138.98 108.6
01/03/2016 157.89 128.71 138.91 108.89
01/04/2016 152.23 128.23 136.51 106.99
01/05/2016 122.95 118.24 119.31 102.17
01/06/2016 123.07 118.25 119.37 101.38
01/07/2016 123.43 116.97 118.77 100.09
01/08/2016 119.78 110.53 113.5 97.46
01/09/2016 120.69 109.76 113.4 97.07
01/10/2016 118.47 106.32 110.47 96.07
01/11/2016 114.29 105.29 108.2 94.64
01/12/2016 115.2 106.63 109.37 96.17
01/01/2017 114.58 105.27 108.31 95.49
01/02/2017 122.6 109.75 114.17 98.4
01/03/2017 119.37 107.19 111.36 96.66
01/04/2017 122.15 107.16 112.44 96.39
01/05/2017 132.14 110.7 118.45 98.47
01/06/2017 136.75 115.5 123.16 101.72
01/07/2017 134.51 115.82 122.52 102.28
01/08/2017 134.36 116.62 122.94 102.25
01/09/2017 133.1 116.94 122.66 102.76
01/10/2017 138.57 114.98 123.58 102.86
01/11/2017 139.61 113.7 123.17 102.16
01/12/2017 136.83 113.16 121.81 102.4
01/01/2018 137.46 114.2 122.69 102.85
01/02/2018 138.15 112.28 121.76 102.74
01/03/2018 145.44 115.03 126.19 104.34
01/04/2018 141.36 114.17 124.15 105.22
01/05/2018 143.43 116.15 126.16 106.82
01/06/2018 135.88 115.2 122.77 106.09
01/07/2018 127.34 109.22 115.84 102.31
01/08/2018 132.52 110.19 118.41 103.62
01/09/2018 127.19 108.21 115.18 101.71
01/10/2018 124.91 109.43 115.06 102.67
01/11/2018 140.51 108.39 120.46 104.76
01/12/2018 122.05 105.9 112.21 102.77
01/01/2019 121.52 108.71 113.73 103.86
01/02/2019 119.51 107.99 112.51 104.21
01/03/2019 117.9 106.8 111.15 103.37
01/04/2019 120.14 106.63 111.94 103.2
01/05/2019 118.47 106.15 110.99 103.11
01/06/2019 120.62 104.4 110.78 102.44
01/07/2019 115.19 106.68 110.08 102.69
01/08/2019 112.98 106.6 109.15 103.03
01/09/2019 113.06 104.25 107.79 103.06
01/10/2019 112.01 103.32 106.81 102.05
01/11/2019 101.43 101.08 101.22 100.56
01/12/2019 100 100 100 100
01/01/2020 95.17 96.91 96.18 99.03
01/02/2020 95.22 91.26 92.95 96.41
01/03/2020 36.95 55.25 47.79 72.21

Chart 2

Data table for Chart 2 
Data table for Chart 2
Table summary
This table displays the results of Data table for Chart 2 Principle component based activity index, Simple index, Weighted index and LASSO based index, calculated using index level units of measure (appearing as column headers).
Principle component based activity index Simple index Weighted index LASSO based index
index level
01/01/2002 5562.24 79.43 85.09 80.21
01/02/2002 5224.48 82.88 87.45 78.31
01/03/2002 4243.91 82.37 84.54 78.13
01/04/2002 5626.24 84.65 90.65 80.77
01/05/2002 6239.63 81.43 89.2 82.66
01/06/2002 5657.11 81.77 88.28 83.57
01/07/2002 6292.55 82.05 90.01 84.18
01/08/2002 5814.76 82.49 89.4 83.92
01/09/2002 6267.22 82.16 90.14 85.7
01/10/2002 7667.45 82.73 93.69 89.83
01/11/2002 7413.24 82.92 93.41 91.66
01/12/2002 6900.72 82.58 92.12 90.72
01/01/2003 6240.78 82.16 90.4 92.15
01/02/2003 7319.34 81.15 91.79 93.04
01/03/2003 8199.98 81.5 93.79 95.4
01/04/2003 8855.62 80.74 94.17 91.16
01/05/2003 4712.39 81.16 87.98 88.97
01/06/2003 4942.04 81.57 89 91.22
01/07/2003 6042.07 81.75 92.14 92.7
01/08/2003 5297.57 82.13 90.8 90.71
01/09/2003 5638.03 81.89 91.45 92.14
01/10/2003 5705.55 82.33 92.03 89.2
01/11/2003 6637.96 80.51 92.55 91.53
01/12/2003 6354.64 80.75 92.2 92.96
01/01/2004 6377.55 83.85 95.26 94.08
01/02/2004 5984.01 83.32 93.86 93.11
01/03/2004 5781.09 84.31 94.33 92.74
01/04/2004 8976.38 83.1 101 93.32
01/05/2004 1043.44 82.13 86.61 89.81
01/06/2004 1157.32 84.15 89.84 92.31
01/07/2004 919.54 83 86.03 92.46
01/08/2004 1002.15 82.87 87.07 93.35
01/09/2004 1046.54 82.96 87.73 93.51
01/10/2004 956.18 83.13 86.75 91.98
01/11/2004 1082.97 83.53 88.83 93.92
01/12/2004 1096.23 82.95 88.47 93.98
01/01/2005 1116.83 82.75 88.53 93.83
01/02/2005 1076.08 83.74 88.95 95.46
01/03/2005 1129 82.7 88.67 94.95
01/04/2005 1016.3 83.22 87.82 93.02
01/05/2005 1098.25 81.74 87.55 91.89
01/06/2005 1176.3 82.06 88.78 92.55
01/07/2005 1088.7 82.72 88.39 93.77
01/08/2005 1129.19 82.43 88.62 94.73
01/09/2005 1369.72 82.15 91.19 97.08
01/10/2005 1143.89 81.95 88.75 96.92
01/11/2005 1272.18 82.39 90.65 97.13
01/12/2005 1155.15 82.72 89.71 96.55
01/01/2006 1105.85 83.92 90.24 97.59
01/02/2006 1022.72 83.8 89.11 96.3
01/03/2006 1259.88 82.88 91.38 98.09
01/04/2006 1598.84 85.67 97.68 100.91
01/05/2006 1009.01 84.61 91.25 97.97
01/06/2006 920.31 84.23 89.7 96.91
01/07/2006 1062.74 84.44 91.97 99.3
01/08/2006 1117.95 84.42 92.67 100.63
01/09/2006 962.48 84.22 90.55 98.02
01/10/2006 891.39 83.85 89.21 93.66
01/11/2006 515.95 84.98 84.6 95.24
01/12/2006 639.62 85.12 87.75 97.2
01/01/2007 718.78 85.51 89.72 98.18
01/02/2007 648.56 85.72 88.6 96
01/03/2007 670.28 86.26 89.51 99.94
01/04/2007 592.42 85.91 87.65 100.93
01/05/2007 762.1 86.63 92.04 101.5
01/06/2007 769.86 87.65 93.1 103.11
01/07/2007 518.87 87.32 88.25 101.85
01/08/2007 487.58 87.06 87.22 100.81
01/09/2007 499.02 87.96 88.3 100.92
01/10/2007 584.42 86.96 89.71 104.28
01/11/2007 276.1 87.64 83.21 105.64
01/12/2007 420.52 87.82 89.89 105.83
01/01/2008 397.36 88.9 90.08 107.45
01/02/2008 378.08 87.45 88.18 106.61
01/03/2008 381.3 88.94 89.56 107.61
01/04/2008 343.09 89.24 88.48 106.06
01/05/2008 382.96 89.14 89.94 109
01/06/2008 484.06 90.99 95.08 109.61
01/07/2008 552.41 91.47 97.52 111.14
01/08/2008 564.08 90.41 96.87 110.34
01/09/2008 496.56 91.21 95.86 108.02
01/10/2008 401.15 91.39 93.26 105.07
01/11/2008 395.31 89.98 91.83 106.9
01/12/2008 317.61 88.5 87.84 102.41
01/01/2009 315.15 87.29 86.72 100.27
01/02/2009 375.72 90.33 91.79 100.7
01/03/2009 309.45 89.43 88.58 98.69
01/04/2009 381 88.94 91.24 98.34
01/05/2009 227.61 89.52 86.23 95.2
01/06/2009 179.03 88.21 82.4 95.84
01/07/2009 187.74 88.7 83.39 94.13
01/08/2009 194.6 90.51 85.29 95.36
01/09/2009 186.3 89.79 84.18 92.3
01/10/2009 200.51 90.83 85.97 93.41
01/11/2009 236.33 90.01 87.61 95.4
01/12/2009 211.69 89.78 86.05 93.21
01/01/2010 192.67 90.29 85.3 93.07
01/02/2010 191.56 90.71 85.57 92.63
01/03/2010 187.44 90.41 85.05 91.48
01/04/2010 222.62 90.44 87.47 96.67
01/05/2010 186.31 90.45 85.34 97.23
01/06/2010 181.82 91.75 86.07 92.9
01/07/2010 195.99 90.85 86.36 93.1
01/08/2010 179.67 91.77 86.03 92.81
01/09/2010 150.84 91.84 84.01 93.32
01/10/2010 180.48 92.38 86.9 97.67
01/11/2010 221.21 92.94 90.29 99.72
01/12/2010 181.91 93.74 88.55 97.65
01/01/2011 206.77 93.54 90.21 98.85
01/02/2011 187.98 92.59 88.2 98.28
01/03/2011 201 95.23 91.25 98.67
01/04/2011 140.74 94.21 86.32 102.23
01/05/2011 162.37 93.96 88.11 102.39
01/06/2011 157.52 93.24 87.14 98.21
01/07/2011 144.83 93.42 86.23 100.57
01/08/2011 140.45 93.81 86.15 98.95
01/09/2011 166.69 95.38 89.78 99.3
01/10/2011 138.77 95.91 87.95 101.57
01/11/2011 164.42 96.13 90.56 104.76
01/12/2011 147.8 94.85 88.17 99.43
01/01/2012 142.08 96.22 88.74 98.02
01/02/2012 148.76 95.74 88.98 99.26
01/03/2012 138.32 96.41 88.58 98.81
01/04/2012 130.13 97.23 88.44 100.15
01/05/2012 148.26 96.3 89.56 99.07
01/06/2012 139.09 96.74 89.08 99.27
01/07/2012 147.07 96.41 89.59 101.18
01/08/2012 139.89 95.43 88.16 100.48
01/09/2012 143.71 95.52 88.59 98.6
01/10/2012 144.87 94.49 87.88 98.07
01/11/2012 76.93 97.55 84.12 98.14
01/12/2012 105.77 98.14 89.29 97.44
01/01/2013 110.1 98.33 89.98 96.9
01/02/2013 124.72 97.25 90.93 101.37
01/03/2013 126.11 97.57 91.34 96.88
01/04/2013 117.26 97.79 90.55 95.77
01/05/2013 121.19 100.05 92.78 96.3
01/06/2013 140.83 97.78 93.25 95.59
01/07/2013 147.7 98.68 94.67 95.71
01/08/2013 150.69 99.45 95.58 93.94
01/09/2013 145.86 99.56 95.21 93.83
01/10/2013 123.04 99.44 92.88 95.26
01/11/2013 125.26 98.31 92.23 95.27
01/12/2013 129.96 99.35 93.58 98.35
01/01/2014 116.14 99.82 92.46 97.39
01/02/2014 112.75 100.65 92.71 97.06
01/03/2014 118.67 101.05 93.76 98.02
01/04/2014 161 98.79 96.99 96.92
01/05/2014 105.85 100.86 93.73 96.98
01/06/2014 107.89 100.51 93.73 96.43
01/07/2014 114.48 102.25 95.96 96.48
01/08/2014 100.08 100.96 93.12 93.48
01/09/2014 98.67 98.89 91.31 92.31
01/10/2014 110.99 99.04 93.14 91.93
01/11/2014 120.98 99.41 94.69 91.65
01/12/2014 100.72 98.89 91.89 92.84
01/01/2015 86.14 98.79 89.82 90.75
01/02/2015 95.56 99.16 91.58 91.14
01/03/2015 106.77 98.31 92.52 90.5
01/04/2015 78.52 99.25 89.6 89.76
01/05/2015 94.69 99.1 92.25 92.56
01/06/2015 109.12 100.51 95.48 91.86
01/07/2015 109.15 98.69 94.01 91.64
01/08/2015 97.06 101.17 94.45 88.82
01/09/2015 80.65 100.61 91.61 90.74
01/10/2015 94.85 99.78 93.39 90.89
01/11/2015 94.74 98.67 92.5 93.48
01/12/2015 77.41 99.14 90.33 87.75
01/01/2016 78.59 98.31 89.89 85.93
01/02/2016 73.8 98.83 89.48 84.6
01/03/2016 62.92 98.26 87.06 82.98
01/04/2016 70.72 99.04 89.26 83.76
01/05/2016 84.59 98.91 91.79 84.32
01/06/2016 85.66 99.87 92.72 87.33
01/07/2016 76.91 97.76 89.63 82.94
01/08/2016 80.6 98.9 91.17 82.78
01/09/2016 96.01 98.86 93.75 85.34
01/10/2016 96.03 100.01 94.69 83.33
01/11/2016 101.63 100.81 96.15 85.67
01/12/2016 85.76 99.91 93.18 87.36
01/01/2017 104.53 100.64 96.81 89.11
01/02/2017 117.8 99.94 98.08 89.2
01/03/2017 99.03 99.56 95.42 87.82
01/04/2017 107.97 98.53 95.87 85.07
01/05/2017 87.54 98.89 93.45 84.81
01/06/2017 67.77 99.19 90.53 81.75
01/07/2017 67.93 98.47 90 82.18
01/08/2017 76.06 98.5 91.64 85
01/09/2017 79.49 99.06 92.7 88.71
01/10/2017 75.91 99.16 92.16 85.73
01/11/2017 76.33 98.36 91.6 87.48
01/12/2017 88.05 98.08 93.49 89.36
01/01/2018 90.06 97.6 93.42 87.6
01/02/2018 84.04 99.14 93.73 87.7
01/03/2018 74.03 98.21 91.31 86.61
01/04/2018 79.26 98.29 92.35 85.5
01/05/2018 87.79 97.65 93.32 88.91
01/06/2018 84.41 97.19 92.41 90.13
01/07/2018 88.39 99.65 95.05 90.52
01/08/2018 83.11 98.62 93.36 91.14
01/09/2018 78.01 98.75 92.61 89.41
01/10/2018 80.64 98.88 93.19 87.89
01/11/2018 54.06 97.45 87.43 87.69
01/12/2018 72.03 97.61 91.91 90.2
01/01/2019 50.7 98.78 88.76 89
01/02/2019 57.45 98.4 90.25 89.15
01/03/2019 68.69 99.5 93.75 91.92
01/04/2019 73.79 99.2 94.56 92.19
01/05/2019 66.16 98.65 92.64 92.53
01/06/2019 72.85 97.98 93.52 92.59
01/07/2019 75.07 98.67 94.5 94.2
01/08/2019 82.25 97.96 95.28 94.37
01/09/2019 81.95 97.08 94.5 94.25
01/10/2019 83.12 99.87 97.01 95.23
01/11/2019 95.79 99.41 98.85 96.55
01/12/2019 100 100 100 100
01/01/2020 106.7 98.56 99.78 99.31
01/02/2020 83.86 98.9 96.87 96.97
01/03/2020 116.72 95.31 99.58 96.15

5 Conclusion

Measures of aggregate economic activity for economies are important for informing decisions about fiscal and monetary policy, for determining the characteristics of business cycles and for examining economic performance. In this study, four indexes of provincial and territorial economic activity based on different methodological approaches are estimated and presented. The methodologies are based on a simple model; PCA; a weighted combination of the simple index and the PCA index; and LASSO. In most cases, all approaches produce roughly similar results. However, the degree of cyclicality and the variance of month-to-month changes can differ significantly. As a general rule, PCA produces the greatest variability and the largest cycles while the simple index is the most stable.

Based on the properties of the methodologies and their outputs, the simple index is the most consistent across provinces and territories. It is also the easiest to interpret in terms of variable contributions and justification for variable inclusion. However, parameter values are often statistically insignificant and the input series are chosen as much for their economic importance as for their presence in all jurisdictions. As a result, these models offer a more limited approach for examining aggregate economic activity, but also present a basis for comparisons to more complex models.

Indexes based on PCA appear to offer a more complete sense of how activity is evolving over time, but it is unclear at the moment how the principal components should be interpreted. Because of this, and because the PCA indexes have the largest variability, they present a trade-off between overall use of input series and interpretability.

Weighting the simple index and PCA index produces a result that has a superior correlation with annual GDP fluctuations. The weighted combination continues to have more variability than the simple index. Since the PCA is included, it is also not as easy to interpret as the simple index, but likely provides a better measure of aggregate activity than its constituent parts.

The LASSO index performs well when compared to annual real GDP, but the model set-up is not as well suited to the situation encountered when trying to estimate the activity indexes. In particular, the relatively small number of observations limits the ability of the algorithms to perform cross validation. Moreover, while the input series are a distinct subset of the input data set, and their contributions can be generated in a straightforward manner, there is no theoretical reason for why the variables are important, and this limits the model’s interpretability.

Given the strengths and weaknesses present between the suitability of the models, their performance and examinations of their outputs, the assessments made thus far suggest that the simple indexes or LASSO indexes present results related to a set of fundamental inputs (often heavily influenced by employment series), that the PCA indexes relate more to some form of short-term activity (but the signal is noisy), and that the weighted index presents a compromise between the two.

The indexes as currently estimated are correlated with annual measures of real GDP and sub-annual measures of real GDP for Ontario and Quebec, but they should not be interpreted as being a real GDP measure. The indexes display greater variability and cyclicality that real GDP measures, and are constituted from measures of gross outputs, employment, relative prices and important ratios such as the unemployment rate. This makes the indexes appropriate for understanding economic activity, but they are not real GDP. Moreover, the indexes do not inform about differing levels of economic activity between provinces and territories.

The indexes are also based on an input data set and modelling strategies that are not ideal. Numerous assumptions must be imposed to produce the indexes, any of which may be a source of important measurement errors. As a consequence, the indexes presented here should be viewed as experimental, and are subject to revision or replacement as future research improves the processes and/or test the assumptions for their validity.

At the current time, the correlations between the different approaches, their positive correlation with provincially produced measures of sub-annual real GDP and examinations of their properties against known provincial and territorial economic performance supports their use as indicators of business cycles, for understanding the magnitude of shocks relative to a provinces’ or territory’s history and for understanding how regional economies are progressing. Inter-provincial comparisons are also supported, but with the caveat that model performance is difficult to understand in all situations, and that level comparisons across provinces are not possible using the index values.

References

Brave, Scott., and R. Andrew Butters. 2010. “Chicago Fed National Activity Index Turns Ten — Analyzing Its First Decade of Performance.” Chicago Fed Letter, no. 273 (April). Federal Reserve Bank of Chicago. https://www.chicagofed.org/~/media/publications/chicago-fed-letter/2010/cflapril2010-273-pdf.pdf.

Statistics Canada 2020a.” Gross domestic product (GDP) at basic prices, by industry, monthly (36100434).” Statistics Canada. Statistics Canada, January 24, 2020. https://www150.statcan.gc.ca/n1/pub/13-607-x/2016001/230-eng.htm (accessed June 2, 2020)

Statistics Canada 2020b.” Gross domestic product (GDP) at basic prices, by industry, monthly (36100434).” Statistics Canada. Statistics Canada, July 31, 2019. https://www.statcan.gc.ca/eng/statistical-programs/document/1301_D1_V3 (accessed June 2, 2020)

Federal Reserve Board of Chicago 2020. “Chicago Fed National Activity Index (CFNAI) Current Data.” Federal Reserve Board of Chicago, June 22, 2020. https://www.chicagofed.org/research/data/cfnai/current-data (accessed June 2, 2020).

Evans, Liu, Charles L., and Genevieve Pham-Kanter. 2002. “The 2001 Recession and the Chicago Fed National Activity Index: Identifying Business Cycle Turning Points.” Economic Perspectives 26 (3). Federal Reserve Bank of Chicago: 26 – 43. https://www.chicagofed.org/~/media/publications/economic-perspectives/2002/3qepart2-pdf.pdf.

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on June 2, 2020

Jollife, I.T. 2002. Principle Components Analysis Second Edition. Springer-Verlag New York Inc, New York, NY.

United Nations (UN), European Commission (EC). International Monetary Fund (IMF), Organisation for Economic Co-operation and Development (OECD), and World Bank (WB). 2009. System of National Accounts, 2008. New York: United Nations. Available at: https://unstats.un.org/unsd/nationalaccount/docs/sna2008.pdf (accessed June 2, 2020).

Organisation for Economic Co-operation and Development (OECD). 2008. Handbook on Constructing Composite Indicators Methodology and User Guide. Organization for Economic Development. https://www.oecd.org/sdd/42495745.pdf (accessed June 2, 2020).

Tibshirani, Robert. 1996. “Regularization Shrinkage and Selection via the Lasso.” Journal of Royal Statistical Society: Series B.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301 – 20. doi:10.1111/j.1467-9868.2005.00503.x.

Date modified: