Analytical Studies Branch Research Paper Series
Going the Distance: Estimating the Effect of Provincial Borders on Trade when Geography Matters
by Robby K. Bemrose, W. Mark Brown and Jesse Tweedle
Economic Analysis Division
Statistics Canada
Acknowledgements
The authors would like to thank Danny Leung, Bart Los, Dennis Novy, Trevor Tombe, Dan Trefler, Yoto Yotov and participants at the Economic Analysis Division and Université du Québec à Montréal seminar series and at the North American Regional Science Council and Western Regional Science Association meetings for their helpful comments. The authors are grateful for research assistance from Javad Sadeghzadeh, Olena Melin, and Afshan DarBrodeur.
Abstract
For many goods, such as dairy products and alcoholic beverages, the presence of substantial (nontariff) barriers to provincial trade is widely recognized. If these nontariff barriers matter, intraprovincial trade should be stronger than interprovincial trade, all else being equal. However, comparing intraprovincial and interprovincial trade levels is challenging, because intraprovincial trade is heavily skewed toward shortdistance flows. When these are not properly taken into account by gravitybased trade models, intraprovincial trade levels—provincial border effects—tend to be overestimated. To resolve this problem, new subprovincial trade flows developed from a set of transactionlevel transportation files are used to estimate provincial border effects. The sensitivity of the results to distance is captured by estimating models across standard geographies of varying size (provinces, economic regions, census divisions) and nonstandard geographies (hexagonal lattices) of differing size and placement via a series of simulations. The results show that provincial border effects diminish as distance is measured more accurately and geographies are more finegrained and uniform in shape and size. Nonetheless, border effects persist, with an implied ad valorem tariff equivalent of 6.9%. This contrasts with the United States, where state border effects are eliminated when similar approaches are applied.
Key words: Border effects, interprovincial trade, transportation costs, gravity model
JEL: R4 R15 F15
Executive summary
For many goods, such as dairy products and alcoholic beverages, the presence of substantial (nontariff) barriers to provincial trade is widely recognized. If these nontariff barriers matter, intraprovincial trade should be stronger than interprovincial trade, all else being equal. However, comparing intraprovincial and interprovincial trade levels is challenging, because intraprovincial trade is heavily skewed toward shortdistance flows. When these are not properly taken into account, intraprovincial trade levels—provincial border effects—tend to be overestimated.
To resolve this problem, subprovincial trade flows developed from a set of transactionlevel transportation files are used to estimate provincial border effects during the 2004to2012 period. Each shipment is measured in terms of value, cost, distance shipped, and origin and destination. Based on the origin and destination points (latitudes and longitudes), flows between regions of any size can be used, which makes possible the estimation of flows between regions within and across provincial borders.
The analysis shows that intraprovincial trade is consistently stronger than interprovincial trade when the distance between the trading regions and the ability of the trading units to generate and absorb trade flows are taken into account. The relative strength of intraprovincial trade depends on the geographic units used to measure trade. When provinces are used, withinprovince trade is estimated to be 2.26fold greater than interprovincial trade. This suggests that the impediments to interprovincial trade are equivalent to imposition of a 13.6% ad valorem tariff. When subprovincial areas are used instead of provinces, the bordereffect tariff equivalent falls to 6.9%. This lower estimate proved to be robust to a large number of statistical tests. It also contrasts with the United States, where state border effects are eliminated when similar approaches are applied.
1. Introduction
For some goods, such as dairy products and alcoholic beverages, substantial (nontariff) barriers to interprovincial trade exist. However, the degree to which these barriers are reflected in the level of interprovincial trade is not clear. One way to assess this is to determine if provinces trade more with themselves than with each other. If nontariff barriers matter, their imprint should be seen in patterns of provincial trade.
Efforts to investigate if this is the case have been hampered by a lack of data that allow the estimation of trade flows within provinces, as it is against these flows that trade between provinces is compared. The problem is overcome by developing estimates of intraprovincial and interprovincial trade from a set of transactionlevel flows. Using a gravitybased trade model, this study compares trade among regions within provinces with trade between provinces, thereby providing an estimate of the extent to which provincial economies are integrated through trade—that is, the degree to which provincial borders dampen provincial trade, what is commonly called the “border effect.”
Building on McCallum’s (1995) initial work, a large literature has been devoted to measuring border effects, national and subnational. Over time, estimated border effects have been reduced as McCallum’s initial specification was modified to take into account the effects of market access and competition (Anderson and van Wincoop 2003; Anderson and Yotov 2010); estimates of distance were refined (Head and Mayer 2010); and new estimators were applied (Head and Mayer 2014). Still, a consistent finding has been that trade is stronger within countries than between them.
While much of the literature focused on measuring border effects between countries, the same methods have been applied to subnational regions. The arc of the intranational border effects literature has also been a reduction of effects with the application of more refined methods. In some instances, this led to elimination of the border effect altogether. In the United States, initially high estimates of interstate border effects (Wolf 2000) were reduced by more accurate measures of distance (Hillberry and Hummels 2003, Head and Mayer 2010, and Crafts and Klein 2015), restricting trade flows to shipments from manufacturers (Hillberry and Hummels 2003), using a panel specification and controlling for internal migration (Millimet and Osang 2007), and more finegrained geographies to define subnational trading units (Hillberry and Hummels 2008; see also Coughlin and Novy 2016).
Two strategies might be used to develop an estimate of provincial border effects. One is to further refine the nowstandard estimators in order to mitigate measurement error and missing variable bias. This is the approach of Agnosteva, Anderson and Yotov (2014), who take advantage of the panel nature of current measures of intraprovincial and interprovincial trade to estimate provincial border effects. The second, and arguably complementary strategy, which is employed in this study, is to further develop data on intraprovincial and interprovincial trade in order to address concerns raised in the literature, while also using an estimation strategy that seeks to reduce remaining biases.
Particular attention is paid to the influence of measured distance (Head and Mayer 2010) and geography (Hillberry and Hummels 2003, 2008). Head and Mayer (2010) show that inaccurate measures of distance can bias estimates of the border effect upward, because intraregional distances tend to be overestimated relative to interregional distances. By estimating distance based on actual pointtopoint flows of goods, this bias is effectively eliminated. Hillberry and Hummels (2008) demonstrate that as the size of the geographic unit of analysis is reduced, estimated barriers to state trade fall to zero. Interstate barriers are an artefact of the geographic scale at which estimates are made. This effect stems from the larger number of shortdistance flows of intermediate goods covered by the U.S. Commodity Flow Survey (CFS), which results in inaccurate estimates of the distance travelled by goods. An impediment to this work, particularly in the Canadian context, is the lack of detailed subprovincial data that would take the biasing effect of shortdistance flows into account. The problem is addressed by taking advantage of the microdata used to calculate intraprovincial and interprovincial trade flows.
The analysis shows that intraprovincial trade is consistently stronger than interprovincial trade after the distance between the trading regions and the ability of the trading units to generate and absorb trade flows are taken into account. The relative strength of intraprovincial trade depends on the geographic units used to measure it, and to a lesser degree, the distance measure employed. When subprovincial areas are used instead of provinces, the border effect tariff equivalent falls from 13.6% to 6.9%. The latter is the estimate that held after application of an extensive set of checks aimed at mitigating the upward biasing effects of model misspecification (nonlinear effects of distance) and geography (size and placement of the geographic units). It is, therefore, a conservative, lowend estimate that contrasts sharply with the United States, where the application of similar approaches eliminates state border effects.
The remainder of this paper is organized as follows. Section 2 (Data development) reviews the method used to estimate trade between subprovincial geographic units and builds a picture of internal Canadian trade. Particular attention is paid to how these estimates are benchmarked to known intraprovincial and interprovincial trade totals and to the underlying validity of these estimates. Section 3 (Model and estimation strategy) outlines the structure of the trade model and the identification of an appropriate estimator. Section 4 (Model estimates) presents the estimates, building from standard intraprovincial and interprovincial estimates through trade based on subprovincial geographic units to a set of robustness checks that test for biases associated with misspecification and the Modifiable Areal Unit Problem (MAUP). Section 5 (Border effect tariff equivalent) estimates tariffequivalent barriers to interprovincial trade in aggregate and by commodity. Section 6 (Conclusions) concludes with a summary of the results and their implications.
2. Data development
Up to now, analysis of Canada’s internal trade has been limited to the provincial level, relying on trade tables from the provincial inputoutput accounts or from provincial trade patterns reported by the Annual Survey of Manufactures (see Brown [2003] for the latter). This report develops a new flexible transactionlevel pointtopoint dataset that permits measurement of trade flows between an almost limitless set of subprovincial geographic units, thereby providing a means to address many of the econometric issues raised in the literature. Because this database is new, it is useful to outline how it was constructed and describe some of its basic characteristics.
2.1 Database construction
The data are from Statistics Canada’s Surface Transportation File (STF), which provides estimates of the value of goods traded between regions in Canada, and between Canada and the United States. The STF is derived from the Trucking Commodity Origin and Destination Survey (TCOD) and railway waybills for the 2002to2012 period, with the analysis focusing on the 2004to2012 period.^{Note 1} Because these data are constructed from the trucking and rail waybill data, the STF in its original form is a “logistics file”.^{Note 2}
The STF measures the movement of goods from the point where they are picked up to the point where they are dropped off. These points do not necessarily represent locations where goods are made or where they are used. However, the analysis requires a database that captures the level of trade between subprovincial regions, which is embedded as a concept in the gravitybased trade model applied here. This objective has much in common with those stipulated when developing the provincial inputoutput accounts:
“In analyzing economic interdependence, it is necessary to maintain the link between the original supply sources and final consumers, by commodity. It follows then that the point of origin (the original supply source) is where goods and services are produced or goods are sold out of inventory stocks of producers, wholesalers and retailers. The point of destination (i.e., the final consumer) is the point where goods and services are purchased for current consumption, capital formation, input into the production process of other commodities, or added to inventory stocks.” (Généreux and Langen 2002, p. 7)
To transform the STF from a logistics file to a trade file, provincial trade flows from the inputoutput accounts are used to benchmark intraprovincial and interprovincial flows by commodity. Each transaction in the STF is given a weight such that the aggregate adds to the total for the corresponding intraprovincial or interprovincial flow from the inputoutput tables. The benchmarking procedure and, in particular, the development of a concordance between the InputOutput Commodity Classification (IOCC) and the Standard Classification of Transported Goods (SCTG) (Statistics Canada, n.d.a) codes on the STF file, is explained in Appendix A.3. A synopsis of the procedure is presented here.
The nominal value of trade between subprovincial regions (hereafter, regions) $i$ and $j$, ${X}_{ij}$, is the sum of the survey weighted value of shipment $x$ indexed by $l$ between origin region $i$ and destination region $j$,^{Note 3} multiplied by the benchmark weight for shipment $l$:
$${X}_{ij}={\displaystyle \sum _{l\in ij}{w}_{l}^{b}{x}_{lij};\text{where}{w}_{i}^{b}}={w}_{l}\times {w}^{b}\text{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{(1)}$$
The shipment benchmark weight is the shipmentbased survey weight, ${w}_{i}$, multiplied by the province pair benchmark weight ${w}_{b}$ for the commodity being shipped, with notation for the province pair and commodity suppressed to simplify the exposition. The benchmark weight is set such that trade between a given province pair (or within the same province) adds to known totals from the provincial trade accounts by detailed commodity and year.
Figure 1 illustrates the benchmarking procedure. Consider the toy example of flows of vehicles made in various locations in Ontario and ultimately used at various locations in Manitoba and Saskatchewan. The vehicles may first be shipped to a distribution centre in Manitoba, with a portion of the shipment sent to Saskatchewan, which is represented by the unbenchmarked flows in the upper lefthand quadrant of the figure. From a logistics perspective, this is a correct representation of the flows, but from a trade perspective, the flow from Ontario where the vehicles are made to Saskatchewan where they are used is underestimated, and the flow from Manitoba to Saskatchewan is overestimated. As presented in Figure 1 (top right quadrant), benchmarking to the inputoutput tables weights up the flow from Ontario to Saskatchewan and weights down (to zero) the flow from Manitoba to Saskatchewan.
The weighting strategy relies on there being a flow on the STF between each province pair; otherwise, there is nothing to weight up (or down): ${w}^{b}=0$. The result is no flow between the province pairs (lower half of Figure 1). If these “broken links” are too common and/or correlated with the distance between the province pairs, benchmarking will result in biased estimates. One source of bias is simply replaced by another.
Description for Figure 1
The title of Figure 1 is "Transformation of logistic to trade flows, full and broken sets."
The diagram depicts the benchmarking process, which is represented by four triangles delineated by arrows of different widths. Not all triangles are fully delineated by arrows because the arrows represent the trade flows. On the three points of each of the four triangles are small triangles named Ontario, Manitoba and Saskatchewan. Ontario is the province of origin of the trade flows.
Two of these four triangles represent the full set of links, one for unbenchmarked vehicles and the other for benchmarked vehicles.
In the triangle for unbenchmarked vehicles, the goods are shipped between Ontario, Manitoba, and Saskatchewan.
In the triangle for benchmarked vehicles, the goods are shipped only between Ontario and Manitoba and between Ontario and Saskatchewan.
The other two triangles represent the broken set of links, one for unbenchmarked vehicles and the other for benchmarked vehicles.
In the triangle for unbenchmarked vehicles, the goods are shipped between Ontario and Manitoba and between Manitoba and Saskatchewan.
In the triangle for benchmarked vehicles, the goods are shipped only from Ontario to Manitoba.
The source of this figure is “Statistics Canada.”
Table 1 presents the ratio of the benchmarked STF interprovincial or intraprovincial flows to the actual flows from the inputoutput tables. Because the Atlantic Provinces had a larger number of broken links, particularly with Western Canada, they were aggregated for benchmarking. As a result, there are relatively few pairs with a serious loss of trade. The overall percentage is 99% of the inputoutput based trade levels. Intraprovincial flows tend to have less of a loss, but this is small. Otherwise, there do not appear to be large losses with distance. For instance, the loss for Atlantic Canada’s exports to Alberta or British Columbia is about the same as Ontario’s loss. The effect of these broken links is tested further below by estimating the gravity model with the inputoutputderived provincial flows and the benchmarked flows; both sets of data provide qualitatively similar results (see Section 4).
Destination  

Atlantic Canada  Que.  Ont.  Man.  Sask.  Alta.  B.C.  Total  
percent  
Origin  
Atlantic Canada  99  99  89  94  77  94  89  95 
Quebec  99  99  100  98  94  98  98  99 
Ontario  100  100  100  99  98  100  100  100 
Manitoba  93  97  95  96  95  97  95  96 
Saskatchewan  87  96  96  95  98  97  97  97 
Alberta  89  97  98  97  99  100  100  99 
British Columbia  96  82  99  97  96  99  98  97 
Total  98  99  99  97  98  99  98  99 
Note: Shares are based on the average level of trade between 2004 and 2012. Source: Statistics Canada, authors' calculations. 
While benchmarking accounts for the level of intraprovincial and interprovincial trade, the pattern of trade, especially within provinces, may be affected by the functioning of the transport/distribution system—that is, shorterdistance logisticsdriven flows may be more prevalent. This has important implications because, when pooled with interprovincial flows, these shorterdistance, intraprovincial flows tend to be underestimated, biasing the estimated interprovincial border effect upward.
As with the toy example above, the effect of benchmarking should be to stretch out interprovincial trade, as shortdistance flows to/from distribution centres or wholesalers are weighted down, and longerdistance flows from points where goods are produced to where they are used are weighted up. This is apparent in Figure 2, which reports the shipment distance kernel densities with survey weights $\left({w}_{l}\right)$ and survey and benchmark weights together $\left({w}_{l}^{b}\right)$, with shipment distances divided between interprovincial and intraprovincial flows. As expected, for interprovincial shipments, benchmarking tends to reduce the importance of shorterdistance flows (less than 1,000 km) and increase the importance of longerdistance flows, particularly those exceeding 3,000 km. For intraprovincial trade, after benchmarking, shortdistance flows are reduced, as imported commodities (for instance, shoes and apparel) that are distributed locally are weighted downward. Still, within provinces, shortdistance logisticsdriven flows may be more prevalent. This effect can be tested by observing whether distance has a stronger effect on intraprovincial trade relative to interprovincial trade. The results indicate that this is not the case (Appendix B.1 contains a detailed discussion).
Description for Figure 2
The title of Figure 2 is "Intraprovincial and interprovincial distance, benchmark and survey weights."
This is a line chart of kernel densities.
The horizontal axis is “Distance (kilometres)”. It starts at 0 and ends at 5,000, with tick marks every 1,000 points.
The vertical axis is “Density”. It starts at 0 and ends at 0.01, with tick marks every 0.002 points.
There are 4 series in this graph.
The title of series 1 is "Intra (survey)," the intraprovincial surveyweighted density of distance. It starts out really high and drops significantly after around 100 kilometres, and is flat thereafter.
The title of series 2 is "Intra (benchmark)," the intraprovincial benchmarkweighted density of distance. It starts out really high and drops significantly after around 100 kilometres, and is flat thereafter. It is lower than series 1 at the start, but then crosses it around 50 kilometres and is higher than series 1 thereafter.
The title of series 3 is "Inter (survey)," the interprovincial surveyweighted density of distance. It starts out low, peaks around 500 kilometres, and is relatively constant thereafter, with bumps around 3,400 kilometres and 4,400 kilometres. It is higher than series 1 and 2.
The title of series 4 is "Inter (benchmark)," the interprovincial benchmarkweighted density of distance. It starts out low, peaks around 50 kilometres, and is relatively constant thereafter, with bumps around 3,400 kilometres and 4,400 kilometres. It is higher than series 1 and 2. It is slightly lower than series 3 before 2,000 kilometres, and slightly higher than series 3 thereafter.
The source of Figure 2 is “Statistics Canada, authors’ calculations.”
For additional information, please contact Statistics Canada at STATCAN.infostatsinfostats.STATCAN@canada.ca.
2.2 Patterns of trade
Before estimating interprovincial barriers to trade, it is helpful to provide a picture of trade between provinces and between subprovincial regions.
Table 2 shows the pattern of exports across provinces (and Atlantic Canada) averaged over the 2004to2012 period. With the exceptions of Saskatchewan, Manitoba and Atlantic Canada, most trade occurs within provinces. This is not necessarily because of interprovincial barriers, but because of the influence of distance on trade flows (Figure 2). Table 2 also demonstrates provinces’ tendency to trade with those nearby. Atlantic Canada’s most important export market is Quebec; Saskatchewan’s is Alberta. However, although intraprovincial flows are large, the pattern of trade within provinces is not known.
Atlantic Canada  Que.  Ont.  Man.  Sask.  Alta.  B.C.  Total  

percent  
Atlantic Canada  28  34  24  5  1  5  4  100 
Quebec  1  67  22  1  1  4  3  100 
Ontario  1  11  76  2  1  6  4  100 
Manitoba  1  7  18  48  6  14  6  100 
Saskatchewan  0  4  13  8  44  24  6  100 
Alberta  0  4  9  4  5  68  10  100 
British Columbia  0  5  8  2  2  15  68  100 
Total  1  23  42  4  4  16  10  100 
Note: Shares are based on the average level of trade between regions for the period from 2004 to 2012. Percentages may not add up to 100% because of rounding. Source: Statistics Canada, authors' calculations. 
Patterns of intraprovincial and interprovincial trade are determined by a discrete set of origins and destinations. Map 1 displays the locations that are served by truck and/or rail. Each point represents a location that participates in trade, generating a potential surface where goods are most likely to be made and used. On this surface, any given geography—province, economic region (ER), census division (CD) or other configuration—is overlaid to generate a set of aggregate flows.
As is evident in Map 1, while most trade occurs within provinces, the greatest potential for trade is even more geographically concentrated. This is confirmed when total trade (exports plus imports) by value across ERs is calculated as a percentage of overall goods trade in Canada. A small minority of ERs accounts for a disproportionate share of internal trade. Of the 73 ERs, three (Toronto, Montréal and Hamilton–Niagara Peninsula) account for 30% of the value of goods shipped in Canada. More trade moves in and out of the Toronto ER than any province except Ontario and Quebec. Thus, understanding provincial trade means understanding trade between subprovincial regions, especially large urban centres.
Description for Map 1
The title of Map 1 is "Origin/destination density."
The map depicts Canada and shows provincial and territorial boundaries.
It is a map of the density of locations in the Surface Transportation File (STF).
Places in Canada with a high amount of trade activity are coloured in red.
The places with the most trade activity are concentrated around major cities: Toronto, Ottawa, Montréal, Quebec City, Calgary, Edmonton and Vancouver.
Other places have a medium amount of trade.
There is less activity in rural areas.
The notes for Map 1 are as follows:
Notes: Each point is a 4kmsided (42km2) hexagon with one or more origins/destinations (postal codes or railway terminals). The gradation in colour from yellow to orange to red denotes a greater the number of origins/destinations.
The source of Map 1 is “Statistics Canada, authors’ calculations.”
For additional information, please contact Statistics Canada at STATCAN.infostatsinfostats.STATCAN@canada.ca.
3. Model and estimation strategy
Estimation of provincial border effects relies on the development of data of sufficient quality to generate wellfounded estimates and a model and estimator that are appropriate for the data.
3.1 Trade model
As is standard in the literature (Head and Mayer 2014),^{Note 4} trade between regions $i$ and $j$ is treated as multiplicative function of the capacity of $i$ to serve export markets $({S}_{i})$, the absorptive capacity of export market in $j\text{\hspace{0.05em}}\text{\hspace{0.05em}}({M}_{j})$, and a measure that captures the effect of trade costs between $i$ and $j$ $\left({\varphi}_{ij}\right)$:
$${X}_{ij}=G{S}_{i}{M}_{j}{\varphi}_{ij};\text{}0{\varphi}_{ij}1,\text{\hspace{1em}}\text{\hspace{1em}}\text{(2)}$$
where $G$ is a constant term. This general form can be expressed as a structural gravity model:
$${X}_{ij}=G\frac{{X}_{i}}{{\Omega}_{i}}\frac{{X}_{j}}{{\Phi}_{j}}{\varphi}_{ij};\text{}{S}_{i}=\frac{{X}_{i}}{{\Omega}_{i}}\text{and}{M}_{j}\text{=}\frac{{X}_{j}}{{\Phi}_{j}},\text{\hspace{1em}}\text{\hspace{1em}}\text{(3)}$$
where ${X}_{i}$ is the value of output in $i$ and is the sum of exports across all trading partners (including itself) $\left({X}_{i}={\displaystyle {\sum}_{j}{X}_{ij}}\right)$, and ${X}_{j}$ is the value of consumption in $j$ and is the sum of imports across all trading partners (including itself) $\left({X}_{j}={\displaystyle {\sum}_{i}{X}_{ij}}\right)$. The terms ${\Omega}_{i}$ and ${\varphi}_{j}$ are multilateral resistance terms (Anderson and van Wincoop 2003), where
$${\Omega}_{i}={\displaystyle \sum _{k}\frac{{\varphi}_{ik}}{{\Phi}_{k}}\text{and}{\Phi}_{j}={\displaystyle \sum _{k}\frac{{\varphi}_{kj}}{{\Omega}_{k}}}}.\text{\hspace{1em}}\text{\hspace{1em}}\text{(4)}$$
${\Omega}_{i}$ is a measure of market access for exporting region $i$, and ${\Phi}_{j}$ measures the level of competition in market $j$. Trade costs $\left({\varphi}_{ij}\right)$ are accounted for by the distance between $i$ and $j$ $\left({d}_{ij}\right)$ and a set of binary variables that account for intraprovincial $\left({\delta}_{p}\right)$ and intraregional trade $\left({\delta}_{r}\right)$.
3.2 Estimator
Equation (3) can be estimated by substituting fixed effects for ${S}_{i}$ and ${M}_{j}$, adding a multiplicative error term and taking the exponent of the righthandside:
$$\mathrm{ln}{X}_{ij}=\mathrm{lnG}+{\lambda}_{i}+{\gamma}_{j}+\underset{\text{ln}\varphi}{\underbrace{\beta \mathrm{ln}{d}_{ij}+{\delta}_{p}+{\delta}_{r}}}+{\epsilon}_{ij,}\text{}\text{\hspace{1em}}\text{\hspace{1em}}\text{(5)}$$
where $\lambda {}_{i}$ and ${\gamma}_{j}$ take into account the $i$ ’s output and market potential $\left({X}_{i}/{\Omega}_{i}\right)$ and $j$ ’s size and level of competition $\left({X}_{j}/{\Phi}_{j}\right)$, respectively.
This estimation strategy has become the standard means of estimating the gravity model^{Note 5} (Anderson and Yotov 2012), because of ease of estimation, and because the fixed effects may pick up origin and destinationspecific unobservables that can bias full informationbased estimates (Anderson 2010 and Head and Mayer 2014).
Missingvariable bias is particularly important in this work. Although every effort is made to assign trade flows to where goods are made and used, there may be cases where a destination is acting as a distribution centre, thereby inflating its level of export and imports. Also, some provinces (for example, British Columbia) may have stronger ties with world markets than with other provinces, which reduces their role as a domestic trading partner. In both instances, the fixed effects should account for these unobservables, which affect the level of trade in and out of a region (Head and Mayer 2014).
Ordinary least squares (OLS), the standard approach to estimating Equation (5), introduces two potential sources of bias. First, OLS estimates of a loglinearized multiplicative model are biased in the presence of heteroscedastic errors (Santos Silva and Tenreyro 2006). Second, OLS estimates are biased in instances with a larger number of zero flows, which are dropped when the gravity model is estimated using OLS (Head and Mayer 2014). The latter is particularly important here, because the models are estimated using flows between subprovincial regions, which results in many zero flows between actively trading region pairs.
The first step in addressing these problems is to determine if the error term is heteroscedastic. To do so, the Manning and Mullahy (2001) test is applied. It is estimated based on the following specification:
$$\mathrm{ln}\widehat{\epsilon}=\alpha +\kappa {\widehat{X}}_{ij}\text{\hspace{0.05em}}\text{\hspace{0.05em}},\text{\hspace{1em}}\text{\hspace{1em}}\text{(6)}$$
Where $\mathrm{ln}{\widehat{X}}_{ij}$ is the predicted loglevel of trade from the OLS estimation of (5) and ${\widehat{\epsilon}}_{ij}={X}_{ij}\mathrm{exp}\left(\mathrm{ln}{\widehat{X}}_{ij}\right)$ is the difference between the data and the fitted values from the same estimator. Without zero flows, Head and Mayer (2014) find $\kappa \approx 2$ when the datagenerating process produces log normal errors, but $\kappa \approx 1.6$ when the process produces (Poisson) heteroscedastic errors. In Table 3, the estimates of $\kappa $ are presented for estimates by province, ER and CD. For provincial trade, the point estimate for $\kappa $ is 2.11, suggesting lognormal errors. However, when the model is estimated by ER and CD, the point estimates for $\kappa $ are about 1.7. For ERs, where the number of zero flows is around 8%, the estimate is close to what would be expected based on Monte Carlo simulations (Figure 4 in Head and Mayer 2014). For the CD estimates, where almost half the pairs have zero flows, the expected value of $\kappa $ is 1.6, with the actual estimate at 1.7. However, this is close to what Head and Mayer find when they obtain estimates of $\kappa $ from real data. The upshot is that, in both instances, the estimate for $\kappa $ is significantly different from 2, suggesting the OLS estimator is inappropriate.
Geography  Κ  95% confidence interval 
Number of observations 


Lower bound  Upper bound  
Province  2.11  1.92  2.30  100 
Economic region  1.71  1.68  1.74  5,069 
Census division  1.68  1.67  1.69  47,156 
Notes: Κ is estimated with Equation (6) using trade flows between provinces, economic regions, and census divisions. When κ is significantly different from 2, the test can be interpreted as indicating that ordinary least squares is not the appropriate estimator. Source: Statistics Canada, authors' calculations. 
The second step is to assess the appropriate estimator in the presence of zero flows and heteroscedastic errors. Based on Monte Carlo simulation results, Head and Mayer (2014) find that the Poisson PseudoMaximumLikelihood estimator (PoissonPML) tends to produce the least bias. Therefore, it is the preferred estimator, especially when estimates are based on flows between subprovincial regions. It is also preferred because it perfectly replicates the Anderson and van Wincoop (2003) structural equation estimates (Fally 2015).
3.3 Geography and estimation
The analysis is based on the aggregation of point data into a set of geographic units of which the StandardGeographicClassificationbased units (hereafter, standard geography) (for example, ERs and CDs) are just one of an almost limitless number of geographies. As demonstrated by Hillberry and Hummels (2008), estimates of barriers to trade can be strongly influenced by the geography chosen.
Hillberry and Hummels’ (2008) findings are an instance of the Modifiable Areal Unit Problem (MAUP): “. . . the sensitivity of analytical results to the definition of units for which real data are collected” (Fotheringham and Wong, 1991, p. 1025). MAUP is characterized by both a scale and zoning effect. That is, analytical results depend on the spatial resolution (scale effect) and the morphology (zoning effect) of the geography used to aggregate the data (Páez and Scott 2005).
These problems apply to multivariate statistics, including spatial interaction models like the gravity model (Fotheringham and Wong 1991, Amrhein and Flowerdew 1992 and Briant, Combes and Lafourcade 2010). In particular, Briant, Combes and Lafourcade (2010) show that gravity model results are more sensitive to scale and less sensitive to zoning effects. However, these are of secondary importance compared with model specification problems (for example, missing variable bias). Nevertheless, as Amrhein (1995) demonstrates, MAUP can emerge even when specification issues are taken into account.
Thus, the effects of geographic aggregation must be taken into account. This is accomplished by applying different geographies to the data. Two strategies are employed here. The first is to determine how sensitive the results are to the application of standard geographies, namely, defining trading regions on the basis of provinces, ERs and CDs. The second strategy takes advantage of the insight of Arbia (1989), who shows that biases resulting from the scale and zoning of the geography can be minimized by ensuring that geographic units are identical and spatially independent. A hexagonal lattice^{Note 6} is overlaid on the geocoded origin and destination points, creating an identical and spatially independent geography (Map 2). Hexagons that cross provincial borders are split and treated as discrete geographic units.
However, use of a hexagonal geography, while possibly minimizing the bias created by aggregating data, does not eliminate it. Issues of scale and zoning remain. Because no theoretically predetermined scale for the hexagons exists, the sensitivity of the results to size requires testing. An example is the geographic coverage of the 75 km and 225 km per side hexagons in Maps A1 and B1 (in Map 2). The smaller hexagons cover portions of metropolitan areas, while the larger hexagons can envelop several. Similarly, although the shape of the hexagons does not change, zoning still matters because they are arbitrarily positioned over the origin and destination points. In Map A1, Toronto is split across two hexagons, while in Map A2, it is split across three. Scaling and zoning effects are tested by running the model across different scales and zonings.
Description for Map 2
The title of Map 2 is "Size and placement of hexagonal lattices."
The map contains four maps: Map A1, Map A2, Map B1 and Map B2. Each map depicts the same area. These four maps represent the density of locations in the Surface Transportation File (STF), along with the hexagonal lattices that make up the geography in the paper.
The notes for Map 2 are as follows:
Notes: Maps A1 and A2 present two different overlays of hexagons with 75km sides on southern Ontario and Quebec, where each dot represents an origin/destination, while Maps B1 and B2 do the same for hexagons with 225 km sides. Hexagons must respect provincial boundaries and are split across provinces.
The source of Map 2 is “Statistics Canada, authors’ calculations.”
For additional information, please contact Statistics Canada at STATCAN.infostatsinfostats.STATCAN@canada.ca.
4. Model estimates
In this section, border effects are first estimated using provincelevel estimates of trade flows, thereby providing a base case. Border effects are then estimated using subprovincial geographies. This forms the core of the analysis. The remainder of the discussion focuses on a set of robustness checks, with particular attention to the sensitivity of the estimates to MAUP, alternative specifications of the model, or combinations thereof.
4.1 Standard provincebased estimates
Interprovincial barriers to trade are measured by comparing intraprovincial and interprovincial aggregate trade levels. This serves several purposes. First, by comparing the actual level of interprovincial trade to the benchmarked estimates, the sensitivity of the results to the loss of trade from benchmarking can be determined. Second, the OLS, PoissonPML and GammaPML estimates can be compared absent zero flows. Based on their firstorder conditions, the Poisson estimator puts more emphasis on the absolute deviation between actual and predicted flows, while the OLS and GammaPML place more emphasis on the percentage deviation, and as such, are expected to give similar results (Head and Mayer 2014). Third, the provincial results form a baseline to compare the estimated barriers to interprovincial trade using trade between subprovincial regions.
Table 41 shows the estimated effects of distance and ownprovince on provincial trade, using the inputoutputbased flows and those derived after benchmarking. The model is estimated using an appropriately transformed version of (5) with the mean level of provincial trade from 2004 to 2012 as the dependent variable. Several observations may be drawn from the table.
First, estimates based on the inputoutput and benchmarked flows are similar. Ownprovince estimates tend to be lower when using the benchmarked estimates, but the effect is relatively small, particularly when the Poisson estimator is used. Because there is relatively little loss of generality resulting from the benchmarking, the remainder of the discussion focuses on these estimates.
Second, there is evidence of a border effect, regardless of estimator used. The exception is the OLS estimator, which is not significant for the benchmarked flows. Using the inputoutput benchmarked estimates, the border effect ranges from 1.61 (OLS) to 2.26 (Poisson)—that is, withinprovince trade is 61% to 126% higher than interprovincial trade when distance and multilateral resistance are taken into account.
A benefit of building the trade estimates up from shipment data is that it is possible to obtain a more accurate measure of the distance goods travel within and between provinces. Sensitivity of the results to the distance measure can be tested by comparing estimates based on the network distance to the greatcircle distance, which is typically used in the literature (Appendix A.4).
Network distance  

Inputoutput  Benchmarked  
OLS  Poisson  Gamma  OLS  Poisson  Gamma  
Distance  
Coefficient  1.025^{Note **}  0.661^{Note **}  0.999^{Note **}  1.077^{Note **}  0.686^{Note **}  1.0780^{Note **} 
Standard error  0.0458  0.0496  0.0453  0.0576  0.0522  0.0537 
Ownprovince  
Coefficient  0.607^{Note **}  0.865^{Note **}  0.775^{Note **}  0.479  0.816^{Note **}  0.634^{Note *} 
Standard error  0.2230  0.0807  0.1900  0.2890  0.0827  0.2540 
Constant  
Coefficient  12.31^{Note **}  9.916^{Note **}  12.42^{Note **}  11.70^{Note **}  9.515^{Note **}  12.08^{Note **} 
Standard error  0.4100  0.5590  0.3730  0.6300  0.8770  0.5350 
Border effect  1.83  2.38  2.17  1.61  2.26  1.89 
Rsquared  0.954  Note ...: not applicable  Note ...: not applicable  0.959  Note ...: not applicable  Note ...: not applicable 
Number of observations  100  100  100  100  100  100 
... not applicable
Source: Statistics Canada, authors' calculations. 
Greatcircle distance  

Inputoutput  Benchmarked  
OLS  Poisson  Gamma  OLS  Poisson  Gamma  
Distance  
Coefficient  1.058^{Note **}  0.778^{Note **}  1.037^{Note **}  1.100^{Note **}  0.806^{Note **}  1.106^{Note **} 
Standard error  0.0462  0.0571  0.0436  0.0613  0.0591  0.0564 
Ownprovince  
Coefficient  0.747^{Note **}  0.780^{Note **}  0.840^{Note **}  0.653^{Note *}  0.728^{Note **}  0.743^{Note **} 
Standard error  0.1940  0.0907  0.1710  0.2740  0.0882  0.2490 
Constant  
Coefficient  12.01^{Note **}  10.49^{Note **}  12.17^{Note **}  11.29^{Note **}  10.12^{Note **}  11.70^{Note **} 
Standard error  0.4050  0.5470  0.3600  0.6440  0.8480  0.5350 
Border effect  2.11  2.18  2.32  1.92  2.07  2.10 
Rsquared  0.956  Note ...: not applicable  Note ...: not applicable  0.959  Note ...: not applicable  Note ...: not applicable 
Number of observations  100  100  100  100  100  100 
... not applicable
Source: Statistics Canada, authors' calculations. 
How distance is measured matters. On average, network distance is 33% greater than greatcircle distances. As a result of the compression of distance, the parameter on distance should be more negative for greatcircle distancebased estimates, which is true regardless of the estimator (see Table 42). As well, greatcircle withinprovince distances are, in relative terms, overestimated (Appendix A.4), which biases the ownprovince effect upward. The OLS and Gamma estimators show this effect, but not the Poisson, where the bias appears to be captured by the coefficient on distance.
4.2 Estimates by subprovincial geography
Estimates of provincial border effects based on comparisons of intraprovincial with interprovincial trade flows may still be biased, if these units do not effectively capture the pattern of trade. As shown by Hillberry and Hummels (2008), if shortdistance flows predominate and are not properly captured by the internal distance measure, the estimated border effect may be biased upward.
To further establish the presence and strength of provincial border effects, intraprovincial and interprovincial trade flows are measured using subprovincial geographies of different sizes and morphologies. Because trade can be both within and between subprovincial geographic units, a binary variable is included for withinunit trade (ownregion). It should capture nonlinearities in the effect of distance for these shorterdistance flows and/or differences in the nature of ownunit versus betweenregion trade. Withinregion trade is more likely to include shortdistance flows between manufacturers and distribution centres, between distribution centres and retail stores (Hillberry and Hummels 2003), and between upstream suppliers and downstream users of intermediate inputs (Hillberry and Hummels 2008).
Moving down to subprovincial units introduces the problem of zero flows between trading units. The set of trading units is defined as those that either make or use the good. Units that do not engage in goods trade, either within themselves or with other units, are excluded. This may result from no measurable goods production in the unit or from sampling variability. Because the estimates are based on the average value of trade over nine years, the effect of sampling variability is likely to be low. Of course, the units included in the trading set do not trade with all potential units, resulting in zero flows. Zero flows may be due to random chance (again, sampling variability), or they may be structural (producers incur costs above the trading threshold). To permit the presence of zeros, the Poisson estimator is used. For zero flows, the distance between regions is measured as the greatcircle distance.^{Note 7}
Of the four geographic units used in the analysis, two (ERs and CDs) are based on standard geographies; the other two are hexagon lattices with 75 km and 225 km per side. If hexagons were larger than 225 km per side, some of the smaller provinces would have very few. Hexagons smaller than 75 km per side would result in such a large number of fixed effects to be estimated that the PoissonPML estimator fails to reliably converge. Table 5 displays the characteristics of the units.
Standard  Hexagons  

Economic regions 
Census divisions 
225km sides  75km sides  
Geographic units  number  
Total  76  288  90  511 
Trading set  73  282  90  380 
Area  square kilometres  
Average  74,930  19,397  131,528  14,614 
Standard deviation  129,347  59,759  Note ...: not applicable  Note ...: not applicable 
Minimum  247  193  131,528  14,614 
25th percentile  10,416  1,863  131,528  14,614 
Median  20,880  3,771  131,528  14,614 
75th percentile  77,903  15,202  131,528  14,614 
Maximum  747,158  747,158  131,528  14,614 
... not applicable Notes: The trading set is defined as geographic units that engage in measured trade, excluding those in the territories. The area of subprovincial geographic units is calculated for the trading set. Source: Statistics Canada, authors' calculations. 
With ERs as the trading unit, the distance parameter tends to be less negative than the provincebased estimates, with ownregion likely picking up the nonlinear effect of shortdistance flows (Table 6). The ownprovince estimate is smaller, resulting in a border effect of 2.10. Using CDs—a fundamental building block of ERs—the number of potential trading pairs rises from 5,329 to 79,524. For this much larger set of smaller trading units, the border effect falls to 1.97.
For both the small and large hexagons, ownregion effects were not statistically significant; the ownprovince effect remained significant, but notably smaller than that for standard geographies. The result is a border effect that falls in a narrow range from 1.60 (large hexagons) to 1.62 (small hexagons). In other words, intraprovincial trade is estimated to be 60% to 62% higher than interprovincial trade, all else being equal (Table 6).
Geography  

Standard  Hexagon  Forward Sortation Area 

Economic region 
Census division 
225km sides  75km sides  
Distance  
Coefficient  0.551^{Note **}  0.573^{Note **}  0.820^{Note **}  0.742^{Note **}  0.426^{Note **} 
Standard error  0.0461  0.0278  0.0620  0.0357  0.0146 
Ownregion  
Coefficient  0.408^{Note **}  0.467^{Note **}  0.101  0.0215  1.052^{Note **} 
Standard error  0.1380  0.1210  0.1270  0.1170  0.0966 
Ownprovince  
Coefficient  0.743^{Note **}  0.679^{Note **}  0.472^{Note **}  0.483^{Note **}  0.909^{Note **} 
Standard error  0.0951  0.0633  0.0872  0.0783  0.0421 
Constant  
Coefficient  6.981^{Note ***}  7.094^{Note ***}  3.142^{Note ***}  2.540^{Note ***}  2.015^{Note ***} 
Standard error  0.4900  0.3590  0.7760  0.4770  0.3830 
Border effect  2.10  1.97  1.60  1.62  2.48 
Number of observations  5,329  77,274  8,619  132,862  2,574,640 
Source: Statistics Canada, authors’ calculations. 
These results contrast with those of Hillberry and Hummels (2008), who find state border effects to be an artefact of the geography used to measure internal trade. However, in their analysis, border effects disappeared only when an even finergrained geography than that applied here was used, namely, 5digit ZIP codes. To account for this, the model was rerun using Forward Sortation Areas (FSAs), the closest Canadian equivalent to ZIP codes.^{Note 8} The point estimates for ownprovince remain positive and significant (Table 6), but at a higher level than the other geographies. Even with a very finegrained geography, provincial border effects remain: a finding that is robust to a wide set of specifications (Subsection 4.3.4).
The provincial and subprovincialbased estimates of border effects indicate that the geography chosen matters, but information from which to draw strong conclusions is still insufficient. Two issues need to be addressed. The first is how sensitive the results are to the MAUP, namely, scaling and zoning effects (size and placement of hexagons). It is not clear if variations in provincial border effects across hexagons of different sizes (or lack thereof) are outweighed by variability resulting from the placement of the lattices. The second issue is whether a nonlinear effect of distance on trade influences estimates of provincial border effects. The elasticity on distance varies across geographies and estimators, and as Head and Mayer (2014) note, variation on the distance term between the Poisson and Gamma estimators may indicate model misspecification, which is observed in Table 4. Therefore, a more rigorous assessment of how the geography and model specification, particularly nonlinear effects of distance, influence estimated border effects is necessary.
4.3 Robustness of subprovincial estimates
The robustness of the estimates is tested in four steps. The first tests how sensitive the results are to the MAUP. The second tests whether there is a nonlinear effect of distance on trade that may influence estimates of provincial border effects. The third combines the first two by determining how sensitive the results are to taking both MAUP and the nonlinear effect of distance into account. The fourth step returns to Hillberry and Hummel’s (2008) analysis to determine if provincial border effects remain when FSAs are used as trading units after applying their specification and estimator, as well as the fullyspecified model.
4.3.1 Modifiable Areal Unit Problem
To test the sensitivity of the results to the MAUP, the models are rerun on randomly shifted hexagonal lattices of varying sizes. For a given size of hexagon, the lattice is superimposed on Canada’s landmass, with each origin and destination point coded to their respective province and hexagon. The centroid of each hexagon is then shifted to any random point within a circle circumscribed by its borders. The set of points is limited to the circumscribed circle, because shifting over more than one unit simply repeats the pattern. The origin and destination points are recoded to their province and hexagon. The lattice is randomly shifted 100 times,^{Note 9} resulting in a set of parameters that describes how sensitive the estimates are to the placement of the lattice (MAUP zoning effect) for a given size of hexagon. This is repeated for seven sizes, increasing in 25 km per side increments from 75 km to 225 km. This accounts for how sensitive the results are to the size of hexagons (the MAUP scaling effect).
To represent the distribution of coefficients resulting from the simulations for the main variables—ownprovince, ownregion (hexagon) and distance—Figure 3 presents box plots by size of hexagon. The boxes represent the interquartile range, with the line intersecting the box being the median coefficient value. The ends of the whiskers—upper and lower adjacent values—represent the ranked coefficient value nearest to, but not above/below 1.5 times, the interquartile range from above/below. The dots identify extreme values.
Description for Figure 3
The title of Figure 3 is "Coefficient estimates for own province, own region, and distance by size (kilometre per side) and placement of hexagons."
There are 3 boxandwhisker plots of regression coefficient estimates in this figure.
The horizontal axis of each plot has 7 categories: 75, 100, 125, 150, 175, 200 and 225.
The vertical axis of the first plot is "Coefficient: Own province." It starts ad 0.35 and ends at 0.55, with tick marks every 0.05.
The median value of the first plot for category 75 is 0.5, the median value for category 225 is 0.45, with the other median values being in between these two.
The vertical axis of the second plot is "Coefficient: Own region." It starts at 0.4 and ends at 0.2, with tick marks at every 0.2.
The median value of the second plot for category 75 is about 0.15, the median value for category 225 is about 0.05, with the other median values being in between these two. The upper whisker on each category in the second plot is above 0.
The vertical axis of the third plot is "Coefficient: Distance." It starts at 0.9 and ends at 0.7, with tick marks at every 0.05.
The median value of the third plot for category 75 is about 0.75, the median value for category 225 is about 0.82, with the other median values being in between these two.
The notes for Figure 3 are as follows:
Notes: The boxes represent the interquartile range, with the line intersecting the box being the median coefficient value. The ends of the whiskers—the upper and lower adjacent values—represent the ranked coefficient value that is nearest to but not above (below) 1.5 times the interquartile range from above (below). The dots signify extreme values.
The source for Figure 3 is “Statistics Canada, authors’ calculations.”
For additional information, please contact Statistics Canada at STATCAN.infostatsinfostats.STATCAN@canada.ca.
For ownprovince, the median coefficient values range from 0.50 for the smallest hexagons to 0.45 for the largest (scaling effect), with the coefficients asymptotically converging toward the lower median value as the size of hexagons increases. This is consistent with Coughlin and Novy (2016), who report that if trade within small units is particularly strong, as the size of the unit increases, the border effect tends to diminish. Placement of the hexagonal lattice (zoning effect) has a larger effect on the estimates, with the difference between the box plot lower and upper adjacent values being greater than the difference in the medians across the size of hexagons. This contrasts with Briant, Combes and Lafourcade (2010), who find the scaling effect is more important. More broadly, shifting to a uniform geography has a qualitative effect on the estimated border effect, a result that holds when accounting for the effect of the size and placement of the hexagons on the estimates.
4.3.2 Nonlinear effects of distance
Variation in the results across hexagons of different sizes may stem from a nonlinear effect of distance on trade, a telltale sign of which is the negative effect of hexagon size on the distance coefficient (Figure 3). As the hexagons become smaller, the average distance shipped decreases. If these more prevalent shorterdistance flows are underestimated, the provincial border effect will be overestimated, because intraprovincial trade occurs over shorter distances than interprovincial trade (Figure 2). This appears to be the case, as a positive association exists between the ownprovince and the distance coefficients (Figure 3).
For at least two reasons, the effect of distance on trade is expected to vary. First, prices charged by trucking firms, for instance, include fixed and variable (linehaul) cost components. Because fixed costs per shipment are around $200 and linehaul costs increase at about $0.80 per kilometre (Brown 2015), prices inclusive of transport costs will effectively be uniform over short distances. Second, the endogenous clustering of upstream suppliers and downstream firms^{Note 10} and hubandspoke distribution networks^{Note 11} (Hillberry and Hummels 2008) may result in a large volume of trade over short distances, with a steep drop as distance shipped moves beyond “just down the street” shipments. Uniform prices over short distances, combined with clustering/distribution effects, result in a complicated set of expectations. For very short distance flows, the effect of distance on trade may be negative (at least after a short plateau), but the negative effect of distance on trade beyond these very short distance flows is expected to be initially weak, but stronger as variable costs surpass the effect of fixed costs on transportation rates. This pattern in the data requires moving beyond the standard quadratic form to account for nonlinearities.
To account for the nonlinear effects of distance, the model is reestimated using a spline with knots at 25 km, 100 km and 500 km (Table 7) employing the hexagonal lattices for the estimates in Table 6.^{Note 12} Based on the smaller hexagon results, the distance elasticities are consistent with a steep drop in shipments over very short distances (reflecting colocation of inputoutput linked plants, for instance), while the insignificant effect of distance for 25 to 100 km flows is consistent with a relatively constant transportation rate charged by firms over short distances. Accounting for the nonlinear effect of distance causes the coefficient on ownprovince to become more similar across hexagon size classes. However, given the sensitivity of the results to placement of the hexagonal lattices, from this single set of point estimates, it is unclear how similar the border effect estimates are between the large and small hexagons.
Finally, a binary variable is added for hexagons that share a border (contiguous regions). The expectation is that this contiguity measure will account for shortdistance flows across boundaries. For both the large and small hexagons, the contiguous region coefficient is insignificant, but the ownprovince coefficient falls while remaining significant.
Hexagon: 225km sides  Hexagon: 75km sides  

Model 1  Model 2  Model 1  Model 2  
Distance  
0 km to 25 km  
Coefficient  1.356^{Note **}  1.338^{Note **}  0.932^{Note **}  0.923^{Note **} 
Standard error  0.284  0.281  0.122  0.122 
25 km to 100 km  
Coefficient  0.544  0.561  0.268  0.276 
Standard error  0.471  0.469  0.227  0.227 
100 km to 500 km  
Coefficient  0.836^{Note **}  0.720^{Note **}  0.711^{Note **}  0.801^{Note **} 
Standard error  0.1190  0.1200  0.0598  0.0915 
Greater than 500 km  
Coefficient  0.818^{Note **}  0.772^{Note **}  0.862^{Note **}  0.858^{Note **} 
Standard error  0.0854  0.1090  0.0684  0.0689 
Ownregion  
Coefficient  0.0608  0.2330  0.312^{Table 7 Note †}  0.1790 
Standard error  0.173  0.237  0.161  0.199 
Ownprovince  
Coefficient  0.458^{Note **}  0.412^{Note **}  0.431^{Note **}  0.418^{Note **} 
Standard error  0.0839  0.0755  0.0713  0.0709 
Contiguous regions  
Coefficient  Note ...: not applicable  0.195  Note ...: not applicable  0.132 
Standard error  Note ...: not applicable  0.1380  Note ...: not applicable  0.0972 
Constant  
Coefficient  4.513^{Note **}  4.184^{Note **}  2.638^{Note **}  2.746^{Note **} 
Standard error  0.807  0.833  0.494  0.501 
Border effect  1.58  1.51  1.54  1.52 
Number of observations  8,619  8,619  132,862  132,862 
... not applicable
Source: Statistics Canada, authors' calculations. 
4.3.3 Nonlinear effects of distance and the Modifiable Areal Unit Problem
The next check assesses whether accounting for the nonlinear effects of distance reduces the degree of variation in results across different sizes and placement of hexagons. Again, this is accomplished by randomly perturbing the hexagonal lattices for the largest (225 km per side) and smallest (75 km per side) hexagons, and also, across model specifications. The “base” model estimates replicate those in Figure 3 (which uses the specification in Table 6); Model 1 and Model 2 match those in Table 7. Taking the nonlinear effect of distance into account reduces the median coefficient of the small hexagons, but increases that of the large hexagons (Figure 4), effectively reversing the pattern in Figure 3. Adding contiguity to the model (Model 2) produces large and small hexagonbased provincial border effects that are statistically indistinguishable. The coefficients on own hexagons also converge, but only when contiguity is taken into account. While the central tendencies of the small and large hexagon coefficient distributions are the same, their variances are not—the large hexagons have more than double the interquartile range of the small hexagons. On this basis, the smallhexagon border effects are the most reliable.
Description for Figure 4
The title of Figure 4 is "Coefficient estimates for own province and own region by model, hexagon size (kilometre per side) and placement."
There are two boxandwhisker plots of regression coefficient estimates in this figure.
The horizontal axis of each plot has two categories: 75 and 225.
Each category in each plot has three series: series 1 is labeled "Base," series 2 is labeled "Model 1," and series 3 is labeled "Model 2."
The vertical axis of the first plot is "Coefficient: Own province."
In the first plot, the median values for the three series at category 75 are as follows: series 1, 0.5; series 2, around 0.44; series 3, around 0.43.
In the same plot, the median values for the three series at category 225 are as follows: series 1, 0.45; series 2, around 0.47; series 3, around 0.45.
The vertical axis of the second plot is "Coefficient: Own region."
In the second plot, the median values for the three series at category 75 are as follows: series 1, 0.1; series 2, around 0.5; series 3, around 0.5.
In the same plot, the median values for the three series at category 225 are as follows: series 1, around 0.1; series 2, around 0.0; series 3, around 0.45.
The notes for Figure 4 are as follows:
Notes: The ‘base’ model estimates replicate those presented in Figure 3 (which use the specification presented in Table 6), while Model 1 and Model 2 match those in Table 7. The boxes represent the interquartile range, with the line intersecting the box being the median coefficient value. The ends of the whiskers—the upper and lower adjacent values—represent the ranked coefficient value that is nearest to but not above (below) 1.5 times the interquartile range from above (below). The dots signify extreme values.
The source of Figure 4 is “Statistics Canada, authors’ calculations.”
For additional information, please contact Statistics Canada at STATCAN.infostatsinfostats.STATCAN@canada.ca.
4.3.4 Provincial border effects based on Forward Sortation Areas
As a last robustness check, the analysis revisits Hillberry and Hummels’ (2008) finding that state border effects are eliminated when trade is measured using fivedigit ZIP codes. This entails initially using the estimator (OLS) and model specification (quadratic term on distance) in their analysis, and then applying the preferred estimator (Poisson) and model (distance effects estimated using a spline) used above.
While the model and estimators can be equated, results may vary because of differences in the underlying data. First, because Hillberry and Hummels’ (2008) CFS data are shipperbased, shipments can be limited to those of manufacturers, rather than wholesalers and distributors. However, the data used here are carrierbased, so it is not possible to distinguish between the two. To the extent that shipments from wholesalers are more localized, stronger trade is more likely at short distances, which should be accounted for by the ownregion (FSA) term.
Second, because the estimates are based on the nineyear average of flows across FSAs, a fuller set of flows is likely to be obtained than that provided by the CFS. As a result of this and of Canada’s geography, the FSA data are weighted toward longerdistance flows, with the average distance shipped between FSAs being 1,679 km (1,049 miles), whereas the average distance between ZIP codes is 837 km (523 miles) (Hillberry and Hummels 2008). Because distance elasticity increases (in absolute terms) with distance shipped, the effect of distance is expected to be stronger in the present analysis.
Table 8 contains the estimates, with the first three columns showing the model equivalent to that in Hillberry and Hummels (2008, Table 2). The first column contains OLSbased estimates; the second and third, Poissonbased estimates without and with zeros included, respectively. Evaluating the effect of distance using the mean distance of 837 km, elasticity is 0.42, more than double the ZIPcodebased estimate of 0.19. Also, the point estimate for ownregion (FSA) is much higher. These disparities were expected, given the differences in the underlying data. Specifically, the ownprovince effect is positive and significant when using the same estimator, model and geography as Hillberry and Hummels.
Application of the Poisson estimator reduces the effect of distance, because larger (typically) shortdistance flows are weighted more heavily. Evaluated at 837 km, elasticity on distance is 0.25, and only slightly lower in absolute terms when zero flows are added. The Poisson estimator also produces smaller, but still significant, ownregion and ownprovince effects. Inclusion of zero flows results in a positive coefficient on distance up to 5 km, and a declining point estimate thereafter. Adding zeros also raises the point estimates on ownregion and province. The highly nonlinear effect of distance when the Poisson estimator is applied suggests that the influence of distance on trade has to be treated in a flexible manner. This is accomplished by estimating a spline on distance.
OLS  Poisson  

Model 1  Model 1  Model 2  Model 3  Model 4  
Distance  
Coefficient  0.490^{Note **}  0.0105  0.217^{Note **}  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable 
Standard error  0.0220  0.0517  0.0566  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable 
Distance squared  
Coefficient  0.0104^{Note **}  0.0353^{Note **}  0.0661^{Note **}  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable 
Standard error  0.00178  0.00446  0.00479  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable 
Distance  
0 km to 25 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.0678  Note ...: not applicable  Note ...: not applicable 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.0587  Note ...: not applicable  Note ...: not applicable 
0 km to 10 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.303^{Note **}  Note ...: not applicable 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.110  Note ...: not applicable 
0 km to 5 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.542^{Note *} 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.245 
5 km to 10 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.0223 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.3470 
10 km to 25 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.461^{Note **}  0.393^{Note **} 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.116  0.131 
25 km to 100 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.296^{Note **}  0.198^{Note **}  0.206^{Note **} 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.0668  0.0669  0.0667 
100 km to 500 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.497^{Note **}  0.507^{Note **}  0.505^{Note **} 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.0377  0.0374  0.0375 
Greater than 500 km  
Coefficient  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.767^{Note **}  0.764^{Note **}  0.764^{Note **} 
Standard error  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable  0.0235  0.0235  0.0235 
Ownregion  
Coefficient  2.357^{Note **}  1.316^{Note **}  1.494^{Note **}  1.472^{Note **}  1.561^{Note **}  1.551^{Note **} 
Standard error  0.1040  0.1010  0.1010  0.1040  0.0998  0.1010 
Ownprovince  
Coefficient  1.211^{Note **}  0.468^{Note **}  0.616^{Note **}  0.592^{Note **}  0.592^{Note **}  0.593^{Note **} 
Standard error  0.0154  0.0361  0.0385  0.0393  0.0393  0.0393 
Constant  
Coefficient  3.117^{Note **}  1.618^{Note **}  1.123^{Note **}  1.416^{Note **}  0.793^{Table 8 Note †}  0.551 
Standard error  0.217  0.399  0.408  0.415  0.442  0.472 
Number of observations  652,214  652,214  2,574,640  2,574,640  2,574,640  2,574,640 
Border effect  3.36  1.60  1.85  1.81  1.81  1.81 
Distance elasticity at 837 km  0.42  0.25  0.23  Note ...: not applicable  Note ...: not applicable  Note ...: not applicable 
Includes zero flows  no  no  yes  yes  yes  yes 
... not applicable
Source: Statistics Canada, authors' calculations. 
Model 2 uses the same structure as Model 1 in Table 7, with knots at 25 km, 100 km and 500 km. The provincial border effect is lower than when the quadratic is used on distance, but remains significant. Unlike when hexagons are used, no strong negative effect on distance is evident between 0 and 25 km. Instead, because the vast majority of FSAs are small and located in metropolitan areas, the effect of shortdistance flows is captured by the ownregion term, with a strong positive coefficient. Hexagons, whose size distribution is not associated with the density of shortdistance flows, have a weaker relationship. Further subdividing the effect of distance over shortdistance shipments reveals a positive association with distance between 0 and 10 km (Model 3) and between 0 and 5 km (Model 4). At such short distances, increases in distance have little effect on trade costs, but an apparent increase in the number of potential sources of demand. The estimated provincial border effect is unchanged with these changes to the specification. In short, contrary to Hillberry and Hummels (2008), adoption of very small trading units does not eliminate border effects. Therefore, provincial border effects, while sensitive to the specification and the geography, are never eliminated. However, the question of whether they are economically meaningful remains.
5. Border effect tariff equivalent
To estimate the tariff equivalent of the provincial border effect, Head and Mayer’s (2014, p. 32–34) approach is applied. As above, ${\delta}_{p}$ denotes the provincial border effect coefficient, which reflects the reduction in trade costs between subprovincial regions simply by virtue of being part of the same province. Given that ${\delta}_{p}=\eta \left(\mathrm{ln}{\rho}^{inter}\mathrm{ln}{\rho}^{intra}\right)$, where ${\rho}^{inter}$ and ${\rho}^{intra}$ are interprovincial and intraprovincial trade costs, respectively, and $\eta $ is the trade elasticity with respect to transportation costs, if $t$ is the tariff that must be removed to equate the cost of moving goods within and between provinces, the interprovincial trade tariff equivalent is
$$t=\left(1+\upsilon \right)\left[\mathrm{exp}\left({\delta}_{p}/\eta \right)1\right]\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}},\text{\hspace{1em}}\text{\hspace{1em}}\text{(7)}$$
where $\upsilon $ is the tariff equivalent of withinprovince barriers to trade, which are assumed to be zero. Hence, the only missing information is the trade cost elasticity of trade:
$$\mathrm{ln}{X}_{ij}={\phi}_{i}+{\xi}_{j}+\eta \mathrm{ln}{\tau}_{ij}+{\mu}_{ij}\text{\hspace{0.05em}}\text{\hspace{0.05em}},\text{\hspace{1em}}\text{\hspace{1em}}\text{(8)}$$
where ${\tau}_{ij}$ is 1 plus the ad valorem transportation costs, ${\phi}_{i}$ and ${\xi}_{j}$ are, respectively, origin and destination fixed effects, and ${\mu}_{ij}$ is the error term. Ad valorem transportation costs are derived from the STF, which reports the price charged to shippers and the estimated value of each shipment. Estimated^{Note 13} price elasticity based on (8) is 6.40, which is between the median (5.03) and average (6.74) price elasticities identified in Head and Mayer’s (2014) metaanalysis.
For the median provincial border effect coefficient on the 75 km per side hexagon (Figure 4, Model 2), $t=\mathrm{exp}\left(0.426/6.40\right)1=0.069$ or 6.9%. Using a different methodology, Agnosteva, Anderson and Yotov (2014) arrive at a lower, but statistically indistinguishable,^{Note 14} estimate of 5.6%.
The tariff equivalents of the border effect across the standard and hexagonal geographies are presented in Chart 1 and illustrate the impact of the trading unit chosen. The hexagons use the median point estimates from the simulations in Figures 3 and 4. The provincial estimates are the highest at 13.6%, followed closely by ER and CDbased tariff equivalents of 12.3% and 11.2%, respectively. Imposition of a uniform hexagonal geography causes the most notable drop in the tariff rate. As the hexagons become larger, the point estimates converge to a tariff equivalent of 7.3%. The tariff equivalent for the 75 km and 225 km per side hexagons that accounts for the nonlinear effect of distance and contiguity (Figure 4, Model 2) provides the lowest estimates, which are essentially indistinguishable. Therefore, in the fully specified model, the size of hexagon chosen is of little consequence. At 6.9%, the 75 km per side hexagons provide the preferred estimate, because of the smaller interquartile range relative to the 225 km per side hexagons. Compared with this estimate, relying on provincial trade would increase border effect estimates by 6.7 percentage points, a substantial difference. To put it in perspective, this value is above Canada’s mean tariff rate (4.9%).^{Note 15}
Data table for Chart 1
Tariff equivalent  

value  
Standard geography  
Province  13.6 
Economic region  12.3 
Census division  11.2 
Hexagon (kilometre per side)  
75  8.1 
100  7.6 
125  7.4 
150  7.4 
175  7.3 
200  7.4 
225  7.3 
Hexagon with spline (kilometre per side)  
75  6.9 
225  7.2 
Notes: All tariff equivalents are estimated using a price elasticity on transportation costs of 6.40. The standard geography ad valorem tariff equivalents are based on the provincial border effect estimates from Table 4 (Poisson estimate of the benchmarked flows using the network measure of distance) and Table 5. The hexagonbased tariff equivalents are based on the median point estimate from Figure 5, while the hexagon with splinebased tariff equivalents are the median point estimate from Figure 6 based on Model 2 from Table 7, which includes the control for contiguous regions. Source: Statistics Canada, authors' calculations. 
Commodity  Tariff equivalent 

coefficient  
Grains and other crop products  0.07^{Note *} 
Live animals  0.09^{Note *} 
Fish and seafood, live, fresh, chilled or frozen  0.01 
Other farm products  0.14^{Note *} 
Forestry products and services  0.32 
Metal ores and concentrates  0.23^{Note *} 
Nonmetallic minerals  0.09 
Alcoholic beverages and tobacco products  0.01 
Food and nonalcoholic beverages  0.04^{Note *} 
Textile products, clothing, and products of leather and similar materials  0.04^{Note *} 
Wood products  0.12^{Note *} 
Wood pulp, paper and paper products and paper stock  0.20^{Note *} 
Printed products and services  0.07^{Note *} 
Refined petroleum products (except petrochemicals)  0.02^{Note *} 
Chemical products  0.02 
Plastic and rubber products  0.08^{Note *} 
Nonmetallic mineral products  0.24^{Note *} 
Primary metallic products  0.06^{Note *} 
Fabricated metallic products  0.08^{Note *} 
Industrial machinery  0.02^{Note *} 
Computer and electronic products  0.01 
Electrical equipment, appliances and components  0.02^{Note *} 
Motor vehicle parts  0.11^{Note *} 
Transportation equipment  0.03^{Note *} 
Furniture and related products  0.08^{Note *} 
Other manufactured products and custom work  0.05^{Note *} 
Source: Statistics Canada, authors' calculations. 
The tariff equivalent of the border effect by commodity is calculated by estimating by commodity the border effect $\left({\delta}_{p}\right)$ and the elasticity of trade with respect to transportation costs $\left(\eta \right)$. The statistical significance of the border effect is reported alongside the tariff equivalent (see Table 9). The border effect is small and statistically insignificant for many commodities, including transportation equipment, industrial machinery and refined petroleum products. For others that are largely intermediate goods (for example, motor vehicle parts), an apparent border effect is evident, potentially reflecting variation in industrial demand across provincial borders. For instance, most demand for auto parts comes from assembly plants concentrated in Ontario. However, for other commodities, there may be some aggregation bias. No significant border effect exists for alcoholic beverages and tobacco products, but this stems largely from the inclusion of tobacco products. When disaggregated, the expected pattern emerges$\u2015$the tariff equivalents for distilled liquor, wine and brandy, and beer are 8%, 56%, and 2%,^{Note 16} respectively. Dairy products might be viewed in the same way, but the commodity aggregations in the trucking data are too broad to separate flows of dairy products to accurately estimate the border effect.
6. Conclusions
Because of a lack of geographically detailed data on trade within and across provinces, intranational border effects are difficult to measure. By using a new flexible transactionlevel transportation data file to generate regional trade flows within and across provincial borders, this analysis shows that regardless of the model or geography chosen, provincial border effects persist, with an implied ad valorem tariff equivalent of 6.9%. This contrasts with estimates for the United States, in which state border effects are eliminated when similar approaches are applied. While the presence of provincial border effects is consistent with a dampening influence of nontariff barriers on interprovincial trade, they do not, in and of themselves, account for all barriers, because border effects capture multiple factors that affect interprovincial trade.
Several methodsbased observations may be drawn from this analysis. First, while the results are sensitive to the size of geographic unit chosen (province, economic region, census division or hexagon), a simple linear relationship does not exist between (average) size and border effects. In fact, a uniform shape (hexagons) is more important than the size of the geographic unit. This supports Arbia’s (1989) finding that biases resulting from the scale and zoning of the geography are minimized by using identical units.
Second, accounting for the nonlinear effect of distance is as or more important than controlling for geography (the Modifiable Areal Unit Problem [MAUP]). The negative association between hexagon size and border effects stemming from a nonlinear relationship between distance and trade predicted by Coughlin and Novy (2016) is effectively eliminated when these nonlinearities are explicitly taken into account. Nonetheless, accounting for MAUP is useful, as it provides a means to test the model’s specification.
Further work is needed to identify the effect of provincial nontariff barriers on estimated border effects, because direct information is required on the extent of these barriers and on other factors that influence interprovincial trade (for instance, firm linkages and migratory flows across provincial borders). Furthermore, while this analysis was able to estimate provincial border effects, the overall welfare implications resulting from their elimination are not measured here. Nevertheless, as Albrecht and Tombe (2016) show, these can be substantial.
Appendix A Data appendix
A.1 Valuing shipments
The waybills on which the Surface Transportation File (STF) is based describe the commodity and tonnage of each shipment, but not its value. To estimate value, a measure of the value per tonne is required. This is derived from an experimental transactionlevel trade file that measures the value and tonnage of goods by detailed Harmonized Commodity Description and Coding System (HS) commodity in 2008. Because the trade file identifies the mode used for each shipment, the value per tonne for each commodity also varies by the mode used. Export prices indices are used to project the value per tonne estimates through time (see Brown [2015] for a detailed discussion).
A.2 Geocoding shipment origins and destinations
Based on postal code data from the Trucking Commodity Origin and Destination Survey (TCOD) and Standard Point Location Codes (SPLCs) from the rail waybill file, each shipment is geocoded (assigned a latitude and longitude for the origin and destination) from 2004 to 2012. These are used to give the file a 2006 Standard Geographic Classification (Statistics Canada, n.d.b). As a result, each origin and destination is coded to its economic region (ER), census division (CD) and consolidated census subdivision (CCSD). Before 2004, the TCOD did not use postal codes to identify origins and destinations. For these years, the flows are coded only to ERs. Because origins and destinations are given latitudes and longitudes, other nonstandard geographies can be applied, such as the hexagonal lattices in this analysis. Imputation of just over half the postal codes likely reduces their accuracy. Nevertheless, when mapped, imputed and nonimputed postal codes presented similar geographic patterns.
A.3 Benchmarked weights
A primary goal when constructing the file is to ensure that value of trade on a shipment basis in the STF adds to known trade totals by commodity from the interprovincial trade flow file. To do this, two problems must be overcome, because the files represent different trade concepts and use different commodity classifications.
In the interprovincial trade flow file, an origin represents the point of production, and a destination represents a point of consumption. However, in the STF, an origin represents the point at which the shipment is picked up, and the destination is the point at which the shipment is dropped off, including warehouses that are transportation waypoints. A commodity produced in Quebec and consumed in British Columbia would be recorded as a flow from Quebec to British Columbia in the interprovincial trade flow database, but that flow may have multiple sources and destinations in the STF if it stops at warehouses in different provinces along the way. For instance, a QuebectoBritish Columbia trade flow might be counted as flows from Quebec to Ontario, and then from Ontario to British Columbia. This results in the STF overestimating flows between close provinces and underestimating flows between provinces that are farther apart, potentially biasing border effect estimates upward. Benchmarking is an attempt to reweight the surface transportation shipments to reflect the interprovincial trade flow concept.
The two files use related, but in practice, different, commodity classification systems. Although both commodity classifications are built from the commoditybased HS, the resulting aggregate classifications are so different as to eliminate any onetoone matching between them. The STF uses the Standard Classification of Transported Goods (SCTG) (Statistics Canada, n.d.a), while the interprovincial trade flow file uses the InputOutput Commodity Classification (IOCC). At every level of aggregation, some SCTG codes map to multiple IOCC codes, and vice versa. Because of the large number of multiple matches, no attempt is made to force a single IOCC code to any SCTG code. Instead, the goal is to benchmark the transportation file so that it represents the same values as the interprovincial flow file without specifying which transported commodities represent which inputoutput commodities. Rather than forcing a onetoone concordance between the files, the benchmark weights are set such that flows add to total commodity flows generated by the inputoutput system. The process involves a series of steps.
In the first step, each file is aggregated to include values of flows by year, origin province, destination province and commodity (SCTG for the STF; IOCC for the interprovincial flow file). This generates two vectors of the value of trade for IOCC commodity flows and SCTG commodity flows: ${X}_{I}$ and ${X}_{S}$, respectively.
The second step builds a concordance between SCTG and IOCC by province pair and year. This is done through onetomany mappings from SCTG to HS and from IOCC to HS, which combine to form a manytomany map from SCTG to IOCC, creating a concordance matrix $C$ used in the third and final step.
In the final step, the benchmark weights are calculated. For each year and origin and destination province pair, the two commodity vectors, ${X}_{I}$ and ${X}_{S}$, are combined with the concordance matrix $C$, of which all values are either 0 or 1 (depending on whether a given SCTG commodity maps to a given IOCC commodity). Defining the number of IOCC commodities as $M$ and the number of SCTG commodities as $N$, then ${X}_{I}$ has length $M$, ${X}_{S}$ has length $N$, and $C$ is an $M\times N$ matrix. The benchmarking problem can be written:
$$\left(B\circ C\right){X}_{S}={X}_{I}\text{\hspace{0.05em}}\text{\hspace{0.05em}},$$
Where $B$ is the $M\times N$ matrix of benchmarking values, and $\circ $ is the elementwise matrix product (Hadamard product). Any $B$ that solves this system of equations will benchmark ${X}_{S}$ to ${X}_{I}$. The problem is to find a solution to $M$ equations given $M\times N$ unknowns. A typical solution is to force $C$ to be onetoone, such that if ${c}_{mn}=1$, then ${c}_{mo}=0$ for all $o\ne n$ and ${c}_{on}=0$ for all $o\ne m$, where $i$ and $j$ index elements of $C$. In that way, the matrix $B\circ C$ has only $M$ nonzero values, and the benchmark weight is ${b}_{mn}={V}_{{I}_{m}}/{V}_{{S}_{n}}$. In this case, the concordance would be static; there would be no need to undertake a concordance by year, let alone province pair. However, this approach discards considerable amounts of information about underlying trading relationships between provinces, because the commodity profile of trade varies across province pairs. For instance, the commodity in a forced pairing may not be found in the trade between the two provinces. Hence, the benchmarking concordance should reflect, and indeed, take advantage of those differences.
To preserve information in the face of a particularly severe manytomany concordance problem in $C$, each element of $B$ is separated into two parts, ${b}_{mn}={b}_{m}{\widehat{b}}_{mn}$, where
$${\widehat{b}}_{mn}=\left(\frac{{X}_{{S}_{n}}}{{\displaystyle \sum {c}_{mo}{X}_{{S}_{o}}}}\right)\left(\frac{{X}_{{I}_{m}}}{{\displaystyle \sum {c}_{on}{X}_{{I}_{o}}}}\right)\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}.\text{\hspace{1em}}\text{\hspace{1em}}\text{(9)}$$
Equation (9) is simply the product of the trade shares of the concorded SCTG and IOCCbased flows. It is assumed that the SCTG and IOCCbased flows are an accurate representation of the patterns of trade and so provide appropriate splits against which to benchmark. ${b}_{m}$ is the value that solves the equation
$${b}_{m}{\displaystyle \sum _{n}{\widehat{b}}_{mn}{c}_{mn}{X}_{{S}_{n}}={X}_{{I}_{m}}\text{}\text{}\text{\hspace{0.05em}}\text{\hspace{0.05em}},}\text{\hspace{1em}}\text{\hspace{1em}}\text{(10)}$$
for each equation in the system, with the convention that ${b}_{m}=0$ if ${X}_{{I}_{m}}=0$ or the sum on the lefthand side of (10) is zero. The only remaining issue is to calculate a single benchmark value for one SCTG code given by
$${w}_{n}^{b}={\displaystyle \sum _{m}{b}_{mn}{c}_{mn}\text{\hspace{0.05em}}\text{\hspace{0.05em}},}$$
which is considered the benchmark weight for all shipments of SCTG commodity $m$ in that year and province origin–destination pair. In other words, ${w}_{n}^{b}$ is the sum of the values of column $n$ of $B\circ C$.
Again, any $B$ that solves this equation will be a benchmark, but the choice is to maximize the information available. Specifically, ${\widehat{b}}_{mn}$ is chosen to use the value of an SCTG commodity flow relative to the total SCTG flows that point to the same IOCC code $m$, and also the value of the flow of that IOCC code relative to all of the IOCC codes that are pointed at by SCTG commodity $n$. In addition, although two commodities cannot be compared directly, the total value of benchmarked trade is the same as the total value of interprovincial trade (for each yearprovinceprovince observation), because
$$\sum _{n}{w}_{n}^{b}}{X}_{{S}_{n}}={\displaystyle \sum _{m}{X}_{{I}_{m}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}.$$
Hence, the procedure achieves the goal of ensuring that trade flows add to known totals from the provincial accounts.
In some instances, the sample of shipments will not cover all the SCTG commodities between two provinces in a year (Figure 1 in Subsection 2.1). In this case, for some IOCC commodity $m$, the $i$ th element of the vector $\left(B\circ C\right){X}_{S}$ is zero, because ${X}_{{S}_{n}}=0$ for all the possible commodities that map to ${I}_{m}$ (those for which ${C}_{mn}=1$ ). In this case, the element ${X}_{{I}_{m}}$ is included in total interprovincial trade, but the corresponding ${X}_{{S}_{n}}$ is zero on the righthand side, which means that total trade in the STF is less than total trade in the interprovincial flows,
$$\sum _{n}{w}_{n}^{b}}{X}_{{S}_{n}}<{\displaystyle \sum _{m}{X}_{{I}_{m}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}.$$
Finally, in the main body of the text, the subscript $n$ is suppressed such that the benchmark weight is ${w}^{b}$.
A.4 Comparing network and greatcircle intraprovincial and interprovincial distances
The analysis relies on the network distance between geocoded origins and destinations, which is the average of transactionlevel intraprovincial and interprovincial distances. Traditionally, intraprovincial and interprovincial distances are measured using the origin–destination populationweighted greatcircle distance (hereafter, greatcircle distance) between subprovincial units (see, for example, Brown and Anderson 2002). This is calculated for the set of subprovincial units (CDs) within each province for intraprovincial trade and between the sets of subprovincial units for each province pair:
$${d}_{op}=\frac{{\displaystyle \sum {}_{i\in o}{\displaystyle \sum {}_{j\in p}}po{p}_{i}po{p}_{j}{d}_{ij}}}{{\displaystyle \sum {}_{i\in o}{\displaystyle \sum {}_{j\in p}po{p}_{i}po{p}_{j}}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}},\text{\hspace{1em}}\text{\hspace{1em}}\text{(11)}$$
Where $o$ and $p$ index provinces, $i$ and $j$ index CDs, $pop$ is the population of the CD, and $d$ is the greatcircle distance between the centroids of CDs. For intraprovincial trade $\left(o=p\right)$, withinCD distance is the radius of a circle of an area equal to that of the CD: ${d}_{ij}=\sqrt{area/\pi}\text{\hspace{0.05em}}$.
It might be assumed that network distance always exceeds greatcircle distance. However, because the actual (network) distance travelled is skewed toward shortdistance trips, when shortdistance trips are more prevalent (for example, intraprovincial trade or trade between contiguous provinces), measured network distance may be shorter. That is, for the greatcircle distance, holding population constant, the distance between nearer CD pairs is weighted the same as between more distant CD pairs. Because they are derived from actual trips, the network distance estimates will weigh closer CD pairs more highly.
This pattern is evident in the data in Table A.1, which presents the network and greatcircle distance within and between provinces. On average, network distance is 33% greater than greatcircle distance. However, this is the tendency for intraprovincial distances and distances between contiguous provinces to be closer to (or even less than) the network distance. For within province, contiguous province, and noncontiguous provinces, network distance is 9%, 25% and 38% greater than greatcircle distance, respectively. The exceptions are the Atlantic Provinces, which form a de facto archipelago whose internal network distances quite naturally exceed greatcircle distances by a wide margin (Table A.1).
Two implications for the econometric analysis follow from these distance patterns. First, because greatcircle distance is less than network distance, elasticity on distance will be less when network distance is used. Second, the relatively shorter intraprovincial greatcircle distances will tend to inflate the intraprovincial trade coefficient (border effect), because of the overestimated intraprovincial trade given the actual distance travelled. Both effects are apparent in the estimates.
N.L.  P.E.I.  N.S.  N.B.  Que.  Ont.  Man.  Sask.  Alta.  B.C.  

kilometres  
Panel A—Network distance  
Newfoundland and Labrador  386  1,364  1,226  1,344  1,567  2,789  4,650  5,223  6,074  6,902 
Prince Edward Island  1,412  61  333  272  1,115  1,706  3,584  4,209  4,810  5,696 
Nova Scotia  1,326  324  136  389  1,173  1,815  3,616  4,308  4,977  5,802 
New Brunswick  1,359  240  396  153  692  1,357  3,293  3,946  4,588  5,307 
Quebec  1,478  1,095  1,222  728  280  584  2,459  3,114  3,734  4,607 
Ontario  2,818  1,730  1,819  1,456  599  191  2,026  2,798  3,429  4,320 
Manitoba  4,573  3,526  3,627  3,236  2,410  1,707  213  654  1,340  2,207 
Saskatchewan  5,249  4,127  4,322  3,929  3,118  2,692  621  221  683  1,570 
Alberta  5,806  4,907  4,908  4,578  3,720  3,248  1,316  660  219  905 
British Columbia  6,873  5,750  5,872  5,476  4,640  4,283  2,244  1,631  1,010  204 
Panel B—Greatcircle distance  
Newfoundland and Labrador  261  715  762  894  1,407  1,987  3,056  3,539  4,056  4,717 
Prince Edward Island  715  39  193  211  756  1,322  2,547  3,071  3,621  4,274 
Nova Scotia  762  193  143  290  805  1,344  2,635  3,167  3,723  4,374 
New Brunswick  894  211  290  140  578  1,134  2,377  2,909  3,464  4,115 
Quebec  1,407  756  805  578  208  615  1,893  2,442  3,010  3,648 
Ontario  1,987  1,322  1,344  1,134  615  226  1,541  2,107  2,688  3,292 
Manitoba  3,056  2,547  2,635  2,377  1,893  1,541  145  604  1,173  1,780 
Saskatchewan  3,539  3,071  3,167  2,909  2,442  2,107  604  234  628  1,233 
Alberta  4,056  3,621  3,723  3,464  3,010  2,688  1,173  628  221  709 
British Columbia  4,717  4,274  4,374  4,115  3,648  3,292  1,780  1,233  709  213 
percent  
Panel C—Difference between network and greatcircle distance  
Newfoundland and Labrador  48  91  61  50  11  40  52  48  50  46 
Prince Edward Island  98  59  72  29  47  29  41  37  33  33 
Nova Scotia  74  67  5  34  46  35  37  36  34  33 
New Brunswick  52  14  37  9  20  20  39  36  32  29 
Quebec  5  45  52  26  34  5  30  28  24  26 
Ontario  42  31  35  28  3  15  31  33  28  31 
Manitoba  50  38  38  36  27  11  47  8  14  24 
Saskatchewan  48  34  36  35  28  28  3  6  9  27 
Alberta  43  36  32  32  24  21  12  5  1  28 
British Columbia  46  35  34  33  27  30  26  32  43  5 
Source: Statistics Canada, authors' calculations. 
Appendix B Robustness checks
B.1 Testing for the differential effect on distance on intraprovincial and interprovincial trade
If intraprovincial trade is populated with a large set of logisticstruncated flows, the distance parameter on intraprovincial flows should be more negative than that on interprovincial flows, whose pattern results from benchmarking to the flows from the provincial inputoutput accounts. To test for this effect, a modified version of Equation (5) is estimated,
$${X}_{if}=\mathrm{exp}\left[{\lambda}_{i}^{intra}+{\lambda}_{i}^{inter}+{\gamma}_{j}^{intra}+{\gamma}_{j}^{inter}+\left(\beta +{\theta}_{p}\right)\mathrm{ln}{\varphi}_{ij}\right]{\epsilon}_{ij}\text{\hspace{0.05em}}\text{\hspace{0.05em}}.\text{\hspace{1em}}\text{\hspace{1em}}\text{(12)}$$
with the distance parameter permitted to vary across intraprovincial and interprovincial flows using an indicator variable for intraprovincial flows $\left({\theta}_{p}\right)$.^{Note 17} If the truncation effect predominates, the distance parameter on intraprovincial trade should be more negative than that on interprovincial trade. To isolate this effect, the model is estimated with separate origin and destination fixed effects for intraprovincial and interprovincial trade, where $p$ indicates the set of intraprovincial regions. Intraregion flows are excluded.^{Note 18} When estimated for ERs, the distance parameter was 0.769 for interprovincial trade, but significantly less negative for intraprovincial trade (0.601) $\left({\widehat{\theta}}_{p}=0.168;P>\leftZ\right=0.064\right)$. Using CDs, a subunit of ERs, there was no significant difference between the distance parameters on intraprovincial trade $\left({\widehat{\theta}}_{p}=0.017;P>\leftZ\right=0.252\right)$. To the extent that it is present, truncation of intraprovincial flows does not appear to be sufficient to bias the estimates.
B.2 Estimates by year
The estimates are presented for trade averaged across the nineyear study period from 2004 through 2012. This is long enough to observe changes stemming from policy initiatives or shifts in the macroeconomy. To account for these effects, the baseline model was estimated with all the variables interacted with timefixed effects, with 2004 being the excluded year. Whether the model is estimated using provinces, ERs, or CDs as the trading units, no significant difference in the coefficients is evident across years (Table B.1). Hence, the average trade levelbased estimates reported in the main body of the paper provide a reasonable picture of provincial border effects over the entire period.
Province  Geography  

Economic region  Census division  
Ownprovince  
Coefficient  0.756^{Note **}  0.752^{Note **}  0.747^{Note **} 
Standard error  0.113  0.128  0.093 
2005  
Coefficient  0.021  0.027  0.072 
Standard error  0.149  0.194  0.150 
2006  
Coefficient  0.0449  0.0256  0.0595 
Standard error  0.143  0.180  0.134 
2007  
Coefficient  0.0548  0.1170  0.1170 
Standard error  0.144  0.189  0.128 
2008  
Coefficient  0.028  0.128  0.146 
Standard error  0.151  0.187  0.145 
2009  
Coefficient  0.0173  0.0310  0.0495 
Standard error  0.158  0.173  0.131 
2010  
Coefficient  0.0663  0.1470  0.0619 
Standard error  0.180  0.196  0.134 
2011  
Coefficient  0.0192  0.0619  0.1420 
Standard error  0.174  0.190  0.132 
2012  
Coefficient  0.2630  0.0598  0.1030 
Standard error  0.248  0.171  0.127 
Rsquared  0.875  0.721  0.983 
Number of observations  900  47,961  713,480 
Source: Statistics Canada, authors' calculations. 
B.3 Differential border effect estimates for Quebec
To test for the effect of Quebec on internal trade, ownprovince is interacted with an indicator variable for internal Quebec trade flows. While the point estimate on the interaction term is positive, it is not significantly different from zero (Table B.2).
Distance  Estimates 

0 km to 25 km  
Coefficient  0.931^{Note ***} 
Standard error  0.122 
25 km to 100 km  
Coefficient  0.273 
Standard error  0.225 
100 km to 500 km  
Coefficient  0.803^{Note ***} 
Standard error  0.092 
Greater than 500 km  
Coefficient  0.877^{Note **} 
Standard error  0.066 
Ownregion  
Coefficient  0.176 
Standard error  0.198 
Ownprovince  
Coefficient  0.346^{Note ***} 
Standard error  0.0844 
Ownprovince × Quebec  
Coefficient  0.244 
Standard error  0.209 
Contiguous regions  
Coefficient  0.139 
Standard error  0.097 
Constant  
Coefficient  2.819^{Note **} 
Standard error  0.503 
Number of observations  132,862 
Source: Statistics Canada, authors' calculations. 
References
Agnosteva, D.E., J.E. Anderson, and Y.V. Yotov. 2014. Intranational Trade Costs: Measurement and Aggregation. NBER Working Paper Series, no. 19872. Cambridge, Massachusetts: National Bureau of Economic Research.
Albrecht, L., and T. Tombe. 2016. “Internal trade, productivity, and interconnected industries: A quantitative analysis.” Canadian Journal of Economics 49 (1).
Amrhein, C.G. 1995. “Searching for the elusive aggregation effect: Evidence from statistical simulation.” Environment and Planning A 27 (1): 105–119.
Amrhein, C.G., and R. Flowerdew. 1992. “The effect of data aggregation on a Poisson regression model of Canadian migration.” Environment and Planning A 24 (10): 1381–1391.
Anderson, J.E. 2010. The Gravity Model. NBER Working Paper Series, no. 16576. Cambridge, Massachusetts: National Bureau of Economic Research.
Anderson, J.E., and E. van Wincoop. 2003. “Gravity with gravitas: A solution to the border puzzle.” American Economic Review 93 (1): 170–192.
Anderson, J.E., and Y.V. Yotov. 2010. “The changing incidence of geography.” American Economic Review 100 (5): 2157–2186.
Anderson, J.E., and Y.V. Yotov. 2012. Gold Standard Gravity. NBER Working Paper Series, no. 17835. Cambridge, Massachusetts: National Bureau of Economic Research.
Arbia, G. 1989. Spatial Data Configuration in Statistical Analysis of Regional Economic and Related Problems. Dordrecht, Netherlands: Kluwer Academic Publishers.
Baldwin, J.R., and W. Gu. 2009. “The impact of trade on plant scale, productionrun length and diversification.” In Producer Dynamics: New Evidence from Micro Data, ed. T. Dunne, J.B. Jensen, and M. Roberts. Vol. 68 of Studies in Income and Wealth. National Bureau of Economic Research. Chicago: University of Chicago Press.
Behrens, K., T. Bougna, and W.M. Brown. 2015. The World Is Not Yet Flat: Transportation Costs Matter! CEPR Discussion Paper no. 10356. London: Centre for Economic Policy Research.
Briant, A., P.P. Combes, and M. Lafourcade. 2010. “Dots to boxes: Do the size and shape of spatial units jeopardize economic geography estimations?” Journal of Urban Economics 67 (3): 287–302.
Brown, W.M. 2003. Overcoming Distance, Overcoming Borders: Comparing North American Regional Trade. Economic Analysis Research Paper Series, no. 8. Statistics Canada Catalogue no. 11F0027M. Ottawa: Statistics Canada.
Brown, W.M. 2015. “How much thicker is the Canada–U.S. border? The cost of crossing the border by truck in the pre and post9/11 eras.” Research in Transportation Business and Management 16: 50–56.
Brown, W.M., and W.P. Anderson. 2002. “Spatial markets and the potential for economic integration between Canadian and U.S. regions.” Papers in Regional Science 81 (1): 99–120.
Coughlin, C.C., and D. Novy. 2016. Estimating Border Effects: The Impact of Spatial Aggregation. Federal Reserve Bank of St. Louis Working Paper Series, no. 2016006A. St. Louis, Missouri: Federal Reserve Bank of St. Louis.
Crafts, N., and A. Klein. 2015. “Geography and intranational home bias: U.S. domestic trade in 1949 and 2007.” Journal of Economic Geography 15 (3): 477–497.
Fally, T. 2015. “Structural gravity and fixed effects.” Journal of International Economics 97 (1): 76–85.
Fotheringham, A.S., and M.E. O’Kelly. 1989. Spatial Interaction Models: Formulations and Applications. Dordrecht, Netherlands: Kluwer Academic Publishers.
Fotheringham, A.S., and D.W.S. Wong. 1991. “The modifiable areal unit problem in multivariate statistical analysis.” Environment and Planning A 23 (7): 1025–1044.
Généreux, P.A., and B. Langen. The Derivation of Provincial (Interregional) Trade Flows: The Canadian Experience. Paper presented at the 14th International InputOutput Techniques Conference, Montréal, October 10 to 15, 2002.
Head, K., and T. Mayer. 2010. “Illusory border effects: Distance mismeasurement inflates estimates of home bias in trade.” In The Gravity Model in International Trade: Advances and Applications, ed. P.A.G. van Bergeijk and S. Brakman, chapter 6, p. 165–192. Cambridge: Cambridge University Press.
Head, K., and T. Mayer. 2013. “What separates us? Sources of resistance to globalization.” Canadian Journal of Economics 46 (4): 1196–1231.
Head, K., and T. Mayer. 2014. “Gravity equations: Workhorse, toolkit, and cookbook.” In Handbook of International Economics, ed. G. Gopinath, E. Helpman, and K. Rogoff, Vol. 4, Chapter 3, p. 131–195. Amsterdam: North Holland.
Hillberry, R., and D. Hummels. 2003. “Intranational home bias: Some explanations.” Review of Economics and Statistics 85 (4): 1089–1092.
Hillberry, R., and D. Hummels. 2008. “Trade responses to geographic frictions: A decomposition using microdata.” European Economic Review 52 (3): 527–550.
Manning, W., and J. Mullahy. 2001. “Estimating log models: To transform or not to transform?” Journal of Health Economics 20 (4): 461–494.
McCallum, J. 1995. “National borders matter: Canada–U.S. regional trade patterns.” American Economic Review 85 (3): 615–623.
Melitz, M.J., and G.I.P. Ottaviano. 2008. “Market size, trade and productivity.” Review of Economic Studies 75 (1): 295–316.
Millimet, D.L., and T. Osang. 2007. “Do state borders matter for U.S. intranational trade? The role of history and internal migration.” Canadian Journal of Economics 40 (1): 93–126.
Páez, A., and D.M. Scott. 2005. “Spatial statistics for urban analysis: A review of techniques with examples.” GeoJournal 61 (1): 53–67.
Santos Silva, J.M.C., and S. Tenreyro. 2006. “The log of gravity.” Review of Economics and Statistics 88 (4): 641–658.
Sen, A., and T. Smith. 1995. Gravity Models of Spatial Interaction Behavior. Berlin: Springer.
Statistics Canada. n.d.a. Standard Classification of Transported Goods (SCTG) 1996. Last updated on April 17, 2015. Available at: http://www.statcan.gc.ca/eng/subjects/standard/sctg/sctgmenu (accessed January 26, 2017).
Statistics Canada. n.d.b. Standard Geographical Classification (SGC) 2006  Volume I, The Classification. Last updated January 30, s2017. Archived. Available at: http://www.statcan.gc.ca/eng/subjects/standard/sgc/2006/2006indfin (accessed February 8, 2017).
Wolf, H.C. 2000. “Intranational Home Bias in Trade.” Review of Economics and Statistics 82 (4): 555–563.
The World Bank Group. 2016. Tariff rate, most favored nation, simple mean, all products (%) (Chart, and table giving overview per country). Annual data from 1996 to 2012. Available at: http://data.worldbank.org/indicator/TM.TAX.MRCH.SM.FN.ZS (accessed: January 17, 2017).