Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Methodology

Index analysis as a function of the number of parameters

Index analysis in function of the number of samples

Analysis of the contribution of the index terms

Analysis of the correlation between the index terms

Index analysis using three samples instead of four

Index variability analysis within stations

Seasonally adjusted index analysis

In this chapter we will perform a sensitivity analysis to explain the index behaviour in view of: a) its complexity; b) the fact that the provinces do not necessarily use the same number of parameters and samples in their calculations, and c) the fact that the samples were not taken during the same period (date).

We will start by describing the underlying methodology used to complete this analysis. Next, we will observe how the index behaves when we increase the number of a) parameters and b) samples, c) the contribution of each of these terms and the correlation between the terms in the index calculation, d) whether the use of three samples instead of four per year influences the index results, e) index variability within stations, and f) index behaviour when we take seasons into account.

We will use the following procedure up to section Index results using three and four samples inclusively for each set of data:

- We attribute a random number to each sample using a uniform distribution.
- We order the samples in ascending order of the random number per station and per year.
- We make all possible combinations of
parameters among the total number of parameters. For each of these combinations:*x*- We select all sequences of
consecutive samples without sample repetition within a given station and year.*y* - We group
samples for three consecutive years.*y* - We calculate the terms F1, F2 and F3 as well as the index itself based on three consecutive years.

- We select all sequences of
- We analyze the results for a given number of parameters and samples.

In this section, for each dataset, we will analyse index behaviour when we increase the number of selected parameters for a given number of samples in order to see if the number of parameters has any impact on the index calculation.

Graph 9

Index behaviour in function of the number of parameters for the Newfoundland and Labrador dataset

**Observations:** This graph shows that as the number of selected parameters increases for a given number of samples, the extreme categories ("poor" and "excellent"), as well as the "good" category become less intense, and the "marginal" and "fair" categories become more populated. We will look more specifically at the case of four samples: The cumulative percentage of "excellent" and "good" categories diminishes from 26.2% to 12%, but increases from 51.1% to 83% in the "marginal" and "fair" categories when we consider ten parameters instead of four in the index calculation. Moreover, the percentage of the "poor" category drops from 18.7% to 4.4%.

**Observations:** This graph shows that when the number of selected parameters increases for the given number of samples, the percentage of extreme categories ("poor" and "excellent") drops and the percentage of the "fair" category increases. Also, by selecting ten samples or more per year, the percentage of the "good" category diminishes as the number of parameters increases. We will now examine ten samples: The cumulative percentage of the "excellent" and "good" categories drops from 29.1% to 10.6% if we use 14 parameters instead of four in the index calculation. Moreover, the percentage of the "poor" category drops from 12.4% to 4.5% and the percentage of the "fair" category increases from 26.9% to 50%.

**Observations:** The more the number of selected parameters increases for a given number of samples, the more the extreme categories ("poor" and "excellent") diminish and the remaining three categories become more populated. We will now look at the case of 8 samples: The percentage of "excellent" and "poor" categories drops from 13.5% to 0% and from 4.8% to 0%, respectively, when we use eight parameters instead of four in the index calculation. On the other hand, the cumulative percentage of the "fair" and "good" categories increases from 50.0% to 73.3%.

**Observations:** Once again, the more the number of selected parameters increases for a given number of samples, the more the percentage for extreme categories ("poor" and "excellent") drops and the percentage for the "fair" category increases. We will now look specifically at the case of six samples: The percentage of the "excellent" and "poor" categories drops from 18.4% to 9.4% and from 8.9% to 6.3%, respectively, when we use six parameters instead of four in the index calculation. On the other hand, the percentage for the "fair" category increases from 27.7% to 43.8%.

**Discussion:** The last four graphs indicate that the index is sensitive to the number of parameters used in the calculation. In fact, the larger the number of parameters, the lower the intensity of extreme categories ("poor" and "excellent") in comparison with the "marginal" and "fair" categories, regardless of the departure point. This can be explained by the fact that with the increase of the number of parameters used in the index calculation, the denominator of the first term increases, and therefore the value of the first term decreases. We also increase the total number of analytical findings which increases the probability of having a non-compliant result and affects the index terms calculation.

We will now examine the behaviour of the index for each dataset when we increase the number of samples selected for a given number of parameters to determine whether the number of samples affects the index calculation.

Graph 13

Index behaviour in function of the number of samples for the Newfoundland and Labrador dataset

**Observations:** In general, this graph shows that the more the number of selected samples increases for a given number of parameters, the more the percentage of the "good" and "excellent" categories diminishes and the "poor" and "marginal" categories increase. We will now look at the case of four parameters: the cumulative percentage of the "good" and "excellent" categories drops from 26.2% to 18% if we use ten samples instead of four in the index calculation, compared to the cumulative percentage of the "poor" and "marginal" categories which increases from 51.7% to 60.1%.

**Observations:** We can draw the same conclusions as for the Newfoundland and Labrador dataset: when the number of selected samples increases for a given number of parameters, the "good" and "excellent" categories become less populated and the "poor" and "marginal" categories become more populated. Let us look at the case of 10 parameters: The cumulative percentage of the "good" and "excellent" categories drops from 31.6% to 2.9% when we use 15 samples instead of 4 in the index calculation, compared to the "poor" and "marginal" categories – their cumulative percentage increases from 25.6% to 57.5%.

**Observations:** Generally, the more the number of selected samples increases for a given number of parameters, the more the percentage of the "good" and "excellent" categories diminishes and the "poor" and "marginal" categories increases. Moreover, using eight parameters, the percentage of the "excellent" category is 0% regardless of the number of samples. Let us look at the case of seven parameters: The cumulative percentage of the "good" and "excellent" categories drops from 38.5% to 3.9%, when we use 15 samples instead of 4 in the index calculation. For the "poor" and "marginal" categories, the cumulative percentage increases from 16.4% to 66.6%.

**Observations:** We note that with the increase of the number of selected samples, the "good" and "excellent" categories become less populated and the "poor" and "fair" categories become more populated. Let us look at the case of six parameters: The cumulative percentage of the "good" and "excellent" categories diminishes from 45.8% to 37.5% when we use six samples instead of four in the index calculation. On the other hand, the cumulative percentage of the "poor" and "fair" categories increases from 37.5% to 50.1%.

**Discussion:** The last four graphs indicate that the index is sensitive to the number of samples used in the calculation. Generally, the larger the number of samples used in the calculation, the lower the intensity of "good" and "excellent" categories in comparison with the "poor" and "marginal" categories, regardless of the departure point. This can be explained by the fact that with the increase in the number of samples, we increase the total number of analytical findings to be analysed and the probability of having a non-compliant result, which in turn affects the calculation of the last three terms of the index.

In this section, for each dataset we will analyse the contribution of the three terms of the index when we increase the number of parameters and samples in order to determine whether the contribution of each terms of the index is similar. We cannot calculate the contribution of these terms to the index directly because the square root includes the sum of squares of the three terms. However, we can express the index equation as follows to evaluate the contribution of each squared term.

**Observations:** The first term accounts for 50% to 60% of the total index value, while the third term contributes from 20% to 30%. Moreover, we note that the contribution of the first term diminishes when we increase the number of parameters for a given number of samples, contrary to the third term, the contribution of which appears to increase. Incidentally, the contribution of the first term increases when we increase the number of samples for a given number of parameters, but diminishes for the other two terms.

**Observations:** The contribution of the first term is even higher for the Ontario dataset and varies from 60% and 75% of the total index contribution. We can also note that the contribution of the first term diminishes when we increase the number of parameters for a given number of samples, and increases for the third term. Moreover, the contribution of the first term increases when we increase the number of samples from four to ten for a given number of parameters, while it decreases for the other two terms.

**Observations:** The first term contributes from 45% to 74% to the total index. We note also that the contribution of the first term diminishes when we increase the number of parameters for a given number of samples (except for 15 samples), while it appears to increase for the third term. Moreover, the contribution of the first term increases when we decrease the number of samples for a given number of parameters, while it decreases for the other two terms.

**Observations:** The first term contributes around 80% to the total index contribution, while each of the two remaining terms contributes around 10%. For the Quebec dataset, the contribution of each term remains stable, regardless of the slight variation of the number of parameters or samples.

**Discussion:** These four graphs show that the contribution of the first term is much higher (varying from 45% to 80%) than the contribution of the other two terms. Increasing the number of parameters in the index calculation causes a sharp increase of the first term's denominator, which is counterbalanced by an increased probability of having a non-compliant result, leading to a slight reduction of its contribution. By increasing the number of samples, we increase the probability of having a non-compliant parameter and therefore we increase the contribution of the first term.

In this section we will look at the correlation between the index terms taken in pairs when the number of parameters and samples is changed. This will be done for each dataset. This analysis permits to identify if there is a strong correlation between two terms of the index. If this is the case, this means than one of these terms may be redundant in the index calculation.

This table shows that the correlation is generally stronger (or at least equal) between the second and the third term of the index that between the first two terms. Moreover, the correlation between the second and the third term is non-negligible in most cases, varying from 0.58 to 0.96. As a general rule, the correlation between the index terms considered in pairs depends on the number of parameters and samples used for the index calculation. The more we increase the number of samples for a given number of parameters, the less dominating is the correlation with the first term. It is more difficult to determine the meaning of the relation when we increase the number of parameters used in the calculation for a given number of samples. Sometimes the correlation increases and sometimes it decreases.

The purpose of this section is to verify whether using of three samples instead of four per year influences the index results. Even though it is recommended to have at least four samples per year for index calculation, some stations with three samples have been considered, since they are remote and more difficult to access, and more costly to sample. To do so, we will begin by comparing the index results obtained with three and four samples. Then we will look at the percentage of stations that change category when we use three instead of four samples in the index calculation.

As in the previous sections, here are graphs showing the index findings for each province with three and four samples, and using a variable number of parameters in the calculation.

Graph 21

Index results when three and four samples are used for the Newfoundland and Labrador dataset

**Observation:** Looking at this graph we see no significant difference if we use three samples instead of four in the index calculation, regardless of the number of parameters used in the calculation.

**Observations:** This graph seems to show a slight difference when three samples instead of four are used in the index calculation. The majority of differences are below 3% with the maximum close to 7% for the "good" category with 14 parameters.

**Observations:** Here again, there seems to be a slight difference when three samples are used instead of four in the index calculation. The majority of differences are below 3% with a maximum close to 7.7% for the "good" category with 8 parameters.

**Observation:** Looking at this graph we note that there is really no difference when we use three samples instead of four in the index calculation.

**Discussion:** These last four graphs show that sometimes there may be a slight difference for certain provinces when three samples are used instead of four in the index calculation. However, good representativeness of water quality must be considered when we use only three or four samples per year in the index calculation.

To perform this analysis, we will use the following procedure for each dataset:

- Attribute a random number to each sample using a uniform distribution.
- Sort the sample in ascending order based on the random number per station and per year.
- Use all possible combinations of
parameters out of the total number of parameters. For each of these combinations:*x*- Select all consecutive sequences of four samples, without repeating a sample, for a given station and year.
- Group the
*four*samples for three consecutive years together. - Calculate the index using the 12 samples.
- Randomly choose
*three*samples out of the four selected in step i. - Group the
*three*samples from the three consecutive years. - Calculate the index using nine the samples.
- Compare the results obtained with 9 and 12 samples for each of the stations.

- Present the results.

**Observations:** This table shows that the water quality does not change in terms of category for more than 86% stations when we use three samples instead of four in the index calculation, with the mean WQI difference lower than 0.8. For the stations where water quality improves (percentage below 13%), the mean WQI difference varies from 5.44 to 16.10. The percentage of stations that change category does not appear to be related to the number of parameters used in the index calculation.

**Discussion:** This section enables us to state that using three samples instead of four per year has little influence on the index calculation for these four datasets. In fact, the impact of three or four samples in the WQI calculation depends on water quality homogeneity and variations through time. Once again, the question arises as to whether or not this number of samples (three or four) is large enough to properly represent water quality.

This section enables us to look at the potential index variability within stations. We will attempt to evaluate potential variability by selecting numerous sub-samples and by calculating the index value for each of them.

To perform this analysis, we will use all the parameters available for the stations and we will keep only the last three years for each dataset. This is the procedure we will use:

- Keep all samples, which have a value for each parameter.
- Select 1,000 replications of
samples per year for each station using a random selection without replacement.*x* - Keep only the replications in which the combination of
*x*samples is different from others. We do not want to have the same combination ofsamples in the analysis twice.*x* - Calculate the index for each of the 1,000 replications for each station.
- Calculate the mean, standard deviation and 95% confidence level for the index value obtained for the 1,000 replications for each station if the number of samples for the given station for the three years in question permits.
- Present the results.

Table 32 indicates the number of parameters and years that will be used in each dataset.

Graph 25

A 95% confidence interval of the index value within stations when four samples per year are used for the Newfoundland and Labrador dataset

**Observations:** This graph shows that there is a relatively good variability in the index within the stations. Calculating the difference between the upper and the lower limit of the confidence interval we obtain a mean difference of 14 points. At the 95% confidence interval, the index value can vary from 4 points up to 27 points, depending on the station. Moreover, categorization of the index is also affected at certain stations since it decreases or increases by one or even two categories.

By increasing the number of samples per year^{1}, the mean value of the index and its variability diminishes within a given station, which results in an increased index stability. The mean value of the index changes from 64.37 to 63.33 and to 62.41, while the standard deviation changes from 3.54 to 3.08 and to 2.79 when we increase the number of samples from four to six and eight per year.

Graph 26

A 95% confidence interval of the index value within stations when four samples per year are used for the Ontario dataset

**Observations:** We note that also here there is a significant index variability within the stations. Moreover, for the Ontario dataset it seems that the lower the index value, the higher the variability. Here the mean difference between the upper and lower limit of the 95% confidence interval is 13.57 points with a minimum of 5.88 and a maximum of 35 points. Index categorization is also affected since for certain stations the category changes by one or even by two.

Increasing the number of selected samples per year^{2}, the mean value of the index and its variability diminishes within a given station, which results in an increased index stability. The mean value of the index changes from 73.57 to 70.86 while the standard deviation changes from 3.46 to 2.45 when we increase the number of samples per year from 4 to 6.

Graph 27

A 95% confidence interval of the index value within stations when four samples per year are used for the British Columbia dataset

**Observations:** This graph shows that one of the four stations consistently met the guidelines during the three-year study period. We also note a significant index variability for the other three stations. The mean difference between the upper limit and lower limit value of the 95% confidence interval is 15.78 with a minimum of 0 and a maximum of 29.6 points. Index categorization is affected for the three stations that present variability.

When we increase the number of samples from four to six and eight per year^{2}, the mean value of the index diminishes, while its variability remains quite stable within a given station.

Graph 28

A 95% confidence interval of the index value within stations when four samples per year are used for the Quebec dataset

**Observations:** Once again, we note significant index variability at certain stations. The mean difference between the upper and lower limit of the 95% confidence interval is 13.15 points with a minimum of 0 and a maximum of 38.7 points. Also, index categorization is affected, since for certain stations, categorization increases or decreases by one or even two categories.

Unfortunately, for the Quebec dataset, we cannot increase the number of samples selected per year because of the small number of samples per station in the initial dataset.

**Discussion:** This analysis shows that there is good index value variability within certain stations, regardless of the number of samples selected per year for the index calculation. We also noted that the higher the number of samples selected per year, the lower the index value, which is consistent with the conclusion from section Index analysis in function of the number of samples. Moreover, index variability tends to decrease for a given station when we increase the number of samples selected per year. This decrease in variability can be explained by the fact that we have a more representative, more homogeneous set when the number of samples is larger. The appendix contains the results per station when we increase the number of samples selected per year for each set of data.

Since the period (date) when the samples are taken is not always the same, it is important to find out if the seasons have an impact on index calculation. If this is the case, it would be preferable to take this into account when determining the periods when samples must be taken.

We will start this section by analysing if the guidelines are met for each parameter and for each season. Then we will examine the index behaviour when we take the season into account in the calculation for each dataset, expect for the Newfoundland and Labrador data, which do not contain sampling dates. The "season" variable is defined as follows:

Winter: December 21 to March 20.

Spring: March 21 to June 20.

Summer: June 21 to September 20.

Autumn: September 21 to December 20.

For each dataset (except for the Newfoundland and Labrador), we will present the percentage of values that comply with the guidelines for each season. This analysis permits to see if the compliance percentage is more or less the same for each parameter, depending on the season.

Table 33

Percentage of samples compliant with the guidelines per season for each parameter for the Ontario dataset

**Observations:** This table shows that the percentage of samples that comply with the guidelines is more or less the same from one season to the other for a large number of parameters. However, there seems to be a slight seasonal difference for the following parameters: cadmium, chloride, chromium, nitrate, suspended solids (SS), as well as temperature. Compliance percentage for the cadmium, chloride, chromium and nitrate parameters is slightly lower in winter than the other seasons. Incidentally, spring appears to be the season when the suspended solids (SS) parameter fails most often, while summer appears to be the most difficult season for the temperature parameter to comply with guidelines.

Table 34

Percentage of samples compliant with the guidelines per season for each parameter for the British Columbia dataset

**Observations:** For the British Columbia dataset, the season in which the samples were collected has no impact for the chromium, pH and lead parameters. This is slightly different for other parameters. Winter appears to be the season when it becomes most difficult to meet guidelines for the cadmium parameter. Copper and zinc parameters are slightly less compliant with the guidelines in spring, while in the summer, the highest non-compliance percentage was shown for the phosphorus and temperature parameter.

Table 35

Percentage of samples compliant with the guidelines per season for each parameter for the Quebec dataset

**Observations:** The Quebec dataset does not contain any samples taken during the winter. In general, the percentage of samples that comply with the guidelines is more or less the same for all seasons, except for the chlorophyll a and turbidity parameters, which presents less compliance in summer.

**Discussion:** Generally, we can say that the parameter compliance percentage is slightly different for certain parameters, depending on the season. We will look now at what happens when we remove one season from the index calculation.

In this subsection, we will study how the index behaves when we remove one season from the index calculation. To do so, we use all parameters and samples available to us for each dataset, keeping only the stations that have at least four samples and four parameters per year with at least 12 samples for three consecutive years for each scenario (overall, without winter, without spring, without summer and without autumn).

**Observations:** On looking at this graph, we note a small difference if we remove one season from the index calculation. By comparing the results of the overall scenario with each of the other scenarios, we note that the percentage for the "marginal" category diminishes and the percentage for the "good" category increases when the winter or spring data is removed. Moreover, the percentage for the "fair" category decreases, and for the "good" category increases when we exclude summer or fall data.

**Observations:** This graph shows a difference when we remove the summer season or the autumn season from the index calculation. Comparing the results of the overall scenario with each of the other scenarios, we see that the percentage of the "fair" category diminishes while the percentage of the "good" category increases when summer or autumn are removed from the index calculation.

**Observations:** For Quebec, there are differences if we remove one season from the index calculation. Comparing the results of the overall scenario with each of the other scenarios, we note that the percentage of the "marginal" and "good" categories diminishes, while the percentage of the "fair" and "excellent" categories increases, when we exclude the spring data. Moreover, there is a reduction of percentage of the "fair" category and an increase of the percentage of the "good" category, when data for the autumn season is excluded from the index calculation.

**Discussion:** It seems that there is a difference in index categorization when we exclude data for a given season from the index calculation. Not including data for a given season may introduce a bias in the index calculation and interpretation.