Objective 1: Identifying Relevant Factors in PV Farm Siting

This study used nine factors deemed relevant to PV farm siting by literature: land class, land use, slope, aspect, drainage, population density, and proximity to water bodies, roads and transformer stations.

Land use data was retrieved from SOLARIS to identify what classification of land in which PV farm development is occurring. Categories included plantations, grasslands, urban areas, forests and swamp areas. Land class was an important factor in considering siting as it identified quality of land for agricultural purposes. The Canada Land Inventory ranks land on a scale of 1-7, with 1 being the best agricultural land. This data was used to identify the relationship between land class and PV farm siting. The slope of Ontario was derived from a digital elevation model obtained from Ontario Ministry of Natural Resources, as it was suggested that smaller slopes are more preferable when constructing PV farms (Carrion et al., 2008; Macknick et al., 2014; Mondino et al., 2013; Nguygen & Pearce, 2010; Watson & Hudson, 2015). The DEM was also used to find aspect to determine which direction the slopes faced. Drainage capacity was extracted from the Soil Survey Complex, as drainage is associated with flooding, and thus PV development (Nguyen & Pearce, 2010). Population density was retrieved from Statistics Canada Census 2001. The proximity to water bodies, roads, as well as transformer station was also considered relevant. Water bodies influence flooding potential which could be a factor in PV farm development. PV farms need to be sited within close proximity to roads and transformer stations to reduce transportation and maintenance costs (Carrion et al., 2008). These nine factors are considered the most important to PV farm siting in Ontario (fig. 1-9 in appendix).

Objective 2: Model Development

The individual logistic regression on each of the siting factors yielded mixed results, and gave valuable insight into the relative significance of each of the factors. It was determined that the most significant factor on an individual basis is the distance from the site to the nearest transformer station (TSDist). TSDist received the lowest AIC index of 179.86, and received a likelihood ratio (R2L) of 0.218 or 21.8%. The next most significant factor was the categorical land use data (LandUse), which attained an AIC of 193.6 and a likelihood ratio of 15.7%. According to the Pr(>|z|) results, most of the factors were significant to at least a 99.5% confidence interval, excluding the slope (99.2% CI), agricultural land class (98.4% CI), and the population density (87.6% CI). A table summarizing the relevant information from the individual regression analyses is displayed below.

Summary of Individual Regression Analyses on Siting Factors

Factor Names

Factor Data Input Description

AIC

Null Deviance

Fitted Deviance

Likelihood Ratio

Model Coefficient

Pr(>|z|)

Significance (% CI)

Aspect

Slope direction (Degrees from North, 0-180)

219.38

224.95

215.38

0.0425

1.167E-02

0.00281

99.719

Drainage

Soil drainage (Catagorical ranked 0-7)

202.56

224.95

198.56

0.1173

3.864E-01

0.00000182

99.999818

LandClass

Agricultural land class (Catagorical ranked 1-7)

222.9

224.95

218.9

0.0269

1.457E-01

0.0151

98.49

LandUse

Land use (2-good, 1-acceptable, 0-not acceptable)

193.6

224.95

189.6

0.1571

1.726E+00

0.000000116

99.9999884

PopDensity

Population density based on census tracts (Cap/km^2)

226.3

224.95

222.3

0.0118

2.735E-03

0.124

87.6

RoadDist

Distance to nearest road in meters

207.98

224.95

203.98

0.0932

-1.677E-03

0.00229

99.771

Slope

Surface slope in degrees

218.44

224.95

214.44

0.0467

-2.768E-01

0.00764

99.236

TSDist

Distance to nearest transformer station in meters

179.86

224.95

175.86

0.2182

-1.893E-04

0.0000118

99.99882

WaterDist

Distance to nearest water feature in meters

217.18

224.95

213.18

0.0523

8.364E-04

0.00115

99.885

Upon analysis of the model coefficients derived by the statistical software, conclusions can be drawn about the effects that each of the factors have on the siting of PV farms. The fact that RoadDist, Slope, and TSDist are given negative model coefficients suggests that as their value increases, the siting suitability decreases, and the opposite is true for factors with positive coefficients. The magnitude of the coefficient is highly dependent on the nature of the data, therefore difficult to use as comparison between factors.

The LandUse factor has the highest statistical significance (99.9999884%), however this is due to the nature of the data being limited to three possible values (2, 1, 0). When compared to a factor such as TSDist, which is comprised of a wide range of values including a decimal point. All factors being comprised of rank-based categorical data (Drainage, LandClass, LandUse) would be expected to obtain a higher significance in logistic regression due to the simplicity of their data. Out of the continuous data, TSDist by far attained the highest significance (99.99882%), proving further that it is the most significant factor when compared on an individual basis.

To begin the second stage of the model development involving the testing of combinations of factors, a base model involving the combination of every factor was created in order to gain an understanding of how the siting factors behave when being affected by each other. For all following model summary charts, the significance of individual factors within the model is given a confidence interval rank according to the table below.

Multi-Variable Model Significance Levels

Significance Level

Confidence Interval

Pr(>|z|) Cutoff

*****

99.9%

0.001

****

99.5%

0.005

***

99%

0.01

**

95%

0.05

*

90%

0.1

The summary of this base model used to gain a background understanding of the factors’ impact on each other is displayed in the table below.

Base Model Summary

Model Names

Factors

Coefficient

Pr(>|z|) / Significance

AIC

D(null)

D(fitted)

Likelihood Ratio

All Factors

Aspect

1.72E-02

0.002010

****

149.86

224.95

129.86

0.4227

Drainage

2.45E-01

0.059470

*

LandClass

1.39E-01

0.173830

LandUse

2.04E-02

0.003450

****

PopDensity

-3.01E-03

0.222330

RoadDist

-3.62E-04

0.581510

Slope

-3.20E-01

0.063900

*

TSDist

-1.25E-04

0.021280

**

WaterDist

3.22E-04

0.426080

Intercept

-3.27E+00

0.002920

****

This base model initially provides a clear indication as to which factors are most significant when compared against each other through their normalized significance levels. The LandUse factor continues to be one of the most significant factors, however the Aspect factor obtained the highest significance when modelled with all other factors. This slightly unusual considering that terrain aspect is a continuous surface derived from analysis of a DEM, however not overall surprising due to the nature of solar panels in the Northern Hemisphere receiving the greatest solar insolation rates, and efficiencies, when facing as close to South as possible. TSDist was the next most significant variable in the logistic equation, consistent with the finding that it was the most significant variable on an individual basis, and Slope and Drainage returned significant coefficients from the logistic regression as well. All other siting factors returned coefficients that had a significance less than 90%, and were deemed insignificant in this specific model. The AIC determined by R statistical software was significantly lower than the models based on individual factors, lending evidence to the proper selection of modeling factors, and the likelihood factor was far higher than with the individual factors (42.3%).

For the first iteration of the factor combination trial and error process, the five factors that returned a coefficient > 90% significance in the ‘All Factors’ model (Aspect, Drainage, LandUse, Slope, TSDist) were combined and tested. The factors’ significance values and the AIC were then iteratively analysed, and used to guide the creation of each consecutive model. With each iteration, the behavior of each factor became more predictable, and after seven iterations it was determined that the model that best fit the site location data had been found. A chart summarizing this iterative process with each factor combination is displayed below:

Multi-Variable Regression Model Summaries

Model Names

Factors

Coefficient

Pr(>|z|) / Significance

AIC

D(null)

D(fitted)

Likelihood Ratio

Combo 1

Aspect

1.877E-02

0.000606

*****

147.74

224.95

135.74

0.3966

Drainage

3.164E-01

0.006564

***

LandUse

9.527E-01

0.003306

****

Slope

-3.278E-01

0.028927

**

TSDist

-1.649E-04

0.000718

*****

Intercept

-3.059E+00

0.000891

*****

Combo 2

Aspect

1.77E-02

0.000782

*****

151.66

224.95

141.66

0.3703

Drainage

2.64E-01

0.015843

**

LandUse

1.01E+00

0.001810

****

TSDist

-1.71E-04

0.000565

*****

Intercept

-3.32E+00

0.000268

*****

Combo 3

Aspect

1.29E-02

0.003490

****

183.14

224.95

175.14

0.2214

Drainage

4.46E-01

0.000002

*****

Slope

-4.12E-01

0.001800

****

Intercept

-2.52E+00

0.000021

*****

Combo 4

Aspect

1.56E-02

0.001490

****

161.36

224.95

151.36

0.3271

Drainage

2.98E-01

0.005250

***

LandClass

7.99E-02

0.360170

TSDist

-1.95E-04

0.000031

*****

Intercept

-1.85E+00

0.007430

***

Combo 5

Aspect

1.55E-02

0.001600

****

160.19

224.95

152.19

0.3234

Drainage

3.29E-01

0.001290

****

TSDist

-1.90E-04

0.000046

*****

Intercept

-1.70E+00

0.010950

**

Combo 6

Aspect

1.42E-02

0.004215

****

157.77

224.95

145.77

0.3520

Drainage

2.74E-01

0.012767

**

RoadDist

-6.63E-04

0.291149

TSDist

-1.69E-04

0.000381

*****

WaterDist

7.17E-04

0.045147

**

Intercept

-1.82E+00

0.016312

**

Combo 7

Aspect

1.44E-02

0.004020

****

157.41

224.95

147.41

0.3447

Drainage

3.17E-01

0.002560

****

TSDist

-1.79E-04

0.000095

*****

WaterDist

7.44E-04

0.038140

**

Intercept

-2.18E+00

0.002210

****

Upon initial comparison of the model fitting indices, it is apparent that Combo 1 was the most accurate model of the site locations, with the lowest AIC of 147.74 and the highest likelihood ratio of 39.7%, followed closely by Combo 2 with an AIC of 151.66 and a likelihood ratio of 37.0%. The one difference in inputs between Combo 1 & 2 was the removal of the least significant factor in Combo 1, the Slope factor, and the fact that this increased the AIC gives evidence that while the slope is not as significant as other factors involved in the model, it is still integral in overall model performance. Another noteworthy observation comes from the comparison of Combo 1 to the All Factors model, in that the AIC was only lowered by about 1.4% and the likelihood ratio actually decreased by 2.6%. This indicates that while the data had a slightly higher amount of deviation from Combo 1, the overall model fit was better with Combo 1 and the All Factors model likely experienced over-fitting with too many variables included. Surprisingly, the All Factors model received a better AIC score than all other models attempted in the iterative process, however the downfall came in the statistical significance of the factor coefficients, which all received higher significances in the iterative combinations than in the All Factors model.

Upon analysis of the performance of specific factors in the iterative process, it is apparent that the TSDist factor continues to be the most significant factor, scoring a significance greater than 99.9% for every model it was used within. Other consistently high scoring factors included the Aspect and LandUse, and factors that were significant but had varying significance levels between models were the Slope and Drainage factors. The LandClass, RoadDist, and WaterDist factors proved to consistently have low significance values and resulted in a higher AIC when included in a model, therefore were not included in the final model. For both this reason and the previously stated comparison between model fit indices, Combo 1 was chosen as the best model of siting factors.

Objective 3: Model Validation

In order to validate the chosen model from the iterative process, the exact same regression analysis used to create the model was performed on the site locations previously separated for the model validation stage. The factors chosen for the final model (Aspect, Drainage, LandUse, Slope, TSDist) were modelled (Validation 1), and a model for All Factors was also created for the validation site data (Validation 2) in order to provide a basis for comparison. A summary chart for the validation models is provided below:

Multi-Variable Validation Model Summaries

Model Names

Factors

Coefficient

Pr(>|z|) / Significance

AIC

D(null)

D(fitted)

Likelihood Ratio

Validation 1

Aspect

6.26E-03

0.171780

149.65

159.83

137.65

0.1388

Drainage

2.38E-01

0.021250

**

LandUse

8.19E-01

0.026170

**

Slope

-1.66E-01

0.161990

TSDist

-1.57E-05

0.450330

Intercept

-2.91E+00

0.001120

****

Validation 2

Aspect

4.73E-03

0.301910

147.76

159.83

127.76

0.2007

Drainage

2.07E-01

0.094640

*

LandClass

7.02E-02

0.465170

LandUse

6.37E-01

0.099390

*

PopDensity

-1.90E-03

0.518760

RoadDist

-1.82E-03

0.054710

*

Slope

-1.66E-01

0.184690

TSDist

-1.53E-06

0.944600

WaterDist

5.59E-04

0.129400

Intercept

-2.47E+00

0.009330

***

The most apparent observation upon inspection of the validation results is that the factor confidence interval values and overall likelihood ratio are significantly decreased in the validation models compared to the regression models, however this is likely explained by the decreased amount of points in the validation site dataset, which can have significant impacts on statistical analysis. The most noteworthy points are that the AIC values did not change significantly between the model creation and validation stages (2-3%), and that the factor coefficients also remained largely unchanged (41.5% averaged deviance) and kept a constant sign, indicating that the factors were properly modelled in the previous stage. Overall, the statistical results indicate that the chosen model is valid with a moderate (0.40-0.60) correlation.

Objective 4: Model Evaluation

The data collected to form the regression model was of high quality. The Canada Land Inventory and SOLARIS data is considered reputable data and is used in other academic studies to obtain land class data and land use in Ontario (Calvert & Mabee, 2014; Calvert et. al, 2013). One limitation of the SOLARIS data is that it only covers southern Ontario. The PV farm points which were not in southern Ontario received a null value regarding land use. Besides the SOLARIS data, the study had sufficient data intensity coverage for Ontario. In terms of completeness, the study used all factors deemed relevant and important to PV farm siting in literature.

Points were used to represent PV farm sites. It was realized vector points are not representative of PV farms. The X Y coordinates may vary from the actual position of the PV farms, and as well, the cell size may not be representative of the PV farm site. These implications could influence the quality of the analysis, and thus were accommodated for. Numeric data, such as slope, aspect and population density were aggregated by a low pass to reduce variability. A majority filter was applied to categorical data, such as land class and land use to find the the mode of the data surrounding the points. Both filters were set to neighbouring points, gathering data from 9 cells in total to find the mean or mode of the data surrounding the vector points. Although the data had to be aggregated to accommodated for variations in X Y coordinates and size of PV farms, it was done in a consistent way that was more realistic and representative of PV site areas.

Overall, this study was effective in determining statistically significant factors in PV farm siting and creating a regression model. The data was of high quality, good intensity, and the results concurred with previously conducted studies.

Objective 5: Suitability Map Development

After a valid model was identified using the R statistical software, the determined coefficients were used to create a suitability map showing a continuous surface representing the probability from 0-1 of a site being present at any given location. This surface was created using raster calculator to evaluate the logistic regression equation on rasters representing the five factors in the final siting factors model. Before applying the raster calculator, it was ensured that the raster data was in the same form as was used in the logistic regression within R, and that all of the data was in floating point form to allow for complex equations. The first step was to create a raster representing the OLS model equation including the determined factor coefficients and intercept, displayed below.

Next, the OLS model equation was applied to an exponential equation, and arranged into the equation determining probability of presence for logistic regression, displayed below.

The raster created from this equation was representative of a suitability map visualizing the model in a simplistic range of values from 0-1, with 1 representing the highest probability of site placement. In addition, this raster allowed for final suitability values to be extracted to each of the site points for additional information on model performance and validity. The suitability map is displayed below, with both regression and validation PV site points overlain on top of the suitability surface.