Objective 4: To assess the strengths and limitations of the regression model and the monitoring data used.
The best model, identified through model selection, was evaluated using the error and accuracy metrics. The error metrics include the RMSE (Root Mean Square Error), MAE (Mean Absolute Error), R2, and an additional MAE calculated using the predicted and actual test set values. These metrics were explained in further detail in Objective 3 and were used to assess the strengths of each model.
The data set used is derived from a natural system which is inherently stochastic. Due to this, the models created will never be able to fully capture all of the environmental variables. This source of error was considered when assessing the strengths of the various models. This data set is also a time series, which allowed for a sample size of 2006 from only 80 monitoring points. The limited number of observation locations was also considered when assessing the model strength. An additional data limitation is that not all potential factors influencing N and P were used and assessed, for example, the physical characteristics of streams including size, depth, and flow (EPA, 2005) could have been considered but were out of scope for this project. Furthermore, more detailed information on the main factors identified could have been used (i.e. different types of agricultural/natural landcover), this too fell outside the scope of this study. These constraints were also acknowledged when assessing the strengths and weaknesses of the overall approach.