(EST 2019 Forecast)Performance of Prediction Algorithms for Modeling Outdoor Air Pollution Spatial Surfaces
可参考的表达方式:
However, in the past decade linear (stepwise) regression methods have been criticized for their lack of flexibility, their ignorance of potential interaction between predictors, and their limited ability to incorporate highly correlated predictors.
然而,线性(逐步)回归方法因缺乏灵活性、忽视预测因子之间潜在的相互作用、对高度相关的预测因子的合并能力有限而受到批评。
Higher training data R2 did not equate to higher test R2 for the external long-term average exposure estimates, making the argument that external validation data are critical to compare model performance.
更高的训练数据的R2并不等于更高的测试结果的R2,对于外部长期平均暴露估计,这表明外部验证数据是比较模型性能的关键。
LUR modeling is an empirical technique with the measured concentration of a pollutant as dependent variable and potential predictors such as road type, traffic count, elevation, and land cover as independent variables in a multiple regression model.
LUR建模是一种经验技术,将污染物的测量浓度作为因变量,并将道路类型、交通数量、海拔高度和土地覆盖等潜在预测因子作为多元回归模型中的自变量。
These models are simple, fast, and often provide interpretable coefficients of predictors.
这些模型简单、快速,而且常常提供可解释的预测因子系数。
Standard linear regression methods are prone to overfitting, especially when few training sites are used for model development along with a large number of predictor variables. Linear regression therefore may not identify the optimal model.
标准的线性回归方法容易过度拟合,特别是当很少的训练点被用于模型开发以及大量的预测变量时。因此,线性回归可能无法确定最佳模型。
Machine learning techniques (such as neural networks and random forests) offer possibilities to create spatial models of air pollutants by learning the underlying relationships in a training data set, without any predefined constrictions.
机器学习技术(如神经网络和随机森林)提供了一种可能性,通过学习训练数据集中的潜在关系,在没有任何预定义限制的情况下,创建空气污染物的空间模型。
KRLS had a higher training model R2 compared to linear regression, but differences decreased when external data were used to compare predictions.
与线性回归相比,KRLS在训练集中有更高的R2,但当使用外部数据来比较预测时,差异减小。
As each modeling algorithm is likely not optimal, we also explored if a combination of models (stacking) could increase predictive performance.
由于每个建模算法都可能不是最优的,我们还探讨了模型的组合(叠加)是否可以提高预测性能。