Can machine learning-based regression models capture spatial patterns?

choosing the "right" model?

Objective:

To find out if nonlinear regression models such as support vector machines, random forest, artificial neural networks, gaussian processes can properly capture complex spatial processes by analyzing the spatial autocorrelation of the residual errors.
example of residuals errors

Description:

Machine learning is revolutionizing the world. it is everywhere. it is used in self-driving cars, in automatic translation of written and spoken text, in recommendation systems, etc. Machine learning is also heavily used in image processing applications and, as an example, machine learning classifiers like random forests and support vector machines are now used by the GIS and emote sensing communities.

Machine learning regression methods are also becoming more and more popular because spatial statistical methods cannot always cope with complex, multidimensional and massive datasets. Despite its power and popularity, machine learning methods do not explicitly consider the spatial properties of the data.

In this MSc topic we propose investigating whether machine learning regression models can properly capture complex spatial processes by analyzing the residuals (differences between the predicted and the observed/measured values) of such models. The analysis of residuals can be used to spot structural problems in the regression models. For instance, a strong spatial correlation of the residuals indicate that the machine learning model could not properly capture the geographic phenomena under study.

A couple of interesting datasets collected by volunteers are available for this MSc topic. We have time series of tick sampling data across multiple locations in the Netherlands and we have a large collection of volunteered phenological observations that report the timing of recurring biological events like leafing and flowering. Both datasets can be modeled using spatial environmental data (e.g. gridded temperature datasets) as explanatory variables.

References:

  • Yee LeungChang-Lin Mei, Wen-Xiu Zhang. Testing for Spatial Autocorrelation among the Residuals of the Geographically Weighted Regression. Environment and Planning A, 2000, 32(5) 871 – 890.

Domain(s):

Study Program(s):

Researchers working on this field: