What is spatial regression?

noetsi

Fortran must die
#1
This is the first time I have heard of this technique [I think they are simply modifying linear regression]. They are concerned with what they call spatial issues, for example how rural versus urban influences their model [or the units that make up their model].

To address these concerns, spatial extensions of the fixed and random effects models were estimated in the context of the spatial autoregressive framework. Comparisons of out-of-sample predictive performance were again used to assess model performance. Spatial dependence was incorporated by adding an additional term to the models, a spatially lagged dependent variable of the form pWy.
Where, W was a row normalized n x n spatial weight matrix that represented the spatial connectivity among the various locations, p was the spatial dependence parameter representing the strength of the spatial dependence between neighboring observations and y was the dependent variable.
I have never encountered this before, can anyone point me in the direction they are talking about?
 
#2
Hi, could it be that they consider spatial data on a lattice? These formulations remind me of the "Analysis and modelling of lattice data" chapter from the book "Analyizing ecological Data" from Alain F. Zuur. These models are defined analogeous to autocorrelation models for time series, i.e., they assume that the space is a priori divided by a lattice and data are only given at the gridpoints of this lattice. This leads e.g. to "Simultaneous auto-regressive (SAR) Models" or "Spatial Moving Average (SMA) Models", analogeous to "AR" and "MA" models for time series. If you have spatially continuous data, analogeous models are Kriging models with different types of variograms.
 

gianmarco

TS Contributor
#3
They are concerned with what they call spatial issues, for example how rural versus urban influences their model [or the units that make up their model].
Noetsi,
the main issue in performing any "regular" statistical approach on spatial data is the so-called "spatial autocorrelation": locations that are close to one another tend to have more similar values relative to locations that are further apart. For example, think about ground elevation (above the sea level): if you select two points that are close to each other, they are likely to have similar elevation relative to a third point that lies miles away from those two.

The fact that spatial data tend to be autocorrelated (either positively [low values tend to go with low values, high values with high values] or negatively [high values go with low values]) hampers the use of traditional statistical approaches since the latter are based on the assumption of the independence of the observations. As a matter of fact, in spatial data, the value of a variable at a location can be correlated to the values of the nearby locations (due to spatial autocorrelation).

To circumvent that issue, spatial regression models have been devised: these models "modify" traditional modelling strategies incorporating the way in which observations are spatially correlated into the model.

There is a large amount of literature on this; e.g.:
F. Dormann, C., M. McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., G. Davies, R., Hirzel, A., Jetz, W., Daniel Kissling, W., Kühn, I., Ohlemüller, R., R. Peres-Neto, P., Reineking, B., Schröder, B., M. Schurr, F., Wilson, R., 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography (Cop.). 30, 609–628. doi:10.1111/j.2007.0906-7590.05171.x

Dormann, C.F., 2007. Effects of incorporating spatial autocorrelation into the analysis of species distribution data. Glob. Ecol. Biogeogr. 16, 129–138. doi:10.1111/j.1466-8238.2006.00279.x

de Frutos, Á., Olea, P.P., Vera, R., 2007. Analyzing and modelling spatial distribution of summering lesser kestrel: The role of spatial autocorrelation. Ecol. Modell. 200, 33–44. doi:10.1016/j.ecolmodel.2006.07.007

Augustin, N.H., Mugglestone, M.A., Buckland, S.T., 1996. An autologistic model for the spatial distribution of wildlife. J. Appl. Ecol. 33, 339–347. doi:10.2307/2404755


If you are interested in reading one (or all) of them, feel free to ask.


Best
Gm
 

hlsmith

Omega Contributor
#4
Both responses provide good information. Though, I am going to attempt to fuse them together and make my response even simpler.


If I give a random intervention to villages, the villages next to the intervention village may have slightly comparable outcomes due to spillover in contrast to villages even further away. Another example, where I have actually used this concept is with serial surveys, a respondent's answer to a survey is going to be closer to their last response than two responses ago. So it is almost like incorporating time series concepts into non-time series data. But groups or observations are correlated because of their proximity.


So you already know the concepts of lags and covariance structure, so in these spatial models you use a covariance structure, e.g., AR(1). to explain some of the variability in the data.
 

noetsi

Fortran must die
#5
I was wondering if multilevel approaches, which look at clustering within units, apply to this. That is whether the solutions they use to deal with clusters within a unit, which violate the independence assumption, can be used to address spatial ac. But I have not seen the term spatial ac talked about in multilevel approaches so maybe not. :p

Thanks for the comments. I will have to explore this more. Interestingly they decided that spatial issues could be ignored based on their results. I think that is unreasonable working with similar data sets that they use.
 

hlsmith

Omega Contributor
#6
Not privy to source you are referencing at the end of your last post, but they can control for spatial component and then not control for them. And see if controlling for spatial construct explains more variability, if not (and given samples aren't underpowered), then they may be able to make that statement.
 

noetsi

Fortran must die
#7
The source I am referencing are in a pdf that has no link. I do not know how to post them here.

In addition to the traditional fixed and random effects variants, spatial econometric extensions were explored to investigate the importance of spatial dependence in this sample data. The primary motivation was that spatial units (states in this context) can differ in their background variables, which tend to be space-specific time-invariant variables that affect the dependent variable but that are difficult or impossible to directly measure. For example, some spatial units are located in coastal tourist locations while others are not. Some units are primarily rural areas with higher concentrations of periphery industries and transportation infrastructures while others are urban with higher concentrations of urban industries and transportation networks. In addition, norms and values regarding factors such as education, religion, criminal behavior, labor/leisure decisions, land use patterns, etc., can differ rather dramatically from place to place. Failing to account for the spatial distribution of these factors can lead to biased estimation results if spatial correlation is substantial.
To address these concerns, spatial extensions of the fixed and random effects models were estimated in the context of the spatial autoregressive framework. Comparisons of out-of-sample predictive performance were again used to assess model performance. Spatial dependence was incorporated by adding an additional term to the models, a spatially lagged dependent variable of the form.... Where, W was a row normalized n x n spatial weight matrix that represented the spatial connectivity among the various locations, ... was the spatial dependence parameter representing the strength of the spatial dependence between neighboring observations and y was the dependent variable.
They predicted a hold out sample between models using a spatial correction and one that did not [using RMSE to analyze the results] and decided including the correction did not matter.