This page was generated from notebooks/15_ML_estimation_spatial_error.ipynb. Interactive online version:

Maximum Likelihood Estimation - Spatial Error Model¶

Luc Anselin¶

(revised 09/21/2024)¶

Preliminaries¶

Similar to the treatment of the spatial lag model, the estimation of the spatial error model is covered in two notebooks. This first one covers Maximum Likelihood estimation. General Method of Moments (GMM) estimation is considered in a separate notebook.

As mentioned in the spatial lag notebook, it should be kept in mind that the maximum likelihood estimation in spreg is primarily included for pedagogical purposes. Generally, the GMM approach is preferred. In addition, an optimal maximum likelihood estimation implementation, based on the Smirnov-Anselin (2001) approximation, is not currently implemented in spreg. It is implemented in C++ in GeoDa. This is the preferred approach for ML estimation in large(r) data sets.

The spreg module implements ML estimation of the spatial error model in the ML_Error class. The estimation of the SLX-Error model is implemented through the inclusion of the slx_lags argument.

Modules Needed¶

As before, the main module is spreg for spatial regression analysis. From this, OLS and ML_Error are imported. In addition, the utilities in libpysal (to open spatial weights and access the sample data set), pandas and geopandas are needed, as well as time (for some timing results), matplotlib.pyplot and seaborn for visualization. All of these rely on numpy as a dependency. Finally, in order to carry out the Likelihood Ratio tests, likratiotest is imported from spreg.diagnostics.

The usual numpy set_printoptions is included as well.

[1]:

import warnings
warnings.filterwarnings("ignore")
import os
os.environ['USE_PYGEOS'] = '0'

import numpy as np
import pandas as pd
import geopandas as gpd
import time
import matplotlib.pyplot as plt
import seaborn as sns
from libpysal.io import open
from libpysal.examples import get_path
from libpysal.weights import lag_spatial

from spreg import OLS, ML_Error
from spreg.diagnostics import likratiotest
np.set_printoptions(legacy="1.25")

Functions Used¶

from pandas/geopandas:
- read_file
- DataFrame
- head
- describe
from libpysal:
- io.open
- examples.get_path
- weights.lag_spatial
from numpy:
- hstack
from matplotlib/seaborn:
- regplot
- show
from spreg:
- spreg.OLS
- spreg.ML_Error
- spreg.diagnostics.likratiotest

Variable definition and data input¶

The data set and spatial weights are from the chicagoSDOH sample data set:

Chi-SDOH.shp,shx,dbf,prj: socio-economic indicators of health for 2014 in 791 Chicago tracts
Chi-SDOH_q.gal: queen contiguity weights

To illustrate the methods, the same descriptive model is used that relates the rate of uninsured households in a tract(for health insurance, EP_UNINSUR) to the lack of high school education (EP_NOHSDP), the economic deprivation index (HIS_ct), limited command of English (EP_LIMENG) and the lack of access to a vehicle (EP_NOVEH). This is purely illustrative of a spatial error specification and does not have a particular theoretical or policy motivation.

The file names and variable names are set in the usual manner. Any customization for different data sets/weights and different variables should be specified in this top cell.

[2]:

infileshp = get_path("Chi-SDOH.shp")     # input shape file with data
infileq = get_path("Chi-SDOH_q.gal")     # queen contiguity weights created with GeoDa

y_name = 'EP_UNINSUR'
x_names = ['EP_NOHSDP','HIS_ct','EP_LIMENG','EP_NOVEH']
ds_name = 'Chi-SDOH'
w_name = 'Chi-SDOH_q'

The read_file and open functions are used to access the sample data set and contiguity weights. The weights are row-standardized and the data frames for the dependent and explanatory variables are constructed. As before, this functionality is agnostic to the actual data sets and variables used, since it relies on the specification given in the initial block above.

[3]:

dfs = gpd.read_file(infileshp)
wq =  open(infileq).read()
wq.transform = 'r'    # row-transform the weights
y = dfs[y_name]
x = dfs[x_names]

OLS and SLX with Spatial Diagnostics¶

For ease of reference, standard OLS and SLX regressions with spatial diagnostics included in this notebook as well. These results are identical to the ones provided in the spatial lag notebooks.

OLS¶

[ ]:

ols1 = OLS(y,x,w=wq,spat_diag=True,moran=True,
                 name_w=w_name,name_ds=ds_name)
print(ols1.summary)

The specification achieves an acceptable \(R^2\) of about 0.63 and all coefficients are positive and highly significant.

The non-spatial diagnostics suggest non-normality as well as a hight degree of heteroskedasticity. There is no problem with multicollinearity.

The spatial diagnostics against the SARERR alternatives show very significant LM-Lag and LM-Error, but of the two robust tests, only RLM-Lag is highly significant (RLM-Error only at p < 0.03). Hence, there is a strong indication that a Lag rather than an Error alternative may be appropriate. While the joint LM test is also highly significant, this is likely due to a strong one-sided (Lag) alternative.

Interestingly, the diagnostics against a spatial Durbin alternative strongly support the latter as well. Both LM tests and their robust forms are highly significant, and so is the joint test. Moreover, the value for the robust forms of the test is smaller than the original, which is the expected behavior (although not always reflected in empirical practice).

In sum, in addition to a spatial Lag model as an alternative, the spatial Durbin specification deserves consideration as well.

SLX¶

[ ]:

slx1 = OLS(y,x,w=wq,slx_lags=1,spat_diag=True,moran=True,
                 name_w=w_name,name_ds=ds_name)
print(slx1.summary)

Relative to the classic regression model, the fit improves slightly, but the constant, EP_NOHSDP and HIS_CT become non-significant at p = 0.01 (they are marginally signifcant at p=0.05). All but one coefficient of the SLX terms are significant (W_EP_NOVEH is not). The signs and magnitudes of the SLX coefficients relative to their unlagged counterparts remain a bit confusing. Only for EP_LIMENG and W_EP_LIMENG are they the same, with the lag coefficient smaller than the unlagged one, in accordance with Tobler’s law. The coefficient for W_HIS_ct is significant and larger than that of HIS_ct, while the latter is not significant at p = 0.01. In other words, the interpretation of these results in terms of distance decay and Tobler’s law may be a bit problematic.

In terms of diagnostics, there is a slight problem with multicollinearity (often the case in SLX specifications), strong non-normality and evidence of heteroskedasticity. Moran’s I is significant, as are both LM-tests, but neither of the robust forms is significant. Based on the relative magnitudes of the test statistics, there is a slight indication of a possible Lag alternative, i.e., a spatial Durbin specification. However, this indication is not as strong as that provided by the LM-SDM test statistics in the classic specification.

ML Estimation of the Error Model¶

Principle¶

The spatial Error model is a regular linear regression model with spatially autoregressive error terms:

\[y = X\beta + u, u = \lambda W + e,\]

where \(\lambda\) is the spatial autoregressive (error) parameter

The point of departure of the Maximum Likelihood estimation of this model is again the assumption of joint normality of the error terms. However, the error terms are no longer independent, but they have a covariance matrix that follows from the spatial autoregressive specification. Specifically:

\[E[uu'] = \Sigma = \sigma^2 [(I - \lambda W)'(I - \lambda W)]^{-1},\]

where \(\sigma^2\) is the variance of the remaining error terms.

This leads to the log-likelihood function:

\[\begin{split}\ln L = -(n/2)(\ln 2\pi) - (n/2) \ln \sigma^2 + \ln | I - \lambda W | \\ - \frac{1}{2\sigma^2} (y - X \beta)'(I - \lambda W)'(I - \lambda W)(y - X \beta).\end{split}\]

The last term in this expression is a sum of squared residuals in a regression of \((I - \lambda W)y\) on \((I - \lambda W)X\), i.e., a standard OLS estimation, but based on the spatially filtered dependent and explanatory variables (of course, this assumes a value for \(\lambda\)). The regression of the spatially filtered variables is referred to as spatially weighted least squares or spatial Cochran-Orcutt, the latter due to its similarity to a familiar time series transformation.

As in ths spatial lag case, maximization of the log-likelihood is simplified since a concentrated likelihood can be derived that is only a function of the single parameter \(\lambda\). Once an estimate for \(\lambda\) is obtained, the corresponding estimates for \(\beta\) and \(\sigma^2\) are easily computed. For technical details, see Chapter 10 of Anselin and Rey (2014).

Inference is again based on an asymptotic variance matrix.

Implementation methods¶

ML estimation of the spatial error model works in the same way as for the lag model. It is implemented by means of spreg.ML_Error, with all the standard regression arguments (i.e., at a minimum, y, x and w). Again, three different methods are implemented: full, ord an LU. These differ only in the way the Jacobian term \(\ln | I - \lambda W |\) is computed.

As in the lag case, the default optimization method is brute force, or method="full". The other options are the Ord eigenvalue method, method = "ord", and the LU matrix decomposition, method = "LU". Again, the latter is the only reliable method for larger data sets.

The ML estimation is illustrated for the same specification as before, first using method="full". Since this is also the default, it is not necessary to explicitly include this argument, but it is listed here for clarity. To compare the relative speed of the different methods, time.time() is used.

Unlike what holds for the spatial lag model, there are no impacts for the spatial error model, since any spillover effects are limited to the error terms. Since the model impacts are based on averages (conditional expectation of y given X), the error terms are immaterial (on average, they are zero).

[ ]:

t0 = time.time()
err1 = ML_Error(y,x,w=wq,method="full",
                     name_w=w_name,name_ds=ds_name)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(err1.summary)

Even though the LM tests provided only weak evidence of an error alternative, the estimate of \(\lambda\), 0.451, is highly significant. The other regression coefficients change slightly relative to the OLS estimates, but not nearly as much as was the case for the spatial lag model. In fact, the estimates should not change much, since OLS remains unbiased (but becomes inefficient) in the presence of spatial error autocorrelation.

The measures of fit include a pseudo \(R^2\), 0.633 (squared correlation between observed and predicted y), the log-likelihood and the AIC and SC information criteria. Since the lag and error models have the same number of parameters, their log-likelihoods are directly comparable. In the lag model, the result was -2418.99, here it is -2428.85. In other words, the log-likelihood in the lag model is somewhat larger than for the error model, confirming the indication given by the LM test statistics (in favor of the lag model).

As before, important attributes of the results are stored in the regression object. These include the regression coefficients, in betas, with the spatial autoregressive coefficient as the last element. The latter is also included separately as lam. The standard errors are in std_err, z-statistics and p-values in z_stat, and the complete variance-covariance matrix is vm.

Since there is no reduced form for the error model, there is only one type of predicted value, contained in predy. However, there are two types of residuals, the classic residual, \(u = y - X \hat{\beta}\), stored in u, and the spatially filtered residuals, \(u - \lambda W u\), stored in e_filtered. The estimate for the error variance, \(\sigma^2\), is based on the latter.

The contents of the betas and lam attributes show how the estimate for \(\lambda\) is also the last element in betas.

[ ]:

print("betas ",err1.betas)
print(f"lambda: {err1.lam:0.3f}")

The Ord eigenvalue method is invoked by means of method="ord". All other attributes are the same as before.

[ ]:

t0 = time.time()
err2 = ML_Error(y,x,w=wq,method="ord",
                     name_w=w_name,name_ds=ds_name)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(err2.summary)

The coefficient estimates are identical to those obtained with the full method. There are some slight differences in the computed standard errors (and thus also in the z-values and p-values), but the overall effect is minimal.

Again, all arguments are the same, except for method = "LU".

[ ]:

t0 = time.time()
err3 = ML_Error(y,x,w=wq,method="LU",
                     name_w=w_name,name_ds=ds_name)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(err3.summary)

In this case, the estimation results are identical to those for the full method.

Predicted Values and Residuals¶

The two types of residuals can be readily turned into a data frame by means of pd.DataFrame applied to an array constructed with np.hstack, in the same way as was done for the OLS predicted values and residuals. In the example, the associated variable names are resid and filtered, passed as the columns argument.

Descriptive statistics are obtained with describe()

[ ]:

preds = pd.DataFrame(np.hstack((err1.u,err1.e_filtered)),columns=['resid','filtered'])
preds.describe()

Note some important differences between the two concepts. The filtered residuals are the proper estimates for the regression error term e (not u). Clearly, it has a mean of zero, which is not quite the case for the unfiltered residuals. Also, the variance for the filtered residual is slightly smaller, but the range is very much smaller.

The correlation between the two concepts is high, but not perfect, at 0.968.

[ ]:

print(f"Correlation between residuals:        {preds['resid'].corr(preds['filtered']):0.3f}")

Spatial pattern of residuals¶

A final interesting comparison is between the spatial pattern of the two types of residuals. To assess this, a simple Moran scatterplot is constructed, where the spatial lag is computed by means of libpysal.lag_spatial. The plot itself is constructed with sns.regplot, which superimposes a regression line on the scatter plot of the spatial lag on the original variable. No customization of the graph is carried out.

For the naive residuals, this yields the following plot.

[ ]:

werr = lag_spatial(wq,preds['resid']).reshape(-1,1)
sns.regplot(x=preds['resid'],y=werr)
plt.show()

The regression line shows a strong positive slope, suggesting remaining spatial clustering.

[ ]:

wfor = lag_spatial(wq,preds['filtered']).reshape(-1,1)
sns.regplot(x=preds['filtered'],y=wfor)
plt.show()

In contrast, the slope for the filtered residuals is essentially flat, suggesting that the spatial autocorrelation has been removed.

Mapping predicted values and residuals¶

Optionally, the predicted values and residuals can be added to the spatial data frame in order to construct associated maps. However, since these maps create only visual impressions of spatial patterning, this is not further pursued here.

ML Estimation of the SLX-Error Model¶

ML estimation of the SLX-Error model is a special case of spreg.ML_Error, with the additional argument of slx_lags=1 (or a larger value). Everything else remains the same. More specifically, the three methods of full, ord and LU are again available. Only the default full is considered here. The results are essentially the same for the other methods.

[ ]:

t0 = time.time()
slxerr = ML_Error(y,x,w=wq,slx_lags=1,
                    name_w=w_name,name_ds=ds_name)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(slxerr.summary)

Compared to the OLS SLX estimates, there are some minor changes, but much less so than for the spatial Durbin model. For example, W_EP_NOHSDP becomes marginally non-significant (p=0.02), whereas it was significant in OLS. This is the only lag coefficient where the sign differs from that of the original coefficient, which was also the case in OLS. In terms of fit, there is a slight improvement, from an AIC of 4903.5 in OLS to 4841.6 here (smaller is better).

The spatial autoregressive coefficient, 0.400, is highly significant.

As in the lag case, further refinements of the model specification can be carried out by eliminating some lag terms by means of slx_vars, as in the standard SLX model. This is not further pursued here.

Likelihood-Ratio Tests¶

A likelihood ratio test is \(LR = 2.0 \times (LogL_1 - LogL_0)\), where \(LogL_1\) is the log-likelihood for the unrestricted model (i.e., with more non-zero parameters), and \(LogL_0\) is the log-likelihood for the restricted model (i.e., where some parameters, like \(\rho\), are set to zero). For example, a likelihood ratio test on the coefficient \(\rho\) in the spatial lag model would use the log likelihood in the spatial lag model as \(LogL_1\), and the log-likelihood from the classic regression as \(LogL_0\).

The \(LR\) statistic is distributed as a Chi-square random variable with degrees of freedom equal to the number of restrictions, i.e., 1 for the spatial autoregressive coefficient, but more for the SLX and spatial Durbin models, depending on how many explanatory variables are included. The LR tests are an alternative to the Wald tests (asymptotic t-values) on the spatial coefficient and the LM tests for spatial effects considered earlier.

The same likelihood ratio test as in the lag model can be implemented with spreg.diagnostics.likratiotest. Its two arguments are the regression object for the constrained model and the regression object for the unconstrained model. The result is a dictionary with the statistic (likr), the degrees of freedom (df) and the p-value (p-value)

Four different LR test consider the following constraints:

Error vs OLS, i.e., \(\lambda = 0\) in the Error model: arguments are ols1 and err1
SLX-Error vs OLS, i.e., both \(\lambda = 0\) and \(\gamma = 0\) in the SLX-Error model: argumentes are ols1 and slxerr
SLX-Error model vs Error model, i.e., \(\gamma = 0\) in the SLX-Error model: arguments are err1 and slxerr
SLX-Error model vs SLX, i.e., \(\lambda = 0\) in the SLX-Error model: arguments are slx1 and slxerr

[ ]:

LR_Error = likratiotest(ols1,err1)
LR_SLXO = likratiotest(ols1,slxerr)
LR_SLXE = likratiotest(err1,slxerr)
LR_SLXS = likratiotest(slx1,slxerr)

print(f"LR statistic Error-OLS: {LR_Error["likr"]:0.3f}, d.f. {LR_Error["df"]:2d}, p-value {LR_Error["p-value"]:0.4f}")
print(f"LR statistic SLX-Err-OLS: {LR_SLXO["likr"]:0.3f}, d.f. {LR_SLXO["df"]:2d}, p-value {LR_SLXO["p-value"]:0.4f}")
print(f"LR statistic SLX-Err-Error: {LR_SLXE["likr"]:0.3f}, d.f. {LR_SLXE["df"]:2d}, p-value {LR_SLXE["p-value"]:0.4f}")
print(f"LR statistic SLX-Err-SLX: {LR_SLXS["likr"]:0.3f}, d.f. {LR_SLXS["df"]:2d}, p-value {LR_SLXS["p-value"]:0.4f}")

In the current example, all null hypotheses are strongly rejected.

For the error model in this example, the LM-Error test statistic was 88.33, the Wald test was 9.560^2 or 91.38, and the LR test (above) 72.72. Whereas the LR and Wald test follow the prescribed order (LM < LR < W), the LM-Lag test does not, which may point to potential remaining specification problems.

As mentioned, the model can be refined by selectively setting slx_vars, but this is not pursued here.

Practice¶

As practice, different model specifications could be considered, including adding additional explanatory variables, selectively removing some lag terms, and using different spatial weights.