This page was generated from notebooks/13_ML_estimation_spatial_lag.ipynb. Interactive online version:

Maximum Likelihood Estimation - Spatial Lag Model¶

Luc Anselin¶

(revised 09/19/2024)¶

Preliminaries¶

This notebook is the first of two that deal with the estimation of the spatial Lag model and the spatial Durbin model. Here, the Maximum Likelihood approach is illustrated. Instrumental variable estimation is considered in a separate notebook.

The maximum likelihood estimation in spreg is primarily included for pedagogical purposes. Generally, the instrumental variables approach is preferred. In addition, an optimal maximum likelihood estimation implementation, based on the Smirnov-Anselin (2001) approximation, is not currently implemented in spreg. It is implemented in C++ in GeoDa. This is the preferred approach for ML estimation in large(r) data sets, although it currently does not support estimation of the spatial Durbin specification (this must be implemented by hand by constructing the spatially lagged explanatory variables explicitly).

The spreg module implements ML estimation of the spatial lag model in the ML_Lag class. Given the problems in the optimization of the log-likelihood for the SARSAR model (and the issues with interpretation of the results), ML estimation of this model is purposely not included. For the same reason, the general nested model is not implemented either. These models can be estimated by means of IV/GMM.

The estimation of the Spatial Durbin model is implemented through the inclusion of the slx_lags argument.

Modules Needed¶

As before, the main module is spreg for spatial regression analysis. From this, OLS and ML_Lag are imported. In addition, the utilities in libpysal (to open spatial weights and access the sample data set), pandas and geopandas are needed, as well as time (for some timing results), matplotlib.pyplot and seaborn for visualization. All of these rely on numpy as a dependency. Finally, in order to carry out the Likelihood Ratio tests, likratiotest is imported from spreg.diagnostics.

The usual numpy set_printoptions is included as well.

[1]:

import warnings
warnings.filterwarnings("ignore")
import os
os.environ['USE_PYGEOS'] = '0'

import numpy as np
import pandas as pd
import geopandas as gpd
import time
import matplotlib.pyplot as plt
import seaborn as sns
from libpysal.io import open
from libpysal.examples import get_path
from libpysal.weights import lag_spatial

from spreg import OLS, ML_Lag
from spreg.diagnostics import likratiotest
np.set_printoptions(legacy="1.25")

Functions Used¶

from pandas/geopandas:
- read_file
- DataFrame
- head
- describe
- corr
from libpysal:
- io.open
- examples.get_path
- weights.lag_spatial
from numpy:
- hstack
from matplotlib/seaborn:
- regplot
- show
from spreg:
- spreg.OLS
- spreg.ML_Lag
- spreg.diagnostics.likratiotest

Variable definition and data input¶

The data set and spatial weights are from the chicagoSDOH sample data set:

Chi-SDOH.shp,shx,dbf,prj: socio-economic indicators of health for 2014 in 791 Chicago tracts
Chi-SDOH_q.gal: queen contiguity weights

To illustrate the methods, a descriptive model is used that relates the rate of uninsured households in a tract(for health insurance, EP_UNINSUR) to the lack of high school education (EP_NOHSDP), the economic deprivation index (HIS_ct), limited command of English (EP_LIMENG) and the lack of access to a vehicle (EP_NOVEH). This is purely illustrative of a spatial lag specification and does not have a particular theoretical or policy motivation.

The file names and variable names are set in the usual manner. Any customization for different data sets/weights and different variables should be specified in this top cell.

[2]:

infileshp = get_path("Chi-SDOH.shp")     # input shape file with data
infileq = get_path("Chi-SDOH_q.gal")     # queen contiguity weights created with GeoDa

y_name = 'EP_UNINSUR'
x_names = ['EP_NOHSDP','HIS_ct','EP_LIMENG','EP_NOVEH']
ds_name = 'Chi-SDOH'
w_name = 'Chi-SDOH_q'

The read_file and open functions are used to access the sample data set and contiguity weights. The weights are row-standardized and the data frames for the dependent and explanatory variables are constructed. As before, this functionality is agnostic to the actual data sets and variables used, since it relies on the specification given in the initial block above.

[3]:

dfs = gpd.read_file(infileshp)
wq =  open(infileq).read()
wq.transform = 'r'    # row-transform the weights
y = dfs[y_name]
x = dfs[x_names]

OLS and SLX with Spatial Diagnostics¶

Standard OLS and SLX regressions with spatial diagnostics are carried out to provide a point of reference. Moran’s I is included by setting moran=True and, of course, spat_diag=True as well. Refer to the specific OLS notebook for further details.

OLS¶

[ ]:

ols1 = OLS(y,x,w=wq,spat_diag=True,moran=True,
                 name_w=w_name,name_ds=ds_name)
print(ols1.summary)

The specification achieves an acceptable \(R^2\) of about 0.63 and all coefficients are positive and highly significant.

The non-spatial diagnostics suggest non-normality as well as a hight degree of heteroskedasticity. There is no problem with multicollinearity.

The spatial diagnostics against the SARERR alternatives show very significant LM-Lag and LM-Error, but of the two robust tests, only RLM-Lag is highly significant (RLM-Error only at p < 0.03). Hence, there is a strong indication that a Lag rather than an Error alternative may be appropriate. While the joint LM test is also highly significant, this is likely due to a strong one-sided (Lag) alternative.

Interestingly, the diagnostics against a spatial Durbin alternative strongly support the latter as well. Both LM tests and their robust forms are highly significant, and so is the joint test. Moreover, the value for the robust forms of the test is smaller than the original, which is the expected behavior (although not always reflected in empirical practice).

In sum, in addition to a spatial Lag model as an alternative, the spatial Durbin specification deserves consideration as well.

SLX¶

[ ]:

slx1 = OLS(y,x,w=wq,slx_lags=1,spat_diag=True,moran=True,
                 name_w=w_name,name_ds=ds_name)
print(slx1.summary)

Relative to the classic regression model, the fit improves slightly, but the constant, EP_NOHSDP and HIS_CT become non-significant at p = 0.01 (they are marginally signifcant at p=0.05). All but one coefficient of the SLX terms are significant (W_EP_NOVEH is not). The signs and magnitudes of the SLX coefficients relative to their unlagged counterparts remain a bit confusing. Only for EP_LIMENG and W_EP_LIMENG are they the same, with the lag coefficient smaller than the unlagged one, in accordance with Tobler’s law. The coefficient for W_HIS_ct is significant and larger than that of HIS_ct, while the latter is not significant at p = 0.01. In other words, the interpretation of these results in terms of distance decay and Tobler’s law may be a bit problematic.

In terms of diagnostics, there is a slight problem with multicollinearity (often the case in SLX specifications), strong non-normality and evidence of heteroskedasticity. Moran’s I is significant, as are both LM-tests, but neither of the robust forms is significant. Based on the relative magnitudes of the test statistics, there is a slight indication of a possible Lag alternative, i.e., a spatial Durbin specification. However, this indication is not as strong as that provided by the LM-SDM test statistics in the classic specification.

ML Estimation of the Lag Model¶

Principle¶

The point of departure of the Maximum Likelihood estimation of the spatial Lag model is an assumption of normal, independent and identically distributed errors. From this, the distribution for the observable vector of the dependent variable is obtained as multivariate normal.

For the lag model \(y = \rho Wy + X\beta + u\), the corresponding log-likelihood is:

\[\begin{split}\ln L = \ln | I - \rho W | -(n/2)(\ln 2\pi) - (n/2) \ln \sigma^2 \\ - (1/2 \sigma^2)(y - \rho W y - X \beta)'(y - \rho W y - X \beta)\end{split}\]

Except for the first term, this is identical to the log-likelihood in the classic regression model. The maximization of the classic log-likelihood would correspond to the minimization of the sum of squared residuals, in this case \((y - \rho W y - X \beta)'(y - \rho W y - X \beta)\), but this ignores the Jacobian term \(\ln | I - \rho W |\). As a consequence, simple minimization of the sum of squared residuals (i.e., OLS), which ignores this Jacobian term, will yield biased estimates.

Maximization of the log-likelihood is simplified since a concentrated likelihood can be derived that is only a function of the single parameter \(\rho\). Once an estimate for \(\rho\) is obtained, the corresponding estimates for \(\beta\) and \(\sigma^2\) are easily computed. For technical details, see Chapter 8 of Anselin and Rey (2014).

Inference is based on an asymptotic variance matrix, which is computed as the inverse of the so-called information matrix (the expected value of the matrix of second partial derivatives of the log-likelihood function).

Implementation methods¶

ML estimation of the classic spatial lag model is implemented by means of spreg.ML_Lag, with all the standard regression arguments (i.e., at a minimum, y, x and w). Three different methods are implemented: full, ord an LU. These differ only in the way the Jacobian term \(\ln | I - \rho W |\) is computed. As the logarithm of the determinant of a \(n \times n\) matrix, this calculation runs into numerical difficulties for large(r) data sets.

The default optimization method is brute force, or method="full". This uses dense matrix expressions to calculate the required determinants and inverse matrix terms. This method should not be used for large(r) data sets.

The Ord eigenvalue method, method="ord" (Ord, 1975) uses the eigenvalues of the spatial weights matrix as a shortcut to compute the Jacobian determinant. Since this method relies on eigenvalue computations, it also is not reliable for large(r) data sets. The method argument must be included (since the default is method="full").

The method="LU" uses the LU (Cholesky) matrix decomposition for sparse matrices to efficiently compute the Jacobian determinant for large data sets. The sparse matrix conversion is done internally, so the only needed additional argument is method = "LU". This is the only reliable method for larger data sets.

The ML estimation is illustrated for the same specification as before, first using method="full". Since this is also the default, it is not necessary to explicitly include this argument, but it is listed here for clarity. To compare the relative speed of the different methods, time.time() is used.

In addition, since the impacts calculation is set to simple by default, it is turned off for now by means of spat_impacts = None (see below for more specifics).

[ ]:

t0 = time.time()
lag1a = ML_Lag(y,x,w=wq,method="full",
                     name_w=w_name,name_ds=ds_name,
                     spat_impacts=None)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(lag1a.summary)

The spatial autoregressive coefficient (W_EP_UNINSUR) at 0.39 is highly significant. The effect of its inclusion on the other coefficient estimates is major as well. All are substantially smaller than in the classic regression, except for the coefficient of EP_NOVEH, which is marginally larger. The main effect is on the constant term and the coefficient of EP_NOHSDP, neither of which is any longer significant. In essence, ignoring the spatial spillover effects in the classic regression means that some of these spillovers are reflected in the regression coefficients, which become biased with respect to their true magnitude. The spillover effects are considered more closely in the discussion of impacts below.

In addition to the coefficient estimates, the output includes information about the model fit. There are two Pseudo R-squared measures: one is based on the naive residuals, \(e = y - \hat{\rho} Wy - X\hat{\beta}\), the other (Spatial Pseudo R-squared) is computed from the forecasting errors when using the reduced form to compute predicted values. The two types of predicted values and residuals are included in the regression object as predy and u for the naive form and predy_e and e_pred for the reduced form results. As is typically the case, the measure of fit based on the reduced form predicted values is (slightly) worse than the naive one.

Other indications of the fit of the model (although strictly speaking not measures of fit) are the Log Likelihood (-2418.99), the Akaike info criterion (4849.97), and the Schwarz criterion, also sometimes referred to as BIC (4878.01). Compared to the results for the classic regression (respectively -2465.21, 4940.42, and 4963.79), the log-likelihood is clearly less negative (thus larger) and the AIC and SC are smaller (better) than their counterparts.

Other interesting attributes of the regression object are the regression coefficients, in betas, with the spatial autoregressive coefficient as the last element. The latter is also included separately as rho. The standard errors are in std_err, z-statistics and p-values in z_stat, and the complete variance-covariance matrix is vm.

The contents of the betas and rho attributes show how the estimate for \(\rho\) is also the last element in betas.

[ ]:

print("betas ",lag1a.betas)
print("rho ",lag1a.rho)

The Ord eigenvalue method is invoked by means of method="ord". All other attributes are the same as before (with again spat_impacts = None).

[ ]:

t0 = time.time()
lag1b = ML_Lag(y,x,w=wq,method="ord",
                     name_w=w_name,name_ds=ds_name,
                     spat_impacts=None)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(lag1b.summary)

The coefficient estimates are identical to those obtained with the full method. There are some slight differences in the computed standard errors (and thus also in the z-values and p-values), but the overall effect is minimal.

Again, all arguments are the same, except for method = "LU".

[ ]:

t0 = time.time()
lag1c = ML_Lag(y,x,w=wq,method="LU",
                     name_w=w_name,name_ds=ds_name,
                     spat_impacts=None)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(lag1c.summary)

In this case, the estimation results are identical to those for the full method.

Spatial Multipliers - Impacts¶

In models that include a spatially lagged dependent variable Wy, with or without additional spatially lagged explanatory variables, the impact of a change in X on y is not simply the coefficient of X, as is the case in the standard regression model. Instead, the effect that results from changes in the neighboring values must also be accounted for. These are the spatial multipliers, indirect effects or spatial impacts.

In Kim, Phipps and Anselin (2003), it was shown that if the change in the explanatory variable is uniform across observations, the spatial multiplier is \(1 / (1- \rho)\), with the total effect of a change in variable \(x_k\) amounting to \(\beta_k / (1 - \rho)\). In the example, the spatial multiplier in the spatial lag model (lag1a) would be 1.0 / (1.0 - 0.39151) = 1.643.

The Kim et al. approach distinguishes between the direct effect, i.e., the coefficient of the \(\beta\) coefficients as estimated, and the total effect, which corresponds to this coefficient times the multiplier. An indirect effect is then the difference between the two.

LeSage and Pace (2009) introduce a slightly different set of concepts and use the terms average direct impact (\(ADI\)), average indirect impact (\(AII\)) and average total impact (\(ATI\)) as summaries computed from the matrix expression \((I - \rho W)^{-1}\) in the reduced form, \((I - \rho W)^{-1}X\beta\). The main difference is that what they refer to as direct effect also includes some feedbacks as reflected in the diagonal elements of the inverse matris. As a result, in their approach, the direct effects will differ from the estimates for \(\beta\).

More formally, LeSage-Pace define \(ADI\) as the average trace of the inverse matrix, or, \(ADI = (1/n) tr[(I - \rho W)^{-1}] = (1/n) \sum_i (I - \rho W)^{-1}_{ii}\). The \(ATI\) is the average of all the elements of the matrix, or, \(ATI = (1/n) \sum_i \sum_j (I - \rho W)^{-1}\). Note that with some algebra, one can show that this equals \(1 / (1 - \rho)\), the same as the total multiplier in the Kim et al. approach.

The \(AII\) then follows as \(ATI - ADI\). The actual impacts are obtained by multiplying the \(\beta\) coefficient by respectively \(ATI\), \(ADI\) and \(AII\).

The impact measures are listed in the spatial lag regression output when the spat_impacts argument is specified (it is by default set to spat_impacts = "simple"). Options include simple (Kim et al. approach), full and power (both based on LeSage-Pace, but with full using a dense matrix computation for the inverse, whereas power uses a power approximation with higher order weights), as well as all, for all three. In addition, any combination of methods can be passed in a list. For example, to obtain both Kim et al. and LeSage-Pace measures, the argument can be set as spat_impacts = ["simple","full"], as in the listing below. Since the default setting is spat_impacts = "simple", when listing the impacts is not desired, spat_impacts must explicitly be set to None.

Based on extensive timing experiments for the LeSage-Pace approach, the power method is superior for data sets with more than 5,000 observations. For larger data sets, it quickly becomes the only viable option, being orders of magnitude faster than the brute force calculations. The Kim et al. approach has no such limitation, since it does not use any matrix calculations.

Note that the reported impacts are only average effects. See the spatial multipliers notebook for a more extensive analysis of the associated spatial pattern.

[ ]:

lag2 = ML_Lag(y,x,w=wq,method="full",
                     name_w=w_name,name_ds=ds_name,
                     spat_impacts=['simple','full'])
print(lag2.summary)

The listing of SPATIAL LAG MODEL IMPACTS for the simple and full methods clearly illustrates the slight differences between the two approaches. The Total effect is the same for both, but the distribution of the effect between Direct and Indirect is slightly different. For the simple method, the Direct effects are identical to the coefficient estimates, but for the full method, they are slightly larger. This is due to the use of the diagonal elements of the inverse matrix, rather than the original estimates. As a result, some of the spillover feed-back effects are characterized as Direct. As a consequence, the part attributed to the Indirect effect is larger for the simple method than for the full method.

These measures should only be interpreted as rough indications of spatial spillovers since they are both based on a rather unrealistic assumption of a uniform change in the X-variable. Also, the Total effect is simply the original coefficient times the spatial multiplier. For example, for EP_LIMENG, this amounts to \(0.3116 \times 1.643 = 0.512\).

This total effect can be compared to that implied by the SLX model, which would be the sum of \(\beta\) and \(\gamma\). For example, for EP_LIMENG, this would be \(0.38547 + 0.22040 = 0.60587\), which is actually larger than the total effect suggested by the spatial lag model. In part, this can be explained by recognizing that the SLX model is a truncated form of the reduced form for the spatial lag specification, i.e., only the first order contiguity elements are included. When ignoring the remainder, its impact tends to be attributed to the coefficients of \(\beta\) and \(\gamma\).

Predicted Values and Residuals¶

As mentioned, the naive predicted values and residuals are attributes of the regression object as predy and u. These are somewhat misleading, since they take the value of \(Wy\) as observed, in a so-called conditional approach. In the simultaneous spatial lag model, the spatial pattern for the dependent variable \(y\) is jointly determined as a function of the X-variables only. A predicted value that reflects this lesser degree of information is computed from the reduced form, as \(y_{pr} = (I - \hat{\rho} W)^{-1} X\hat{\beta}\). This is predy_e in the regression object. The associated forecast error, \(y - y_{pr}\) is e_pred in the regression object.

The two types of predicted values and residuals can be readily turned into a data frame by means of pd.DataFrame applied to an array constructed with np.hstack, in the same way as was done for the OLS predicted values and residuals. In the example, the associated variable names are ypred, yreduce, resid and forcerr, passed as the columns argument.

Descriptive statistics are obtained with describe()

[ ]:

preds = pd.DataFrame(np.hstack((lag1a.predy,lag1a.predy_e,lag1a.u,lag1a.e_pred)),columns=['ypred','yreduce','resid','forcerr'])
preds.describe()

Note some important differences between the two concepts. First, whereas the mean of ypred equals the mean of the actual dependent variable of 18.4852 (see Mean dependent var in the regression output listing), the mean of the reduced form prediction is slightly different (18.5128). Consequently, the mean of resid is essentially zero, but the mean of forcerr is slightly negative, at -0.0375. There are other slight differences as well.

The correlation between the two concepts is high, but not perfect, respectively 0.988 for the predicted values and 0.978 for the residuals.

[ ]:

print(f"Correlation between predicted values: {preds['ypred'].corr(preds['yreduce']):0.3f}")
print(f"Correlation between residuals:        {preds['resid'].corr(preds['forcerr']):0.3f}")

Spatial pattern of residuals¶

A final interesting comparison is between the spatial pattern of the two types of residuals. To assess this, a simple Moran scatterplot is constructed, where the spatial lag is computed by means of libpysal.lag_spatial. The plot itself is constructed with sns.regplot, which superimposes a regression line on the scatter plot of the spatial lag on the original variable. No customization of the graph is carried out.

For the naive residuals, this yields the following plot.

[ ]:

werr = lag_spatial(wq,preds['resid']).reshape(-1,1)
sns.regplot(x=preds['resid'],y=werr)
plt.show()

The regression line is essentially flat, which means most/all of the remaining spatial correlation has been eliminated. In contrast, the Moran scatterplot for the prediction error shows a strong positive slope, suggesting remaining spatial clustering.

[ ]:

wfor = lag_spatial(wq,preds['forcerr']).reshape(-1,1)
sns.regplot(x=preds['forcerr'],y=wfor)
plt.show()

Why would this be the case? Recall that the predicted value is computed as \(y_{pr} = (I - \hat{\rho} W)^{-1} X\hat{\beta}\). However, the complete expression for the reduced form is \(y = (I - \rho W)^{-1} X\beta + (I - \rho W)^{-1}u\). Since the error term is unobserved and its mean is zero, the second term is ignored in the predicted value. As a result, the actual difference between \(y\) and \(y_{pr}\) is \(e = (I - \rho W)^{-1}u\). This can be written as \(e = \rho We + u\), a spatial autoregressive process, which accounts for the spatial pattern in the residuals.

Since the residual is not filtered for the existing spatial correlation, it will remain spatially correlated itself. This is reflected in the positive slope of the regression line in the Moran scatter plot.

Mapping predicted values and residuals¶

Optionally, the predicted values and residuals can be added to the spatial data frame in order to construct associated maps. However, since these maps create only visual impressions of spatial patterning, this is not further pursued here.

ML Estimation of Spatial Durbin Model¶

ML estimation of the Spatial Durbin model is a special case of spreg.ML_Lag, with the additional argument of slx_lags=1 (or a larger value). Everything else remains the same. More specifically, the three methods of full, ord and LU are again available. Only the default full is considered here. The results are essentially the same for the other methods.

To illustrate the difference between the two types of impact measures, spat_impacts is set to ["simple","full"] (note, the default setting remains spat_impacts = "simple").

Another default setting is spat_diag = True, which yields the results for the Common Factor Hypothesis test. To avoid this test, spat_diag must be set to False. In the illustration, both arguments are listed explicitly for clarity.

[ ]:

t0 = time.time()
spdur = ML_Lag(y,x,w=wq,slx_lags=1,
                    name_w=w_name,name_ds=ds_name,
                    spat_impacts = ['simple','full'],
                    spat_diag=True)
t1 = time.time()
print("Time in seconds: ",t1-t0)
print(spdur.summary)

The inclusion of the spatial lag terms affects the results relative to both the original classic specification and the SLX model. The spatial autoregressive coefficient of 0.40 is highly significant. Similar to what happened in the SLX model relative to the classic specification, EP_NOHSDP and HIS_ct are no longer significant and neither is W_EP_NOVEH, but now W_EP_LIMENG is also no longer significant. As in the SLX specification, some of the signs and magnitudes of the WX coefficients run counter to Tobler’s Law. According to the latter, the coefficients of \(\beta\) and the matching \(\gamma\) should be the same, which is not the case for EP_NOHSDP (but not significant) and EP_NOVEH (but the lag is not significant). A consistent pattern of opposite signs may be an indication that the common factor hypothesis holds (see below).

Relative to the standard Lag model, the inclusion of the WX terms makes the spatial autoregressive coefficient slightly larger (0.40 relative to 0.39), but it renders HIS_ct non-significant (this is similar to what happened in the SLX model relative to the classic specification).

Significance improves slightly relative to the SLX model, with a Log Likelihood of -2410.7 (compared to -2442.8), AIC of 4841.5 (relative to 4903.5), and SC of 4888.2 (relative to 4945.6).

As in the spatial Lag model, there will be two types of predicted values and residuals, respectively based on a naive approach and the reduced form. This is not considered further since the treatment and interpretation are identical to that in the standard spatial Lag model.

Further refinements of the model specification can be carried out by eliminating some lag terms by means of slx_vars, as in the SLX model. This is not further pursued here.

Spatial Multipliers - Impacts¶

The impact measures for the Spatial Durbin model follow the same logic as in the spatial lag model. They are derived from the reduced form, which is now, ignoring the error term: \begin{equation*} y = (I - \rho W)^{-1} X \beta + (I - \rho W)^{-1} WX \gamma. \end{equation*} The effect of this is that the various multipliers must be applied to both \(\beta\) and the matching \(\gamma\) to compute the overall impacts.

As it turns out, the total multiplier is the same as in the lag model, i.e., \(1.0 / (1.0 - \rho)\). However, to get the total effect, this factor needs to be multiplied by both \(\beta\) and the matching \(\gamma\).

For example, taking \(\rho = 0.4044\) as in the example, the total multiplier follows as \(1.673\). The total impact for the variable EP_LIMENG (ignoring for now that W_LIMENG turned out to be non-significant) would be \(1.673 \times 0.37397 + 1.673 \times 0.01243 = 0.6464\), the value given in the Total column of the impacts listing. As in the lag model, the total impact is the same for the Kim et al approach and the LeSage-Pace approach. The main difference is in the way the direct effect is computed.

In the Kim et al approach, the direct effect is the coefficient of \(\beta\). This is a strict interpretation of direct effect, i.e., it considers the effect of \(\gamma\) to be indirect. This is consistent with the interpretation of \(\gamma\) in the SLX model, but it is not the approach used by LeSage-Pace in their original treatment. In their formulation, the \(ADI\) is applied to both \(\beta\) and \(\gamma\) to compute the direct effect, typically yielding a larger value for the direct effect (as long as \(\beta\) and \(\gamma\) have the same sign). This approach is not followed by spreg. Instead, the direct effect is obtained by multiplying the \(ADI\) with the \(\beta\) coefficient only. As a result, the share attributed to the indirect impact will be larger than in original the LeSage-Pace formulation (in contrast, the latter is followed in the R spatialreg package).

The total and direct/indirect impacts can be compared to those suggested by the simple spatial lag model and the SLX model. For example, for EP_LIMENG, the total impact of a uniform change in that variable was 0.606 in SLX, 0.512 in the spatial lag model and 0.646 in the spatial Durbin specification. Of this (using the Kim et al logic), 0.220 was spatial spillover (indirect effect) in SLX, 0.201 in spatial lag, and 0.272 in the spatial Durbin model.

As mentioned above, these impact measures are only summaries. A more meaningful indication of spatial spillovers would follow from a non-uniform change in (some of) the X variables through the use of the full reduced form. In addition, the average may mask some interesting spatial patterns, as demonstrated in an earlier notebook.

Common Factor Hypothesis¶

A complication in the interpretation of the spatial Durbin model occurs because the spatial autoregressive Error model has an equivalent counterpart as a spatial Lag formulation that includes WX terms, i.e., the same specification as the spatial Durbin model. Formally: \begin{equation*} y = X \beta + (I - \lambda W)^{-1} u = \lambda Wy + X \beta - \lambda WX \beta + u. \end{equation*} In this alternative specification, the coefficients for the \(WX\) variables correspond to the negative product of the autoregressive and the matching regression parameters (except for the constant term), the so-called common factor constraint.

Following Anselin(1988), the test on the common factor hypothesis \(H_0: \rho \beta^* + \gamma = 0\) (with \(\beta^*\) as the vector of regression coefficients without the constant term) consists of three elements:

the constraint as a \(h \times 1\) vector \(g = \hat{\rho} \hat{\beta}^* + \hat{\gamma}\) (with \(h = k-1\))
a \(2h+1 \times h\) matrix of partial derivatives:

\begin{equation*} G = \begin{bmatrix} \hat{\rho} \times I_h\\ I_h\\ \hat{\beta'}^* \end{bmatrix} \end{equation*}

and the \(2h+1 \times 2h+1\) square asymptotic variance matrix \(V\) from the estimated spatial Durbin model (vm in the regression object)

Using the delta method, the common factor statistic then follows as: \begin{equation*} CF = g'[G'VG]^{-1}g \sim \chi^2(h). \end{equation*} For our purposes, this suffices, although it should be noted that Juhl(2021) has pointed out potential problems due to the lack of invariance of the Wald test to different reparameterizations of the null hypothesis.

A test on the common factor hypothesis is included in the spatial Durbin output when spat_diag = True, which is the default for this specification.

In the example above, the value of the test statistic is 32.954, which strongly rejects the null. Some indication of the failure of the common factor hypothesis could also be gleaned from the lack of opposite signs of \(\beta\) and \(\gamma\), which is necessary for the constraint to hold with a positive \(\rho\).

Likelihood-Ratio Tests¶

A likelihood ratio test is \(LR = 2.0 \times (LogL_1 - LogL_0)\), where \(LogL_1\) is the log-likelihood for the unrestricted model (i.e., with more non-zero parameters), and \(LogL_0\) is the log-likelihood for the restricted model (i.e., where some parameters, like \(\rho\), are set to zero). For example, a likelihood ratio test on the coefficient \(\rho\) in the spatial lag model would use the log likelihood in the spatial lag model as \(LogL_1\), and the log-likelihood from the classic regression as \(LogL_0\).

The \(LR\) statistic is distributed as a Chi-square random variable with degrees of freedom equal to the number of restrictions, i.e., 1 for the spatial autoregressive coefficient, but more for the SLX and spatial Durbin models, depending on how many explanatory variables are included. The LR tests are an alternative to the Wald tests (asymptotic t-values) on the spatial coefficient and the LM tests for spatial effects considered earlier.

A likelihood ratio test is implemented as spreg.diagnostics.likratiotest. Its two arguments are the regression object for the constrained model and the regression object for the unconstrained model. The result is a dictionary with the statistic (likr), the degrees of freedom (df) and the p-value (p-value)

Four different LR test consider the following constraints:

Lag vs OLS, i.e., \(\rho = 0\) in the Lag model: arguments are ols1 and lag2
SDM vs OLS, i.e., both \(\rho = 0\) and \(\gamma = 0\) in the spatial Durbin model: argumentes are ols1 and spdur
SDM vs Lag, i.e., \(\gamma = 0\) in the spatial Durbin model: arguments are lag2 and spdur
SDM vs SLX, i.e., \(\rho = 0\) in the spatial Durbin model: arguments are slx1 and spdur

[ ]:

LR_Lag = likratiotest(ols1,lag2)
LR_SDMO = likratiotest(ols1,spdur)
LR_SDML = likratiotest(lag2,spdur)
LR_SDMS = likratiotest(slx1,spdur)

print(f"LR statistic Lag-OLS: {LR_Lag["likr"]:0.3f}, d.f. {LR_Lag["df"]:2d}, p-value {LR_Lag["p-value"]:0.4f}")
print(f"LR statistic SDM-OLS: {LR_SDMO["likr"]:0.3f}, d.f. {LR_SDMO["df"]:2d}, p-value {LR_SDMO["p-value"]:0.4f}")
print(f"LR statistic SDM-Lag: {LR_SDML["likr"]:0.3f}, d.f. {LR_SDML["df"]:2d}, p-value {LR_SDML["p-value"]:0.4f}")
print(f"LR statistic SDM-SLX: {LR_SDMS["likr"]:0.3f}, d.f. {LR_SDMS["df"]:2d}, p-value {LR_SDMS["p-value"]:0.4f}")

In the current example, all null hypotheses are strongly rejected. Based on this evidence alone, it would suggest that the most appropriate specification is the spatial Durbin model. However, conflicts with the signs and magnitudes of the coefficients make that model difficult to interpret. Most importantly, several parameters violate Tobler’s law, one of the most important tenets of spatial analysis. While in and of itself this is not sufficient to dismiss the model specification, it does require careful consideration of the interpretation.

The Likelihood Ratio, Wald (the square of the asymptotic t-ratio) and Lagrange Multiplier test statistics are considered to be classic tests. Asymptotically, they are equivalent, but in finite samples, they tend to follow the order LM < LR < W.

For the lag model in this example, the LM-Lag test statistic was 109.463, the Wald test was 9.922^2 or 98.446, and the LR test (above) 92.446. Whereas the LR and Wald test follow the prescribed order, the LM-Lag test does not, which may point to potential remaining specification problems.

As mentioned, the model can be refined by selectively setting slx_vars, but this is not pursued here.

Practice¶

While the ML estimation paradigm is very powerful, it also is not robust to various forms of misspecification. This is difficult to consider when the results are viewed in isolation, but it is important to keep in mind. As practice, different model specifications could be considered, including adding additional explanatory variables, selectively removing some lag terms, and using different spatial weights. Make sure to carefully consider the interpretation of the estimated coefficients and associated direct and indirect effects.