This page was generated from notebooks/10_specification_tests_properties.ipynb. Interactive online version: Binder badge

Specification Tests - Properties

Luc Anselin

(revised 09/11/2024)

Preliminaries

In this notebook, a closer look is taken at the properties of the various specification tests for spatial effects. This is carried out by means of a series of simulation experiments on data generated under the null hypothesis of no spatial effects as well as under various alternatives. This provides insight into the distribution of the test statistics under the null and their relative power against various alternatives.

Prerequisites

Familiarity with OLS estimation in spreg is assumed, as covered in the OLS notebook and the Specification Tests notebook. For the graphs, it may be useful to have some familiarity with matplotlib and the plot functionality of pandas, although to just replicate the graphs used here, that is not really needed.

Modules Needed

The main module for spatial regression in PySAL is spreg. In addition, libpysal is needed for spatial weights manipulation, and pandas for data frame manipulation. In these exercises, geopandas is not needed. In order to get nicer looking graphs, matplotlib.pyplot is imported as well, although this is not critical. In addition, the module time is used for timing experiments (optional).

As before, only the relevant parts of libpysal and spreg are imported.

[1]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
import libpysal.weights as weights
from spreg import OLS, make_x, make_xb, make_wx, make_wxg, \
    make_error, dgp_ols, dgp_lag, dgp_sperror, dgp_slx
np.set_printoptions(legacy="1.25")

Functions Used

  • from numpy:

    • random.default_rng

    • rng.chisquare

    • zeros

    • array

    • reshape

    • hstack

  • from pandas:

    • DataFrame

    • describe

    • plot

  • from matplotlib.pyplot:

    • show

  • from libpysal:

    • weights.lat2W

    • w.transform

    • w.n

  • from spreg:

    • OLS

    • make_x

    • make_xb

    • make_wx

    • make_wxg

    • make_error

    • dgp_ols

    • dgp_lag

    • dgp_sperror

    • dgp_slx

Files and Variables

In this notebook, no actual data are used, since the data sets will be created by means of simulation.

Model Parameters and Variables

The various model parameters are set here, so that it is easy to replicate the experiments for different sample sizes and coefficient values.

  • gridx: the number of cells in the horizontal dimension of a regular lattice of dimension gridx x gridy

  • gridy: the number of cells in the vertical dimension of a regular lattice of dimension gridx x gridy

  • b1: a list with regression parameters (includes a coefficient for the constant term as the first element)

  • k: length of b1 less one (no constant term counted)

  • rndseed: the random seed to ensure reproducibility

  • reps: the number of replications

  • rhovals: a list with spatial autoregressive coefficients \(\rho\) for the lag variables Wy

  • lamvals: a list with spatial coefficients \(\lambda\) for the error lag variables We

  • gamvals: a list with coefficients for the SLX variables (WX)

  • gamma: coefficient for WX in the Spatial Durbin Model

  • w: queen contiguity spatial weights

  • n: number of observations

  • p_value: critical value to be used for tests

  • x1: x values (not including constant)

  • xb1: \(X \beta\)

  • wx1: \(WX\)

[2]:
# grid layout and weights
gridx = 20
gridy = 20
w = weights.lat2W(gridx,gridy,rook=False)
w.transform = 'r'
n = w.n

# model coefficient values
b1 = [1, 1, 1, 1]
k = len(b1)-1
rhovals = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5,
           0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9]
lamvals = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5,
           0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9]
gamvals = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5,
           0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9]
gamma = 0.5

# simulation parameters
rndseed = 123456789
reps = 1000
p_value = 0.05

# Create X
rng=np.random.default_rng(seed=rndseed) # set seed for X
xx = make_x(rng,n*k,mu=[0],varu=[6],method="uniform")
x1 = np.reshape(xx,(n,k))
xb1 = make_xb(x1,b1) # no constant in x1, but a coefficient for the constant in b1
wx1 = make_wx(x1,w) # default first order no constant
[ ]:
print("Summary of Simulation Design Parameters")
print("grid size: ",gridx," x ",gridy)
print("n: ",n," k: ",k)
print("betas: ",b1)
print("random seed: ",rndseed)
print("replications: ",reps)
print("p-value: ",p_value)

Distribution Under the Null

The distribution under the null is obtained by simulating reps data sets under the null of standard normal errors. The reference distributions for Chi-squared are obtained by means of rng.chisquare with the appropriate degrees of freedom. For the LMWX and LMSDM tests, these depend on the number of variables in the X matrix (\(k\)).

The values for the test statistics are taken from the OLS regression object with the arguments spat_diag=True and moran=True.

All the results are collected into a pandas dataframe that is then used to compute the descriptive statistics by means of describe and to create the various plots.

[ ]:
t0 = time.time()

# all distributions under the null
alltests = ["N01","Chi2-1","Chi2-2","Chi2-k","Chi2-kr",
            "Moran","LM-Lag","LM-Error","LMWX",
            "LMSARER","LMSDM"]
# initialize
best = np.zeros((reps,len(alltests)))
rng=np.random.default_rng(seed=rndseed)
# reference distributions as random number draws
# standard normal
nn = make_error(rng,reps)
best[:,0] = nn[:,0]
# chi-squared
for j in range(2):
    df = j+1
    best[:,df] = rng.chisquare(df,reps)
for dff in [k,k+1]:  # d.f. for LMWX and LMSDM depends on k
    df = df + 1
    best[:,df] = rng.chisquare(dff,reps)

# replications
for i in range(reps):
    u = make_error(rng,n)

    y = dgp_ols(u,xb1)
    reg = OLS(y,x1,w=w,spat_diag=True,moran=True)
    testres = [reg.moran_res[1],reg.lm_lag[0],reg.lm_error[0],reg.lm_wx[0],
                reg.lm_sarma[0],reg.lm_spdurbin[0]]
    for jj in range(len(testres)):
        best[i,jj+5] = testres[jj]

t1 = time.time()
print("time in minutes: ",(t1-t0)/60.0)

results = pd.DataFrame(best,columns=alltests)
results.describe()

The most relevant characteristics are the mean and the standard deviation. For the draws from the theoretical distributions, the means are roughly what would be expected, i.e., around 0 for the standard normal, and around the degrees of freedom for the Chi-squared distributions. The standard deviation for the standard normal should be around 1, whereas for the Chi-squared distributions it should be roughly the square root of twice the degrees of freedom. For example, for 1 degree of freedom, it should be around 1.41 (here, 1.26), for two degrees of freedom, around 2 (here, 2.01), etc.

The mean for the standardized z-value for Moran’s I is -0.015 with a standard deviation of 0.986, close to the moments of a standard normal distribution. The mean of the LM statistics is roughly what would be expected.

A closer look at the distribution of the test statistics under the null is obtained in a graph.

Moran’s I

The plot functionality of a pandas dataframe is used to create a density plot for Moran’s I. It is contrasted with a density plot from simulated standard normal variates. The argument kind="kde" is used to obtain a density plot. The columns in the data frame are N01 and Moran.

[ ]:
results.plot(y=["N01","Moran"],kind="kde",
             title="Distribution under Null")
plt.show()

The result illustrates how Moran’s I standardized z-value closely tracks the standard normal distribution under the null.

LM-Lag and LM-Error

For LM-Lag and LM-Error, the reference distribution is a Chi-squared distribution with one degree of freedom. Since the kde interpolation results in values less than 0, which is impossible for Chi-squared, the graph is truncated at 0 by means of xlim=((0,None)). This still results in a slight bump at zero for one degree of freedom, where in the strict sense there should not be one. For the purposes here, this does not matter.

The columns in the data frame are Chi2-1, LM-Lag and LM-Error.

[ ]:
results.plot(y=["Chi2-1","LM-Lag","LM-Error"],kind="kde",
             title="Distribution under Null",xlim=((0,None)))
plt.show()

In contrast to the results for Moran’s I, the graphs closely follow the pattern for the \(\chi^2(1)\) distribution.

LM-SARERROR

In the same way, the distribution of LM-SARERROR is compared to a \(\chi^2(2)\) distribution by selecting the proper columns in the data frame, i.e., Chi2-2 and LMSARER.

[ ]:
results.plot(y=["Chi2-2","LMSARER"],kind="kde",
             title="Distribution under Null",xlim=((0,None)))
plt.show()

The two plots track closely.

LM-WX

The plot for LM-WX is again obtained by selecting the proper columns. The reference distribution is now \(\chi^2(k)\), where \(k\) is the number of explanatory variables, not counting the constant. In the example, this is 3. The corresponding columns in the data frame are Chi2-k and LMWX.

[ ]:
results.plot(y=["Chi2-k","LMWX"],kind="kde",
             title="Distribution under Null",xlim=((0,None)))
plt.show()

The mode obtained for the test statistic is somewhat smaller than its expected value, but otherwise the two curves track closely.

LM-Spatial Durbin

For the LM-Spatial Durbin test, the theoretical distribution has \(k+1\) degrees of freedom, which includes a degree for the spatial autoregressive parameter as well as the explanatory variables. The data frame columns are Chi2-kr and LMSDM.

[ ]:
results.plot(y=["Chi2-kr","LMSDM"],kind="kde",
             title="Distribution under Null",xlim=((0,None)))
plt.show()

The same general pattern is obtained as for LM-WX, i.e., the mode is somewhat lower than the expected theoretical value, but otherwise the two graphs track closely.

Power Functions - Lag Alternative

Power functions show the rejection percentage of a test statistic for a given p-value at different parameter values for the alternative hypothesis. For the Lag alternative, this is obtained by setting the value for \(\rho\) and generating a vector for \(y\) using dgp_lag. This dependent variable is then used in a standard OLS regression. For each replication, the p-values for the various spatial diagnostics are extracted from the regression object and compared to the critical value (p_value).

The comparison is turned into a 0-1 variable for each replication. Finally, the mean over all replications is the rejection frequency for each test for the given value of \(\rho\). The resulting array is turned into a pandas data frame to plot the corresponding power functions.

Depending on the hardware, this simulation can take a few minutes.

[12]:
pvals = ["Moran","LM-Lag","LM-Error","LMWX",
         "LMSARER","LMSDM","RLM-Lag","RLM-Error","RLMWX"]
[ ]:
t0 = time.time()
powvals = np.zeros((len(rhovals),len(pvals)+1))
powvals[:,0] = rhovals
for r in range(len(rhovals)):
    best = np.zeros((reps,len(pvals)))
    rng=np.random.default_rng(seed=rndseed)
    for i in range(reps):
        u = make_error(rng,n)
        y = dgp_lag(u,xb1,w,rho=rhovals[r])
        reg = OLS(y,x1,w,spat_diag=True,moran=True)
        testp = [reg.moran_res[2],reg.lm_lag[1],reg.lm_error[1],
                     reg.lm_wx[1],reg.lm_sarma[1],
                     reg.lm_spdurbin[1],reg.rlm_lag[1],
                     reg.rlm_error[1],reg.rlm_wx[1]]
        best[i,:] = testp

    bestp = (best < p_value) * 1    # significant
    mm = bestp.mean(axis=0)
    powvals[r,1:]= mm

t1 = time.time()
print("time in minutes: ",(t1-t0)/60.0)

powresult = pd.DataFrame(powvals,columns=["rho"]+pvals)

print("Test Power for different Rho")
print(powresult)

The results data frame shows how the rejection frequency for the various spatial diagnostics changes with the value of the spatial autoregressive parameter \(\rho\).

The relative shape of the associated power curves can be compared by means of the plot functionality of the pandas data frame. For the current purposes, only a rudimentary graph is shown. Fancier versions can be obtained by means of the full functionality of matplotlib.

Single alternative tests for Lag and Error

The first comparison is between the traditional LM-Lag and LM-Error tests, their robust forms and Moran’s I.

[ ]:
testnames = ["LM-Lag","RLM-Lag","Moran","LM-Error","RLM-Error"]
powresult.plot(x="rho",y=testnames,
               title=" Lag Alternative - Power Functions")
plt.show()

The power functions for LM-Lag and its robust form are the two left-most curves. They track each other very closely and achieve a 100% rejection rate for values of \(\rho\) as low as 0.2. Next most powerful are LM-Error and Moran’s I, which achieve almost identical power with 100% rejection for \(\rho = 0.3\). Finally, while the robust form of LM-Error provides an effective correction for small values of \(\rho\), it too achieves 100% rejection rate, but for \(\rho = 0.4\). This illustrates the difficulty of identifying the proper alternative when the spatial autoregressive parameter is large.

Lag tests and test against WX

A second comparison is between the LM-Lag test and its robust form and the LM test on WX and its robust form (robust to the presence of a spatial lag term).

[ ]:
testnames = ["LM-Lag","RLM-Lag","LMWX","RLMWX"]
powresult.plot(x="rho",y=testnames,
               title=" Lag Alternative - Power Functions")
plt.show()

The power of the LM-WX test tracks that for LM-Lag and Robust LM-Lag very closely, with only slightly less power for the smallest values of \(\rho\). However, just as the for Lag tests, it reaches 100% rejection for \(\rho = 0.2\). In contrast, the robust form of the LM-WX test has much less power and only reaches the 100% rejection rate for \(\rho = 0.45\). This illustrates the effectiveness of the correction for small values of \(\rho\), i.e., for local alternatives.

Lag tests and higher order tests

A final comparison is between the Lag tests and the higher order diagnostics, i.e., LM-SAR-Error and LM-SDM.

[ ]:
testnames = ["LM-Lag","RLM-Lag","LMSARER","LMSDM"]
powresult.plot(x="rho",y=testnames,
               title=" Lag Alternative - Power Functions")
plt.show()

The results clearly illustrate the unfortunate property of the higher order tests to have strong power against the single parameter Lag alternative. Both LM-SARER and LM-SDM reach 100% rejection for \(\rho = 0.2\) and they have only slightly less power than the Lag tests for smaller values of \(\rho\).

Power Functions - Error Alternative

The same approach can be taken to assess the relative power of the spatial diagnostics against an error alternative. The code is essentially the same as before, except that lamvals is used for the spatial parameter and dgp_sperror is used for the data generating process.

A more pythonic solution would be to put all these operations in a function and pass the dgp as a function object to that function, but the current structure of the dgp module makes that difficult to generalize.

The same spatial diagnostics as before are considered. The simulation can take quite a bit longer than for the Lag case.

[ ]:
pvals = ["Moran","LM-Lag","LM-Error","LMWX","LMSARER","LMSDM",
         "RLM-Lag","RLM-Error","RLMWX"]
t0 = time.time()
powvals = np.zeros((len(lamvals),len(pvals)+1))
powvals[:,0] = lamvals
for r in range(len(lamvals)):
    best = np.zeros((reps,len(pvals)))
    rng=np.random.default_rng(seed=rndseed)
    for i in range(reps):
        u = make_error(rng,n)
        y = dgp_sperror(u,xb1,w,lam=lamvals[r])
        reg = OLS(y,x1,w,spat_diag=True,moran=True)
        testp = [reg.moran_res[2],reg.lm_lag[1],reg.lm_error[1],
                     reg.lm_wx[1],reg.lm_sarma[1],
                     reg.lm_spdurbin[1],reg.rlm_lag[1],
                     reg.rlm_error[1],reg.rlm_wx[1]]
        best[i,:] = testp
    bestp = (best < p_value) * 1    # significant
    mm = bestp.mean(axis=0)
    powvals[r,1:]= mm

t1 = time.time()
print("time in minutes: ",(t1-t0)/60.0)

powresult = pd.DataFrame(powvals,columns=["lam"]+pvals)

print("Test Power for different Lambda")
print(powresult)

Single alternative tests for Lag and Error

As in the case of the Lag DGP, the first comparison is between the LM tests, their robust versions and Moran’s I.

[ ]:
testnames = ["LM-Lag","RLM-Lag","Moran",
             "LM-Error","RLM-Error"]
powresult.plot(x="lam",y=testnames,
               title=" Error Alternative - Power Functions")
plt.show()

The left-most power curves are for Moran’s I, LM-Error and Robust LM-Error, which a slight edge for Moran’s I for the smaller values of \(\lambda\). Unlike the patterns observed for the Lag alternative, the 100% rejection rate is not reached until \(\lambda = 0.45\) for Moran’s I and LM-Error, and \(\lambda = 0.5\) for Robust LM-Error, illustrating an overall lower power of these tests against the error alternative relative to the lag alternative. Also, LM-Lag has much less power against this alternative than LM-Error did against the Lag alternative. Its curve is well to the right of that for the specific error tests and only reaches the 100% rejection rate for \(\lambda = 0.85\). The Robust LM-Lag test has negligible power against the error alternative. Even for \(\lambda = 0.9\), its rejection rate is only 34%.

Error tests and test against WX

A second comparison is between the two LM tests against error and LM-WX and its robust form (robust against lag).

[ ]:
testnames = ["LM-Error","RLM-Error","LMWX",
             "RLMWX"]
powresult.plot(x="lam",y=testnames,
               title=" Error Alternative - Power Functions")
plt.show()

As is to be expected, the LM-Error and Robust LM-Error have the highest power, but surprisingly, the Robust LM-WX has much higher power than LM-WX itself. Its power curve is consistently above that of LM-WX which contradicts the theoretical requirement that the robust form of a test should be smaller than its original value.

The LM-WX test only reaches a 60.6% rejection rate for \(\lambda = 0.9\), whereas Robust LM-WX has a 100% rejection rate for \(\lambda = 0.5\), essentially the same as Robust LM-Error, although it remains below the two LM curves for all smaller values of \(\lambda\).

This phenomenon illustrates the two out of three problem associated with the robust LM-tests, namely that they are constructed to take into account only one of potentially two types of misspecification. In this case, the correction of LM-WX is for the presence of a spatial Lag term, but the actual DGP is a spatial error model, which is no longer a local alternative, hence the strange behavior of the robust test.

Error tests and higher order tests

The final comparison is between LM-Error and its Robust form and the LM-SARER and LM-SDM tests.

[ ]:
testnames = ["LM-Error","RLM-Error","LMSARER",
             "LMSDM"]
powresult.plot(x="lam",y=testnames,
               title=" Error Alternative - Power Functions")
plt.show()

As was the case for the Lag DGP, the LM-SARER test has strong power against the one-directional error alternative. It reaches 100% rejection rate for \(\lambda = 0.5\), the same as the robust LM-Error, although for the smaller values of \(\lambda\) its power curve is always slightly below those of the one-directional LM tests.

More disturbingly are the results for a two-directional test against the spatial Durbin DGP (i.e., both \(\rho\) and \(\gamma\) non-zero), which obtains almost the same power as the other tests against the error DGP. In other words, even in the total absence of a SDM model, the LM-SDM test will point to that alternative when the true DGP is an error model. At some level, this may be expected, since SDM is equivalent to an error specification under the spatial common factor coefficient constraints. However, in practice, this result may be confusing. One would expect that after estimating a SDM, the common factor test would point to an error specification, but this is not the most efficient approach.

Power Functions - SLX Alternative

A final analysis of the power of the various spatial diagnostics is when the alternative is an SLX model. The same approach is taken as for the other two cases, but this time the loop is over the values of \(\gamma\), the coefficients of the WX variables. For the sake of simplicity, these are taken to be the same for each spatially lagged explanatory variable.

The code is again essentially the same as before, except that now the argument wxg1 must be calculated for each value of \(\gamma\) and then passed to the dgp_slxfunction for the data generating process. The results are again collected in a data frame for visualization.

This simulation takes slightly less time than the previous cases.

[ ]:
pvals = ["Moran","LM-Lag","LM-Error","LMWX","LMSARER",
         "LMSDM","RLM-Lag","RLM-Error","RLMWX"]
t0 = time.time()
powvals = np.zeros((len(gamvals),len(pvals)+1))
powvals[:,0] = gamvals
for r in range(len(gamvals)):
    g = gamvals[r]
    gg = [g for i in b1[0:-1]]    # create list of gamma values of the correct length
    wxg1 = make_wxg(wx1,gg)
    best = np.zeros((reps,len(pvals)))
    rng=np.random.default_rng(seed=rndseed)
    for i in range(reps):
        u = make_error(rng,n)
        y = dgp_slx(u,xb1,wxg1)
        reg = OLS(y,x1,w,spat_diag=True,moran=True)
        testp = [reg.moran_res[2],reg.lm_lag[1],reg.lm_error[1],reg.lm_wx[1],reg.lm_sarma[1],
                     reg.lm_spdurbin[1],reg.rlm_lag[1],reg.rlm_error[1],reg.rlm_wx[1]]
        best[i,:] = testp
    bestp = (best < p_value) * 1    # significant
    mm = bestp.mean(axis=0)
    powvals[r,1:]= mm

t1 = time.time()
print("time in minutes: ",(t1-t0)/60.0)

powresult = pd.DataFrame(powvals,columns=["gam"]+pvals)

print("Test Power for different Gamma")
print(powresult)

SLX tests and Moran’s I

As a first comparison, the power curves of LM-WX and its Robust version are plotted together with the power curve of Moran’s I for this alternative.

[ ]:
testnames = ["LMWX","RLMWX","Moran"]
powresult.plot(x="gam",y=testnames,
               title=" SLX Alternative - Power Functions")
plt.show()

Overall, the LM-WX test has good power, achieving a 100% rejection rate for \(\gamma = 0.2\). Not unsurprisingly, Moran’s I has decent power against this alternative as well, illustrating its usefulness as a misspecification test. Its only starts to gain power around \(\gamma = 0.2\), but reaches 100% rejection rate for \(\gamma = 0.45\).

On the other hand, the shape of the power curve for Robust LM-WS is bizarre. The test has essentially no power at all after correcting for the (inappropriate) presence of a spatial autoregressive term. This again illustrates how the robustness correction is not appropriate when the ignored alternative is not local.

SLX tests and Error tests

A second comparison rates the power curve for LM-WX against that for the LM-Error test and its robust form.

[ ]:
testnames = ["LMWX","RLMWX","LM-Error","RLM-Error"]
powresult.plot(x="gam",y=testnames,
               title=" SLX Alternative - Power Functions")
plt.show()

Whereas the LM-Error test has power against the SLX alternative, similar to that of Moran’s I, its robust version does not. Since it corrects for an inappropriate DGP, its behavior is non-standard and it has essentially no power against this alternative. LM-Error has slightly less power than Moran’s I (in the previous graph), but also reaches the 100% rejection rate for \(\gamma = 0.45\).

SLX tests and Lag tests

Another interesting comparison is between LM-WX and the power curves for LM-Lag and its robust form. In contrast to the LM-Error DGP, which has no commonality with the SLX DGP, the latter can be considered as a truncated form of the Lag DGP. Hence one would expect the Lag tests to have some power against the SLX alternative.

[ ]:
testnames = ["LMWX","RLMWX","LM-Lag","RLM-Lag"]
powresult.plot(x="gam",y=testnames,
               title=" SLX Alternative - Power Functions")
plt.show()

The Lag tests indeed show strong power against the SLX alternative. In fact, for small values of \(\gamma\), their power slightly exceeds that of LM-WX. In this case as well, the robust form of LM-Lag corrects for the wrong alternative. As a result, it tends to have slightly higher power than LM-Lag itself, which is not the proper behavior. All three tests achieve 100% rejection rate for \(\gamma = 0.2\). The graph for RLMWX is the same as before.

SLX tests and higher error tests

A final comparison is between the LM-WX test and the two higher order tests.

[ ]:
testnames = ["LMWX","RLMWX","LMSARER","LMSDM"]
powresult.plot(x="gam",y=testnames,
               title=" SLX Alternative - Power Functions")
plt.show()

In this case as well, the power curves of the higher order tests closely track that for LM-WX, achieving 100% rejection rate for \(\gamma = 0.2\). For the smaller values of \(\gamma\), the power of LM-SARER is slightly higher and that of LM-SDM slightly lower than that of LM-WX.

This again highlights the caution that is needed when interpreting the results of the higher order tests. Both also have excellent power against one-directional alternatives, which can easily provide misleading guidance in a specification search.

Practice

The range of comparisons can easily be expanded using different spatial layouts and associated sample sizes as well as different DGP, such as a moving average error process, or various higher order DGP.