This page was generated from notebooks/4_spatial_weights.ipynb. Interactive online version: Binder badge

Spatial Weights

Luc Anselin

09/06/2024

Preliminaries

In this notebook, basic operations pertaining to spatial weights are reviewed. Two major cases are considered: reading weights files constructed by other software, such as GeoDa, and creating weights from GeoDataFrames or spatial layers using the functionality in libpysal.weights. In addition, some special operations are covered, such as creating spatial weights for regular grids and turning a PySAL weights object into a full matrix. The computation of a spatially lagged variable is illustrated as well.

A video recording is available from the GeoDa Center YouTube channel playlist Applied Spatial Regression - Notebooks, at https://www.youtube.com/watch?v=IbmTItot0q8&list=PLzREt6r1NenmhNy-FCUwiXL17Vyty5VL6&index=4.

Modules Needed

The main functionality is provided by the utilities in libpysal for spatial weights, and the functionality in geopandas for data input and output. All of these rely on numpy as a dependency.

To simplify notation, the libpysal.weights module is imported as weights, and get_path and open are imported from respectively libpysal.examples and libpysal.io.

The warnings module filters some warnings about future changes. To avoid some arguably obnoxious new features of numpy 2.0, it is necessary to include the set_printoptions command if you are using a Python 3.12 environment with numpy 2.0 or greater.

[2]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
from libpysal.examples import get_path
from libpysal.io import open
import libpysal.weights as weights
np.set_printoptions(legacy="1.25")

Functions Used

  • from numpy:

    • array

    • mean

    • std

    • flatten

    • @

  • from geopandas:

    • read_file

    • astype

  • from libpysal.examples:

    • get_path

  • from libpysal.io:

    • open

  • from libpysal.weights:

    • neighbors

    • weights

    • n

    • min_neighbors, max_neighbors, mean_neighbors

    • pct_nonzero

    • asymmetry, asymmetries

    • Kernel.from_file

    • Queen.from_dataframe

    • transform

    • Queen.from_file

    • KNN.from_dataframe

    • symmetrize

    • Kernel

    • Kernel.from_shapefile

    • lat2W

    • full

    • lag_spatial

Files and Variables

This notebook uses data on socio-economic correlates of health outcomes contained in the chicagoSDOH sample shape files and associated spatial weights. It is assumed that all sample files have been installed.

  • Chi-SDOH.shp,shx,dbf,prj: socio-economic indicators of health for 2014 in 791 Chicago tracts

  • Chi-SDOH_q.gal: queen contiguity spatial weights from GeoDa

  • Chi-SDOH_k6s.gal: k-nearest neighbor weights for k=6, made symmetric in GeoDa

  • Chi-SDOH_k10tri.kwt: triangular kernel weights based on a variable bandwidth with 10 nearest neighbors from GeoDa

As before, file names and variable names are specified at the top of the notebook so that this is the only part that needs to be changed for other data sets and variables.

[9]:
infileshp = "Chi-SDOH.shp"         # input shape file
infileq = "Chi-SDOH_q.gal"         # queen contiguity from GeoDa
infileknn = "Chi-SDOH_k6s.gal"     # symmetric k-nearest neighbor weights from GeoDa
infilekwt = "Chi-SDOH_k10tri.kwt"  # triangular kernel weights for a variable knn bandwidth from GeoDa
outfileq = "test_q.gal"            # output file for queen weights computed with libpysal
outfilek = "test_k.kwt"            # outpuf file for kernel weights computed with libpysal
y_name = ["YPLL_rate"]             # variable to compute spatial lag

Spatial Weights from a File (GeoDa)

Spatial weights are an essential part of any spatial autocorrelation analysis and spatial regression. Functionality to create and analyze spatial weights is contained in the libpysal.weights library. The full range of functions is much beyond the current scope and can be found at https://pysal.org/libpysal/api.html.

Only the essentials are covered here, sufficient to proceed with the spatial regression analysis. Also, only the original Weights class is considered. A newer alternative is provided by the Graph class, but it is not further discussed here. Full details can be found at https://pysal.org/libpysal/user-guide/graph/w_g_migration.html.

Arguably the easiest way to create spatial weights is to use the GeoDa software (https://geodacenter.github.io/download.html), which provides functionality to construct a wide range of contiguity as well as distance based weights through a graphical user interface. The weights information is stored as gal, gwt or kwt files. Importing these weights into PySAL is considered first.

Queen Contiguity Weights

Contiguity weights can be read into PySAL spatial weights objects using the read function, after opening the file with libpysal.io.open (here, just open). This is applied to the queen contiguity weights created by GeoDa, contained in the file infileq, after obtaining its path using get_path.

[ ]:
inpath = get_path(infileq)
wq = open(inpath).read()
wq

The result is a PySAL spatial weights object of the class libpysal.weights.weights.W. This object contains lists of neighbors and weights as well as many other attributes and methods.

It is useful to remember that the neighbors and weights are dictionaries that use an ID variable or simple sequence number as the key. A quick view of the relevant keys is obtained by converting them to a list and printing out the first few elements.

[ ]:
print(list(wq.neighbors.keys())[0:5])

This reveals that the keys are simple strings, starting at ‘1’ and not at 0 as in the usual Python indexing. The IDs of the neighbors for a given observation can be listed by specifying the key. For example, for observation with ID=’1’, this yields:

[ ]:
wq.neighbors['1']

When an inappropriate key is used, an error is generated (recall that dictionaries have no order, so there are no sequence numbers). For example, here 1 is entered as an integer, but it should have been a string, as above.

[ ]:
wq.neighbors[1]

The weights associated with each observation key are found using weights. For example, for observation with ID=’1’ this yields:

[ ]:
wq.weights['1']

At this point, all the weights are simply binary. Row-standardization is considered below.

A quick check on the number of observations, i.e., the number of rows in the weights matrix.

[ ]:
wq.n

Minimum, maximum and average number of neighbors and percent non-zero (an indication of sparsity)

[ ]:
wq.min_neighbors,wq.max_neighbors,wq.mean_neighbors,wq.pct_nonzero

There is no explicit check for symmetry as such, but instead the lack of symmetry can be assessed by means of the asymmetry method, or the list of id pairs with asymmetric weights is obtained by means of the asymmetries attribute.

[ ]:
print(wq.asymmetry())
print(wq.asymmetries)

Since contiguity weights are symmetric by construction, the presence of an asymmetry would indicate some kind of error. This is not the case here.

K-Nearest Neighbors Weights

Similarly, the symmetric knn weights (k=6) created by GeoDa can be read from the file infileknn:

[ ]:
inpath = get_path(infileknn)
wk6s = open(inpath).read()
wk6s

Some characteristics:

[ ]:
wk6s.n
[ ]:
print(wk6s.min_neighbors,wk6s.max_neighbors,wk6s.mean_neighbors,wk6s.pct_nonzero)

Note how the operation to make the initially asymmetric k-nearest neighbor weights symmetric has resulted in many observations having more than 6 neighbors (max_neighbors is larger than 6). That is the price to pay to end up with symmetric weights, which is required for some of the estimation methods. We can list neighbors and weights in the usual way. As it turns out, the observation with key 1 is not adjusted, but observation with key 3 now has eight neighbors (up from the original six).

[ ]:
wk6s.neighbors['1']
[ ]:
wk6s.neighbors['3']

Kernel Weights

Triangular kernel weights based on a variable bandwidth with 10 nearest neighbors created by GeoDa are contained in the file infilekwt. The properties of kernel weights are considered in more detail in a later notebook.

The weights can be read in the usual fashion, by means of libpysal.io.open:

[24]:
inpath = get_path(infilekwt)
kwtri = open(inpath).read()

However, this does not give the desired result. The object is not recognized as kernel weights, but as a standard spatial weights object, revealed by checking the type.

[ ]:
print(type(kwtri))

Tthe kernel weights can be checked with the usual weights attribute. However, the values for the keys in this example are not characters, but simple integers. This is revealed by a quick check of the keys.

[ ]:
print(list(kwtri.neighbors.keys())[0:5])

Now, with the integer 1 as the key, the contents of the weights can be listed. Note the presence of the weights 1.0 (for the diagonal). All is fine, except that PySAL does not recognize the weights as kernel weights.

[ ]:
print(kwtri.weights[1])

The alternative, using the weights.Kernel.from_file method from libpysal has the same problem.

[ ]:
kwtri10f = weights.Kernel.from_file(inpath)
print(type(kwtri10f))
print(kwtri10f.weights[1])

In this particular case, a hack is to force the class of the weights object to be a kernel weight. This is generally not recommended, but since the object in question has all the characteristics of kernel weights, it is safe to do so.

It is accomplished by setting the attribute __class__ of the weights object to libpysal.weights.distance.Kernel.

[ ]:
kwtri10f.__class__ = weights.distance.Kernel
print(type(kwtri10f))

Creating Weights from a GeoDataFrame

Queen Contiguity Weights

In PySAL, the spatial weights construction is handled by libpysal.weights. The generic function is weights.<weights_type>.from_dataframe with as arguments the geodataframe and optionally the ids (recommended). For the Chicago data, the ID variable is OJECTID. To make sure the latter is an integer (it is not in the original data frame), its type is changed by means of the astype method.

The same operation can also create a contiguity weights file from a shape file, using weigths.<weights_type>.from_shapefile, but this is left as an exercise.

[ ]:
inpath = get_path(infileshp)
dfs = gpd.read_file(inpath)
dfs = dfs.astype({'OBJECTID':'int'})
wq1 = weights.Queen.from_dataframe(dfs,ids='OBJECTID')
wq1

A quick check on the keys reveals these are integers.

[ ]:
print(list(wq1.neighbors.keys())[0:5])

Again, some characteristics:

[ ]:
wq1.n
[ ]:
print(wq1.min_neighbors,wq1.max_neighbors,wq1.mean_neighbors,wq1.pct_nonzero)

The structure of the weights is identical to that from the file read from GeoDa. For example, the first set of neighbors and weights are:

[ ]:
print(wq1.neighbors[1])
print(wq1.weights[1])

Row-standardization

As created, the weights are simply 1.0 for binary weights. To turn the weights into row-standardized form, a transformation is needed, wq1.transform = 'r':

[ ]:
wq1.transform = 'r'
wq1.weights[1]

Writing a Weights File

To write out the weights object to a GAL file, libpysal.io.open is used with the write method. The argument to the open command is the filename and mode='w' (for writing a file). The weights object itself is the argument to the write method.

Note that even though the weights are row-standardized, this information is lost in the output file.

[37]:
open(outfileq,mode='w').write(wq1)

A quick check using the weights.Queen.from_file operation on the just created weights file.

[ ]:
wq1a = weights.Queen.from_file(outfileq)
print(wq1a.n)
print(list(wq1a.neighbors.keys())[0:5])

Note how the type of the key has changed from integer above to character after reading from the outside file. This again stresses the importance of checking the keys before any further operations.

The weights are back to their original binary form, so the row-standardization is lost after writing the output file.

[ ]:
wq1a.weights['1']

KNN Weights

The corresponding functionality for k-nearest neighbor weights is weights.KNN.from_dataframe. An important argument is k, the number of neighbors, with the default set to 2, which is typically not that useful. Again, it is useful to include OBJECTID as the ID variable. Initially the weights are in binary form. As before, they are row-standardized.

[ ]:
wk6 = weights.KNN.from_dataframe(dfs,k=6,ids='OBJECTID')
print(wk6.n)
print(list(wk6.neighbors.keys())[0:5])
wk6

To compare the just created weights to the symmetric form read into wk6s, the list of neighbors for observation 3 is informative. It consists of a subset of six from the list of eight from the above symmetric knn weights.

[ ]:
print(wk6.neighbors[3])

The k-nearest neighbor weights are intrinsically asymmetric. Rather than listing all the pairs that contain such asymmetries, the length of this list can be checked using the asymmetry method.

[ ]:
print(len(wk6.asymmetry()))

KNN weights have a built-in method to make them symmetric: symmetrize

[ ]:
wk6s2 = wk6.symmetrize()
print(len(wk6.asymmetry()))
print(len(wk6s2.asymmetry()))

The entries are now the same as for the symmetric knn GAL file that was read in from GeoDa. For example, the neighbors of observation with key 3 are:

[ ]:
print(wk6s2.neighbors[3])

Finally, to make them row-standardized, the same transformation is used.

[ ]:
wk6s2.transform = 'r'
wk6s2.weights[3]

Kernel Weights

There are several ways to create the kernel weights that are used later in the course, for example to compute HAC standard errors in ordinary least squares regression. One is to create the weights in GeoDa and save them as a weights file with a kwt extension. However, currently, there is a bug in libpysal so that the proper class needs to be set explicitly.

The alternative is to compute the weights directly with PySAL. This can be implemented in a number of ways. One is to create the weights using the libpysal.weights.Kernel function, with a matrix of x-y coordinates passed. Another is to compute the weights directly from the information in a shape file, using libpysal.weights.Kernel.from_shapefile.

Each is considered in turn.

Kernel Weights Computation

Direct computation of kernel weights takes as input an array of coordinates. Typically these are the coordinates of the locations, but it is a perfectly general approach and can take any number of variables to compute general distances (or economic distances). In the example, the X and Y coordinates contained in the geodataframe dfs are used as COORD_X and COORD_Y.

First, the respective columns from the data frame are turned into a numpy array.

The command to create the kernel weights is libpysal.weights.Kernel. It takes the array as the first argument, followed by a number of options. To have a variable bandwidth that follows the 10 nearest neighbors, fixed = False (the default is a fixed bandwidth) and k=10. The kernel function is selected as function="triangular" (this is also the default, but it is included here for clarity). Finally, the use of kernel weights in the HAC calculations requires the diagonals to be set to the value of one, achieved by means of diagonal=True.

[ ]:
coords = np.array(dfs[['COORD_X','COORD_Y']])
kwtri10 = weights.Kernel(coords,fixed=False,k=10,
                                   function="triangular",diagonal=True)
print(type(kwtri10))

The result is an object of class libpysal.weights.distance.Kernel. This contains several attributes, such as the kernel function used.

[ ]:
kwtri10.function

A check on the keys.

[ ]:
print(list(kwtri10.neighbors.keys())[0:5])

Note that the index starts at 0 and the keys are integers. The neighbors for the first observation:

[ ]:
kwtri10.neighbors[0]

The kernel weights for the first observations:

[ ]:
kwtri10.weights[0]

These are the same values as we obtained above from reading the kwt file, but now they are recognized as a proper kernel weights object.

Kernel Weights from a Shape File

Contiguity weights, distance weights and kernel weights can also be constructed directly from a shape file, using the relevant from_shapefile methods. For kernel weights, this can be based on either point coordinates or on the coordinates of polygon centroids to compute the distances needed. The relevant function is libpysal.weights.Kernel.from_shapefile with as its main argument the file (path) name of the shape file involved. The other arguments are the same options as before. The shape file in infileshp is used as the input file.

[ ]:
inpath = get_path(infileshp)
kwtri10s = weights.Kernel.from_shapefile(inpath,
                                                 fixed=False,k=10,
                                   function="triangular",diagonal=True)
print(type(kwtri10s))

The result is of the proper type, contains the same structure as before, with matching function, neighbors and weights.

[ ]:
print(kwtri10s.function)
print(list(kwtri10s.neighbors.keys())[0:5])
print(kwtri10s.neighbors[0])
print(kwtri10s.weights[0])

Writing the Kernel Weights

We use the same method as for the queen weights to write the just constructed kernel weights to an outside kwt file. The output file is outfilek.

[55]:
open(outfilek,mode='w').write(kwtri10s)

Quick check:

[ ]:
kk = weights.Kernel.from_file(outfilek)
print(type(kk))

So, the same problem as mentioned above persists for weights files written by PySAL and the proper class needs to be set explicitly.

[ ]:
kk.__class__ = weights.distance.Kernel
print(type(kk))

Special Weights Operations

A few special weights operations will come in handy later on. One is to create spatial weights for a regular grid setup, which is very useful for simulation designs. The other is to turn a spatial weights object into a standard numpy array, which can be be used in all kinds of matrix operations.

Weights for Regular Grids

The weights.lat2W operation creates rook contiguity spatial weights (the default, queen contiguity is available for rook = False) for a regular rectangular grid with the number of rows and the number of columns as the arguments. The result is a simple binary weights object, so row-standardization is typically needed as well.

For a square grid, with gridside=20 as the number of rows/columns, the result has dimension 400.

[ ]:
gridside = 20
wgrid = weights.lat2W(gridside,gridside,rook=True)
wgrid.n

Quick check on the neighbor keys.

[ ]:
print(list(wgrid.neighbors.keys())[0:5])

Since this is a square grid, the first observation, in the upper left corner, has only two neighbors, one to the right (1) and one below (20 - since the first row goes from 0 to 19).

[ ]:
wgrid.neighbors[0]

Row-standardization yields the actual weights.

[ ]:
wgrid.transform = 'r'
wgrid.weights[0]

Any non-border cell has four neighbors, one to the left, right, up and down.

[ ]:
wgrid.weights[21]

Weights as Matrices

The weights.full operation turns a spatial weights object into a standard numpy array. The function returns a tuple, of which the first element is the actual matrix and the second consists of a list of keys. For actual matrix operations, the latter is not that useful.

It is important to remember to always extract the first element of the tuple as the matrix of interest. Otherwise, one quickly runs into trouble with array operations.

This is illustrated for the row-standardized queen weights wq1 created earlier.

[ ]:
wq1full, wqfkeys = weights.full(wq1)
print(type(wq1full),type(wqfkeys))
wq1full.shape

Spatially Lagged Variables

Spatially lagged variables are essential in the specification of spatial regression models. They are the product of a spatial weight matrix with a vector of observations and yield new values as (weighted) averages of the values observed at neighboring locations (with the neighbors defined by the spatial weights).

This is illustrated for the variable y_name extracted from the data frame. Its mean and standard deviation are listed using the standard numpy methods.

[ ]:
y = np.array(dfs[y_name])
print(y.shape)
print(y.mean())
print(y.std())

The new spatially lagged variable is created with the weights.lag_spatial command, passing the weights object wq1 and the vector of interest, y. Its important to make sure that the dimensions match. In particular, if the vector in question is not an actual column vector, but a one-dimensional array, the result will not be a vector, but an array. This may cause trouble in some applicaitons.

[ ]:
wy = weights.lag_spatial(wq1,y)
print(wy.shape)
print(wy.mean())
print(wy.std())

The result is a column vector. The mean roughly corresponds to that of the original variable, but the spatially lagged variable has a smaller standard deviation. This illustrates the smoothing implied by the spatial lag operation.

To illustrate the problem with numpy arrays rather than vectors, the original vector is flattened and then the lag_spatial operation is applied to it. Everything works fine, except that the result is an array, and not a column vector.

[ ]:
yy = y.flatten()
print(yy.shape)
wyy = weights.lag_spatial(wq1,yy)
print(wyy.shape)

The same result can also be obtained using an explicit matrix-vector multiplication with the full matrix wq1full just created.

[ ]:
wy1 = wq1full @ y
print(wy1.shape)
print(wy1.mean())
print(wy1.std())

Practice

Experiment with various spatial weights for your own data set or for one of the PySAL sample data sets. Create a spatially lagged variable for each of the weights and compare their properties, such as the mean, standard deviation, correlation between the original variable and the spatial lag, etc.