This page was generated from notebooks/4_spatial_weights.ipynb. Interactive online version:
Spatial Weights¶
Luc Anselin¶
09/06/2024¶
Preliminaries¶
In this notebook, basic operations pertaining to spatial weights are reviewed. Two major cases are considered: reading weights files constructed by other software, such as GeoDa, and creating weights from GeoDataFrames or spatial layers using the functionality in libpysal.weights. In addition, some special operations are covered, such as creating spatial weights for regular grids and turning a PySAL weights object into a full matrix. The computation of a spatially lagged variable is illustrated as well.
A video recording is available from the GeoDa Center YouTube channel playlist Applied Spatial Regression - Notebooks, at https://www.youtube.com/watch?v=IbmTItot0q8&list=PLzREt6r1NenmhNy-FCUwiXL17Vyty5VL6&index=4.
Modules Needed¶
The main functionality is provided by the utilities in libpysal for spatial weights, and the functionality in geopandas for data input and output. All of these rely on numpy as a dependency.
To simplify notation, the libpysal.weights
module is imported as weights
, and get_path
and open
are imported from respectively libpysal.examples
and libpysal.io
.
The warnings
module filters some warnings about future changes. To avoid some arguably obnoxious new features of numpy 2.0, it is necessary to include the set_printoptions
command if you are using a Python 3.12 environment with numpy 2.0 or greater.
[2]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
from libpysal.examples import get_path
from libpysal.io import open
import libpysal.weights as weights
np.set_printoptions(legacy="1.25")
Functions Used¶
from numpy:
array
mean
std
flatten
@
from geopandas:
read_file
astype
from libpysal.examples:
get_path
from libpysal.io:
open
from libpysal.weights:
neighbors
weights
n
min_neighbors, max_neighbors, mean_neighbors
pct_nonzero
asymmetry, asymmetries
Kernel.from_file
Queen.from_dataframe
transform
Queen.from_file
KNN.from_dataframe
symmetrize
Kernel
Kernel.from_shapefile
lat2W
full
lag_spatial
Files and Variables¶
This notebook uses data on socio-economic correlates of health outcomes contained in the chicagoSDOH sample shape files and associated spatial weights. It is assumed that all sample files have been installed.
Chi-SDOH.shp,shx,dbf,prj: socio-economic indicators of health for 2014 in 791 Chicago tracts
Chi-SDOH_q.gal: queen contiguity spatial weights from
GeoDa
Chi-SDOH_k6s.gal: k-nearest neighbor weights for k=6, made symmetric in
GeoDa
Chi-SDOH_k10tri.kwt: triangular kernel weights based on a variable bandwidth with 10 nearest neighbors from
GeoDa
As before, file names and variable names are specified at the top of the notebook so that this is the only part that needs to be changed for other data sets and variables.
[9]:
infileshp = "Chi-SDOH.shp" # input shape file
infileq = "Chi-SDOH_q.gal" # queen contiguity from GeoDa
infileknn = "Chi-SDOH_k6s.gal" # symmetric k-nearest neighbor weights from GeoDa
infilekwt = "Chi-SDOH_k10tri.kwt" # triangular kernel weights for a variable knn bandwidth from GeoDa
outfileq = "test_q.gal" # output file for queen weights computed with libpysal
outfilek = "test_k.kwt" # outpuf file for kernel weights computed with libpysal
y_name = ["YPLL_rate"] # variable to compute spatial lag
Spatial Weights from a File (GeoDa)¶
Spatial weights are an essential part of any spatial autocorrelation analysis and spatial regression. Functionality to create and analyze spatial weights is contained in the libpysal.weights
library. The full range of functions is much beyond the current scope and can be found at https://pysal.org/libpysal/api.html.
Only the essentials are covered here, sufficient to proceed with the spatial regression analysis. Also, only the original Weights
class is considered. A newer alternative is provided by the Graph
class, but it is not further discussed here. Full details can be found at https://pysal.org/libpysal/user-guide/graph/w_g_migration.html.
Arguably the easiest way to create spatial weights is to use the GeoDa software (https://geodacenter.github.io/download.html), which provides functionality to construct a wide range of contiguity as well as distance based weights through a graphical user interface. The weights information is stored as gal, gwt or kwt files. Importing these weights into PySAL is considered first.
Queen Contiguity Weights¶
Contiguity weights can be read into PySAL spatial weights objects using the read
function, after opening the file with libpysal.io.open
(here, just open
). This is applied to the queen contiguity weights created by GeoDa
, contained in the file infileq, after obtaining its path using get_path
.
[ ]:
inpath = get_path(infileq)
wq = open(inpath).read()
wq
The result is a PySAL spatial weights object of the class libpysal.weights.weights.W
. This object contains lists of neighbors
and weights
as well as many other attributes and methods.
It is useful to remember that the neighbors
and weights
are dictionaries that use an ID variable or simple sequence number as the key. A quick view of the relevant keys is obtained by converting them to a list
and printing out the first few elements.
[ ]:
print(list(wq.neighbors.keys())[0:5])
This reveals that the keys are simple strings, starting at ‘1’ and not at 0 as in the usual Python indexing. The IDs of the neighbors for a given observation can be listed by specifying the key. For example, for observation with ID=’1’, this yields:
[ ]:
wq.neighbors['1']
When an inappropriate key is used, an error is generated (recall that dictionaries have no order, so there are no sequence numbers). For example, here 1
is entered as an integer, but it should have been a string, as above.
[ ]:
wq.neighbors[1]
The weights associated with each observation key are found using weights
. For example, for observation with ID=’1’ this yields:
[ ]:
wq.weights['1']
At this point, all the weights are simply binary. Row-standardization is considered below.
A quick check on the number of observations, i.e., the number of rows in the weights matrix.
[ ]:
wq.n
Minimum, maximum and average number of neighbors and percent non-zero (an indication of sparsity)
[ ]:
wq.min_neighbors,wq.max_neighbors,wq.mean_neighbors,wq.pct_nonzero
There is no explicit check for symmetry as such, but instead the lack of symmetry can be assessed by means of the asymmetry
method, or the list of id pairs with asymmetric weights is obtained by means of the asymmetries
attribute.
[ ]:
print(wq.asymmetry())
print(wq.asymmetries)
Since contiguity weights are symmetric by construction, the presence of an asymmetry would indicate some kind of error. This is not the case here.
K-Nearest Neighbors Weights¶
Similarly, the symmetric knn weights (k=6) created by GeoDa
can be read from the file infileknn:
[ ]:
inpath = get_path(infileknn)
wk6s = open(inpath).read()
wk6s
Some characteristics:
[ ]:
wk6s.n
[ ]:
print(wk6s.min_neighbors,wk6s.max_neighbors,wk6s.mean_neighbors,wk6s.pct_nonzero)
Note how the operation to make the initially asymmetric k-nearest neighbor weights symmetric has resulted in many observations having more than 6 neighbors (max_neighbors
is larger than 6). That is the price to pay to end up with symmetric weights, which is required for some of the estimation methods. We can list neighbors and weights in the usual way. As it turns out, the observation with key 1
is not adjusted, but observation with key 3
now has eight neighbors (up from the original
six).
[ ]:
wk6s.neighbors['1']
[ ]:
wk6s.neighbors['3']
Kernel Weights¶
Triangular kernel weights based on a variable bandwidth with 10 nearest neighbors created by GeoDa
are contained in the file infilekwt. The properties of kernel weights are considered in more detail in a later notebook.
The weights can be read in the usual fashion, by means of libpysal.io.open
:
[24]:
inpath = get_path(infilekwt)
kwtri = open(inpath).read()
However, this does not give the desired result. The object is not recognized as kernel weights, but as a standard spatial weights object, revealed by checking the type
.
[ ]:
print(type(kwtri))
Tthe kernel weights can be checked with the usual weights
attribute. However, the values for the keys in this example are not characters, but simple integers. This is revealed by a quick check of the keys.
[ ]:
print(list(kwtri.neighbors.keys())[0:5])
Now, with the integer 1 as the key, the contents of the weights can be listed. Note the presence of the weights 1.0 (for the diagonal). All is fine, except that PySAL does not recognize the weights as kernel weights.
[ ]:
print(kwtri.weights[1])
The alternative, using the weights.Kernel.from_file
method from libpysal
has the same problem.
[ ]:
kwtri10f = weights.Kernel.from_file(inpath)
print(type(kwtri10f))
print(kwtri10f.weights[1])
In this particular case, a hack is to force the class of the weights object to be a kernel weight. This is generally not recommended, but since the object in question has all the characteristics of kernel weights, it is safe to do so.
It is accomplished by setting the attribute __class__
of the weights object to libpysal.weights.distance.Kernel
.
[ ]:
kwtri10f.__class__ = weights.distance.Kernel
print(type(kwtri10f))
Creating Weights from a GeoDataFrame¶
Queen Contiguity Weights¶
In PySAL, the spatial weights construction is handled by libpysal.weights
. The generic function is weights.<weights_type>.from_dataframe
with as arguments the geodataframe and optionally the ids
(recommended). For the Chicago data, the ID variable is OJECTID. To make sure the latter is an integer (it is not in the original data frame), its type is changed by means of the astype
method.
The same operation can also create a contiguity weights file from a shape file, using weigths.<weights_type>.from_shapefile
, but this is left as an exercise.
[ ]:
inpath = get_path(infileshp)
dfs = gpd.read_file(inpath)
dfs = dfs.astype({'OBJECTID':'int'})
wq1 = weights.Queen.from_dataframe(dfs,ids='OBJECTID')
wq1
A quick check on the keys reveals these are integers.
[ ]:
print(list(wq1.neighbors.keys())[0:5])
Again, some characteristics:
[ ]:
wq1.n
[ ]:
print(wq1.min_neighbors,wq1.max_neighbors,wq1.mean_neighbors,wq1.pct_nonzero)
The structure of the weights is identical to that from the file read from GeoDa
. For example, the first set of neighbors and weights are:
[ ]:
print(wq1.neighbors[1])
print(wq1.weights[1])
Row-standardization¶
As created, the weights are simply 1.0 for binary weights. To turn the weights into row-standardized form, a transformation is needed, wq1.transform = 'r'
:
[ ]:
wq1.transform = 'r'
wq1.weights[1]
Writing a Weights File¶
To write out the weights object to a GAL file, libpysal.io.open
is used with the write
method. The argument to the open
command is the filename and mode='w'
(for writing a file). The weights object itself is the argument to the write
method.
Note that even though the weights are row-standardized, this information is lost in the output file.
[37]:
open(outfileq,mode='w').write(wq1)
A quick check using the weights.Queen.from_file
operation on the just created weights file.
[ ]:
wq1a = weights.Queen.from_file(outfileq)
print(wq1a.n)
print(list(wq1a.neighbors.keys())[0:5])
Note how the type of the key has changed from integer above to character after reading from the outside file. This again stresses the importance of checking the keys before any further operations.
The weights are back to their original binary form, so the row-standardization is lost after writing the output file.
[ ]:
wq1a.weights['1']
KNN Weights¶
The corresponding functionality for k-nearest neighbor weights is weights.KNN.from_dataframe
. An important argument is k
, the number of neighbors, with the default set to 2
, which is typically not that useful. Again, it is useful to include OBJECTID as the ID variable. Initially the weights are in binary form. As before, they are row-standardized.
[ ]:
wk6 = weights.KNN.from_dataframe(dfs,k=6,ids='OBJECTID')
print(wk6.n)
print(list(wk6.neighbors.keys())[0:5])
wk6
To compare the just created weights to the symmetric form read into wk6s, the list of neighbors for observation 3 is informative. It consists of a subset of six from the list of eight from the above symmetric knn weights.
[ ]:
print(wk6.neighbors[3])
The k-nearest neighbor weights are intrinsically asymmetric. Rather than listing all the pairs that contain such asymmetries, the length of this list can be checked using the asymmetry
method.
[ ]:
print(len(wk6.asymmetry()))
KNN weights have a built-in method to make them symmetric: symmetrize
[ ]:
wk6s2 = wk6.symmetrize()
print(len(wk6.asymmetry()))
print(len(wk6s2.asymmetry()))
The entries are now the same as for the symmetric knn GAL file that was read in from GeoDa
. For example, the neighbors of observation with key 3
are:
[ ]:
print(wk6s2.neighbors[3])
Finally, to make them row-standardized, the same transformation is used.
[ ]:
wk6s2.transform = 'r'
wk6s2.weights[3]
Kernel Weights¶
There are several ways to create the kernel weights that are used later in the course, for example to compute HAC standard errors in ordinary least squares regression. One is to create the weights in GeoDa
and save them as a weights file with a kwt extension. However, currently, there is a bug in libpysal so that the proper class needs to be set explicitly.
The alternative is to compute the weights directly with PySAL
. This can be implemented in a number of ways. One is to create the weights using the libpysal.weights.Kernel
function, with a matrix of x-y coordinates passed. Another is to compute the weights directly from the information in a shape file, using libpysal.weights.Kernel.from_shapefile
.
Each is considered in turn.
Kernel Weights Computation¶
Direct computation of kernel weights takes as input an array of coordinates. Typically these are the coordinates of the locations, but it is a perfectly general approach and can take any number of variables to compute general distances (or economic distances). In the example, the X and Y coordinates contained in the geodataframe dfs are used as COORD_X
and COORD_Y
.
First, the respective columns from the data frame are turned into a numpy array.
The command to create the kernel weights is libpysal.weights.Kernel
. It takes the array as the first argument, followed by a number of options. To have a variable bandwidth that follows the 10 nearest neighbors, fixed = False
(the default is a fixed bandwidth) and k=10
. The kernel function is selected as function="triangular"
(this is also the default, but it is included here for clarity). Finally, the use of kernel weights in the HAC calculations requires the diagonals to be set
to the value of one, achieved by means of diagonal=True
.
[ ]:
coords = np.array(dfs[['COORD_X','COORD_Y']])
kwtri10 = weights.Kernel(coords,fixed=False,k=10,
function="triangular",diagonal=True)
print(type(kwtri10))
The result is an object of class libpysal.weights.distance.Kernel
. This contains several attributes, such as the kernel function used.
[ ]:
kwtri10.function
A check on the keys.
[ ]:
print(list(kwtri10.neighbors.keys())[0:5])
Note that the index starts at 0 and the keys are integers. The neighbors for the first observation:
[ ]:
kwtri10.neighbors[0]
The kernel weights for the first observations:
[ ]:
kwtri10.weights[0]
These are the same values as we obtained above from reading the kwt file, but now they are recognized as a proper kernel weights object.
Kernel Weights from a Shape File¶
Contiguity weights, distance weights and kernel weights can also be constructed directly from a shape file, using the relevant from_shapefile
methods. For kernel weights, this can be based on either point coordinates or on the coordinates of polygon centroids to compute the distances needed. The relevant function is libpysal.weights.Kernel.from_shapefile
with as its main argument the file (path) name of the shape file involved. The other arguments are the same options as before. The
shape file in infileshp is used as the input file.
[ ]:
inpath = get_path(infileshp)
kwtri10s = weights.Kernel.from_shapefile(inpath,
fixed=False,k=10,
function="triangular",diagonal=True)
print(type(kwtri10s))
The result is of the proper type, contains the same structure as before, with matching function, neighbors and weights.
[ ]:
print(kwtri10s.function)
print(list(kwtri10s.neighbors.keys())[0:5])
print(kwtri10s.neighbors[0])
print(kwtri10s.weights[0])
Writing the Kernel Weights¶
We use the same method as for the queen weights to write the just constructed kernel weights to an outside kwt file. The output file is outfilek
.
[55]:
open(outfilek,mode='w').write(kwtri10s)
Quick check:
[ ]:
kk = weights.Kernel.from_file(outfilek)
print(type(kk))
So, the same problem as mentioned above persists for weights files written by PySAL and the proper class needs to be set explicitly.
[ ]:
kk.__class__ = weights.distance.Kernel
print(type(kk))
Special Weights Operations¶
A few special weights operations will come in handy later on. One is to create spatial weights for a regular grid setup, which is very useful for simulation designs. The other is to turn a spatial weights object into a standard numpy array, which can be be used in all kinds of matrix operations.
Weights for Regular Grids¶
The weights.lat2W
operation creates rook contiguity spatial weights (the default, queen contiguity is available for rook = False
) for a regular rectangular grid with the number of rows and the number of columns as the arguments. The result is a simple binary weights object, so row-standardization is typically needed as well.
For a square grid, with gridside=20 as the number of rows/columns, the result has dimension 400.
[ ]:
gridside = 20
wgrid = weights.lat2W(gridside,gridside,rook=True)
wgrid.n
Quick check on the neighbor keys.
[ ]:
print(list(wgrid.neighbors.keys())[0:5])
Since this is a square grid, the first observation, in the upper left corner, has only two neighbors, one to the right (1) and one below (20 - since the first row goes from 0 to 19).
[ ]:
wgrid.neighbors[0]
Row-standardization yields the actual weights.
[ ]:
wgrid.transform = 'r'
wgrid.weights[0]
Any non-border cell has four neighbors, one to the left, right, up and down.
[ ]:
wgrid.weights[21]
Weights as Matrices¶
The weights.full
operation turns a spatial weights object into a standard numpy array. The function returns a tuple, of which the first element is the actual matrix and the second consists of a list of keys. For actual matrix operations, the latter is not that useful.
It is important to remember to always extract the first element of the tuple as the matrix of interest. Otherwise, one quickly runs into trouble with array operations.
This is illustrated for the row-standardized queen weights wq1 created earlier.
[ ]:
wq1full, wqfkeys = weights.full(wq1)
print(type(wq1full),type(wqfkeys))
wq1full.shape
Spatially Lagged Variables¶
Spatially lagged variables are essential in the specification of spatial regression models. They are the product of a spatial weight matrix with a vector of observations and yield new values as (weighted) averages of the values observed at neighboring locations (with the neighbors defined by the spatial weights).
This is illustrated for the variable y_name extracted from the data frame. Its mean and standard deviation are listed using the standard numpy
methods.
[ ]:
y = np.array(dfs[y_name])
print(y.shape)
print(y.mean())
print(y.std())
The new spatially lagged variable is created with the weights.lag_spatial
command, passing the weights object wq1 and the vector of interest, y. Its important to make sure that the dimensions match. In particular, if the vector in question is not an actual column vector, but a one-dimensional array, the result will not be a vector, but an array. This may cause trouble in some applicaitons.
[ ]:
wy = weights.lag_spatial(wq1,y)
print(wy.shape)
print(wy.mean())
print(wy.std())
The result is a column vector. The mean roughly corresponds to that of the original variable, but the spatially lagged variable has a smaller standard deviation. This illustrates the smoothing implied by the spatial lag operation.
To illustrate the problem with numpy arrays rather than vectors, the original vector is flattened and then the lag_spatial
operation is applied to it. Everything works fine, except that the result is an array, and not a column vector.
[ ]:
yy = y.flatten()
print(yy.shape)
wyy = weights.lag_spatial(wq1,yy)
print(wyy.shape)
The same result can also be obtained using an explicit matrix-vector multiplication with the full matrix wq1full just created.
[ ]:
wy1 = wq1full @ y
print(wy1.shape)
print(wy1.mean())
print(wy1.std())
Practice¶
Experiment with various spatial weights for your own data set or for one of the PySAL sample data sets. Create a spatially lagged variable for each of the weights and compare their properties, such as the mean, standard deviation, correlation between the original variable and the spatial lag, etc.