This page was generated from notebooks/correlogram.ipynb. Interactive online version: Binder badge

[1]:
%load_ext watermark
%watermark -a 'eli knaap'
%load_ext autoreload
%autoreload 2

from libpysal import examples
from esda import correlogram

import geopandas as gpd
Author: eli knaap

[2]:
sac = gpd.read_file(examples.load_example("Sacramento1").get_path("sacramentot2.shp"))
[3]:
sac = sac.to_crs(sac.estimate_utm_crs())  # now in meters)
[4]:
correlogram?
Signature:
correlogram(
    geometry: geopandas.geoseries.GeoSeries,
    variable: str | list | pandas.core.series.Series | None,
    support: list | None = None,
    statistic: collections.abc.Callable | str = <class 'esda.moran.Moran'>,
    distance_type: str = 'band',
    weights_kwargs: dict = None,
    stat_kwargs: dict = None,
    select_numeric: bool = False,
    n_jobs: int = -1,
    n_bins: int | None = 50,
) -> pandas.core.frame.DataFrame
Docstring:
Generate a spatial correlogram

A spatial profile is a set of spatial autocorrelation statistics calculated for
a set of increasing distances. It is a useful exploratory tool for examining
how the relationship between spatial units changes over different notions of scale.

Parameters
----------
geometry : gpd.GeoSeries
    geodataframe holding spatial and attribute data
variable: pd.Series or list
    pandas series matching input geometries
support : list or None
    list of values at which to compute the autocorrelation statistic
statistic : callable or str
    statistic to be computed for a range of libpysal.Graph specifications.
    This should be a class with a signature like `Statistic(y,w, **kwargs)`
    where y is a  array and w is a libpysal.weights.W object
    Generally, this is a class from pysal's `esda` package
    defaults to esda.Moran, which computes the Moran's I statistic. If
    'lowess' is provided, a non-parametric correlogram is computed using
    lowess regression on the spatial-covariation model, see Notes.
distance_type : str, optional
    which concept of distance to increment. Options are {`band`, `knn`}.
    by default 'band' (for `libpysal.weights.DistanceBand` weights)
weights_kwargs : dict
    additional keyword arguments passed to the libpysal.weights.W class
stat_kwargs : dict
    additional keyword arguments passed to the `esda` autocorrelation statistic class.
    For example for faster results with no statistical inference, set the number
    of permutations to zero with stat_kwargs={permutations: 0}
select_numeric : bool
    if True, only return numeric attributes from the original class. This is useful
    e.g. to prevent lists inside a "cell" of a dataframe
n_jobs : int
    number of jobs to pass to joblib. If -1 (default), all cores will be used
n_bins : int
    number of distance bands or k-nearest neighbor values to use if
    `support` is not provided. Ignored if `support` is provided.
    by default 10. If `distance_type` is 'knn', the number of neighbors
    will be capped at n-1, where n is the number of observations. Further,
    if n-1 is not divisible by `n_bins`, the actual number of bins will be
    may be off by one bin.

Returns
-------
outputs : pandas.DataFrame
    table of autocorrelation statistics at increasing distance bandwidths

Notes
-----
The nonparametric correlogram uses a lowess regression
to estimate the spatial-covariation model:

    zi*zj = f(d_{ij}) + e_ij

where f is a smooth function of distance d_{ij} between points i and j.
This function requires the statsmodels package to be installed.

For the nonparametric correlogram, a precomputed distance matrix can
be used. To do this, set
stat_kwargs={'metric':'precomputed', 'coordinates':distance_matrix}
where `distance_matrix` is a square matrix of pairwise distances that
aligns with the `geometry` rows.
File:      ~/Dropbox/work/dev/esda/esda/correlogram.py
Type:      function
[5]:
from esda import Moran, Geary, G

Distance Bands

[6]:
# Create a liste of distances between 500 and 5000 (meters, here) in increments of 500

distances = [i+500 for i in range(0,5000, 500)]
[7]:
distances
[7]:
[500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000]

The correlogram will compute an autocorrelation statistic (Moran’s \(I\) by default) at each distance threshold. Plotting this statistic against distance reveals how spatial similarity changes over distance (similar in concept to a variogram)

[ ]:
prof = correlogram(
    sac.centroid,
    sac.HH_INC,
    distances,
    Moran
)

prof is a dataframe of autocorrelation statistics indexed by distance. It includes all attributes created by the esda autocorrelation statistic class (e.g. Moran, Geary, or Geits-Ord G). The row index for each statistic is the distance at which it was computed

[9]:
prof.head()
[9]:
y w permutations n z z2ss EI VI_norm seI_norm VI_rand ... z_rand p_norm p_rand sim p_sim EI_sim seI_sim VI_sim z_sim p_z_sim
500 [52941, 51958, 32992, 54556, 50815, 60167, 490... <libpysal.weights.distance.DistanceBand object... 999 403 [0.27506115837390277, 0.21948138443207246, -0.... 1.260604e+11 -0.002488 0.497534 0.705361 0.496239 ... 0.086188 9.314058e-01 9.313166e-01 [-0.041149125017795225, 0.8047828919464444, 0.... 0.465 -0.017506 0.693436 0.480854 0.109214 4.565164e-01
1000 [52941, 51958, 32992, 54556, 50815, 60167, 490... <libpysal.weights.distance.DistanceBand object... 999 403 [0.27506115837390277, 0.21948138443207246, -0.... 1.260604e+11 -0.002488 0.014259 0.119409 0.014221 ... 4.147088 3.447621e-05 3.367309e-05 [0.01407910066509541, -0.09399265148441757, -0... 0.001 -0.007273 0.120349 0.014484 4.149132 1.668694e-05
1500 [52941, 51958, 32992, 54556, 50815, 60167, 490... <libpysal.weights.distance.DistanceBand object... 999 403 [0.27506115837390277, 0.21948138443207246, -0.... 1.260604e+11 -0.002488 0.004586 0.067719 0.004574 ... 6.763620 1.430242e-11 1.345857e-11 [-0.028776580865062164, 0.07946937157388524, 0... 0.001 -0.003352 0.067581 0.004567 6.781418 5.950106e-12
2000 [52941, 51958, 32992, 54556, 50815, 60167, 490... <libpysal.weights.distance.DistanceBand object... 999 403 [0.27506115837390277, 0.21948138443207246, -0.... 1.260604e+11 -0.002488 0.002164 0.046515 0.002158 ... 12.164298 5.846476e-34 4.815656e-34 [-0.0566559868640034, 0.06456200403436937, -0.... 0.001 -0.000508 0.047756 0.002281 11.791263 2.164994e-32
2500 [52941, 51958, 32992, 54556, 50815, 60167, 490... <libpysal.weights.distance.DistanceBand object... 999 403 [0.27506115837390277, 0.21948138443207246, -0.... 1.260604e+11 -0.002488 0.001481 0.038483 0.001477 ... 13.102771 3.974924e-39 3.174519e-39 [-0.03272028047068946, -0.025327634962963246, ... 0.001 -0.002785 0.039315 0.001546 12.816543 6.623787e-38

5 rows × 23 columns

Often, it is easiest to visualize the statistic, and the pandas plot function will plot a column against the dataframe’s index by default

[ ]:
ax = prof.I.plot()
ax.set_xlabel("distance (m)")
ax.set_ylabel("Moran's I")
<Axes: >
../_images/notebooks_correlogram_13_1.png

Autocorrelation statistics differ in concept, so the shape of the spatial correlogram statistic will vary considerably, based on which statistic is created. For example, we can also plot Geary’s \(C\)

[ ]:
prof = correlogram(
    sac.centroid,
    sac.HH_INC,
    distances,
    statistic=Geary
)
[ ]:
ax = prof.C.plot()
ax.set_xlabel("distance (m)")
ax.set_ylabel("Geary's C")
<Axes: >
../_images/notebooks_correlogram_16_1.png

…Or Getis-Ord \(G\)

[ ]:
prof = correlogram(
    sac.centroid,
    sac.HH_INC,
    distances,
    statistic=G
)
[ ]:
ax = prof.G.plot()
ax.set_xlabel("distance (m)")
ax.set_ylabel("Getis-Ord G")
<Axes: >
../_images/notebooks_correlogram_19_1.png

You can also leave the distance thresholds unspecified, and the correlogram will estimate over a reasonable range of distances:

[ ]:
prof = correlogram(
    sac.centroid,
    sac.HH_INC,
    statistic=G
)
ax = prof.G.plot()
ax.set_xlabel("distance (m)")
ax.set_ylabel("Getis-Ord G")
<Axes: >
../_images/notebooks_correlogram_21_1.png

KNN Distance

It is also possible to consider different concepts of distance. For example, rather than stepping through increments of Euclidean distance/length at each interval, we could instead step through increments of nearest-neighbors.

Instead of adding neighbors using sequential distances of 500 meters, here we will step through the 50 nearest neighbors, adding one neighbor at a time

[16]:
kdists = list(range(1,50))
[20]:
kcorr = correlogram(sac.centroid, sac.HH_INC, kdists, distance_type='knn')
[ ]:
ax = kcorr.I.plot()
ax.set_xlabel("number of nearest neighbors")
ax.set_ylabel("Moran's I")
<Axes: >
../_images/notebooks_correlogram_26_1.png

Nonparametric

Finally it is also possible to fit a non-parametric curve to estimate spatial autocorrelation as a function of distance. This is done using a LOWESS smoother, which fits a locally-weighted polynomial regression to the data based on the following regression:

\[z_iz_j = f(d_{ij}) + u_{ij}\]

where \(z_i\) and \(z_j\) are standardized values of the variable of interest at locations \(i\) and \(j\), \(d_{ij}\) is the distance between locations \(i\) and \(j\), and \(u_{ij}\) is an error term. The function \(f\) is estimated using a LOWESS smoother, which fits a locally-weighted polynomial regression to the data.

This can be interpreted as the correlation between observations separated by approximately the distance on the horizontal axis.

[26]:
nonparametric = correlogram(sac.centroid, sac.HH_INC, statistic='lowess')
ax = nonparametric.lowess.plot()
ax.set_xlabel("Distance (m)")
ax.set_ylabel("Correlation between pairs\napproximately this separation")
[26]:
Text(0, 0.5, 'Correlation between pairs\napproximately this separation')
../_images/notebooks_correlogram_29_1.png
[ ]: