This page was generated from notebooks/ward.ipynb. Interactive online version: Binder badge

Ward

Author: Xin Feng, James Gaboardi

This algorithm is an agglomerative clustering using ward linkage with a spatial connectivity constraint. Specifically, it is a “bottom-up” approach: each zone starts as its own cluster, and pairs of clusters are chosen to merge at each step in order to minimally increase a given linkage distance. Ward linkage refers to the variance of the clusters being merged. Ward algorithm in pysal/spopt is the function (sklearn.cluster.AgglomerativeClustering) when the linkage criterion is ward.

[1]:
%config InlineBackend.figure_format = "retina"
%load_ext watermark
%watermark
Last updated: 2023-12-10T14:13:53.699494-05:00

Python implementation: CPython
Python version       : 3.12.0
IPython version      : 8.18.0

Compiler    : Clang 15.0.7
OS          : Darwin
Release     : 23.1.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit

[2]:
import geopandas
import libpysal
from libpysal.examples import load_example
import spopt
from spopt.region import WardSpatial

%matplotlib inline
%watermark -w
%watermark -iv
Watermark: 2.4.3

geopandas: 0.14.1
libpysal : 4.9.2
spopt    : 0.5.1.dev53+g5cadae7

Airbnb Spots Clustering in Chicago

To illustrate Ward we utilize data on Airbnb spots in Chicago, which can be downloaded from libpysal.examples.

We can first explore the data by plotting the number of Airbnb spots in each community in the sample, using a quintile classification:

[3]:
load_example("AirBnB")
[3]:
<libpysal.examples.base.Example at 0x15beb81d0>
[4]:
pth = libpysal.examples.get_path("airbnb_Chicago 2015.shp")
chicago = geopandas.read_file(pth)
chicago.plot(
    figsize=(7, 14),
    column="num_spots",
    scheme="Quantiles",
    cmap="GnBu",
    edgecolor="grey",
    legend=True
).axis("off");
../_images/notebooks_ward_5_0.png

Regionalization

With Ward, we can aggregate these 77 communities into 5 clusters. During the merging process, the variance of the clusters is minimized.

We first define the variable that will be used to measure the variance of clusters. The variable is the number of Airbnb spots in each community in this case.

[5]:
attrs_name = ["num_spots"]

Next, we specify a number of other parameters that will serve as input to the Ward model.

A spatial weights object describes the spatial connectivity of the spatial objects:

[6]:
w = libpysal.weights.Queen.from_dataframe(chicago, use_index=False)

The number of clusters that we would like to group these counties into:

[7]:
n_clusters = 5

There are also some optional parameters about clustering in (sklearn.cluster.AgglomerativeClustering). They can be added in the Ward function as a dictionary. In this example, we only use the default settings, you can define them as needed.

The model can then be solved:

[8]:
model = WardSpatial(chicago, w, attrs_name, n_clusters)
model.solve()
[9]:
chicago["ward_new"] = model.labels_
[10]:
chicago["number"] = 1
chicago[["ward_new", "number"]].groupby(by="ward_new").count()
[10]:
number
ward_new
0 3
1 2
2 3
3 62
4 7
[11]:
chicago.plot(
    figsize=(7, 14), column="ward_new", categorical=True, edgecolor="w"
).axis("off");
../_images/notebooks_ward_19_0.png

The model solution results in five clusters, two of which have three communities, one with two, one with seven, and one with sixty-two communities.