This page was generated from notebooks/ward.ipynb. Interactive online version:
Ward¶
Author: Xin Feng, James Gaboardi
This algorithm is an agglomerative clustering using ward linkage with a spatial connectivity constraint. Specifically, it is a “bottom-up” approach: each zone starts as its own cluster, and pairs of clusters are chosen to merge at each step in order to minimally increase a given linkage distance. Ward linkage refers to the variance of the clusters being merged. Ward
algorithm in pysal/spopt is the function
(sklearn.cluster.AgglomerativeClustering) when the linkage criterion is ward.
[1]:
%config InlineBackend.figure_format = "retina"
%load_ext watermark
%watermark
Last updated: 2023-12-10T14:13:53.699494-05:00
Python implementation: CPython
Python version : 3.12.0
IPython version : 8.18.0
Compiler : Clang 15.0.7
OS : Darwin
Release : 23.1.0
Machine : x86_64
Processor : i386
CPU cores : 8
Architecture: 64bit
[2]:
import geopandas
import libpysal
from libpysal.examples import load_example
import spopt
from spopt.region import WardSpatial
%matplotlib inline
%watermark -w
%watermark -iv
Watermark: 2.4.3
geopandas: 0.14.1
libpysal : 4.9.2
spopt : 0.5.1.dev53+g5cadae7
Airbnb Spots Clustering in Chicago¶
To illustrate Ward
we utilize data on Airbnb spots in Chicago, which can be downloaded from libpysal.examples.
We can first explore the data by plotting the number of Airbnb spots in each community in the sample, using a quintile classification:
[3]:
load_example("AirBnB")
[3]:
<libpysal.examples.base.Example at 0x15beb81d0>
[4]:
pth = libpysal.examples.get_path("airbnb_Chicago 2015.shp")
chicago = geopandas.read_file(pth)
chicago.plot(
figsize=(7, 14),
column="num_spots",
scheme="Quantiles",
cmap="GnBu",
edgecolor="grey",
legend=True
).axis("off");
Regionalization¶
With Ward
, we can aggregate these 77 communities into 5 clusters. During the merging process, the variance of the clusters is minimized.
We first define the variable that will be used to measure the variance of clusters. The variable is the number of Airbnb spots in each community in this case.
[5]:
attrs_name = ["num_spots"]
Next, we specify a number of other parameters that will serve as input to the Ward
model.
A spatial weights object describes the spatial connectivity of the spatial objects:
[6]:
w = libpysal.weights.Queen.from_dataframe(chicago, use_index=False)
The number of clusters that we would like to group these counties into:
[7]:
n_clusters = 5
There are also some optional parameters about clustering in (sklearn.cluster.AgglomerativeClustering). They can be added in the Ward
function as a dictionary. In this example, we only use the default settings, you can define them as needed.
The model can then be solved:
[8]:
model = WardSpatial(chicago, w, attrs_name, n_clusters)
model.solve()
[9]:
chicago["ward_new"] = model.labels_
[10]:
chicago["number"] = 1
chicago[["ward_new", "number"]].groupby(by="ward_new").count()
[10]:
number | |
---|---|
ward_new | |
0 | 3 |
1 | 2 |
2 | 3 |
3 | 62 |
4 | 7 |
[11]:
chicago.plot(
figsize=(7, 14), column="ward_new", categorical=True, edgecolor="w"
).axis("off");
The model solution results in five clusters, two of which have three communities, one with two, one with seven, and one with sixty-two communities.