This page was generated from notebooks/ward.ipynb. Interactive online version: Binder badge


Author: Xin Feng

This algorithm is an agglomerative clustering using ward linkage with a spatial connectivity constraint. Specifically, it is a “bottom-up” approach: each zone starts as its own cluster, and pairs of clusters are chosen to merge at each step in order to minimally increase a given linkage distance. Ward linkage refers to the variance of the clusters being merged. Ward algorithm in pysal/spopt is the function (sklearn.cluster.AgglomerativeClustering) when the linkage criterion is ward.

import sys
from spopt.region import WardSpatial
import warnings
import geopandas as gpd
import libpysal
from libpysal.examples import load_example
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 8]

Airbnb Spots Clustering in Chicago

To illustrate Ward we utilize data on Airbnb spots in Chicago, which can be downloaded from libpysal.examples.

We can first explore the data by plotting the number of Airbnb spots in each community in the sample, using a quintile classification:

<libpysal.examples.base.Example at 0x1606592e0>
pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp')
chicago = gpd.read_file(pth)
chicago.plot(column='num_spots', scheme='Quantiles', cmap='GnBu', edgecolor='grey', legend=True)


With Ward, we can aggregate these 77 communities into 5 clusters. During the merging process, the variance of the clusters is minimized.

We first define the variable that will be used to measure the variance of clusters. The variable is the number of Airbnb spots in each community in this case.

attrs_name = ['num_spots']

Next, we specify a number of other parameters that will serve as input to the Ward model.

A spatial weights object describes the spatial connectivity of the spatial objects:

w = libpysal.weights.Queen.from_dataframe(chicago)

The number of clusters that we would like to group these counties into:

n_clusters = 5

There are also some optional parameters about clustering in (sklearn.cluster.AgglomerativeClustering). They can be added in the Ward function as a dictionary. In this example, we only use the default settings, you can define them as needed.

The model can then be solved:

model = WardSpatial(chicago, w, attrs_name, n_clusters)
chicago['ward_new'] = model.labels_
chicago['number'] = 1
0 3
1 2
2 3
3 62
4 7
chicago.plot(column='ward_new', categorical=True, edgecolor='w')

The model solution results in five clusters, two of which have three communities, one with two, one with seven, and one with sixty-two communities.