This page was generated from notebooks/skater.ipynb. Interactive online version: Binder badge

Skater

Authors:Xin Feng

Skater is a constrained spatial regionalization algorithm based on spanning tree pruning. The number of edges is prespecified to be cut in a continuous tree to group spatial units into contiguous regions.

The first step of Skater is to create a connectivity graph that captures the neighbourhood relationship between the spatial objects. The cost of each edge in the graph is inversely proportional to the similarity between the regions it joins. The neighbourhood is structured by a minimum spanning tree (MST), which is a connected tree with no circuits. The next step is to partition the MST by successive removal of edges that link dissimilar regions. The final result is the division of the spatial objects into connected regions that have maximum internal homogeneity. More detail can be found in (AssunCao et al., 2006)

[1]:
import warnings
warnings.filterwarnings('ignore')
[2]:
import sys
sys.path.append("../")
[3]:
from spopt.region.skater import Skater
import geopandas as gpd
import libpysal
from libpysal.examples import load_example
import numpy as np
from sklearn.metrics import pairwise as skm
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 8]

Airbnb Spots Clustering in Chicago

To illustrate Skater we utilize data on Airbnb spots in Chicago, which can be downloaded from libpysal.examples.

We can first explore the data by plotting the number of Airbnb spots in each community in the sample, using a quintile classification:

[4]:
load_example('AirBnB')
[4]:
<libpysal.examples.base.Example at 0x1643a1f10>
[5]:
pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp')
chicago = gpd.read_file(pth)
chicago.plot(column='num_spots', scheme='Quantiles', cmap='GnBu', edgecolor='grey', legend=True)
[5]:
<AxesSubplot:>
../_images/notebooks_skater_6_1.png

Regionalization

With Skater, we can cluster these 77 communities into 5 regions such that each region consists of at least 5 communities. The homogeneity of the number of Airbnb spots per county within the regions is maximized.

We first define the variable that will be used to measure regional homogeneity, which is the number of Airbnb spots in this case.

[6]:
attrs_name = ['num_spots']

Next, we specify a number of other parameters that will serve as input to the skater model, including the spatial weight (to describe the relationship between the spatial objects), the number of regions, the least spatial objects in each region, etc.

A spatial weights object describes the spatial connectivity of the spatial objects:

[7]:
w = libpysal.weights.Queen.from_dataframe(chicago)

The number of contiguous regions that we would like to group spatial units into:

[8]:
n_clusters = 5

The minimum number of spatial objects in each region:

[9]:
floor = 5

trace is a bool denoting whether to store intermediate labelings as the tree gets pruned

[10]:
trace = False

We can set the a string to islands to describe what to do with islands. If “ignore”, will discover n_clusters regions, treating islands as their own regions. If “increase”, will discover n_clusters regions, treating islands as separate from n_clusters.

[11]:
islands = "increase"

We can also specify some key words as imput to the spanning forest algorithm, including:

dissimilarity : a callable distance metric

affinity : an callable affinity metric between 0,1. Will be inverted to provide a dissimilarity metric.

reduction: the reduction applied over all clusters to provide the map score.

center: way to compute the center of each region in attribute space

[12]:
spanning_forest_kwds = dict(
    dissimilarity=skm.manhattan_distances, affinity=None, reduction=np.sum, center=np.mean
)

The model can then be instantiated and solved:

[13]:
model = Skater(chicago, w, attrs_name, n_clusters, floor, trace, islands, spanning_forest_kwds)
model.solve()
[14]:
chicago['skater_new'] = model.labels_
chicago['number'] = 1
[15]:
chicago[['skater_new','number']].groupby(by='skater_new').count()
[15]:
number
skater_new
0 5
1 54
2 7
3 5
4 6
[16]:
chicago.plot(column='skater_new', categorical=True, figsize=(12,8), edgecolor='w')
[16]:
<AxesSubplot:>
../_images/notebooks_skater_27_1.png

The model solution results in five regions, two of which have five communities, one with six, one with seven, and one with fifty-four communities.