This page was generated from notebooks/skater.ipynb. Interactive online version:
Skater is a constrained spatial regionalization algorithm based on spanning tree pruning. The number of edges is prespecified to be cut in a continuous tree to group spatial units into contiguous regions.
The first step of Skater is to create a connectivity graph that captures the neighbourhood relationship between the spatial objects. The cost of each edge in the graph is inversely proportional to the similarity between the regions it joins. The neighbourhood is structured by a minimum spanning tree (MST), which is a connected tree with no circuits. The next step is to partition the MST by successive removal of edges that link dissimilar regions. The final result is the division of the spatial objects into connected regions that have maximum internal homogeneity. More detail can be found in (AssunCao et al., 2006)
import warnings warnings.filterwarnings('ignore')
import sys sys.path.append("../")
from spopt.region.skater import Skater import geopandas as gpd import libpysal from libpysal.examples import load_example import numpy as np from sklearn.metrics import pairwise as skm import matplotlib.pyplot as plt plt.rcParams['figure.figsize'] = [12, 8]
Airbnb Spots Clustering in Chicago¶
Skater we utilize data on Airbnb spots in Chicago, which can be downloaded from libpysal.examples.
We can first explore the data by plotting the number of Airbnb spots in each community in the sample, using a quintile classification:
<libpysal.examples.base.Example at 0x1643a1f10>
pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp') chicago = gpd.read_file(pth) chicago.plot(column='num_spots', scheme='Quantiles', cmap='GnBu', edgecolor='grey', legend=True)
With Skater, we can cluster these 77 communities into 5 regions such that each region consists of at least 5 communities. The homogeneity of the number of Airbnb spots per county within the regions is maximized.
We first define the variable that will be used to measure regional homogeneity, which is the number of Airbnb spots in this case.
attrs_name = ['num_spots']
Next, we specify a number of other parameters that will serve as input to the
skater model, including the spatial weight (to describe the relationship between the spatial objects), the number of regions, the least spatial objects in each region, etc.
A spatial weights object describes the spatial connectivity of the spatial objects:
w = libpysal.weights.Queen.from_dataframe(chicago)
The number of contiguous regions that we would like to group spatial units into:
n_clusters = 5
The minimum number of spatial objects in each region：
floor = 5
trace is a bool denoting whether to store intermediate labelings as the tree gets pruned
trace = False
We can set the a string to
islands to describe what to do with islands. If “ignore”, will discover
n_clusters regions, treating islands as their own regions. If “increase”, will discover
n_clusters regions, treating islands as separate from n_clusters.
islands = "increase"
We can also specify some key words as imput to the spanning forest algorithm, including:
dissimilarity : a callable distance metric
affinity : an callable affinity metric between 0,1. Will be inverted to provide a dissimilarity metric.
reduction: the reduction applied over all clusters to provide the map score.
center: way to compute the center of each region in attribute space
spanning_forest_kwds = dict( dissimilarity=skm.manhattan_distances, affinity=None, reduction=np.sum, center=np.mean )
The model can then be instantiated and solved:
model = Skater(chicago, w, attrs_name, n_clusters, floor, trace, islands, spanning_forest_kwds) model.solve()
chicago['skater_new'] = model.labels_ chicago['number'] = 1
chicago.plot(column='skater_new', categorical=True, figsize=(12,8), edgecolor='w')
The model solution results in five regions, two of which have five communities, one with six, one with seven, and one with fifty-four communities.