spopt.region.Skater¶

class spopt.region.Skater(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]¶

Skater is a spatial regionalization algorithm based on spanning tree pruning introduced in [ANCdCF06].

Parameters:

gdfgeopandas.GeoDataFrame: A Geodataframe containing original data. The data attribute is derived from gdf as the attrs_name columns.
wlibpysal.weights.W: A PySAL weights object created from given data expressing the neighbor relationships between observations. It must be symmetric and binary, for example: Queen/Rook, DistanceBand, or a symmetrized KNN.
attrs_namelist: Strings for attribute names (columns of geopandas.GeoDataFrame).
n_clustersint (default 5): The number of clusters to form.
floorint, float (default -numpy.inf): The floor on the size of regions.
tracebool (default False): Flag denoting whether to store intermediate labelings as the tree gets pruned.
islandsstr (default ‘increase’): Description of what to do with islands. If 'ignore', the algorithm will discover n_clusters regions, treating islands as their own regions. If “increase”, the algorithm will discover n_clusters regions, treating islands as separate from n_clusters.
spanning_forest_kwdsdict (default dict()): Keyword arguments to be passed to SpanningForest including dissimilarity, affinity, reduction, and center. See spopt.region.skater.SpanningForest for docstrings.

Examples

>>> from spopt.region import Skater
>>> import geopandas
>>> import libpysal
>>> import numpy
>>> from sklearn.metrics import pairwise as skm

Read the data.

>>> pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp')
>>> chicago = geopandas.read_file(pth)

Initialize the parameters.

>>> w = libpysal.weights.Queen.from_dataframe(chicago)
>>> attrs_name = ['num_spots']
>>> n_clusters = 10
>>> floor = 3
>>> trace = False
>>> islands = 'increase'
>>> spanning_forest_kwds = dict(
...     dissimilarity=skm.manhattan_distances,
...     affinity=None,
...     reduction=numpy.sum,
...     center=numpy.mean
... )

Run the skater algorithm.

>>> model = Skater(
...     chicago, w,
...     attrs_name,
...     n_clusters,
...     floor,
...     trace,
...     islands,
...     spanning_forest_kwds
... )
>>> model.solve()

Get the region IDs for unit areas.

>>> model.labels_

Show the clustering results.

>>> chicago['skater_new'] = model.labels_
>>> chicago.plot(
...     column='skater_new', categorical=True, figsize=(12,8), edgecolor='w'
... )

Attributes:

labels_numpy.array: Region IDs for observations.

__init__(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]¶

Methods

`__init__`(gdf, w, attrs_name[, n_clusters, ...])
`solve`()	Solve the optimization model.

solve()[source]¶: Solve the optimization model.