spopt.region.Skater

class spopt.region.Skater(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]

Skater is a spatial regionalization algorithm based on spanning tree pruning introduced in [ANCdCF06].

Parameters:
gdfgeopandas.GeoDataFrame

A Geodataframe containing original data. The data attribute is derived from gdf as the attrs_name columns.

wlibpysal.weights.W

A PySAL weights object created from given data expressing the neighbor relationships between observations. It must be symmetric and binary, for example: Queen/Rook, DistanceBand, or a symmetrized KNN.

attrs_namelist

Strings for attribute names (columns of geopandas.GeoDataFrame).

n_clustersint (default 5)

The number of clusters to form.

floorint, float (default -numpy.inf)

The floor on the size of regions.

tracebool (default False)

Flag denoting whether to store intermediate labelings as the tree gets pruned.

islandsstr (default ‘increase’)

Description of what to do with islands. If 'ignore', the algorithm will discover n_clusters regions, treating islands as their own regions. If “increase”, the algorithm will discover n_clusters regions, treating islands as separate from n_clusters.

spanning_forest_kwdsdict (default dict())

Keyword arguments to be passed to SpanningForest including dissimilarity, affinity, reduction, and center. See spopt.region.skater.SpanningForest for docstrings.

Examples

>>> from spopt.region import Skater
>>> import geopandas
>>> import libpysal
>>> import numpy
>>> from sklearn.metrics import pairwise as skm

Read the data.

>>> pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp')
>>> chicago = geopandas.read_file(pth)

Initialize the parameters.

>>> w = libpysal.weights.Queen.from_dataframe(chicago)
>>> attrs_name = ['num_spots']
>>> n_clusters = 10
>>> floor = 3
>>> trace = False
>>> islands = 'increase'
>>> spanning_forest_kwds = dict(
...     dissimilarity=skm.manhattan_distances,
...     affinity=None,
...     reduction=numpy.sum,
...     center=numpy.mean
... )

Run the skater algorithm.

>>> model = Skater(
...     chicago, w,
...     attrs_name,
...     n_clusters,
...     floor,
...     trace,
...     islands,
...     spanning_forest_kwds
... )
>>> model.solve()

Get the region IDs for unit areas.

>>> model.labels_

Show the clustering results.

>>> chicago['skater_new'] = model.labels_
>>> chicago.plot(
...     column='skater_new', categorical=True, figsize=(12,8), edgecolor='w'
... )
Attributes:
labels_numpy.array

Region IDs for observations.

__init__(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]

Methods

__init__(gdf, w, attrs_name[, n_clusters, ...])

solve()

Solve the optimization model.

solve()[source]

Solve the optimization model.