spopt.region.Skater¶
- class spopt.region.Skater(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]¶
Skater is a spatial regionalization algorithm based on spanning tree pruning introduced in [ANCdCF06].
- Parameters:
- gdf
geopandas.GeoDataFrame
A Geodataframe containing original data. The
data
attribute is derived fromgdf
as theattrs_name
columns.- w
libpysal.weights.W
A PySAL weights object created from given data expressing the neighbor relationships between observations. It must be symmetric and binary, for example: Queen/Rook, DistanceBand, or a symmetrized KNN.
- attrs_name
list
Strings for attribute names (columns of
geopandas.GeoDataFrame
).- n_clusters
int
(default 5) The number of clusters to form.
- floor
int
,float
(default -numpy.inf) The floor on the size of regions.
- tracebool (default
False
) Flag denoting whether to store intermediate labelings as the tree gets pruned.
- islands
str
(default ‘increase’) Description of what to do with islands. If
'ignore'
, the algorithm will discovern_clusters
regions, treating islands as their own regions. If “increase”, the algorithm will discovern_clusters
regions, treating islands as separate fromn_clusters
.- spanning_forest_kwds
dict
(defaultdict
()) Keyword arguments to be passed to
SpanningForest
includingdissimilarity
,affinity
,reduction
, andcenter
. Seespopt.region.skater.SpanningForest
for docstrings.
- gdf
Examples
>>> from spopt.region import Skater >>> import geopandas >>> import libpysal >>> import numpy >>> from sklearn.metrics import pairwise as skm
Read the data.
>>> pth = libpysal.examples.get_path('airbnb_Chicago 2015.shp') >>> chicago = geopandas.read_file(pth)
Initialize the parameters.
>>> w = libpysal.weights.Queen.from_dataframe(chicago) >>> attrs_name = ['num_spots'] >>> n_clusters = 10 >>> floor = 3 >>> trace = False >>> islands = 'increase' >>> spanning_forest_kwds = dict( ... dissimilarity=skm.manhattan_distances, ... affinity=None, ... reduction=numpy.sum, ... center=numpy.mean ... )
Run the skater algorithm.
>>> model = Skater( ... chicago, w, ... attrs_name, ... n_clusters, ... floor, ... trace, ... islands, ... spanning_forest_kwds ... ) >>> model.solve()
Get the region IDs for unit areas.
>>> model.labels_
Show the clustering results.
>>> chicago['skater_new'] = model.labels_ >>> chicago.plot( ... column='skater_new', categorical=True, figsize=(12,8), edgecolor='w' ... )
- Attributes:
- labels_
numpy.array
Region IDs for observations.
- labels_
- __init__(gdf, w, attrs_name, n_clusters=5, floor=-inf, trace=False, islands='increase', spanning_forest_kwds={})[source]¶
Methods
__init__
(gdf, w, attrs_name[, n_clusters, ...])solve
()Solve the optimization model.