spopt.region.Spenc¶
- class spopt.region.Spenc(gdf, w, attrs_name, n_clusters=5, random_state=None, gamma=1, eigen_solver=None, n_init=10, affinity='rbf', n_neighbors=10, eigen_tol=1e-09, assign_labels='discretize', degree=3, coef0=1, kernel_params=None, n_jobs=1)[source]¶
Spatially encouraged spectral clustering found in [Wol18].
Apply clustering to a projection of the normalized laplacian, using spatial information to constrain the clustering. In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.
Spatially-Encouraged Spectral Clustering (SPENC) is useful for when there may be highly non-convex clusters or clusters with irregular topology in a geographic context. If a binary weights matrix is provided during fit, this method can be used to find weighted normalized graph cuts.
When calling
fit
, an affinity matrix is constructed using either kernel function such the Gaussian (aka RBF) kernel of the euclidean distanced \(d(X, X)\):numpy.exp(-gamma * d(X,X) ** 2)
or a \(k\)-nearest neighbors connectivity matrix. Alternatively, using
precomputed
, a user-provided affinity matrix can be used. Read more in thescikit-learn
user guide on spectral clustering.- __init__(gdf, w, attrs_name, n_clusters=5, random_state=None, gamma=1, eigen_solver=None, n_init=10, affinity='rbf', n_neighbors=10, eigen_tol=1e-09, assign_labels='discretize', degree=3, coef0=1, kernel_params=None, n_jobs=1)[source]¶
- Parameters:
- gdf
geopandas.GeoDataFrame
Input data.
- w
libpywal.weights.W
Spatial weights matrix.
- attrs_name
list
Strings for attribute names from columns in
gdf
.- n_clusters
int
(default 5) The number of clusters to form.
- random_state
int
ornumpy.random.RandomState
(defaultNone
) A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when
eigen_solver='amg'
and by the \(k\)-Means initialization. Ifint
,random_state
is the seed used by the random number generator; Ifnumpy.random.RandomState
,random_state
is the random number generator; IfNone
, the random number generator is the numpy.random.RandomState instance used bynumpy.random
.- gamma
int
,float
(default 1) Kernel coefficient for rbf, poly, sigmoid, laplacian and chi2 kernels. Ignored for
affinity='nearest_neighbors'
.- eigen_solver
str
(defaultNone
) The eigenvalue decomposition strategy to use. Valid values include
{'arpack', 'lobpcg', 'amg'}
. AMG requirespyamg
to be installed, which may be faster on very large, sparse problems, but may also lead to instabilities. Note –eigen_solver
is ignored unless fitting using thebreakme
flag in the.fit()
method (so do not use then).- n_init
int
(default 10) The number of times the \(k\)-means algorithm will be run with different centroid seeds. The final results will be the best output of
n_init
consecutive runs in terms of inertia.- affinity
str
, array_like,callable()
(default ‘rbf’) If a
str
, valid values include{'nearest_neighbors', 'precomputed', 'rbf'}
or one of the kernels supported bysklearn.metrics.pairwise_kernels
. Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.- n_neighbors
int
(default 10) The number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for
affinity='rbf'
.- eigen_tol
float
(default 1e-7) Stopping criterion for eigen decomposition of the Laplacian matrix when using
'arpack'
as theeigen_solver
.- assign_labels
str
(default ‘discretize’) The strategy to use to assign labels in the embedding space. There are three ways to assign labels after the laplacian embedding:
{'kmeans', 'discretize', 'hierarchical'}
:'kmeans'
can be applied and is a popular choice. But it can also be sensitive to initialization.'discretize'
is another approach which is less sensitive to random initialization, and which usually finds better clusters.'hierarchical'
decomposition repeatedly bi-partitions the graph, instead of finding the decomposition all at once, as suggested in [SM00].
- degree
float
(default 3) Degree of the polynomial affinity kernel. Ignored by other kernels.
- coef0
float
(default 1) Zero coefficient for polynomial and sigmoid affinity kernels. Ignored by other kernels.
- kernel_params
dict
(defaultNone
) Parameters (keyword arguments) and values for affinity kernel passed as callable object. Ignored by other affinity kernels.
- n_jobs
int
(default 1) The number of parallel jobs to run for the nearest-neighbors affinity kernel, if used. If
-1
, then the number of jobs is set to the number of CPU cores.
- gdf
Notes
If you have an affinity matrix, such as a distance matrix, for which
0
means identical elements, and high values mean very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:numpy.exp(-dist_matrix ** 2 / (2. * delta ** 2))
Where
delta
is a free parameter representing the width of the Gaussian kernel.Another alternative is to take a symmetric version of the \(k\)-nearest neighbors connectivity matrix of the points/areas.
References
[SM00] Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik – https://doi.org/10.1109/34.868688
[vonLuxburg07] A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg – https://doi.org/10.1007/s11222-007-9033-z
[YS03] Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi – https://doi.org/10.1109/ICCV.2003.1238361
- Attributes:
- affinity_matrix_array_like
Affinity matrix used for clustering in the shape of
(n_samples, n_samples)
. Available only if after callingfit
.- labels_
list
Cluster labels of each point or area.
Methods
__init__
(gdf, w, attrs_name[, n_clusters, ...])- Parameters:
solve
([fit_kwargs])Solve the spenc.