esda.boundary_silhouette¶
- esda.boundary_silhouette(data, labels, W, metric=<function euclidean_distances>, drop_islands=True)[source]¶
Compute the observation-level boundary silhouette score [WKR19].
- Parameters:
- data(N_obs,P)
numpy
array
an array of covariates to analyze. Each row should be one observation, and each clumn should be one feature.
- labels(N_obs,)
array
oflabels
the labels corresponding to the group each observation is assigned.
- W
libpysal.weights.W
|libpysal.graph.Graph
a spatial weights object containing the connectivity structure for the data
- metric
callable()
, array, a function that takes an argument (data) and returns the all-pairs distances/dissimilarity between observations.
- drop_islandsbool (default
True
) Whether or not to preserve islands as entries in the adjacency list. By default, observations with no neighbors do not appear in the adjacency list. If islands are kept, they are coded as self-neighbors with zero weight. See
libpysal.weights.to_adjlist()
.
- data(N_obs,P)
- Returns:
- (N_obs,)
array
ofboundary
silhouette
values
for
each
observation
- (N_obs,)
Notes
The boundary silhouette is the silhouette score using only spatially-proximate clusters as candidates for the next-best-fit distance function (the b(i) function in [Rou87]. This restricts the next-best-fit cluster to be the set of clusters on which an observation neighbors. So, instead of considering all clusters when finding the next-best-fit cluster, only clusters that i borders are considered. This is supposed to model the fact that, in spatially-constrained clustering, observation i can only be reassigned from cluster c to cluster k if some observation j neighbors i and also resides in k.
- If an observation only neighbors its own cluster, i.e. is not on the boundary
of a cluster, this value is zero.
If a cluster has exactly one observation, this value is zero.
If an observation is on the boundary of more than one cluster, then the best candidate is chosen from the set of clusters on which the observation borders.
metric is a callable mapping an (N,P) data into an (N,N) distance matrix OR an (N,N) distance matrix already.