esda.boundary_silhouette¶

esda.boundary_silhouette(data, labels, W, metric=<function euclidean_distances>, drop_islands=True)[source]¶

Compute the observation-level boundary silhouette score [WKR19].

Parameters:

data(N_obs,P) numpy array: an array of covariates to analyze. Each row should be one observation, and each clumn should be one feature.
labels(N_obs,) array of labels: the labels corresponding to the group each observation is assigned.
Wlibpysal.weights.W | libpysal.graph.Graph: a spatial weights object containing the connectivity structure for the data
metriccallable(), array,: a function that takes an argument (data) and returns the all-pairs distances/dissimilarity between observations.
drop_islandsbool (default True): Whether or not to preserve islands as entries in the adjacency list. By default, observations with no neighbors do not appear in the adjacency list. If islands are kept, they are coded as self-neighbors with zero weight. See libpysal.weights.to_adjlist().

Returns:

(N_obs,) array of boundary silhouette values for each observation

Notes

The boundary silhouette is the silhouette score using only spatially-proximate clusters as candidates for the next-best-fit distance function (the b(i) function in [Rou87]. This restricts the next-best-fit cluster to be the set of clusters on which an observation neighbors. So, instead of considering all clusters when finding the next-best-fit cluster, only clusters that i borders are considered. This is supposed to model the fact that, in spatially-constrained clustering, observation i can only be reassigned from cluster c to cluster k if some observation j neighbors i and also resides in k.

If an observation only neighbors its own cluster, i.e. is not on the boundary: of a cluster, this value is zero.

If a cluster has exactly one observation, this value is zero.

If an observation is on the boundary of more than one cluster, then the best candidate is chosen from the set of clusters on which the observation borders.

metric is a callable mapping an (N,P) data into an (N,N) distance matrix OR an (N,N) distance matrix already.