gwlearn.search.BandwidthSearch¶
-
class gwlearn.search.BandwidthSearch(model, *, fixed=
False, kernel='bisquare', coplanar='raise', n_jobs=-1, search_method='golden_section', criterion=None, metrics=None, minimize=True, min_bandwidth=None, max_bandwidth=None, interval=None, max_iterations=100, tolerance=0.01, verbose=False, **kwargs)[source]¶ Optimal bandwidth search for geographically weighted estimators.
Reports scores from multiple models with varying bandwidth and identifies the optimal one. When using golden section search, it minimizes (or maximizes) the chosen
criterion.The search supports two broad families of models:
Linear / logistic models (
GWLinearRegression,GWLogisticRegression): information criteria"aicc","aic","bic"are valid and recommended. They are included inmetrics_automatically.Non-linear models (random forest, gradient boosting, …): information criteria are not valid (no closed-form log-likelihood or hat matrix). Use
"rmse"/"mae"for regression or"log_loss"combined with"prediction_rate"for classification instead.
When using classification models with a defined
min_proportion, keep in mind that some locations may be excluded from the final model. In such a case, even the valid information criteria are not comparable across bandwidths and"log_loss"should be preferred.- Parameters:¶
- model : type¶
A geographically weighted estimator class (e.g.
gwlearn.linear_model.GWLogisticRegression) that can be instantiated asmodel(bandwidth=..., fixed=..., kernel=..., n_jobs=..., ...).- fixed : bool, optional¶
True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default
False- kernel : str | Callable, optional¶
Type of kernel function used to weight observations, by default
"bisquare"- coplanar : {"raise", "jitter", "clique"}, optional¶
Method for handling coplanar points with adaptive kernels. Options are
"raise"(raising an exception when coplanar points are present),"jitter"(randomly displace coplanar points to produce uniqueness), and"clique"(induce fully-connected sub-cliques for coplanar points). By default"raise"- n_jobs : int, optional¶
The number of jobs to run in parallel.
-1means using all processors, by default-1- search_method : {"golden_section", "interval"}, optional¶
Method used to search for optimal bandwidth. When using
"golden_section", the Golden section optimization is used to find the optimal bandwidth while attempting to minimize or maximisecriterion. When using"interval", fits all models within the specified bandwidths at a set interval without any attempt to optimize the selection. By default"golden_section".- criterion : str, optional¶
Criterion used to select the optimal bandwidth.
Built-in special values:
"aicc","aic","bic"— information criteria; only valid for linear / logistic models."log_loss"— cross-entropy loss; for classifiers only."prediction_rate"— proportion of fitted locations; classifiers."rmse"— root mean squared error of focal residuals; regressors."mae"— mean absolute error of focal residuals; regressors.
Any other string
mis interpreted as an attribute name and retrieved from the fitted model asgetattr(model, m + "_"). By default"aicc"for linear models and"rmse"for non-linear.- metrics : list[str] | None, optional¶
Additional metrics to report for each bandwidth. Follow the same conventions as
criterion. By defaultNone.- minimize : bool, optional¶
Minimize or maximize the
criterion. For information criteria and error metrics the optimum is the lowest value; for"prediction_rate"or accuracy-like metrics it is the highest. By defaultTrue.- min_bandwidth : int | float | None, optional¶
Minimum bandwidth to consider, by default
None- max_bandwidth : int | float | None, optional¶
Maximum bandwidth to consider, by default
None- interval : int | float | None, optional¶
Interval for bandwidth search when using
"interval"method, by defaultNone- max_iterations : int, optional¶
Maximum number of iterations for golden section search, by default
100- tolerance : float, optional¶
Tolerance for convergence in golden section search, by default
1e-2- verbose : bool | int, optional¶
Verbosity level, by default False
- **kwargs¶
Additional keyword arguments passed to
modelinitialization
- scores_[source]¶
Series of criterion scores for each bandwidth tested (index is bandwidth).
- Type:¶
pd.Series
- metrics_[source]¶
DataFrame of additional metrics for each bandwidth tested. For linear/logistic models, columns
"aicc","aic","bic"are always present; they are omitted for non-linear models.- Type:¶
pd.DataFrame
Examples
Interval search over a small set of candidate bandwidths:
>>> import geopandas as gpd >>> from geodatasets import get_path >>> from gwlearn.linear_model import GWLogisticRegression >>> from gwlearn.search import BandwidthSearch>>> gdf = gpd.read_file(get_path('geoda.guerry')) >>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']] >>> y = gdf["Region"] == 'E'>>> search = BandwidthSearch( ... GWLogisticRegression, ... fixed=False, ... search_method="interval", ... criterion="aicc", ... min_bandwidth=20, ... max_bandwidth=80, ... interval=10, ... max_iter=200, ... ).fit(X, y, geometry=gdf.representative_point()) >>> search.optimal_bandwidth_ 40Methods
__init__(model, *[, fixed, kernel, ...])fit(X, y, geometry)Fit the searcher by evaluating candidate bandwidths on the provided data.
Attributes
- fit(X, y, geometry)[source]¶
Fit the searcher by evaluating candidate bandwidths on the provided data.
- Parameters:¶
- X : pd.DataFrame¶
Feature matrix used to evaluate candidate bandwidths (rows are samples).
- y : pd.Series¶
Target values corresponding to X.
- geometry : gpd.GeoSeries¶
Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by
bandwidth,fixed,kernel, andinclude_focalkeywords.
- Returns:¶
The fitted instance.
- Return type:¶
self
Notes
The optimal bandwidth is selected as the index of the minimum score if
minimize=True, otherwise as the index of the maximum score.