gwlearn.search.BandwidthSearch

class gwlearn.search.BandwidthSearch(model, *, fixed=False, kernel='bisquare', coplanar='raise', n_jobs=-1, search_method='golden_section', criterion=None, metrics=None, minimize=True, min_bandwidth=None, max_bandwidth=None, interval=None, max_iterations=100, tolerance=0.01, verbose=False, **kwargs)[source]

Optimal bandwidth search for geographically weighted estimators.

Reports scores from multiple models with varying bandwidth and identifies the optimal one. When using golden section search, it minimizes (or maximizes) the chosen criterion.

The search supports two broad families of models:

  • Linear / logistic models (GWLinearRegression, GWLogisticRegression): information criteria "aicc", "aic", "bic" are valid and recommended. They are included in metrics_ automatically.

  • Non-linear models (random forest, gradient boosting, …): information criteria are not valid (no closed-form log-likelihood or hat matrix). Use "rmse" / "mae" for regression or "log_loss" combined with "prediction_rate" for classification instead.

When using classification models with a defined min_proportion, keep in mind that some locations may be excluded from the final model. In such a case, even the valid information criteria are not comparable across bandwidths and "log_loss" should be preferred.

Parameters:
model : type

A geographically weighted estimator class (e.g. gwlearn.linear_model.GWLogisticRegression) that can be instantiated as model(bandwidth=..., fixed=..., kernel=..., n_jobs=..., ...).

fixed : bool, optional

True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False

kernel : str | Callable, optional

Type of kernel function used to weight observations, by default "bisquare"

coplanar : {"raise", "jitter", "clique"}, optional

Method for handling coplanar points with adaptive kernels. Options are "raise" (raising an exception when coplanar points are present), "jitter" (randomly displace coplanar points to produce uniqueness), and "clique" (induce fully-connected sub-cliques for coplanar points). By default "raise"

n_jobs : int, optional

The number of jobs to run in parallel. -1 means using all processors, by default -1

search_method : {"golden_section", "interval"}, optional

Method used to search for optimal bandwidth. When using "golden_section", the Golden section optimization is used to find the optimal bandwidth while attempting to minimize or maximise criterion. When using "interval", fits all models within the specified bandwidths at a set interval without any attempt to optimize the selection. By default "golden_section".

criterion : str, optional

Criterion used to select the optimal bandwidth.

Built-in special values:

  • "aicc", "aic", "bic" — information criteria; only valid for linear / logistic models.

  • "log_loss" — cross-entropy loss; for classifiers only.

  • "prediction_rate" — proportion of fitted locations; classifiers.

  • "rmse" — root mean squared error of focal residuals; regressors.

  • "mae" — mean absolute error of focal residuals; regressors.

Any other string m is interpreted as an attribute name and retrieved from the fitted model as getattr(model, m + "_"). By default "aicc" for linear models and "rmse" for non-linear.

metrics : list[str] | None, optional

Additional metrics to report for each bandwidth. Follow the same conventions as criterion. By default None.

minimize : bool, optional

Minimize or maximize the criterion. For information criteria and error metrics the optimum is the lowest value; for "prediction_rate" or accuracy-like metrics it is the highest. By default True.

min_bandwidth : int | float | None, optional

Minimum bandwidth to consider, by default None

max_bandwidth : int | float | None, optional

Maximum bandwidth to consider, by default None

interval : int | float | None, optional

Interval for bandwidth search when using "interval" method, by default None

max_iterations : int, optional

Maximum number of iterations for golden section search, by default 100

tolerance : float, optional

Tolerance for convergence in golden section search, by default 1e-2

verbose : bool | int, optional

Verbosity level, by default False

**kwargs

Additional keyword arguments passed to model initialization

scores_[source]

Series of criterion scores for each bandwidth tested (index is bandwidth).

Type:

pd.Series

metrics_[source]

DataFrame of additional metrics for each bandwidth tested. For linear/logistic models, columns "aicc", "aic", "bic" are always present; they are omitted for non-linear models.

Type:

pd.DataFrame

optimal_bandwidth_[source]

The optimal bandwidth found by the search method.

Type:

int | float

Examples

Interval search over a small set of candidate bandwidths:

>>> import geopandas as gpd
>>> from geodatasets import get_path
>>> from gwlearn.linear_model import GWLogisticRegression
>>> from gwlearn.search import BandwidthSearch
>>> gdf = gpd.read_file(get_path('geoda.guerry'))
>>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']]
>>> y = gdf["Region"] == 'E'
>>> search = BandwidthSearch(
...     GWLogisticRegression,
...     fixed=False,
...     search_method="interval",
...     criterion="aicc",
...     min_bandwidth=20,
...     max_bandwidth=80,
...     interval=10,
...     max_iter=200,
... ).fit(X, y, geometry=gdf.representative_point())
>>> search.optimal_bandwidth_
40

Methods

__init__(model, *[, fixed, kernel, ...])

fit(X, y, geometry)

Fit the searcher by evaluating candidate bandwidths on the provided data.

Attributes

fit(X, y, geometry)[source]

Fit the searcher by evaluating candidate bandwidths on the provided data.

Parameters:
X : pd.DataFrame

Feature matrix used to evaluate candidate bandwidths (rows are samples).

y : pd.Series

Target values corresponding to X.

geometry : gpd.GeoSeries

Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by bandwidth, fixed, kernel, and include_focal keywords.

Returns:

The fitted instance.

Return type:

self

Notes

The optimal bandwidth is selected as the index of the minimum score if minimize=True, otherwise as the index of the maximum score.