gwlearn.search.BandwidthSearch

class gwlearn.search.BandwidthSearch(model, *, geometry, fixed=False, kernel='bisquare', n_jobs=-1, search_method='golden_section', criterion='aicc', metrics=None, minimize=True, min_bandwidth=None, max_bandwidth=None, interval=None, max_iterations=100, tolerance=0.01, verbose=False, **kwargs)[source]

Optimal bandwidth search for geographically weighted estimators.

Reports information criteria and (optionally) other scores from multiple models with varying bandwidth. When using golden section search, it minimizes (or maximizes) the chosen criterion.

When using classification models with a defined min_proportion, keep in mind that some locations may be excluded from the final model. In such a case, the information criteria are typically not comparable across models with different bandwidths and shall not be used to determine the optimal one.

Parameters:
model : type

A geographically weighted estimator class (e.g. gwlearn.linear_model.GWLogisticRegression) that can be instantiated as model(bandwidth=..., geometry=..., fixed=..., kernel=..., n_jobs=..., ...) and exposes information criteria attributes like aicc_/aic_/bic_.

geometry : gpd.GeoSeries, optional

Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by bandwidth, fixed, kernel, and include_focal keywords.

fixed : bool, optional

True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False

kernel : str | Callable, optional

Type of kernel function used to weight observations, by default "bisquare"

n_jobs : int, optional

The number of jobs to run in parallel. -1 means using all processors, by default -1

search_method : {"golden_section", "interval"}, optional

Method used to search for optimal bandwidth. When using "golden_section", the Golden section optimization is used to find the optimal bandwidth while attempting to minimize or maximise criterion. When using "interval", fits all models within the specified bandwidths at a set interval without any attempt to optimize the selection. By default "golden_section".

criterion : str, optional

Criterion used to select the optimal bandwidth.

Supported values include {"aicc", "aic", "bic", "prediction_rate", "log_loss"}. If you pass another string, it is interpreted as an attribute name m and retrieved from the fitted model as getattr(model, m + "_"). By default "aicc".

metrics : list[str] | None, optional

Additional metrics to report for each bandwidth. Metrics follow the same conventions as criterion (including special cases "log_loss" and "prediction_rate"). By default None.

minimize : bool, optional

Minimize or maximize the criterion. When using information criterions, like AICc, the optimal solution is the lowest value. When using other metrics, the optimal may be the highest value. By default True, assuming lower is better.

min_bandwidth : int | float | None, optional

Minimum bandwidth to consider, by default None

max_bandwidth : int | float | None, optional

Maximum bandwidth to consider, by default None

interval : int | float | None, optional

Interval for bandwidth search when using “interval” method, by default None

max_iterations : int, optional

Maximum number of iterations for golden section search, by default 100

tolerance : float, optional

Tolerance for convergence in golden section search, by default 1e-2

verbose : bool | int, optional

Verbosity level, by default False

**kwargs

Additional keyword arguments passed to model initialization

scores_

Series of criterion scores for each bandwidth tested (index is bandwidth).

Type:

pd.Series

metrics_

DataFrame of additional metrics for each bandwidth tested.

Type:

pd.DataFrame

optimal_bandwidth_

The optimal bandwidth found by the search method.

Type:

int | float

Examples

Interval search over a small set of candidate bandwidths:

>>> import geopandas as gpd
>>> from geodatasets import get_path
>>> from gwlearn.linear_model import GWLogisticRegression
>>> from gwlearn.search import BandwidthSearch
>>> gdf = gpd.read_file(get_path('geoda.guerry'))
>>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']]
>>> y = gdf["Region"] == 'E'
>>> search = BandwidthSearch(
...     GWLogisticRegression,
...     geometry=gdf.representative_point(),
...     fixed=False,
...     search_method="interval",
...     criterion="aicc",
...     min_bandwidth=20,
...     max_bandwidth=80,
...     interval=10,
...     max_iter=200,
... ).fit(X, y)
>>> search.optimal_bandwidth_
np.int64(40)

Methods

__init__(model, *, geometry[, fixed, ...])

fit(X, y)

Fit the searcher by evaluating candidate bandwidths on the provided data.

Attributes

fit(X, y)[source]

Fit the searcher by evaluating candidate bandwidths on the provided data.

Parameters:
X : pd.DataFrame

Feature matrix used to evaluate candidate bandwidths (rows are samples).

y : pd.Series

Target values corresponding to X.

Returns:

The fitted instance.

Return type:

self

Notes

The optimal bandwidth is selected as the index of the minimum score if minimize=True, otherwise as the index of the maximum score.