gwlearn.search.BandwidthSearch¶

class gwlearn.search.BandwidthSearch(model, *, fixed=False, kernel='bisquare', coplanar='raise', n_jobs=-1, search_method='golden_section', criterion=None, metrics=None, minimize=True, min_bandwidth=None, max_bandwidth=None, interval=None, max_iterations=100, tolerance=0.01, verbose=False, **kwargs)[source]¶

Optimal bandwidth search for geographically weighted estimators.

Reports scores from multiple models with varying bandwidth and identifies the optimal one. When using golden section search, it minimizes (or maximizes) the chosen criterion.

The search supports two broad families of models:

Linear / logistic models (GWLinearRegression, GWLogisticRegression): information criteria "aicc", "aic", "bic" are valid and recommended. They are included in metrics_ automatically.
Non-linear models (random forest, gradient boosting, …): information criteria are not valid (no closed-form log-likelihood or hat matrix). Use "rmse" / "mae" for regression or "log_loss" combined with "prediction_rate" for classification instead.

When using classification models with a defined min_proportion, keep in mind that some locations may be excluded from the final model. In such a case, even the valid information criteria are not comparable across bandwidths and "log_loss" should be preferred.

Parameters:¶

model : type¶

A geographically weighted estimator class (e.g. gwlearn.linear_model.GWLogisticRegression) that can be instantiated as model(bandwidth=..., fixed=..., kernel=..., n_jobs=..., ...).

fixed : bool, optional¶

True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False

kernel : str | Callable, optional¶

Type of kernel function used to weight observations, by default "bisquare"

coplanar : {"raise", "jitter", "clique"}, optional¶

Method for handling coplanar points with adaptive kernels. Options are "raise" (raising an exception when coplanar points are present), "jitter" (randomly displace coplanar points to produce uniqueness), and "clique" (induce fully-connected sub-cliques for coplanar points). By default "raise"

n_jobs : int, optional¶

The number of jobs to run in parallel. -1 means using all processors, by default -1

search_method : {"golden_section", "interval"}, optional¶

Method used to search for optimal bandwidth. When using "golden_section", the Golden section optimization is used to find the optimal bandwidth while attempting to minimize or maximise criterion. When using "interval", fits all models within the specified bandwidths at a set interval without any attempt to optimize the selection. By default "golden_section".

criterion : str, optional¶

Criterion used to select the optimal bandwidth.

Built-in special values:

"aicc", "aic", "bic" — information criteria; only valid for linear / logistic models.
"log_loss" — cross-entropy loss; for classifiers only.
"prediction_rate" — proportion of fitted locations; classifiers.
"rmse" — root mean squared error of focal residuals; regressors.
"mae" — mean absolute error of focal residuals; regressors.

Any other string m is interpreted as an attribute name and retrieved from the fitted model as getattr(model, m + "_"). By default "aicc" for linear models and "rmse" for non-linear.

metrics : list[str] | None, optional¶

Additional metrics to report for each bandwidth. Follow the same conventions as criterion. By default None.

minimize : bool, optional¶

Minimize or maximize the criterion. For information criteria and error metrics the optimum is the lowest value; for "prediction_rate" or accuracy-like metrics it is the highest. By default True.

min_bandwidth : int | float | None, optional¶

Minimum bandwidth to consider, by default None

max_bandwidth : int | float | None, optional¶

Maximum bandwidth to consider, by default None

interval : int | float | None, optional¶

Interval for bandwidth search when using "interval" method, by default None

max_iterations : int, optional¶

Maximum number of iterations for golden section search, by default 100

tolerance : float, optional¶

Tolerance for convergence in golden section search, by default 1e-2

verbose : bool | int, optional¶

Verbosity level, by default False

**kwargs¶

Additional keyword arguments passed to model initialization

scores_[source]¶

Series of criterion scores for each bandwidth tested (index is bandwidth).

Type:¶: pd.Series

metrics_[source]¶

DataFrame of additional metrics for each bandwidth tested. For linear/logistic models, columns "aicc", "aic", "bic" are always present; they are omitted for non-linear models.

Type:¶: pd.DataFrame

optimal_bandwidth_[source]¶

The optimal bandwidth found by the search method.

Type:¶: int | float

Examples

Interval search over a small set of candidate bandwidths:

>>> import geopandas as gpd
>>> from geodatasets import get_path
>>> from gwlearn.linear_model import GWLogisticRegression
>>> from gwlearn.search import BandwidthSearch

>>> gdf = gpd.read_file(get_path('geoda.guerry'))
>>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']]
>>> y = gdf["Region"] == 'E'

>>> search = BandwidthSearch(
...     GWLogisticRegression,
...     fixed=False,
...     search_method="interval",
...     criterion="aicc",
...     min_bandwidth=20,
...     max_bandwidth=80,
...     interval=10,
...     max_iter=200,
... ).fit(X, y, geometry=gdf.representative_point())
>>> search.optimal_bandwidth_
40

Methods

`__init__`(model, *[, fixed, kernel, ...])
`fit`(X, y, geometry)	Fit the searcher by evaluating candidate bandwidths on the provided data.

Attributes

fit(X, y, geometry)[source]¶

Fit the searcher by evaluating candidate bandwidths on the provided data.

Parameters:¶

X : pd.DataFrame¶: Feature matrix used to evaluate candidate bandwidths (rows are samples).
y : pd.Series¶: Target values corresponding to X.
geometry : gpd.GeoSeries¶: Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by bandwidth, fixed, kernel, and include_focal keywords.

Returns:¶

The fitted instance.

Return type:¶

self

Notes

The optimal bandwidth is selected as the index of the minimum score if minimize=True, otherwise as the index of the maximum score.