gwlearn.search.BandwidthSearch¶
-
class gwlearn.search.BandwidthSearch(model, *, geometry, fixed=
False, kernel='bisquare', n_jobs=-1, search_method='golden_section', criterion='aicc', metrics=None, minimize=True, min_bandwidth=None, max_bandwidth=None, interval=None, max_iterations=100, tolerance=0.01, verbose=False, **kwargs)[source]¶ Optimal bandwidth search for geographically weighted estimators.
Reports information criteria and (optionally) other scores from multiple models with varying bandwidth. When using golden section search, it minimizes (or maximizes) the chosen
criterion.When using classification models with a defined
min_proportion, keep in mind that some locations may be excluded from the final model. In such a case, the information criteria are typically not comparable across models with different bandwidths and shall not be used to determine the optimal one.- Parameters:¶
- model : type¶
A geographically weighted estimator class (e.g.
gwlearn.linear_model.GWLogisticRegression) that can be instantiated asmodel(bandwidth=..., geometry=..., fixed=..., kernel=..., n_jobs=..., ...)and exposes information criteria attributes likeaicc_/aic_/bic_.- geometry : gpd.GeoSeries, optional¶
Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by
bandwidth,fixed,kernel, andinclude_focalkeywords.- fixed : bool, optional¶
True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default
False- kernel : str | Callable, optional¶
Type of kernel function used to weight observations, by default
"bisquare"- n_jobs : int, optional¶
The number of jobs to run in parallel.
-1means using all processors, by default-1- search_method : {"golden_section", "interval"}, optional¶
Method used to search for optimal bandwidth. When using
"golden_section", the Golden section optimization is used to find the optimal bandwidth while attempting to minimize or maximisecriterion. When using"interval", fits all models within the specified bandwidths at a set interval without any attempt to optimize the selection. By default"golden_section".- criterion : str, optional¶
Criterion used to select the optimal bandwidth.
Supported values include
{"aicc", "aic", "bic", "prediction_rate", "log_loss"}. If you pass another string, it is interpreted as an attribute namemand retrieved from the fitted model asgetattr(model, m + "_"). By default"aicc".- metrics : list[str] | None, optional¶
Additional metrics to report for each bandwidth. Metrics follow the same conventions as
criterion(including special cases"log_loss"and"prediction_rate"). By defaultNone.- minimize : bool, optional¶
Minimize or maximize the
criterion. When using information criterions, like AICc, the optimal solution is the lowest value. When using other metrics, the optimal may be the highest value. By default True, assuming lower is better.- min_bandwidth : int | float | None, optional¶
Minimum bandwidth to consider, by default
None- max_bandwidth : int | float | None, optional¶
Maximum bandwidth to consider, by default
None- interval : int | float | None, optional¶
Interval for bandwidth search when using “interval” method, by default
None- max_iterations : int, optional¶
Maximum number of iterations for golden section search, by default
100- tolerance : float, optional¶
Tolerance for convergence in golden section search, by default
1e-2- verbose : bool | int, optional¶
Verbosity level, by default False
- **kwargs¶
Additional keyword arguments passed to
modelinitialization
- scores_¶
Series of criterion scores for each bandwidth tested (index is bandwidth).
- Type:¶
pd.Series
Examples
Interval search over a small set of candidate bandwidths:
>>> import geopandas as gpd >>> from geodatasets import get_path >>> from gwlearn.linear_model import GWLogisticRegression >>> from gwlearn.search import BandwidthSearch>>> gdf = gpd.read_file(get_path('geoda.guerry')) >>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']] >>> y = gdf["Region"] == 'E'>>> search = BandwidthSearch( ... GWLogisticRegression, ... geometry=gdf.representative_point(), ... fixed=False, ... search_method="interval", ... criterion="aicc", ... min_bandwidth=20, ... max_bandwidth=80, ... interval=10, ... max_iter=200, ... ).fit(X, y) >>> search.optimal_bandwidth_ np.int64(40)Methods
__init__(model, *, geometry[, fixed, ...])fit(X, y)Fit the searcher by evaluating candidate bandwidths on the provided data.
Attributes