gwlearn.base.BaseRegressor

class gwlearn.base.BaseRegressor(model, *, bandwidth=None, fixed=False, kernel='bisquare', include_focal=False, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, verbose=False, **kwargs)[source]

Generic geographically weighted regression meta-estimator.

This class wraps a scikit-learn-compatible regressor class and fits one local model per focal observation using spatially varying sample weights.

The fitted object exposes focal predictions (pred_, in-sample if include_focal=True) and local goodness-of-fit summaries.

Prediction for new (out-of-sample) observations is not currently implemented for regressors.

Notes

  • Only point geometries are supported.

Parameters:
model : RegressorMixin

Class implementing the scikit-learn regressor API (e.g. sklearn.linear_model.LinearRegression). The class (not an instance) is instantiated internally for each local model.

bandwidth : float | int | None

Bandwidth for defining neighborhoods.

  • If fixed=True, this is a distance threshold.

  • If fixed=False, this is the number of nearest neighbors used to form the local neighborhood.

If graph is provided, bandwidth is ignored.

fixed : bool, optional

True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False

kernel : str | Callable, optional

Type of kernel function used to weight observations, by default “bisquare”

include_focal : bool, optional

Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for further spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default False

geometry : gpd.GeoSeries, optional

Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by bandwidth, fixed, kernel, and include_focal keywords. Either geometry or graph need to be specified. To allow prediction, it is required to specify geometry.

graph : Graph, optional

Custom libpysal.graph.Graph object encoding the spatial interaction between observations in the sample. If given, it is used directly and bandwidth, fixed, kernel, and include_focal keywords are ignored. Either geometry or graph need to be specified. To allow prediction, it is required to specify geometry. Potentially, both can be specified where graph encodes spatial interaction between observations in geometry.

n_jobs : int, optional

The number of jobs to run in parallel. -1 means using all processors by default -1

fit_global_model : bool, optional

Determines if the global baseline model shall be fitted alongside the geographically weighted, by default True

strict : bool | None, optional

Do not fit any models if at least one neighborhood has invariant y, by default False. None is treated as False but provides a warning if there are invariant models.

keep_models : bool | str | Path, optional

Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.

temp_folder : str | None, optional

Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g., /tmp. Passed to joblib.Parallel, by default None

batch_size : int | None, optional

Number of models to process in each batch. Specify batch_size if your models do not fit into memory. By default None

verbose : bool, optional

Whether to print progress information, by default False

**kwargs

Additional keyword arguments passed to model initialisation

pred_

Focal predictions for each location.

Type:

pd.Series

resid_

Residuals for each location (y - pred_).

Type:

pd.Series

RSS_

Residual sum of squares for each location.

Type:

pd.Series

TSS_

Total sum of squares for each location.

Type:

pd.Series

y_bar_

Weighted mean of y for each location.

Type:

pd.Series

local_r2_

Local R2 for each location.

Type:

pd.Series

hat_values_

Hat values for each location (diagonal elements of hat matrix).

Type:

pd.Series

effective_df_

Effective degrees of freedom (sum of hat values).

Type:

float

log_likelihood_

Global log likelihood of the model.

Type:

float

aic_

Akaike information criterion of the model.

Type:

float

aicc_

Corrected Akaike information criterion to account for model complexity (smaller bandwidths).

Type:

float

bic_

Bayesian information criterion.

Type:

float

Examples

>>> import geopandas as gpd
>>> from geodatasets import get_path
>>> from sklearn.linear_model import LinearRegression
>>> from gwlearn.base import BaseRegressor
>>> gdf = gpd.read_file(get_path('geoda.guerry'))
>>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']]
>>> y = gdf["Suicids"]
>>> gwr = BaseRegressor(
...     LinearRegression,
...     bandwidth=30,
...     fixed=False,
...     include_focal=True,
...     geometry=gdf.representative_point(),
... ).fit(X, y)
>>> gwr.local_r2_.head()
0    0.614715
1    0.488495
2    0.599862
3    0.662435
4    0.662276
dtype: float64

Methods

__init__(model, *[, bandwidth, fixed, ...])

fit(X, y)

Fit geographically weighted local regression models.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

score(X, y[, sample_weight])

Return coefficient of determination on test data.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Configure whether metadata should be requested to be passed to the score method.

Attributes

fit(X, y)[source]

Fit geographically weighted local regression models.

Fits one local model per focal observation and stores focal (in-sample if include_focal=True) predictions in pred_.

Parameters:
X : pandas.DataFrame

Feature matrix.

y : pandas.Series

Target values.

Returns:

Fitted estimator.

Return type:

self

Notes

The neighborhood definition comes from either self.graph or from self.geometry + (bandwidth, fixed, kernel, include_focal).

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance