gwlearn.linear_model.GWLinearRegression¶

class gwlearn.linear_model.GWLinearRegression(bandwidth=None, fixed=False, kernel='bisquare', include_focal=True, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, verbose=False, **kwargs)[source]¶

Geographically weighted linear regression

Fits one sklearn.linear_model.LinearRegression per focal observation using spatially varying sample weights.

The fitted object exposes focal predictions (pred_, in-sample if include_focal=True) and local goodness-of-fit summaries.

Prediction for new (out-of-sample) observations is not currently implemented for regressors.

Parameters:¶

bandwidth : float | int | None¶

Bandwidth for defining neighborhoods.

If fixed=True, this is a distance threshold.
If fixed=False, this is the number of nearest neighbors used to form the local neighborhood.

If graph is provided, bandwidth is ignored.

fixed : bool, optional¶

True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False

kernel : str | Callable, optional¶

Type of kernel function used to weight observations, by default “bisquare”

include_focal : bool, optional¶

Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for further spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default True

geometry : gpd.GeoSeries, optional¶

Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by bandwidth, fixed, kernel, and include_focal keywords. Either geometry or graph need to be specified. To allow prediction, it is required to specify geometry.

graph : Graph, optional¶

Custom libpysal.graph.Graph object encoding the spatial interaction between observations in the sample. If given, it is used directly and bandwidth, fixed, kernel, and include_focal keywords are ignored. Either geometry or graph need to be specified. To allow prediction, it is required to specify geometry. Potentially, both can be specified where graph encodes spatial interaction between observations in geometry.

n_jobs : int, optional¶

The number of jobs to run in parallel. -1 means using all processors by default -1

fit_global_model : bool, optional¶

Determines if the global baseline model shall be fitted alongside the geographically weighted, by default True

strict : bool | None, optional¶

Do not fit any models if at least one neighborhood has invariant y, by default False. None is treated as False but provides a warning if there are invariant models.

keep_models : bool | str | Path, optional¶

Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.

temp_folder : str | None, optional¶

Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g., /tmp. Passed to joblib.Parallel, by default None

batch_size : int | None, optional¶

Number of models to process in each batch. Specify batch_size if your models do not fit into memory. By default None

verbose : bool, optional¶

Whether to print progress information, by default False

**kwargs¶

Additional keyword arguments passed to sklearn.linear_model.LinearRegression initialisation

pred_¶

Focal predictions for each location.

Type:¶: pd.Series

resid_¶

Residuals for each location (y - pred_).

Type:¶: pd.Series

RSS_¶

Residual sum of squares for each location.

Type:¶: pd.Series

TSS_¶

Total sum of squares for each location.

Type:¶: pd.Series

y_bar_¶

Weighted mean of y for each location.

Type:¶: pd.Series

local_r2_¶

Local R2 for each location.

Type:¶: pd.Series

hat_values_¶

Hat values for each location (diagonal elements of hat matrix).

Type:¶: pd.Series

effective_df_¶

Effective degrees of freedom (sum of hat values).

Type:¶: float

log_likelihood_¶

Global log likelihood of the model.

Type:¶: float

aic_¶

Akaike information criterion of the model.

Type:¶: float

aicc_¶

Corrected Akaike information criterion to account for model complexity (smaller bandwidths).

Type:¶: float

bic_¶

Bayesian information criterion.

Type:¶: float

local_coef_¶

Local coefficient of the features in the decision function for each feature at each location

Type:¶: pd.DataFrame

local_intercept_¶

Local intercept values at each location

Type:¶: pd.Series

Examples

>>> import geopandas as gpd
>>> from geodatasets import get_path
>>> from gwlearn.linear_model import GWLinearRegression

>>> gdf = gpd.read_file(get_path('geoda.guerry'))
>>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']]
>>> y = gdf["Suicids"]

>>> gwr = GWLinearRegression(
...     bandwidth=30,
...     fixed=False,
...     geometry=gdf.representative_point(),
... ).fit(X, y)
>>> gwr.local_r2_.head()
0    0.614715
1    0.488495
2    0.599862
3    0.662435
4    0.662276
dtype: float64

Methods

`__init__`([bandwidth, fixed, kernel, ...])
`fit`(X, y)	Fit geographically weighted local regression models.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`score`(X, y[, sample_weight])	Return coefficient of determination on test data.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.

Attributes

fit(X, y)[source]¶

Fit geographically weighted local regression models.

Fits one local model per focal observation and stores focal (in-sample if include_focal=True) predictions in pred_.

Parameters:¶

X : pandas.DataFrame¶: Feature matrix.
y : pandas.Series¶: Target values.

Returns:¶

Fitted estimator.

Return type:¶

self

Notes

The neighborhood definition comes from either self.graph or from self.geometry + (bandwidth, fixed, kernel, include_focal).

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:¶: routing – A MetadataRequest encapsulating routing information.
Return type:¶: MetadataRequest

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:¶

deep : bool, default=True¶: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:¶

params – Parameter names mapped to their values.

Return type:¶

dict

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:¶

**params : dict¶: Estimator parameters.

Returns:¶

self – Estimator instance.

Return type:¶

estimator instance