gwlearn.base.BaseRegressor¶
-
class gwlearn.base.BaseRegressor(model, *, bandwidth=
None, fixed=False, kernel='bisquare', include_focal=False, graph=None, n_jobs=-1, fit_global_model=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, random_state=None, verbose=False, **kwargs)[source]¶ Generic geographically weighted regression meta-estimator.
This class wraps a scikit-learn-compatible regressor class and fits one local model per focal observation using spatially varying sample weights.
The fitted object exposes focal predictions (
pred_, in-sample ifinclude_focal=True) and local goodness-of-fit summaries.Prediction for new (out-of-sample) observations is not currently implemented for regressors.
Notes
Only point geometries are supported.
- Parameters:¶
- model : RegressorMixin¶
Class implementing the scikit-learn regressor API (e.g.
sklearn.linear_model.LinearRegression). The class (not an instance) is instantiated internally for each local model.- bandwidth : float | int | None¶
Bandwidth for defining neighborhoods.
If
fixed=True, this is a distance threshold.If
fixed=False, this is the number of nearest neighbors used to form the local neighborhood.
If
graphis provided,bandwidthis ignored.- fixed : bool, optional¶
True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False
- kernel : str | Callable, optional¶
Type of kernel function used to weight observations, by default “bisquare”
- include_focal : bool, optional¶
Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for further spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default False
- graph : Graph, optional¶
Custom libpysal.graph.Graph object encoding the spatial interaction between observations in the sample. If given, it is used directly and
bandwidth,fixed,kernel, andinclude_focalkeywords are ignored. Eithergeometryorgraphneed to be specified. To allow prediction, it is required to specifygeometry. Potentially, both can be specified wheregraphencodes spatial interaction between observations ingeometry.- n_jobs : int, optional¶
The number of jobs to run in parallel.
-1means using all processors by default-1- fit_global_model : bool, optional¶
Determines if the global baseline model shall be fitted alongside the geographically weighted, by default True
- strict : bool | None, optional¶
Do not fit any models if at least one neighborhood has invariant
y, by default False. None is treated as False but provides a warning if there are invariant models.- keep_models : bool | str | Path, optional¶
Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.
- temp_folder : str | None, optional¶
Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g.,
/tmp. Passed tojoblib.Parallel, by default None- batch_size : int | None, optional¶
Number of models to process in each batch. Specify batch_size if your models do not fit into memory. By default None
- random_state : int | None, optional¶
Random seed for reproducibility, by default None
- verbose : bool, optional¶
Whether to print progress information, by default False
- **kwargs¶
Additional keyword arguments passed to
modelinitialisation
- hat_values_[source]¶
Hat values for each location (diagonal elements of hat matrix).
- Type:¶
pd.Series
- aicc_[source]¶
Corrected Akaike information criterion to account for model complexity (smaller bandwidths).
Examples
>>> import geopandas as gpd >>> from geodatasets import get_path >>> from sklearn.linear_model import LinearRegression >>> from gwlearn.base import BaseRegressor>>> gdf = gpd.read_file(get_path('geoda.guerry')) >>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']] >>> y = gdf["Suicids"]>>> gwr = BaseRegressor( ... LinearRegression, ... bandwidth=30, ... fixed=False, ... include_focal=True, ... ).fit(X, y, geometry=gdf.representative_point()) >>> gwr.local_r2_.head() 0 0.614715 1 0.488495 2 0.599862 3 0.662435 4 0.662276 dtype: float64Methods
__init__(model, *[, bandwidth, fixed, ...])fit(X, y[, geometry])Fit geographically weighted local regression models.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X, geometry[, bandwidth, ...])Predict target values for new observations.
score(X, y, geometry[, bandwidth, ...])Return the coefficient of determination R^2 of the prediction.
set_fit_request(*[, geometry])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_predict_request(*[, bandwidth, ...])Configure whether metadata should be requested to be passed to the
predictmethod.set_score_request(*[, bandwidth, geometry, ...])Configure whether metadata should be requested to be passed to the
scoremethod.Attributes
-
fit(X, y, geometry=
None)[source]¶ Fit geographically weighted local regression models.
Fits one local model per focal observation and stores focal (in-sample if
include_focal=True) predictions inpred_.- Parameters:¶
- X : pandas.DataFrame¶
Feature matrix.
- y : pandas.Series¶
Target values.
- geometry : geopandas.GeoSeries | None¶
Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by
bandwidth,fixed,kernel, andinclude_focalkeywords. If None, a precomputedgraphneeds to be specified. To allow prediction, it is required to specifygeometry. If bothgraphandgeometryare specified,graphis used at the fit time, whilegeometryis used for prediction.
- Returns:¶
Fitted estimator.
- Return type:¶
self
Notes
The neighborhood definition comes from either
self.graphor fromgeometry+ (bandwidth,fixed,kernel,include_focal).
- get_metadata_routing()[source]¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing – A
MetadataRequestencapsulating routing information.- Return type:¶
MetadataRequest
-
predict(X, geometry, bandwidth=
'nearest', global_model_weight=0)[source]¶ Predict target values for new observations.
Prediction can be retrieved either from the nearest local model or based on the ensemble of local models. In the latter case, the prediction process works as follows:
For a new location on which you want a prediction, identify local models within the bandwidth used to train the model.
Apply the kernel function used to train the model to derive weights of each of the local models.
Make prediction using each of the local models in the bandwidth.
Make weighted average of predictions based on the kernel weights.
The results from the nearest and ensemble predictions are typically similar, with the ensemble being significantly slower due to the required number of inference calls.
Further the prediction can be a result of a fusion of local and global models when
global_model_weightis set to a non-zero value, following Georganos et al. [2021].- Parameters:¶
- X : pandas.DataFrame¶
Feature matrix for new observations.
- geometry : geopandas.GeoSeries¶
Point geometries for new observations.
- bandwidth : "nearest", float or None¶
Prediction method. Nearest uses the nearest location available at the fit time and does prediction using its single model. When set to a numeric value, uses an ensemble of local models available within the bandwidth, with predictions from individual models being weighted based on the distance and a set kernel. When
None, uses the bandwidth set at the fit time.- global_model_weight : float¶
Weight of the prediction from the global model. When non-zero, the resulting prediction is a weighted average of the values from local model(s) and from global model, where local prediction has a weight of 1 and global model has a weight equal to
global_model_weight.
- Returns:¶
Predicted values.
- Return type:¶
pandas.Series
Notes
Requires the estimator to have been fit with
keep_models=True(or aPath) so local models can be used at prediction time.
-
score(X, y, geometry, bandwidth=
'nearest', global_model_weight=0)[source]¶ Return the coefficient of determination R^2 of the prediction.
- Parameters:¶
- X : pandas.DataFrame¶
Feature matrix for new observations.
- y : pandas.Series¶
True values for X.
- geometry : geopandas.GeoSeries¶
Point geometries for new observations.
- bandwidth : "nearest", float or None¶
Prediction method. See predict().
- global_model_weight : float¶
Weight of the prediction from the global model.
- Returns:¶
R^2 of self.predict(X, geometry).
- Return type:¶
-
set_fit_request(*, geometry=
'$UNCHANGED$')[source]¶ Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.
-
set_predict_request(*, bandwidth=
'$UNCHANGED$', geometry='$UNCHANGED$', global_model_weight='$UNCHANGED$')[source]¶ Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:¶
- bandwidth : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED¶
Metadata routing for
bandwidthparameter inpredict.- geometry : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED¶
Metadata routing for
geometryparameter inpredict.- global_model_weight : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED¶
Metadata routing for
global_model_weightparameter inpredict.
- Returns:¶
self – The updated object.
- Return type:¶
-
set_score_request(*, bandwidth=
'$UNCHANGED$', geometry='$UNCHANGED$', global_model_weight='$UNCHANGED$')[source]¶ Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:¶
- bandwidth : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED¶
Metadata routing for
bandwidthparameter inscore.- geometry : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED¶
Metadata routing for
geometryparameter inscore.- global_model_weight : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED¶
Metadata routing for
global_model_weightparameter inscore.
- Returns:¶
self – The updated object.
- Return type:¶