gwlearn.base.BaseClassifier#

class gwlearn.base.BaseClassifier(model, *, bandwidth=None, fixed=False, kernel='bisquare', include_focal=False, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, measure_performance=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, min_proportion=0.2, undersample=False, leave_out=None, random_state=None, verbose=False, **kwargs)[source]#

Generic geographically weighted classification meta-class

Parameters:
modelmodel class

Scikit-learn model class

bandwidthint | float

Bandwidth value consisting of either a distance or N nearest neighbors

fixedbool, optional

True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False

kernelstr | Callable, optional

Type of kernel function used to weight observations, by default “bisquare”

include_focalbool, optional

Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for further spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default False

geometrygpd.GeoSeries, optional

Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by bandwidth, fixed, kernel, and include_focal keywords. Either geometry or graph need to be specified. To allow prediction, it is required to specify geometry.

graphGraph, optional

Custom libpysal.graph.Graph object encoding the spatial interaction between observations in the sample. If given, it is used directly and bandwidth, fixed, kernel, and include_focal keywords are ignored. Either geometry or graph need to be specified. To allow prediction, it is required to specify geometry. Potentially, both can be specified where graph encodes spatial interaction between observations in geometry.

n_jobsint, optional

The number of jobs to run in parallel. -1 means using all processors by default -1

fit_global_modelbool, optional

Determines if the global baseline model shall be fitted alongside the geographically weighted, by default True

measure_performancebool | list, optional

Calculate performance metrics for the model. If True, measures accuracy score, precision, recall, balanced accuracy, F1 scores and log loss. A subset of these can be specified by passing a list of strings. By default True

strictbool | None, optional

Do not fit any models if at least one neighborhood has invariant y, by default False. None is treated as False but provides a warning if there are invariant models.

keep_modelsbool | str | Path, optional

Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.

temp_folderstr | None, optional

Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g., /tmp. Passed to joblib.Parallel, by default None

batch_sizeint | None, optional

Number of models to process in each batch. Specify batch_size if your models do not fit into memory. By default None

min_proportionfloat, optional

Minimum proportion of minority class for a model to be fitted, by default 0.2

undersamplebool, optional

Whether to apply random undersampling to balance classes, by default False

leave_outfloat | int, optional

Leave out a fraction (when float) or a set number (when int) of random observations from each local model to be used to measure out-of-sample log loss based on pooled samples from all the models. This is useful for bandwidth selection for cases where some local models are not fitted due to local invariance and resulting information criteria are not comparable.

random_stateint | None, optional

Random seed for reproducibility, by default None

verbosebool, optional

Whether to print progress information, by default False

**kwargs

Additional keyword arguments passed to model initialisation

Attributes:
proba_pd.DataFrame

Probability predictions for focal locations based on a local model trained around the point itself.

pred_pd.Series

Binary predictions for focal locations based on a local model trained around the location itself.

hat_values_pd.Series

Hat values for each location (diagonal elements of hat matrix)

effective_df_float

Effective degrees of freedom (sum of hat values)

score_float

Accuracy score of the model based on pred_.

precision_float

Precision score of the model based on pred_.

recall_float

Recall score of the model based on pred_.

balanced_accuracy_float

Balanced accuracy score of the model based on pred_.

f1_macro_float

F1 score with macro averaging based on pred_.

f1_micro_float

F1 score with micro averaging based on pred_.

f1_weighted_float

F1 score with weighted averaging based on pred_.

log_loss_float

Log loss of the model based on pred_.

log_likelihood_float

Global log likelihood of the model

aic_float

Akaike information criterion of the model

aicc_float

Corrected Akaike information criterion to account for model complexity (smaller bandwidths)

bic_float

Bayesian information criterion

prediction_rate_float

Proportion of models that are fitted, where the rest are skipped due to not fulfilling min_proportion.

oos_log_loss_float

Out-of-sample log loss of the model. It is based on pooled data of randomly left out observations from training of local models. Log loss is measured as weighted using the set bandwidth and a kernel. Available only when leave_out is not None.

__init__(model, *, bandwidth=None, fixed=False, kernel='bisquare', include_focal=False, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, measure_performance=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, min_proportion=0.2, undersample=False, leave_out=None, random_state=None, verbose=False, **kwargs)[source]#

Methods

__init__(model, *[, bandwidth, fixed, ...])

fit(X, y)

Fit the geographically weighted model

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X, geometry)

predict_proba(X, geometry)

Predict probabiliies using the ensemble of local models

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_fit_request(*[, geometry])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_predict_proba_request(*[, geometry])

Request metadata passed to the predict_proba method.

set_predict_request(*[, geometry])

Request metadata passed to the predict method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

fit(X, y)[source]#

Fit the geographically weighted model

Parameters:
Xpd.DataFrame

Independent variables

ypd.Series

Dependent variable

predict(X, geometry)[source]#
predict_proba(X, geometry)[source]#

Predict probabiliies using the ensemble of local models

set_predict_proba_request(*, geometry='$UNCHANGED$')#

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
geometrystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for geometry parameter in predict_proba.

Returns:
selfobject

The updated object.

set_predict_request(*, geometry='$UNCHANGED$')#

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
geometrystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for geometry parameter in predict.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.