gwlearn.linear_model.GWLogisticRegression#
- class gwlearn.linear_model.GWLogisticRegression(bandwidth=None, fixed=False, kernel='bisquare', include_focal=True, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, measure_performance=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, undersample=False, leave_out=None, **kwargs)[source]#
Geographically weighted logistic regression
- Parameters:
- bandwidth
int|float Bandwidth value consisting of either a distance or N nearest neighbors
- fixedbool,
optional True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False
- kernel
str|Callable,optional Type of kernel function used to weight observations, by default “bisquare”
- include_focalbool,
optional Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for further spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default True
- geometry
gpd.GeoSeries,optional Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by
bandwidth,fixed,kernel, andinclude_focalkeywords. Eithergeometryorgraphneed to be specified. To allow prediction, it is required to specifygeometry.- graph
Graph,optional Custom libpysal.graph.Graph object encoding the spatial interaction between observations in the sample. If given, it is used directly and
bandwidth,fixed,kernel, andinclude_focalkeywords are ignored. Eithergeometryorgraphneed to be specified. To allow prediction, it is required to specifygeometry. Potentially, both can be specified wheregraphencodes spatial interaction between observations ingeometry.- n_jobs
int,optional The number of jobs to run in parallel.
-1means using all processors by default-1- fit_global_modelbool,
optional Determines if the global baseline model shall be fitted alongside the geographically weighted, by default True
- measure_performancebool |
list,optional Calculate performance metrics for the model. If True, measures accuracy score, precision, recall, balanced accuracy, and F1 scores. A subset of these can be specified by passing a list of strings. By default True
- strictbool |
None,optional Do not fit any models if at least one neighborhood has invariant
y, by default False. None is treated as False but provides a warning if there are invariant models.- keep_modelsbool |
str|Path,optional Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.
- temp_folder
str|None,optional Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g.,
/tmp. Passed tojoblib.Parallel, by default None- batch_size
int|None,optional Number of models to process in each batch. Specify batch_size if your models do not fit into memory. By default None
- min_proportion
float,optional Minimum proportion of minority class for a model to be fitted, by default 0.2
- undersamplebool,
optional Whether to apply random undersampling to balance classes, by default False
- leave_out
float|int,optional Leave out a fraction (when float) or a set number (when int) of random observations from each local model to be used to measure out-of-sample log loss based on pooled samples from all the models. This is useful for bandwidth selection for cases where some local models are not fitted due to local invariance and resulting information criteria are not comparable.
- random_state
int|None,optional Random seed for reproducibility, by default None
- verbosebool,
optional Whether to print progress information, by default False
- **kwargs
Additional keyword arguments passed to
modelinitialisation
- bandwidth
- Attributes:
- proba_
pd.DataFrame Probability predictions for focal locations based on a local model trained around the point itself.
- pred_
pd.Series Binary predictions for focal locations based on a local model trained around the location itself.
- hat_values_
pd.Series Hat values for each location (diagonal elements of hat matrix)
- effective_df_
float Effective degrees of freedom (sum of hat values)
- score_
float Accuracy score of the model based on
pred_.- precision_
float Precision score of the model based on
pred_.- recall_
float Recall score of the model based on
pred_.- balanced_accuracy_
float Balanced accuracy score of the model based on
pred_.- f1_macro_
float F1 score with macro averaging based on
pred_.- f1_micro_
float F1 score with micro averaging based on
pred_.- f1_weighted_
float F1 score with weighted averaging based on
pred_.- log_loss_
float Log loss of the model based on
pred_.- log_likelihood_
float Global log likelihood of the model
- aic_
float Akaike information criterion of the model
- aicc_
float Corrected Akaike information criterion to account for model complexity (smaller bandwidths)
- bic_
float Bayesian information criterion
- local_coef_
pd.DataFrame Local coefficient of the features in the decision function for each feature at each location
- local_intercept_
pd.Series Local intercept values at each location
- pooled_score_
float Accuracy score of pooled predictions from local models
- pooled_precision_
float Precision score of pooled predictions from local models
- pooled_recall_
float Recall score of pooled predictions from local models
- pooled_balanced_accuracy_
float Balanced accuracy score of pooled predictions from local models
- pooled_f1_macro_
float F1 score with macro averaging for pooled predictions from local models
- pooled_f1_micro_
float F1 score with micro averaging for pooled predictions from local models
- pooled_f1_weighted_
float F1 score with weighted averaging for pooled predictions from local models
- local_pooled_score_
pd.Series Local accuracy scores for each location based on all samples used in each local model
- local_pooled_precision_
pd.Series Local precision scores for each location based on all samples used in each local model
- local_pooled_recall_
pd.Series Local recall scores for each location based on all samples used in each local model
- local_pooled_balanced_accuracy_
pd.Series Local balanced accuracy scores for each location based on all samples used in each local model
- local_pooled_f1_macro_
pd.Series Local F1 scores with macro averaging for each location based on all samples used in each local model
- local_pooled_f1_micro_
pd.Series Local F1 scores with micro averaging for each location based on all samples used in each local model
- local_pooled_f1_weighted_
pd.Series Local F1 scores with weighted averaging for each location based on all samples used in each local model
- prediction_rate_
float Proportion of models that are fitted, where the rest are skipped due to not fulfilling
min_proportion.- oos_log_loss_
float Out-of-sample log loss of the model. It is based on pooled data of randomly left out observations from training of local models. Log loss is measured as weighted using the set bandwidth and a kernel. Available only when
leave_outis not None.
- proba_
- __init__(bandwidth=None, fixed=False, kernel='bisquare', include_focal=True, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, measure_performance=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, undersample=False, leave_out=None, **kwargs)[source]#
Methods
__init__([bandwidth, fixed, kernel, ...])fit(X, y)Fit the geographically weighted model
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X, geometry)predict_proba(X, geometry)Predict probabiliies using the ensemble of local models
score(X, y[, sample_weight])Return accuracy on provided data and labels.
set_fit_request(*[, geometry])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_predict_proba_request(*[, geometry])Configure whether metadata should be requested to be passed to the
predict_probamethod.set_predict_request(*[, geometry])Configure whether metadata should be requested to be passed to the
predictmethod.set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.- fit(X, y)[source]#
Fit the geographically weighted model
- Parameters:
- X
pd.DataFrame Independent variables
- y
pd.Series Dependent variable
- X
- set_predict_proba_request(*, geometry='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
predict_probamethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredict_probaif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict_proba.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_predict_request(*, geometry='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.