gwlearn.ensemble.GWGradientBoostingClassifier#
- class gwlearn.ensemble.GWGradientBoostingClassifier(bandwidth, fixed=False, kernel='bisquare', include_focal=False, graph=None, n_jobs=-1, fit_global_model=True, measure_performance=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, **kwargs)[source]#
Geographically weighted gradient boosting classifier
- Parameters:
- bandwidth
int
|float
bandwidth value consisting of either a distance or N nearest neighbors
- fixedbool,
optional
True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False
- kernel
str
|Callable
,optional
type of kernel function used to weight observations, by default “bisquare”
- include_focalbool,
optional
Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for futher spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default False
- graph
Graph
,optional
Custom libpysal.graph.Graph object encoding the spatial interaction between observations. If given, it is used directly and bandwidth, fixed, kernel, and include_focal keywords are ignored.
- n_jobs
int
,optional
The number of jobs to run in parallel.
-1
means using all processors by default-1
- fit_global_modelbool,
optional
Determines if the global baseline model shall be fitted alognside the geographically weighted, by default True
- measure_performancebool,
optional
Calculate performance metrics for the model, by default True
- strictbool |
None
,optional
Do not fit any models if at least one neighborhood has invariant
y
, by default False. None is treated as False but provides a warning if there are invariant models.- keep_modelsbool |
str
|Path
,optional
Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.
- temp_folder
str
|None
,optional
Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g.,
/tmp
. Passed tojoblib.Parallel
, by default None- batch_size
int
|None
,optional
Number of models to process in each batch. Specify batch_size fi your models do not fit into memory. By default None
- min_proportion
float
,optional
Minimum proportion of minority class for a model to be fitted, by default 0.2
- undersamplebool,
optional
Whether to apply random undersampling to balance classes, by default False
- random_state
int
|None
,optional
Random seed for reproducibility, by default None
- verbosebool,
optional
Whether to print progress information, by default False
- **kwargs
Additional keyword arguments passed to
model
initialisation
- bandwidth
- Attributes:
- proba_
pd.DataFrame
Probability predictions for focal locations based on a local model trained around the point itself.
- pred_
pd.Series
Binary predictions for focal locations based on a local model trained around the location itself.
- hat_values_
pd.Series
Hat values for each location (diagonal elements of hat matrix)
- effective_df_
float
Effective degrees of freedom (sum of hat values)
- score_
float
Accuracy score of the model based on
pred_
.- precision_
float
Precision score of the model based on
pred_
.- recall_
float
Recall score of the model based on
pred_
.- balanced_accuracy_
float
Balanced accuracy score of the model based on
pred_
.- f1_macro_
float
F1 score with macro averaging based on
pred_
.- f1_micro_
float
F1 score with micro averaging based on
pred_
.- f1_weighted_
float
F1 score with weighted averaging based on
pred_
.- log_likelihood_
float
Global log likelihood of the model
- aic_
float
Akaike inofrmation criterion of the model
- aicc_
float
Corrected Akaike information criterion to account to account for model complexity (smaller bandwidths)
- bic_
float
Bayesian information criterion
- feature_importances_
pd.DataFrame
Feature importance values for each local model
- proba_
- __init__(bandwidth, fixed=False, kernel='bisquare', include_focal=False, graph=None, n_jobs=-1, fit_global_model=True, measure_performance=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, **kwargs)[source]#
Methods
__init__
(bandwidth[, fixed, kernel, ...])fit
(X, y, geometry)Fit the geographically weighted model
get_metadata_routing
()Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
predict
(X, geometry)predict_proba
(X, geometry)Predict probabiliies using the ensemble of local models
score
(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_fit_request
(*[, geometry])Request metadata passed to the
fit
method.set_params
(**params)Set the parameters of this estimator.
set_predict_proba_request
(*[, geometry])Request metadata passed to the
predict_proba
method.set_predict_request
(*[, geometry])Request metadata passed to the
predict
method.set_score_request
(*[, sample_weight])Request metadata passed to the
score
method.- fit(X, y, geometry)[source]#
Fit the geographically weighted model
- Parameters:
- X
pd.DataFrame
Independent variables
- y
pd.Series
Dependent variable
- geometry
gpd.GeoSeries
Geographic location
- X
- set_fit_request(*, geometry='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_predict_proba_request(*, geometry='$UNCHANGED$')#
Request metadata passed to the
predict_proba
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict_proba
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict_proba
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_predict_request(*, geometry='$UNCHANGED$')#
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.