gwlearn.ensemble.GWGradientBoostingClassifier¶
-
class gwlearn.ensemble.GWGradientBoostingClassifier(*, bandwidth=
None, fixed=False, kernel='bisquare', include_focal=False, geometry=None, graph=None, n_jobs=-1, fit_global_model=True, strict=False, keep_models=False, temp_folder=None, batch_size=None, min_proportion=0.2, undersample=False, random_state=None, verbose=False, **kwargs)[source]¶ Geographically weighted gradient boosting classifier.
Fits one
sklearn.ensemble.GradientBoostingClassifierper focal observation using spatially varying sample weights.The spatial interaction is defined either by (a)
geometry+ bandwidth/kernel settings or (b) a precomputedlibpysal.graph.Graphpassed viagraph.Notes
ymust be binary ({0, 1}or boolean).To enable prediction on new data via
predict()/predict_proba(), you must setkeep_models=True(store in memory) orkeep_models=Path(...)(serialize to disk).Only point geometries are supported.
- Parameters:¶
- bandwidth : float | int | None¶
Bandwidth for defining neighborhoods.
If
fixed=True, this is a distance threshold.If
fixed=False, this is the number of nearest neighbors used to form the local neighborhood.
If
graphis provided,bandwidthis ignored.- fixed : bool, optional¶
True for distance based bandwidth and False for adaptive (nearest neighbor) bandwidth, by default False
- kernel : str | Callable, optional¶
type of kernel function used to weight observations, by default “bisquare”
- include_focal : bool, optional¶
Include focal in the local model training. Excluding it allows assessment of geographically weighted metrics on unseen data without a need for train/test split, hence providing value for all samples. This is needed for futher spatial analysis of the model performance (and generalises to models that do not support OOB scoring). However, it leaves out the most representative sample. By default False
- geometry : gpd.GeoSeries, optional¶
Geographic location of the observations in the sample. Used to determine the spatial interaction weight based on specification by
bandwidth,fixed,kernel, andinclude_focalkeywords. Eithergeometryorgraphneed to be specified. To allow prediction, it is required to specifygeometry.- graph : Graph, optional¶
Custom libpysal.graph.Graph object encoding the spatial interaction between observations in the sample. If given, it is used directly and
bandwidth,fixed,kernel, andinclude_focalkeywords are ignored. Eithergeometryorgraphneed to be specified. To allow prediction, it is required to specifygeometry. Potentially, both can be specified wheregraphencodes spatial interaction between observations ingeometry.- n_jobs : int, optional¶
The number of jobs to run in parallel.
-1means using all processors by default-1- fit_global_model : bool, optional¶
Determines if the global baseline model shall be fitted alognside the geographically weighted, by default True
- strict : bool | None, optional¶
Do not fit any models if at least one neighborhood has invariant
y, by default False. None is treated as False but provides a warning if there are invariant models.- keep_models : bool | str | Path, optional¶
Keep all local models (required for prediction), by default False. Note that for some models, like random forests, the objects can be large. If string or Path is provided, the local models are not held in memory but serialized to the disk from which they are loaded in prediction.
- temp_folder : str | None, optional¶
Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes, e.g.,
/tmp. Passed tojoblib.Parallel, by default None- batch_size : int | None, optional¶
Number of models to process in each batch. Specify batch_size fi your models do not fit into memory. By default None
- min_proportion : float, optional¶
Minimum proportion of minority class for a model to be fitted, by default 0.2
- undersample : bool | float, optional¶
Whether to apply random undersampling to balance classes.
If
True, undersample the majority class to match the minority class (i.e., minority/majority ratio = 1.0).If a float
alpha > 0, target a minority/majority ratio ofalphaafter resampling, i.e.alpha = N_min / N_resampled_majority. By default False- random_state : int | None, optional¶
Random seed for reproducibility, by default None
- verbose : bool, optional¶
Whether to print progress information, by default False
- **kwargs¶
Additional keyword arguments passed to
modelinitialisation
- proba_¶
Probability predictions for focal locations based on a local model trained around the point itself.
- Type:¶
pd.DataFrame
- pred_¶
Binary predictions for focal locations based on a local model trained around the location itself.
- Type:¶
pd.Series
- aicc_¶
Corrected Akaike information criterion to account to account for model complexity (smaller bandwidths)
- prediction_rate_¶
Proportion of models that are fitted, where the rest are skipped due to not fulfilling
min_proportion.
Examples
>>> import geopandas as gpd >>> from geodatasets import get_path >>> from gwlearn.ensemble import GWGradientBoostingClassifier>>> gdf = gpd.read_file(get_path('geoda.guerry')) >>> X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']] >>> y = gdf["Region"] == 'E'>>> gw = GWGradientBoostingClassifier( ... bandwidth=30, ... fixed=False, ... geometry=gdf.representative_point(), ... random_state=0, ... ).fit(X, y) >>> gw.pred_.head() 0 False 1 False 2 False 3 True 4 True dtype: booleanMethods
__init__(*[, bandwidth, fixed, kernel, ...])fit(X, y)Fit geographically weighted gradient boosting classifiers.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
local_metric(func, *args, **kwargs)Compute a metric per fitted local model.
predict(X, geometry)Predict classes for new observations.
predict_proba(X, geometry)Predict class probabilities for new observations.
score(X, y[, sample_weight])Return accuracy on provided data and labels.
set_params(**params)Set the parameters of this estimator.
set_predict_proba_request(*[, geometry])Configure whether metadata should be requested to be passed to the
predict_probamethod.set_predict_request(*[, geometry])Configure whether metadata should be requested to be passed to the
predictmethod.set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.Attributes
- fit(X, y)[source]¶
Fit geographically weighted gradient boosting classifiers.
Notes
Populates
feature_importances_from the fitted local models.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing – A
MetadataRequestencapsulating routing information.- Return type:¶
MetadataRequest
- predict(X, geometry)¶
Predict classes for new observations.
This is equivalent to
predict_proba(...).idxmax(axis=1).- Parameters:¶
- X : pandas.DataFrame¶
Feature matrix for new observations.
- geometry : geopandas.GeoSeries¶
Point geometries for new observations.
- Returns:¶
Predicted class.
- Return type:¶
pandas.Series
Notes
Requires the estimator to have been fit with
keep_models=True(or aPath) so local models can be used at prediction time.
- predict_proba(X, geometry)¶
Predict class probabilities for new observations.
- Parameters:¶
- X : pandas.DataFrame¶
Feature matrix for new observations.
- geometry : geopandas.GeoSeries¶
Point geometries for new observations.
- Returns:¶
Predicted probabilities with columns equal to the global classes observed during fit.
- Return type:¶
pandas.DataFrame
Notes
Requires the estimator to have been fit with
keep_models=True(or aPath) so local models can be used at prediction time.