Comparison with MGWR¶

gwlearn is a modern, flexible alternative to traditional geographically weighted regression libraries like mgwr. While both enable spatial regression analysis, they have fundamentally different design philosophies. Where gwlearn is designed to provide a flexible framework for a wide variety of models, mgwr is designed to provide performant implementaion of (multiscale) geographically weighted linear regression.

import time

import geopandas as gpd
import numpy as np
import pandas as pd
from geodatasets import get_path
from mgwr.gwr import GWR
from mgwr.sel_bw import Sel_BW
from sklearn import metrics

from gwlearn.linear_model import GWLinearRegression
from gwlearn.search import BandwidthSearch

Let’s compare the two on a simple example of a suicide rate prediction.

gdf = gpd.read_file(get_path("geoda.guerry"))
X = gdf[["Crm_prp", "Litercy", "Donatns", "Lottery"]]
y = gdf["Suicids"]
geometry = gdf.representative_point()

API difference¶

The first difference you notice is the API design.

API of gwlearn matches the one of scikit-learn and inherits from it, making estimators sklearn-compatible.

%%time
gw = GWLinearRegression(bandwidth=25, fixed=False)
gw.fit(X, y, geometry)

print(f"gwlearn R²: {metrics.r2_score(y, gw.pred_):.4f}  AICc: {gw.aicc_:.1f}")

gwlearn R²: 0.7251  AICc: 2007.4
CPU times: user 104 ms, sys: 9 ms, total: 113 ms
Wall time: 1.93 s

API of mgwr uses a custom logic that is similar in design but does not follow scikit-learn principles. At the same time, it requires specifically shaped Numpy arrays as input.

%%time
mg = GWR(
    geometry.get_coordinates().values,
    y.values.reshape(-1, 1),
    X.values,
    bw=25,
    fixed=False,
).fit()

print(f"mgwr R²: {metrics.r2_score(y, mg.predy):.4f}  AICc: {mg.aicc:.1f}")

mgwr R²: 0.7251  AICc: 2011.2
CPU times: user 28.5 ms, sys: 835 μs, total: 29.4 ms
Wall time: 58.7 ms

Performance¶

For linear regression, mgwr typically outperforms gwlearn due to its optimised code. However, that comes at the cost of lower robustness, so you may encounter issues with singular matrix and similar for some local models, which gwlearn typically resolves itself.

Results¶

The resulting models should match 1:1 between mgwr and gwlearn, with some minor differences here and there.

print(
    f"R²:   gwlearn={metrics.r2_score(y, gw.pred_):.4f},  mgwr={metrics.r2_score(y, mg.predy):.4f}"
)
print(f"AIC:  gwlearn={gw.aic_:.2f}, mgwr={mg.aic:.2f}")
print(f"BIC:  gwlearn={gw.bic_:.2f}, mgwr={mg.bic:.2f}")
print(f"AICc: gwlearn={gw.aicc_:.2f}, mgwr={mg.aicc:.2f}")

R²:   gwlearn=0.7251,  mgwr=0.7251
AIC:  gwlearn=1960.75, mgwr=1960.75
BIC:  gwlearn=2045.63, mgwr=2045.63
AICc: gwlearn=2007.43, mgwr=2011.20

gwlearn and mgwr produce the same predictions and fitted values, but AICc can differ slightly. This is because each library uses a different method to count effective model parameters. The difference reflects how model complexity is measured, not a difference in model fit.

Bandwidth search¶

The API for bandwidth search is also different. What also differs is implementation. While mgwr is fitting only a light-weight model per each tested bandwidth, gwlearn is fitting complete model as if you were testing bandwidths manually within a loop.

%%time
mgwr_selector = Sel_BW(
    geometry.get_coordinates().values, y.values.reshape(-1, 1), X.values, fixed=False
)
mgwr_bw = mgwr_selector.search(criterion="AICc")

print(f"mgwr optimal bandwidth: {mgwr_bw}")

mgwr optimal bandwidth: 70.0
CPU times: user 153 ms, sys: 16.3 ms, total: 170 ms
Wall time: 199 ms

To find out the optimal bandwidth, gwlearn provides a BandwidthSearch class, which trains models on a range of bandwidths and selects the most optimal one. The selection strategies (intrval search and golden section search) are the same.

%%time
gwlearn_search = BandwidthSearch(
    GWLinearRegression,
    fixed=False,
    criterion="aicc",
)
gwlearn_search.fit(X, y, geometry)

print(f"gwlearn optimal bandwidth: {gwlearn_search.optimal_bandwidth_}")

gwlearn optimal bandwidth: 70
CPU times: user 554 ms, sys: 17.6 ms, total: 572 ms
Wall time: 877 ms

So which one?¶

Generally, gwlearn offers much more flexibility in terms of all model support, neighbourhood definition, or bandwidth selection metrics. On top of that, it offers complex prediction tooling, which mgwr currently lacks. If yoy are interested only in linear regression and performance is important, use mgwr. The same applies if you want deeper statistical assessment of the outcome, as siginficance of local beta coefficients, or multiscale models.

If you need more fleibility, other models or compatibility with scikit-learn, gwlearn is going to be a better choice.