Comparison with MGWR¶
SpML is a modern, flexible alternative to traditional geographically weighted regression libraries like mgwr. While both enable spatial regression analysis, they have fundamentally different design philosophies. Where spml is designed to provide a flexible framework for a wide variety of models, mgwr is designed to provide performant implementaion of (multiscale) geographically weighted linear regression.
import time
import geopandas as gpd
import numpy as np
import pandas as pd
from geodatasets import get_path
from mgwr.gwr import GWR
from mgwr.sel_bw import Sel_BW
from sklearn import metrics
from spml.linear_model import GWLinearRegression
from spml.search import BandwidthSearch
Let’s compare the two on a simple example of a suicide rate prediction.
gdf = gpd.read_file(get_path("geoda.guerry"))
X = gdf[["Crm_prp", "Litercy", "Donatns", "Lottery"]]
y = gdf["Suicids"]
geometry = gdf.representative_point()
API difference¶
The first difference you notice is the API design.
API of spml matches the one of scikit-learn and inherits from it, making estimators sklearn-compatible.
%%time
gw = GWLinearRegression(bandwidth=25, fixed=False)
gw.fit(X, y, geometry)
print(f"spml R²: {metrics.r2_score(y, gw.pred_):.4f} AICc: {gw.aicc_:.1f}")
spml R²: 0.7251 AICc: 2011.2
CPU times: user 132 ms, sys: 11.4 ms, total: 143 ms
Wall time: 2.64 s
API of mgwr uses a custom logic that is similar in design but does not follow scikit-learn principles. At the same time, it requires specifically shaped Numpy arrays as input.
%%time
mg = GWR(
geometry.get_coordinates().values,
y.values.reshape(-1, 1),
X.values,
bw=25,
fixed=False,
).fit()
print(f"mgwr R²: {metrics.r2_score(y, mg.predy):.4f} AICc: {mg.aicc:.1f}")
mgwr R²: 0.7251 AICc: 2011.2
CPU times: user 32.2 ms, sys: 196 μs, total: 32.4 ms
Wall time: 57.2 ms
Performance¶
For linear regression, mgwr typically outperforms spml due to its optimised code. However, that comes at the cost of lower robustness, so you may encounter issues with singular matrix and similar for some local models, which spml typically resolves itself.
Results¶
The resulting models should match 1:1 between mgwr and spml, with some minor differences here and there.
print(
f"R²: spml={metrics.r2_score(y, gw.pred_):.4f}, mgwr={metrics.r2_score(y, mg.predy):.4f}"
)
print(f"AIC: spml={gw.aic_:.2f}, mgwr={mg.aic:.2f}")
print(f"BIC: spml={gw.bic_:.2f}, mgwr={mg.bic:.2f}")
print(f"AICc: spml={gw.aicc_:.2f}, mgwr={mg.aicc:.2f}")
R²: spml=0.7251, mgwr=0.7251
AIC: spml=1960.75, mgwr=1960.75
BIC: spml=2045.63, mgwr=2045.63
AICc: spml=2011.20, mgwr=2011.20
spml and mgwr produce the same predictions and fitted values, but AICc can differ slightly. This is because each library uses a different method to count effective model parameters. The difference reflects how model complexity is measured, not a difference in model fit.
Bandwidth search¶
The API for bandwidth search is also different. What also differs is implementation. While mgwr is fitting only a light-weight model per each tested bandwidth, spml is fitting complete model as if you were testing bandwidths manually within a loop.
%%time
mgwr_selector = Sel_BW(
geometry.get_coordinates().values, y.values.reshape(-1, 1), X.values, fixed=False
)
mgwr_bw = mgwr_selector.search(criterion="AICc")
print(f"mgwr optimal bandwidth: {mgwr_bw}")
mgwr optimal bandwidth: 70.0
CPU times: user 163 ms, sys: 18.9 ms, total: 182 ms
Wall time: 212 ms
To find out the optimal bandwidth, spml provides a BandwidthSearch class, which trains models on a range of bandwidths and selects the most optimal one. The selection strategies (intrval search and golden section search) are the same.
%%time
spml_search = BandwidthSearch(
GWLinearRegression,
fixed=False,
criterion="aicc",
)
spml_search.fit(X, y, geometry)
print(f"spml optimal bandwidth: {spml_search.optimal_bandwidth_}")
spml optimal bandwidth: 70
CPU times: user 651 ms, sys: 15.1 ms, total: 666 ms
Wall time: 1.06 s
So which one?¶
Generally, spml offers much more flexibility in terms of all model support, neighbourhood definition, or bandwidth selection metrics. On top of that, it offers complex prediction tooling, which mgwr currently lacks. If yoy are interested only in linear regression and performance is important, use mgwr. The same applies if you want deeper statistical assessment of the outcome, as siginficance of local beta coefficients, or multiscale models.
If you need more fleibility, other models or compatibility with scikit-learn, spml is going to be a better choice.