Comparison with MGWR¶

spatialml is a modern, flexible alternative to traditional geographically weighted regression libraries like mgwr. While both enable spatial regression analysis, they have fundamentally different design philosophies. Where spatialml is designed to provide a flexible framework for a wide variety of models, mgwr is designed to provide performant implementaion of (multiscale) geographically weighted linear regression.

import time

import geopandas as gpd
import numpy as np
import pandas as pd
from geodatasets import get_path
from mgwr.gwr import GWR
from mgwr.sel_bw import Sel_BW
from sklearn import metrics

from spatialml.linear_model import GWLinearRegression
from spatialml.search import BandwidthSearch

Let’s compare the two on a simple example of a suicide rate prediction.

gdf = gpd.read_file(get_path("geoda.guerry"))
X = gdf[["Crm_prp", "Litercy", "Donatns", "Lottery"]]
y = gdf["Suicids"]
geometry = gdf.representative_point()

API difference¶

The first difference you notice is the API design.

API of spatialml matches the one of scikit-learn and inherits from it, making estimators sklearn-compatible.

%%time
gw = GWLinearRegression(bandwidth=25, fixed=False)
gw.fit(X, y, geometry)

print(f"spatialml R²: {metrics.r2_score(y, gw.pred_):.4f}  AICc: {gw.aicc_:.1f}")

spatialml R²: 0.7251  AICc: 2011.2
CPU times: user 145 ms, sys: 9.03 ms, total: 154 ms
Wall time: 2.69 s

API of mgwr uses a custom logic that is similar in design but does not follow scikit-learn principles. At the same time, it requires specifically shaped Numpy arrays as input.

%%time
mg = GWR(
    geometry.get_coordinates().values,
    y.values.reshape(-1, 1),
    X.values,
    bw=25,
    fixed=False,
).fit()

print(f"mgwr R²: {metrics.r2_score(y, mg.predy):.4f}  AICc: {mg.aicc:.1f}")

mgwr R²: 0.7251  AICc: 2011.2
CPU times: user 30.5 ms, sys: 2.23 ms, total: 32.7 ms
Wall time: 59.9 ms

Performance¶

For linear regression, mgwr typically outperforms spatialml due to its optimised code. However, that comes at the cost of lower robustness, so you may encounter issues with singular matrix and similar for some local models, which spatialml typically resolves itself.

Results¶

The resulting models should match 1:1 between mgwr and spatialml, with some minor differences here and there.

print(
    f"R²:   spatialml={metrics.r2_score(y, gw.pred_):.4f},  mgwr={metrics.r2_score(y, mg.predy):.4f}"
)
print(f"AIC:  spatialml={gw.aic_:.2f}, mgwr={mg.aic:.2f}")
print(f"BIC:  spatialml={gw.bic_:.2f}, mgwr={mg.bic:.2f}")
print(f"AICc: spatialml={gw.aicc_:.2f}, mgwr={mg.aicc:.2f}")

R²:   spatialml=0.7251,  mgwr=0.7251
AIC:  spatialml=1960.75, mgwr=1960.75
BIC:  spatialml=2045.63, mgwr=2045.63
AICc: spatialml=2011.20, mgwr=2011.20

spatialml and mgwr produce the same predictions and fitted values, but AICc can differ slightly. This is because each library uses a different method to count effective model parameters. The difference reflects how model complexity is measured, not a difference in model fit.

Bandwidth search¶

The API for bandwidth search is also different. What also differs is implementation. While mgwr is fitting only a light-weight model per each tested bandwidth, spatialml is fitting complete model as if you were testing bandwidths manually within a loop.

%%time
mgwr_selector = Sel_BW(
    geometry.get_coordinates().values, y.values.reshape(-1, 1), X.values, fixed=False
)
mgwr_bw = mgwr_selector.search(criterion="AICc")

print(f"mgwr optimal bandwidth: {mgwr_bw}")

mgwr optimal bandwidth: 70.0
CPU times: user 168 ms, sys: 17.1 ms, total: 185 ms
Wall time: 193 ms

To find out the optimal bandwidth, spatialml provides a BandwidthSearch class, which trains models on a range of bandwidths and selects the most optimal one. The selection strategies (intrval search and golden section search) are the same.

%%time
spatialml_search = BandwidthSearch(
    GWLinearRegression,
    fixed=False,
    criterion="aicc",
)
spatialml_search.fit(X, y, geometry)

print(f"spatialml optimal bandwidth: {spatialml_search.optimal_bandwidth_}")

spatialml optimal bandwidth: 70
CPU times: user 679 ms, sys: 24.6 ms, total: 704 ms
Wall time: 1.11 s

So which one?¶

Generally, spatialml offers much more flexibility in terms of all model support, neighbourhood definition, or bandwidth selection metrics. On top of that, it offers complex prediction tooling, which mgwr currently lacks. If yoy are interested only in linear regression and performance is important, use mgwr. The same applies if you want deeper statistical assessment of the outcome, as siginficance of local beta coefficients, or multiscale models.

If you need more fleibility, other models or compatibility with scikit-learn, spatialml is going to be a better choice.