Using custom graphs¶

gwlearn’s geographically weighted modelling framework is built on top of libpysal.graph.Graph objects, which are typically generated on-the-fly based on the bandwidth and kernel specification. However, you can easily derive the graph in other way and use it wihtin the model, instead of relying on its limited distance and KNN builders.

Below is a basic example using linear regression and a graph redived from travel cost.

import geodatasets
import geopandas as gpd
import pandarm
from libpysal.graph import Graph

from gwlearn.linear_model import GWLinearRegression

Get some data you want to model. This dataset is assumed to contain both dependent and independent variables.

df = gpd.read_file(geodatasets.get_path("geoda Cincinnati"))

Generate a pandarm.Network based on the extent of the dataset. This will automatically pull the data for street network from OpenStretMap and prepare a network that can be later queried. All points will be linked to this network and accessibility will be measured alongside its edges.

network = pandarm.Network.from_gdf(df)

/Users/martin/dev/pandarm/pandarm/loaders/osm.py:68: UserWarning: GeoDataFrame is stored in coordinate system PROJCS["Lambert_Conformal_Conic",GEOGCS["GCS_GRS 1980(IUGG, 1980)",DATUM["unknown",SPHEROID["GRS80",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",38],PARAMETER["central_meridian",-82.5],PARAMETER["standard_parallel_1",40.0333333333333],PARAMETER["standard_parallel_2",38.7333333333333],PARAMETER["false_easting",1968500],PARAMETER["false_northing",0],UNIT["US survey foot",0.304800609601219,AUTHORITY["EPSG","9003"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] so the pandana.Network will also be stored in this system
  warn(

Generating contraction hierarchies with 8 threads.
Setting CH node vector of size 6004
Setting CH edge vector of size 17060
Range graph removed 17492 edges of 34120
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%

Use the network to build a graph with a set distance threshold and a kernel transforming the actual distance to a distance decay weight.

G = Graph.build_travel_cost(
    df.set_geometry(df.representative_point()), network, threshold=1500, kernel="bisquare"
)

Check the properties of the resulting graph. You can see that on average, you have approximately 130 neighbors.

G.summary()

Graph Summary Statistics
Number of nodes:	457
Number of edges:	60055
Number of connected components:	1
Number of isolates:	0
Number of non-zero edges:	60055
Percentage of non-zero edges:	28.76%
Number of asymmetries:	NA

Cardinalities
Mean:	131	25%:	80
Standard deviation:	60	50%	137
Min:	13	75%:	182
Max:	237

Weights
Mean:	0	25%:	0
Standard deviation:	0	50%	0
Min:	0	75%:	1
Max:	1

Sum of weights and Traces
S0:	20238	GG:	11847
S1:	23694	G'G:	11847
S3:	4546775	G'G + GG:	23694

Graph indexed by: [0, 1, 2, 3, 4, ...]

Fit the regression.

gwlr = GWLinearRegression(
    graph=G
)
gwlr.fit(
    X=df[
        [
            "AGE_0_5",
            "AGE_5_9",
            "AGE_10_14",
            "AGE_15_19",
            "AGE_20_24",
            "AGE_25_34",
            "AGE_35_44",
            "AGE_45_54",
            "AGE_55_59",
            "AGE_60_64",
            "AGE_65_74",
            "AGE_75_84",
            "AGE_85",
        ]
    ],
    y=df["WHITE"],
)

GWLinearRegression(graph=<Graph of 457 nodes and 60055 nonzero edges indexed by
 [0, 1, 2, 3, 4, ...]>)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Done. Now you can extract whatever the object contain. Like prediction.

gwlr.pred_

    384.603824
     17.493468
     17.012252
     17.465274
     46.531855
          ...    
    5.448992
    0.418611
    0.304021
    1.147532
    2.536903
Length: 457, dtype: float64

Or local R2.

gwlr.local_r2_

    0.951705
    0.986416
    0.988294
    0.974153
    0.983152
         ...   
  0.792927
  0.674607
  0.715204
  0.873481
  0.856795
Length: 457, dtype: float64

That can be plotted.

df.plot(gwlr.local_r2_, legend=True, vmin=0, vmax=1).set_axis_off()

_images/819f038f35220f8e831b1906fc5c489b2ee45c2b01a1f06cc8409cacc94dcbdf.png

	bandwidth	None
	fixed	False
	kernel	'bisquare'
	include_focal	True
	geometry	None
	graph	<Graph of 457...2, 3, 4, ...]>
	n_jobs	-1
	fit_global_model	True
	keep_models	False
	temp_folder	None
	batch_size	None