Generating h3 Hexgrids from GeoDataFrames¶

[1]:

import geopandas  as gpd
import matplotlib.pyplot as plt
import pandas
import libpysal
from tobler.util import h3fy
from tobler.area_weighted import area_interpolate
%load_ext watermark
%watermark -v -a "author: eli knaap" -d -u -p tobler,cenpy,geopandas

Author: author: eli knaap

Last updated: 2021-01-18

Python implementation: CPython
Python version       : 3.8.6
IPython version      : 7.19.0

tobler   : 0.5.4
cenpy    : 1.0.0.post4
geopandas: 0.8.1

Note: This notebook relies on functionality from the `contextily <https://contextily.readthedocs.io/en/latest/>`__ package that provides convenient basemaps for geospatial plots, and the `cenpy <https://cenpy-devs.github.io/cenpy/>`__ package that provide a convenient interface to the U.S. Census API. These can be installed with

pip install contextily cenpy
or
conda install contextily cenpy -c conda-forge

[2]:

import contextily as ctx
from cenpy import products

Getting data from CenPy¶

To begin with, we will fetch some data from the 2017 ACS

[3]:

acs = products.ACS(2017)

We’re looking for median home value, so first we will filter the ACS tables by those containing “value” in the description so we can find the correct variable code

[4]:

acs.filter_tables('VALUE', by='description')

[4]:

	description	columns
table_name
B25075	VALUE	[B25075_001E, B25075_002E, B25075_003E, B25075...
B25076	LOWER VALUE QUARTILE (DOLLARS)	[B25076_001E]
B25077	MEDIAN VALUE (DOLLARS)	[B25077_001E]
B25078	UPPER VALUE QUARTILE (DOLLARS)	[B25078_001E]
B25079	AGGREGATE VALUE (DOLLARS) BY AGE OF HOUSEHOLDER	[B25079_001E, B25079_002E, B25079_003E, B25079...
B25080	AGGREGATE VALUE (DOLLARS) BY UNITS IN STRUCTURE	[B25080_001E, B25080_002E, B25080_003E, B25080...
B25082	AGGREGATE VALUE (DOLLARS) BY MORTGAGE STATUS	[B25082_001E, B25082_002E, B25082_003E]
B25083	MEDIAN VALUE (DOLLARS) FOR MOBILE HOMES	[B25083_001E]
B25096	MORTGAGE STATUS BY VALUE	[B25096_001E, B25096_002E, B25096_003E, B25096...
B25097	MORTGAGE STATUS BY MEDIAN VALUE (DOLLARS)	[B25097_001E, B25097_002E, B25097_003E]
B25100	MORTGAGE STATUS BY RATIO OF VALUE TO HOUSEHOLD...	[B25100_001E, B25100_002E, B25100_003E, B25100...
B25107	MEDIAN VALUE BY YEAR STRUCTURE BUILT	[B25107_001E, B25107_002E, B25107_003E, B25107...
B25108	AGGREGATE VALUE (DOLLARS) BY YEAR STRUCTURE BUILT	[B25108_001E, B25108_002E, B25108_003E, B25108...
B25109	MEDIAN VALUE BY YEAR HOUSEHOLDER MOVED INTO UNIT	[B25109_001E, B25109_002E, B25109_003E, B25109...
B25110	AGGREGATE VALUE (DOLLARS) BY YEAR HOUSEHOLDER ...	[B25110_001E, B25110_002E, B25110_003E, B25110...
B25121	HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 201...	[B25121_001E, B25121_002E, B25121_003E, B25121...
B992519	ALLOCATION OF VALUE	[B992519_001E, B992519_002E, B992519_003E]

The variable we’re looking for is B25077_001E, the median home value of each. Lets collect that data for the Washington DC metropolitan region. The next cell can take a minute or two to run, depending on the speed of your connection.

[5]:

dc = acs.from_msa('Washington-Arlington', variables=['B25077_001E'])

/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

[6]:

dc.head()

[6]:

	GEOID	geometry	B25077_001E	NAME	state	county	tract
0	51047930501	POLYGON ((-8697543.430 4642311.970, -8697525.0...	277000.0	Census Tract 9305.01, Culpeper County, Virginia	51	047	930501
1	51043010100	POLYGON ((-8692606.860 4745175.960, -8692597.2...	319600.0	Census Tract 101, Clarke County, Virginia	51	043	010100
2	51107610701	POLYGON ((-8656718.120 4739943.880, -8656659.1...	461500.0	Census Tract 6107.01, Loudoun County, Virginia	51	107	610701
3	51177020408	POLYGON ((-8654861.430 4622579.010, -8654714.2...	241500.0	Census Tract 204.08, Spotsylvania County, Virg...	51	177	020408
4	51061930402	POLYGON ((-8657404.180 4678713.200, -8657386.4...	402000.0	Census Tract 9304.02, Fauquier County, Virginia	51	061	930402

Creating Hexgrids with the `h3fy` function¶

Using the h3fy function from the tobler.util module, we can easily generate a hexgrid covering the face of the DC Metropolitan region

[7]:

dc_hex = h3fy(dc)

[8]:

fig, axs = plt.subplots(1,2, figsize=(18,10))
axs=axs.flatten()

dc.plot(ax=axs[0], alpha=0.4, linewidth=1.6, edgecolor='white')
dc_hex.plot(ax=axs[1], alpha=0.4, linewidth=1.6, edgecolor='white')

axs[0].set_title('Original Tract Data')
axs[1].set_title('Hex Grid')

for i,_ in enumerate(axs):
    ctx.add_basemap(axs[i], source=ctx.providers.Stamen.TonerLite)
    axs[i].axis('off')

../_images/notebooks_census_to_hexgrid_15_0.png

By altering the resolution parameter, we can generate grids using hexes of various sizes

[9]:

dc_hex_large = h3fy(dc, resolution=5)
dc_hex_small = h3fy(dc, resolution=7)

[10]:

fig, axs = plt.subplots(1,2, figsize=(18,10))

dc_hex_large.plot(ax=axs[0], alpha=0.4, linewidth=1.6, edgecolor='white')
dc_hex_small.plot(ax=axs[1], alpha=0.4, linewidth=1.6, edgecolor='white')

for ax in axs:
    ctx.add_basemap(ax=ax, source=ctx.providers.Stamen.TonerLite)
    ax.axis('off')

../_images/notebooks_census_to_hexgrid_18_0.png

and by using the clip parameter, we can ensure that the hexgrid is does not extend beyond the borders of the input geodataframe

[11]:

dc_hex_clipped = h3fy(dc, resolution=5, clip=True)

[12]:

fig, ax = plt.subplots(figsize=(10,10))

dc_hex_clipped.plot(ax=ax, alpha=0.4, linewidth=1.6, edgecolor='white')
ctx.add_basemap(ax=ax, source=ctx.providers.Stamen.TonerLite)
ax.axis('off')

[12]:

(-8738395.66, -8484578.799999999, 4575391.257202434, 4837245.453466552)

../_images/notebooks_census_to_hexgrid_21_1.png

Interpolating to a hexgrid¶

Thus, in just a few lines of code, we can estimate the value of census variables represented by a regular hexgrid

here, we will estimate the median home value of each hex in the DC region using simple areal interpolation

[13]:

dc_hex_interpolated = area_interpolate(source_df=dc, target_df=dc_hex, intensive_variables=['B25077_001E'])

/Users/knaaptime/Dropbox/projects/tobler/tobler/util/util.py:32: UserWarning: nan values in variable: B25077_001E, replacing with 0
  warn(f"nan values in variable: {column}, replacing with 0")

[14]:

fig, axs = plt.subplots(1,2, figsize=(20,10))
dc.plot('B25077_001E', scheme='quantiles', alpha=0.5, ax=axs[0])

dc_hex_interpolated.plot('B25077_001E', scheme='quantiles', alpha=0.5, ax=axs[1])

axs[0].set_title('Original Data')
axs[1].set_title('Interpolated Data')

for ax in axs:
    ctx.add_basemap(ax=ax, source=ctx.providers.Stamen.TonerLite)
    ax.axis('off')
plt.suptitle('Spatial Interpolation with the PySAL $\mathtt{tobler}$ package', fontsize=16)

[14]:

Text(0.5, 0.98, 'Spatial Interpolation with the PySAL $\\mathtt{tobler}$ package')

../_images/notebooks_census_to_hexgrid_26_1.png

Generating h3 Hexgrids from GeoDataFrames¶

Getting data from CenPy¶

Creating Hexgrids with the h3fy function¶

Interpolating to a hexgrid¶

Creating Hexgrids with the `h3fy` function¶