Implementing Binary Dasymetric Interpolation¶
Following an NHGIS-like Methodology¶
[1]:
import geopandas as gpd
import pandas as pd
import osmnx as ox
import contextily as ctx
import numpy as np
import matplotlib.pyplot as plt
from tobler.area_weighted import area_interpolate
from tobler.dasymetric import extract_raster_features
from tobler.util import h3fy
%load_ext watermark
%watermark -a 'eli knaap' -v -d -u -p tobler,geopandas,rasterio
Author: eli knaap
Last updated: 2021-01-27
Python implementation: CPython
Python version : 3.8.6
IPython version : 7.19.0
tobler : 0.6.0
geopandas: 0.8.1
rasterio : 1.1.5
NHGIS uses a two step process, in which they first apply a binary dasymetric filter, then apply target density weighting. Here we will follow a similar strategy, relplicating their BD filtering (albeit with slightly different open data and replicable, open source tooling) before carrying out a simpler areal interpolation.
In NHGIS’s BD model for 2000 block data, the inhabited zone consists of all areas that are at least 5% developed impervious surface (within each 30-meter square cell of NLCD 2001 data) and lie within 300 feet of a residential road center line2 but not in a water body, using road definitions and water polygons from the 2010 TIGER/Line Shapefiles.
so we’ll start by grabbing residential streets from OSM, buffering them out 300 feet, then using that geometry to select the impervious surfaces within
[2]:
rside_streets = ox.geometries_from_place('Riverside County, California',tags={'highway':'residential'})
[3]:
rside_streets = rside_streets.to_crs(2230)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[4]:
rside_streets = rside_streets.buffer(300)
[5]:
rside_streets = gpd.GeoDataFrame(geometry=rside_streets)
[6]:
rside_streets.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 49402 entries, 0 to 49401
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geometry 49402 non-null geometry
dtypes: geometry(1)
memory usage: 771.9 KB
Now we use the street buffers to extract any cells between 5 and 100% impervious surface
First, download NLCD data from here: https://s3-us-west-2.amazonaws.com/mrlc/NLCD_2016_Impervious_L48_20190405.zip, then unzip the archive and pass the path to the img
file to tobler’s extract_raster_features
function
[7]:
nlcd_path = "/Users/knaaptime/Downloads/NLCD_2016_Impervious_L48_20190405/NLCD_2016_Impervious_L48_20190405.img"
[8]:
urban_rside = extract_raster_features(rside_streets, nlcd_path, pixel_values=list(range(5,101)), collapse_values=True)
[9]:
urban_rside.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 24280 entries, 0 to 24279
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geometry 24280 non-null geometry
dtypes: geometry(1)
memory usage: 189.8 KB
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[10]:
urban_rside = urban_rside.to_crs(2230)
[11]:
fig, ax = plt.subplots(figsize=(20,14))
urban_rside.plot(ax=ax)
[11]:
<AxesSubplot:>
This geodataframe represents the areas that meet both criteria NHGIS suggests
Now we need some census data to transfer geometries. Lets grab tract-level population counts using cenpy
[12]:
from cenpy import products
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[13]:
acs = products.ACS(2017)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[14]:
rside_census = acs.from_county('Riverside County', variables=['B25077_001E'])
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
[15]:
rside_census = rside_census.to_crs(2230)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[16]:
rside_census.plot()
[16]:
<AxesSubplot:>
And lets generate some synthetic zones to transfer this population data into using h3 hexagons
[17]:
rside_hex = h3fy(rside_census, resolution=8)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[18]:
rside_hex.plot()
[18]:
<AxesSubplot:>
Ok, now all that’s left is to clip our input census data, “restricting” it to the locations NLCD defines as 5-100% impervious. To do so, we just need to clip the census data using our extracted urban features as a mask
The commented cell below takes hours and hours because geopandas current clip function is very inefficient for these data–dont uncomment it!
[19]:
#rside_dasy = gpd.clip(rside_census, urban_rside)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
Instead, lets define a more efficient clip operation using the spatial index
[20]:
def iterclip(source, mask):
tree = mask.sindex
features = []
for i, row in source.iterrows():
query = tree.query(source.geometry.iloc[i], predicate='intersects').tolist()
submask = mask[mask.index.isin(query)]
single = gpd.GeoDataFrame(row.to_frame().T, crs=source.crs)
features.append(gpd.clip(single, submask))
frame = gpd.GeoDataFrame(pd.concat(features), crs=source.crs)
return frame
[21]:
dasy = iterclip(rside_census, urban_rside)
Clipping census geometries by the urban features leaves the census attributes intact but reshapes them. If a census tract had 2000 people in it and covered 1 sqmi, but only a quarter mile of the tract was urbanized, the new geodataframe effectively shows these 2000 people occupying the quarter mile. By passing these new data to the areal_interpolate
function, we’re still assuming that population density is constant across our new geometries (a condition that may not be true in reality) but
that assumption is much more plausible than when using original census geometries.
[22]:
dasy.plot()
[22]:
<AxesSubplot:>
[23]:
dasy.B25077_001E = dasy.B25077_001E.fillna(0)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[24]:
interp = area_interpolate(dasy, rside_hex, extensive_variables=['B25077_001E'])
[25]:
interp['plot'] = interp.B25077_001E.apply(np.log1p)
/Users/knaaptime/anaconda3/envs/tobler/lib/python3.8/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[26]:
fig, ax = plt.subplots(figsize=(20,20))
interp.to_crs(3857).plot('plot', ax=ax, alpha=0.4)
ctx.add_basemap(source=ctx.providers.Stamen.TonerLite, ax=ax)
And there we have it. Riverside’s 2017 population estimated at a regular hexgrid
[ ]: