%matplotlib inline

import pandas as pd
import geopandas as gpd
import libpysal as lp
import esda
import numpy as np
import matplotlib.pyplot as plt

Case Study: Gini in a bottle: Income Inequality and the Trump Vote

Read in the table and show the first three rows

pres = gpd.read_file("zip://../data/uspres.zip")
pres.head(3)

	name	state_name	stfips	cofips	fipsno	dem_2008	dem_2012	dem_2016	gini_2015	state	county	fips	geometry
0	Delta	Michigan	26	41	26041	0.532316	0.466327	0.366585	0.4268	26	041	26041	POLYGON ((-86.45813751220703 45.76276779174805...
1	Lipscomb	Texas	48	295	48295	0.124199	0.102322	0.104328	0.4437	48	295	48295	POLYGON ((-100.0068664550781 36.49387741088867...
2	Walker	Alabama	1	127	1127	0.263663	0.232437	0.156339	0.4594	01	127	1127	POLYGON ((-87.41892242431641 33.60782241821289...

Set the Coordinate Reference System and reproject it into a suitable projection for mapping the contiguous US

hint: the epsg code useful here is 5070, for Albers equal area conic focused on North America

pres.crs = {'init':'epsg:4326'}
pres = pres.to_crs(epsg=5070)

Plot each year’s vote against each other year’s vote

In this instance, it also helps to include the line ($y=x$) on each plot, so that it is clearer the directions the aggregate votes moved.

import seaborn as sns
facets = sns.pairplot(data=pres.filter(like='dem_'))
facets.map_offdiag(lambda *arg, **kw: plt.plot((0,1),(0,1), color='k'))

<seaborn.axisgrid.PairGrid at 0x7fcaf577d0f0>

png

Show the relationship between the dem two-party vote and the Gini coefficient by county.

import seaborn as sns
facets = sns.pairplot(x_vars=pres.filter(like='dem_').columns,
                      y_vars=['gini_2015'], data=pres)

png

Compute change in vote between each subsequent election

pres['swing_2012'] = pres.eval("dem_2012 - dem_2008")
pres['swing_2016'] = pres.eval("dem_2016 - dem_2012")
pres['swing_full'] = pres.eval("dem_2016 - dem_2008")

Negative swing means the Democrat voteshare in 2016 (what Clinton won) is lower than Democrat voteshare in 2008 (what Obama won). So, counties where swing is negative mean that Obama “outperformed” Clinton. Equivalently, these would be counties where McCain (in 2008) “beat” Trump’s electoral performance in 2016.

Positive swing in a county means that Clinton (in 2016) outperformed Obama (in 2008), or where Trump (in 2016) did better than McCain (in 2008).

The national average swing was around -9% from 2008 to 2016. Further, swing does not directly record who “won” the county, only which direction the county “moved.”

map the change in vote from 2008 to 2016 alongside the votes in 2008 and 2016:

f,ax = plt.subplots(3,1,
                    subplot_kw=dict(aspect='equal', 
                                    frameon=False),
                    figsize=(60,15))
pres.plot('dem_2008', ax=ax[0], cmap='RdYlBu')
pres.plot('swing_full', ax=ax[1], cmap='bwr_r')
pres.plot('dem_2016', ax=ax[2], cmap='RdYlBu')
for i,ax_ in enumerate(ax):
    ax_.set_xticks([])
    ax_.set_yticks([])

png

Build a spatial weights object to model the spatial relationships between US counties

import libpysal as lp
w = lp.weights.Rook.from_dataframe(pres)

Note that this is just one of many valid solutions. But, all the remaining exercises are predicated on using this weight. If you choose a different weight structure, your results may differ.

Is swing “contagious?” Do nearby counties tend to swing together?

from pysal.explore import esda as esda
np.random.seed(1)
moran = esda.moran.Moran(pres.swing_full, w)
print(moran.I)

0.6930802468425128

Visually show the relationship between places’ swing and their surrounding swing, like in a scatterplot.

f = plt.figure(figsize=(6,6))
plt.scatter(pres.swing_full, lp.weights.lag_spatial(w, pres.swing_full))
plt.plot((-.3,.1),(-.3,.1), color='k')
plt.title('$I = {:.3f} \ \ (p < {:.3f})$'.format(moran.I,moran.p_sim))

Text(0.5, 1.0, '$I = 0.693 \\ \\ (p < 0.001)$')

png

Are there any outliers or clusters in swing using a Local Moran’s $I$?

np.random.seed(11)
lmos = esda.moran.Moran_Local(pres.swing_full, w, 
                              permutations=70000) #min for a bonf. bound
(lmos.p_sim <= (.05/len(pres))).sum()

Where are these outliers or clusters?

f = plt.figure(figsize=(10,4))
ax = plt.gca()
ax.set_aspect('equal')
is_weird = lmos.p_sim <= (.05/len(pres))
pres.plot(color='lightgrey', ax=ax)
pres.assign(quads=lmos.q)[is_weird].plot('quads', 
                                         legend=True, 
                                         k=4, categorical=True,
                                         cmap='bwr_r', ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x7fcaf24ff668>

png

Can you focus the map in on the regions which are outliers?

f = plt.figure(figsize=(10,4))
ax = plt.gca()
ax.set_aspect('equal')
is_weird = lmos.p_sim <= (.05/len(pres))
pres.assign(quads=lmos.q)[is_weird].plot('quads', 
                                         legend=True,
                                         k=4, categorical='True',
                                         cmap='bwr_r', ax=ax)
bounds = ax.axis()
pres.plot(color='lightgrey', ax=ax, zorder=-1)
ax.axis(bounds)

(300221.12947796297, 1721965.997477191, 1183473.8523896334, 2010122.0749399317)

png

Group 3 moves surprisingly strongly from Obama to Trump relative to its surroundings, and group 1 moves strongly from Obama to Hilary relative to its surroundings.

Group 4 moves surprisingly away from Trump while its area moves towards Trump. Group 2 moves surprisingly towards Trump while its area moves towards Hilary.

Relaxing the significance a bit, where do we see significant spatial outliers?

pres.assign(local_score = lmos.Is, 
            pval = lmos.p_sim,
            quad = lmos.q)\
    .sort_values('local_score')\
    .query('pval < 1e-3 & local_score < 0')[['name','state_name','dem_2008','dem_2016',
                                             'local_score','pval', 'quad']]

	name	state_name	dem_2008	dem_2016	local_score	pval	quad
2700	Washington	Ohio	0.067810	0.282640	-5.992520	0.000229	4
298	Monroe	Indiana	0.662479	0.624804	-1.069071	0.000586	4
441	San Juan	Utah	0.477002	0.355410	-0.432591	0.000843	2
172	Eau Claire	Wisconsin	0.612624	0.539251	-0.329336	0.000386	4
1112	Sangamon	Illinois	0.522164	0.449798	-0.303292	0.000843	4
2746	Monongalia	West Virginia	0.519568	0.443524	-0.299415	0.000343	4
2920	Cabell	West Virginia	0.448643	0.365252	-0.134028	0.000829	4
2648	Tippecanoe	Indiana	0.558866	0.469750	-0.024366	0.000057	4

mainly in ohio, indiana, and west virginia

What about when comparing the voting behavior from 2012 to 2016?

np.random.seed(21)
lmos16 = esda.moran.Moran_Local(pres.swing_2016, w, 
                              permutations=70000) #min for a bonf. bound
(lmos16.p_sim <= (.05/len(pres))).sum()
pres.assign(local_score = lmos16.Is, 
            pval = lmos16.p_sim,
            quad = lmos16.q)\
    .sort_values('local_score')\
    .query('pval < 1e-3 & local_score < 0')[['name','state_name','dem_2008','dem_2016',
                                             'local_score','pval', 'quad']]

	name	state_name	dem_2008	dem_2016	local_score	pval	quad
172	Eau Claire	Wisconsin	0.612624	0.539251	-0.680028	0.000557	4
665	Lake	California	0.599217	0.520347	-0.175233	0.000471	2
58	Carbon	Utah	0.458791	0.248662	-0.158747	0.000400	2
1239	McDonough	Illinois	0.528353	0.437391	-0.153826	0.000014	4
3058	Ohio	West Virginia	0.445533	0.329845	-0.107886	0.000643	4

What is the relationship between the Gini coefficient and partisan swing?

sns.regplot(pres.gini_2015,
            pres.swing_full)

<matplotlib.axes._subplots.AxesSubplot at 0x7fcaf119f668>

png

Hillary tended to do better than Obama in counties with higher income inequality. In contrast, Trump fared better in counties with lower income inequality. If you’re further interested in the sometimes-counterintuitive relationship between income, voting, & geographic context, check out Gelman’s Red State, Blue State.

Find 8 geographical regions in the US two-party vote since 2008

from sklearn import cluster

clusterer = cluster.AgglomerativeClustering(n_clusters=8, connectivity=w.sparse)
clusterer.fit(pres.filter(like='dem_').values)

AgglomerativeClustering(affinity='euclidean', compute_full_tree='auto',
            connectivity=<3082x3082 sparse matrix of type '<class 'numpy.float64'>'
	with 17166 stored elements in Compressed Sparse Row format>,
            linkage='ward', memory=None, n_clusters=8,
            pooling_func='deprecated')

pres.assign(cluster = clusterer.labels_).plot('cluster', categorical='True')

<matplotlib.axes._subplots.AxesSubplot at 0x7fcaf073ab38>

png

Find 10 geographical clusters in the change in US presidential two-party vote since 2008

clusterer = cluster.AgglomerativeClustering(n_clusters=10, connectivity=w.sparse)
clusterer.fit(pres.filter(like='swing_').values)

AgglomerativeClustering(affinity='euclidean', compute_full_tree='auto',
            connectivity=<3082x3082 sparse matrix of type '<class 'numpy.float64'>'
	with 17166 stored elements in Compressed Sparse Row format>,
            linkage='ward', memory=None, n_clusters=10,
            pooling_func='deprecated')

pres.assign(cluster = clusterer.labels_).plot('cluster', categorical='True')

<matplotlib.axes._subplots.AxesSubplot at 0x7fcaf06e7dd8>

png