Table of Contents
This is a notebook that explains a step-by-step procedure to perform decomposition on comparative segregation measures.
First, let's import all the needed libraries.
import pandas as pd
import pickle
import numpy as np
import matplotlib.pyplot as plt
import segregation
from segregation.decomposition import DecomposeSegregation
In this example, we are going to use census data that the user must download its own copy, following similar guidelines explained in https://github.com/spatialucr/geosnap/blob/master/examples/01_getting_started.ipynb where you should download the full type file of 2010. The zipped file download will have a name that looks like LTDB_Std_All_fullcount.zip
. After extracting the zipped content, the filepath of the data should looks like this:
#filepath = '~/LTDB_Std_2010_fullcount.csv'
Then, we read the data:
df = pd.read_csv(filepath, encoding = "ISO-8859-1", sep = ",")
We are going to work with the variable of the nonhispanic black people (nhblk10
) and the total population of each unit (pop10
). So, let's read the map of all census tracts of US and select some specific columns for the analysis:
# This file can be download here: https://drive.google.com/open?id=1gWF0OCn6xuR_WrEj7Ot2jY6KI2t6taIm
with open('data/tracts_US.pkl', 'rb') as input:
map_gpd = pickle.load(input)
map_gpd['INTGEOID10'] = pd.to_numeric(map_gpd["GEOID10"])
gdf_pre = map_gpd.merge(df, left_on = 'INTGEOID10', right_on = 'tractid')
gdf = gdf_pre[['GEOID10', 'geometry', 'pop10', 'nhblk10']]
In this notebook, we use the Metropolitan Statistical Area (MSA) of US (we're also using the word 'cities' here to refer them). So, let's read the correspondence table that relates the tract id with the corresponding Metropolitan area...
# You can download this file here: https://drive.google.com/open?id=10HUUJSy9dkZS6m4vCVZ-8GiwH0EXqIau
with open('data/tract_metro_corresp.pkl', 'rb') as input:
tract_metro_corresp = pickle.load(input).drop_duplicates()
..and merge them with the previous data.
merged_gdf = gdf.merge(tract_metro_corresp, left_on = 'GEOID10', right_on = 'geoid10')
We now build the composition variable (compo
) which is the division of the frequency of the chosen group and total population. Let's inspect the first rows of the data.
merged_gdf['compo'] = np.where(merged_gdf['pop10'] == 0, 0, merged_gdf['nhblk10'] / merged_gdf['pop10'])
merged_gdf.head()
Now, we chose two different metropolitan areas to compare the degree of segregation.
la_2010 = merged_gdf.loc[(merged_gdf.name == "Los Angeles-Long Beach-Anaheim, CA")]
la_2010.plot(column = 'compo', figsize = (10, 10), cmap = 'OrRd', legend = True)
plt.axis('off')
ny_2010 = merged_gdf.loc[(merged_gdf.name == 'New York-Newark-Jersey City, NY-NJ-PA')]
ny_2010.plot(column = 'compo', figsize = (20, 10), cmap = 'OrRd', legend = True)
plt.axis('off')
We first compare the Gini index of both cities. Let's import the Gini_Seg
class from segregation
, fit both indexes and check the difference in point estimation.
from segregation.aspatial import GiniSeg
G_la = GiniSeg(la_2010, 'nhblk10', 'pop10')
G_ny = GiniSeg(ny_2010, 'nhblk10', 'pop10')
G_la.statistic - G_ny.statistic
Let's decompose these difference according to Rey, S. et al "Comparative Spatial Segregation Analytics". Forthcoming. You can check the options available in this decomposition below:
help(DecomposeSegregation)
The difference of -0.10653 fitted previously, can be decomposed into two components. The Spatial component and the attribute component. Let's estimate both, respectively.
DS_composition = DecomposeSegregation(G_la, G_ny)
DS_composition.c_s
DS_composition.c_a
So, the first thing to notice is that attribute component, i.e., given by a difference in the population structure (in this case, the composition) plays a more important role in the difference, since it has a higher absolute value.
The difference in the composition can be inspected in the plotting method with the type cdfs
:
DS_composition.plot(plot_type = 'cdfs')
If your data is a GeoDataFrame, it is also possible to visualize the counterfactual compositions with the argument plot_type = 'maps'
The first and second contexts are Los Angeles and New York, respectively.
DS_composition.plot(plot_type = 'maps')
Note that in all plotting methods, the title presents each component of the decomposition performed.
The share approach takes into consideration the share of each group in each city. Since this approach takes into consideration the focus group and the complementary group share to build the "counterfactual" total population of each unit, it is of interest to inspect all these four cdf's.
ps.: The share is the population frequency of each group in each unit over the total population of that respectively group.
DS_share = DecomposeSegregation(G_la, G_ny, counterfactual_approach = 'share')
DS_share.plot(plot_type = 'cdfs')
We can see that curve between the contexts are closer to each other which represent a drop in the importance of the population structure (attribute component) to -0.062. However, this attribute still overcomes the spatial component (-0.045) in terms of importance due to both absolute magnitudes.
DS_share.plot(plot_type = 'maps')
We can see that the counterfactual maps of the composition (outside of the main diagonal), in this case, are slightly different from the previous approach.
The dual_composition
approach is similar to the composition approach. However, it uses also the counterfactual composition of the cdf of the complementary group.
DS_dual = DecomposeSegregation(G_la, G_ny, counterfactual_approach = 'dual_composition')
DS_dual.plot(plot_type = 'cdfs')
It is possible to see that the component values are very similar with slight changes from the composition
approach.
DS_dual.plot(plot_type = 'maps')
The counterfactual distributions are virtually the same (but not equal) as the one from the composition
approach.
from segregation.spatial import RelativeConcentration
RCO_la = RelativeConcentration(la_2010, 'nhblk10', 'pop10')
RCO_ny = RelativeConcentration(ny_2010, 'nhblk10', 'pop10')
RCO_la.statistic - RCO_ny.statistic
RCO_DS_composition = DecomposeSegregation(RCO_la, RCO_ny)
RCO_DS_composition.c_s
RCO_DS_composition.c_a
It is possible to note that, in this case, the spatial component is playing a much more relevant role in the decomposition.