Segregation Index Decomposition¶

Table of Contents¶

Decomposition framework of the PySAL segregation module
- Map of the composition of the Metropolitan area of Los Angeles
- Map of the composition of the Metropolitan area of New York
- Composition Approach (default)
- Share Approach
- Dual Composition Approach
- Inspecting a different index: Relative Concentration

This is a notebook that explains a step-by-step procedure to perform decomposition on comparative segregation measures.

First, let’s import all the needed libraries.

[1]:

import pandas as pd
import pickle
import numpy as np
import matplotlib.pyplot as plt

import segregation
from segregation.decomposition import DecomposeSegregation

In this example, we are going to use census data that the user must download its own copy, following similar guidelines explained in https://github.com/spatialucr/geosnap/blob/master/examples/01_getting_started.ipynb where you should download the full type file of 2010. The zipped file download will have a name that looks like LTDB_Std_All_fullcount.zip. After extracting the zipped content, the filepath of the data should looks like this:

[2]:

#filepath = '~/LTDB_Std_2010_fullcount.csv'

Then, we read the data:

[4]:

df = pd.read_csv(filepath, encoding = "ISO-8859-1", sep = ",")

We are going to work with the variable of the nonhispanic black people (nhblk10) and the total population of each unit (pop10). So, let’s read the map of all census tracts of US and select some specific columns for the analysis:

[5]:

# This file can be download here: https://drive.google.com/open?id=1gWF0OCn6xuR_WrEj7Ot2jY6KI2t6taIm
with open('data/tracts_US.pkl', 'rb') as input:
    map_gpd = pickle.load(input)

map_gpd['INTGEOID10'] = pd.to_numeric(map_gpd["GEOID10"])
gdf_pre = map_gpd.merge(df, left_on = 'INTGEOID10', right_on = 'tractid')
gdf = gdf_pre[['GEOID10', 'geometry', 'pop10', 'nhblk10']]

In this notebook, we use the Metropolitan Statistical Area (MSA) of US (we’re also using the word ‘cities’ here to refer them). So, let’s read the correspondence table that relates the tract id with the corresponding Metropolitan area…

[6]:

# You can download this file here: https://drive.google.com/open?id=10HUUJSy9dkZS6m4vCVZ-8GiwH0EXqIau
with open('data/tract_metro_corresp.pkl', 'rb') as input:
    tract_metro_corresp = pickle.load(input).drop_duplicates()

..and merge them with the previous data.

[7]:

merged_gdf = gdf.merge(tract_metro_corresp, left_on = 'GEOID10', right_on = 'geoid10')

We now build the composition variable (compo) which is the division of the frequency of the chosen group and total population. Let’s inspect the first rows of the data.

[8]:

merged_gdf['compo'] = np.where(merged_gdf['pop10'] == 0, 0, merged_gdf['nhblk10'] / merged_gdf['pop10'])
merged_gdf.head()

[8]:

	GEOID10	geometry	pop10	nhblk10	geoid10	metro_id	numeric_id	geoid	name	compo
0	01001020801	POLYGON ((-86.456273 32.405837, -86.4570349999...	3081	293	01001020801	33860	33860	33860	Montgomery, AL	0.095099
1	01001020802	POLYGON ((-86.412497 32.589422, -86.412442 32....	10435	1420	01001020802	33860	33860	33860	Montgomery, AL	0.136080
2	01001020200	POLYGON ((-86.467354 32.459308, -86.46764 32.4...	2170	1226	01001020200	33860	33860	33860	Montgomery, AL	0.564977
3	01001020700	POLYGON ((-86.46106999999999 32.42709, -86.461...	2891	452	01001020700	33860	33860	33860	Montgomery, AL	0.156347
4	01001020600	POLYGON ((-86.470524 32.456117, -86.4700469999...	3668	776	01001020600	33860	33860	33860	Montgomery, AL	0.211559

Now, we chose two different metropolitan areas to compare the degree of segregation.

Map of the composition of the Metropolitan area of Los Angeles¶

[9]:

la_2010 = merged_gdf.loc[(merged_gdf.name == "Los Angeles-Long Beach-Anaheim, CA")]
la_2010.plot(column = 'compo', figsize = (10, 10), cmap = 'OrRd', legend = True)
plt.axis('off')

[9]:

(-119.02865769999998, -117.3360503, 32.6463769, 34.9269651)

../_images/notebooks_06_decomposition_example_18_1.png

Map of the composition of the Metropolitan area of New York¶

[10]:

ny_2010 = merged_gdf.loc[(merged_gdf.name == 'New York-Newark-Jersey City, NY-NJ-PA')]
ny_2010.plot(column = 'compo', figsize = (20, 10), cmap = 'OrRd', legend = True)
plt.axis('off')

[10]:

(-75.5381038, -71.59841419999998, 39.36886419999999, 41.70820779999999)

../_images/notebooks_06_decomposition_example_20_1.png

We first compare the Gini index of both cities. Let’s import the Gini_Seg class from segregation, fit both indexes and check the difference in point estimation.

[11]:

from segregation.aspatial import GiniSeg

G_la = GiniSeg(la_2010, 'nhblk10', 'pop10')
G_ny = GiniSeg(ny_2010, 'nhblk10', 'pop10')

G_la.statistic - G_ny.statistic

[11]:

-0.10652888790131243

Let’s decompose these difference according to Rey, S. et al “Comparative Spatial Segregation Analytics”. Forthcoming. You can check the options available in this decomposition below:

[12]:

help(DecomposeSegregation)

Help on class DecomposeSegregation in module segregation.decomposition.decompose_segregation:

class DecomposeSegregation(builtins.object)
 |  Decompose segregation differences into spatial and attribute components.
 |
 |  Given two segregation indices of the same type, use Shapley decomposition
 |  to measure whether the differences between index measures arise from
 |  differences in spatial structure or population structure
 |
 |  Parameters
 |  ----------
 |  index1 : segregation.SegIndex class
 |      First SegIndex class to compare.
 |  index2 : segregation.SegIndex class
 |      Second SegIndex class to compare.
 |  counterfactual_approach : str, one of
 |                            ["composition", "share", "dual_composition"]
 |      The technique used to generate the counterfactual population
 |      distributions.
 |
 |  Attributes
 |  ----------
 |
 |  c_s : float
 |      Shapley's Spatial Component of the decomposition
 |
 |  c_a : float
 |      Shapley's Attribute Component of the decomposition
 |
 |  Methods
 |  ----------
 |
 |  plot : Visualize features of the Decomposition performed
 |      plot_type : str, one of ['cdfs', 'maps']
 |
 |      'cdfs' : visualize the cumulative distribution functions of the compositions/shares
 |      'maps' : visualize the spatial distributions for original data and counterfactuals generated and Shapley's components (only available for GeoDataFrames)
 |
 |  Examples
 |  --------
 |  Several examples can be found at https://github.com/pysal/segregation/blob/master/notebooks/decomposition_wrapper_example.ipynb.
 |
 |  Methods defined here:
 |
 |  __init__(self, index1, index2, counterfactual_approach='composition')
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  plot(self, plot_type='cdfs')
 |      Plot the Segregation Decomposition Profile
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

Composition Approach (default)¶

The difference of -0.10653 fitted previously, can be decomposed into two components. The Spatial component and the attribute component. Let’s estimate both, respectively.

[13]:

DS_composition = DecomposeSegregation(G_la, G_ny)
DS_composition.c_s

[13]:

0.029575766160051364

[14]:

DS_composition.c_a

[14]:

-0.1361046540613638

So, the first thing to notice is that attribute component, i.e., given by a difference in the population structure (in this case, the composition) plays a more important role in the difference, since it has a higher absolute value.

The difference in the composition can be inspected in the plotting method with the type cdfs:

[15]:

DS_composition.plot(plot_type = 'cdfs')

../_images/notebooks_06_decomposition_example_30_0.png

If your data is a GeoDataFrame, it is also possible to visualize the counterfactual compositions with the argument plot_type = 'maps'

The first and second contexts are Los Angeles and New York, respectively.

[16]:

DS_composition.plot(plot_type = 'maps')

../_images/notebooks_06_decomposition_example_32_0.png

Note that in all plotting methods, the title presents each component of the decomposition performed.

Dual Composition Approach¶

The dual_composition approach is similar to the composition approach. However, it uses also the counterfactual composition of the cdf of the complementary group.

[19]:

DS_dual = DecomposeSegregation(G_la, G_ny, counterfactual_approach = 'dual_composition')
DS_dual.plot(plot_type = 'cdfs')

../_images/notebooks_06_decomposition_example_42_0.png

It is possible to see that the component values are very similar with slight changes from the composition approach.

[20]:

DS_dual.plot(plot_type = 'maps')

../_images/notebooks_06_decomposition_example_44_0.png

The counterfactual distributions are virtually the same (but not equal) as the one from the composition approach.

Inspecting a different index: Relative Concentration¶

[21]:

from segregation.spatial import RelativeConcentration

RCO_la = RelativeConcentration(la_2010, 'nhblk10', 'pop10')
RCO_ny = RelativeConcentration(ny_2010, 'nhblk10', 'pop10')

RCO_la.statistic - RCO_ny.statistic

[21]:

-0.4252237137424809

[22]:

RCO_DS_composition = DecomposeSegregation(RCO_la, RCO_ny)
RCO_DS_composition.c_s

[22]:

-0.37586237172215886

[23]:

RCO_DS_composition.c_a

[23]:

-0.049361342020322

It is possible to note that, in this case, the spatial component is playing a much more relevant role in the decomposition.