Single-group Segregation Indices

[1]:
%load_ext watermark
%watermark -a 'eli knaap' -v -d -u -p segregation,geopandas,libpysal,pandana
Author: eli knaap

Last updated: 2021-05-11

Python implementation: CPython
Python version       : 3.9.2
IPython version      : 7.23.1

segregation: 2.0.0
geopandas  : 0.9.0
libpysal   : 4.3.0
pandana    : 0.6.1

Single-group indices are calculated using the singlegroup module

Data Prep

[2]:
import geopandas as gpd
import matplotlib.pyplot as plt
from libpysal.examples import load_example
[3]:
# read in sacramento data from libpysal and reproject into an appropriate CRS
sacramento = gpd.read_file(load_example("Sacramento1").get_path("sacramentot2.shp"))
sacramento = sacramento.to_crs(sacramento.estimate_utm_crs())
[4]:
sacramento.head()
[4]:
FIPS MSA TOT_POP POP_16 POP_65 WHITE BLACK ASIAN HISP MULTI_RA ... EMP_FEM OCC_MAN OCC_OFF1 OCC_INFO HH_INC POV_POP POV_TOT HSG_VAL POLYID geometry
0 06061022001 Sacramento 5501 1077 518 4961 29 82 336 31 ... 1187 117 663.0 42 52941 5461 470 225900 1 POLYGON ((740409.853 4338451.728, 740199.864 4...
1 06061020106 Sacramento 2072 396 109 1603 0 28 391 41 ... 522 38 229.0 19 51958 2052 160 249300 2 POLYGON ((753400.378 4347151.080, 753395.816 4...
2 06061020107 Sacramento 3633 911 126 1624 9 0 1918 41 ... 698 86 197.0 0 32992 3604 668 175900 3 POLYGON ((758318.262 4352123.456, 758319.774 4...
3 06061020105 Sacramento 1683 281 154 1564 0 55 60 4 ... 519 5 256.0 6 54556 1683 116 302300 4 POLYGON ((750839.595 4342678.807, 750805.840 4...
4 06061020200 Sacramento 5794 1278 830 5185 17 13 251 229 ... 1260 155 506.0 59 50815 5771 342 167300 5 POLYGON ((670062.020 4311030.409, 670133.819 4...

5 rows × 31 columns

[5]:
sacramento.plot('BLACK')
[5]:
<AxesSubplot:>
../_images/notebooks_01_singlegroup_indices_7_1.png

Aspatial Segregation Indices

To compute an aspatial segregation index, pass a dataframe, a group population variable, and total population variable to the index’s class

[6]:
from segregation.singlegroup import Dissim
[7]:
dissim = Dissim(sacramento, group_pop_var='BLACK',
                total_pop_var='TOT_POP')

The statistic attribute holds the value of the segregation index, and the data attribute holds the data used to calculate the index

[8]:
dissim.statistic
[8]:
0.4883394024705785
[9]:
dissim.data.head()
[9]:
BLACK TOT_POP geometry
0 29 5501 POLYGON ((740409.853 4338451.728, 740199.864 4...
1 0 2072 POLYGON ((753400.378 4347151.080, 753395.816 4...
2 9 3633 POLYGON ((758318.262 4352123.456, 758319.774 4...
3 0 1683 POLYGON ((750839.595 4342678.807, 750805.840 4...
4 17 5794 POLYGON ((670062.020 4311030.409, 670133.819 4...

Spatial Segregation Indices

For calculating spatial segregation indices, the package implements two classes of indices: spatially-explicit and spatially-implicit.

Spatially-explicit indices are those for which space was a formal consideration in the index’s original formulation, whereas spatially-implicit indices are developed using the logic of Reardon and O’Sulivan.

For the latter,(otherwise called generalized spatial segregation indices) the package can incorporate spatial relationships represented by either a `libpysal.W <https://pysal.org/libpysal/api.html>`__ weights object or a `pandana.Network <http://udst.github.io/pandana/network.html>`__ network object, which means generalized spatial segregation indices can be computed according to many different spatial relationships which could include contiguity, distance, or network connectivity. This flexibility is particularly useful for specifying appropriate “neighborhood” definitions for different types of input data (which could be, e.g. housing units, census tracts, or counties)

For spatially-explicit indices, they can be called like any other, though some may have additional arguments:

[10]:
from segregation.singlegroup import AbsoluteCentralization, Gini
[11]:
cent = AbsoluteCentralization(sacramento, group_pop_var='BLACK',
                              total_pop_var='TOT_POP')
[12]:
cent.statistic
[12]:
0.8491771822066525

Euclidian distance based measures

For generalized spatial indices, a distance parameter can be passed to the index of choice. Under the hood, the input data will be passed through a kernel function with the distance parameter as the kernel bandwidth.

(note in this case because the CRS of the sacramento dataframe is UTM, the units are in meters)

[13]:
# aspatial gini index
aspatial_gini = Gini(sacramento, group_pop_var='BLACK',
                     total_pop_var='TOT_POP')
[14]:
# generalized spatial gini index
gen_spatialgini = Gini(sacramento, group_pop_var='BLACK',
                       total_pop_var='TOT_POP', distance=2000)
[15]:
gen_spatialgini.statistic
[15]:
0.5368102768280784
[16]:
aspatial_gini.statistic
[16]:
0.6361755332635235

Examining the data attribute of the fitted index shows how the input data are transformed

[17]:
# kernelized data
gen_spatialgini.data.plot('BLACK')
[17]:
<AxesSubplot:>
../_images/notebooks_01_singlegroup_indices_28_1.png
[18]:
# original data
sacramento.plot('BLACK')
[18]:
<AxesSubplot:>
../_images/notebooks_01_singlegroup_indices_29_1.png

Network distance based measures

Instead of a euclidian distance-based kernel, each generalized spatial segregation index can be calculated using accssibility analysis on a transportation network instead. Since people can’t fly, using a travel network to measure spatial distances is more conceptually pure to the spirit of segregation indices

[19]:
import pandana as pdna

A network can be created using the urbanaccess package, or the built-in get_osm_network function from the segregation.util module. Alternatively, metropolitan-scale networks from OpenStreetMap are also available in the CGS quilt bucket (named by CBSA FIPS code)

[ ]:

[20]:
net = pdna.Network.from_hdf5('../40900.h5')
[21]:
network_spatialgini = Gini(sacramento, group_pop_var='BLACK',
                           total_pop_var='TOT_POP', distance=2000,
                           network=net, decay='linear')

Comparing spatial gini indices based on straight-line distance versus network distance:

[22]:
network_spatialgini.statistic
[22]:
0.5848616778202473
[23]:
gen_spatialgini.statistic
[23]:
0.5368102768280784

The segregation statistic using network distance to construct neighborhoods is higher than using the one using unrestricted euclidian distance

Batch-Computing Single-Group Measures

To compute all single group indices in one go, the package provides a wrapper function in the batch module

[24]:
from segregation.batch import batch_compute_singlegroup
[25]:
all_singlegroup = batch_compute_singlegroup(sacramento, group_pop_var='BLACK', total_pop_var='TOT_POP')
[26]:
all_singlegroup
[26]:
Statistic
Name
AbsoluteCentralization 0.849177
AbsoluteClustering 0.117545
AbsoluteConcentration 0.981443
Atkinson 0.365947
BiasCorrectedDissim 0.487694
BoundarySpatialDissim 0.450074
ConProf 0.112752
CorrelationR 0.101027
Delta 0.907277
DensityCorrectedDissim 0.335178
Dissim 0.488339
DistanceDecayInteraction 0.841137
DistanceDecayIsolation 0.160247
Entropy 0.112068
Gini 0.636176
Interaction 0.837925
Isolation 0.162075
MinMax 0.656220
ModifiedDissim 0.476238
ModifiedGini 0.623809
PARDissim 0.481833
RelativeCentralization 0.076906
RelativeClustering 1.626559
RelativeConcentration 0.778755
SpatialDissim 0.446272
SpatialProxProf 0.115984
SpatialProximity 1.106990
[ ]: