{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Table of Contents\n", "* [Decomposition framework of the PySAL *segregation* module](#Decomposition-framework-of-the-PySAL-*segregation*-module)\n", "\t* [Map of the composition of the Metropolitan area of Los Angeles](#Map-of-the-composition-of-the-Metropolitan-area-of-Los-Angeles)\n", "\t* [Map of the composition of the Metropolitan area of New York](#Map-of-the-composition-of-the-Metropolitan-area-of-New-York)\n", "\t* [Composition Approach (default)](#Composition-Approach-%28default%29)\n", "\t* [Share Approach](#Share-Approach)\n", "\t* [Dual Composition Approach](#Dual-Composition-Approach)\n", "\t* [Inspecting a different index: Relative Concentration](#Inspecting-a-different-index:-Relative-Concentration)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Decomposition framework of the PySAL *segregation* module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a notebook that explains a step-by-step procedure to perform decomposition on comparative segregation measures.\n", "\n", "First, let's import all the needed libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import pickle\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import segregation\n", "from segregation.decomposition import DecomposeSegregation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we are going to use census data that the user must download its own copy, following similar guidelines explained in https://github.com/spatialucr/geosnap/blob/master/examples/01_getting_started.ipynb where you should download the full type file of 2010. The zipped file download will have a name that looks like `LTDB_Std_All_fullcount.zip`. After extracting the zipped content, the filepath of the data should looks like this:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#filepath = '~/LTDB_Std_2010_fullcount.csv'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we read the data:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(filepath, encoding = \"ISO-8859-1\", sep = \",\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to work with the variable of the nonhispanic black people (`nhblk10`) and the total population of each unit (`pop10`). So, let's read the map of all census tracts of US and select some specific columns for the analysis:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# This file can be download here: https://drive.google.com/open?id=1gWF0OCn6xuR_WrEj7Ot2jY6KI2t6taIm\n", "with open('data/tracts_US.pkl', 'rb') as input:\n", " map_gpd = pickle.load(input)\n", " \n", "map_gpd['INTGEOID10'] = pd.to_numeric(map_gpd[\"GEOID10\"])\n", "gdf_pre = map_gpd.merge(df, left_on = 'INTGEOID10', right_on = 'tractid')\n", "gdf = gdf_pre[['GEOID10', 'geometry', 'pop10', 'nhblk10']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we use the Metropolitan Statistical Area (MSA) of US (we're also using the word 'cities' here to refer them). So, let's read the correspondence table that relates the tract id with the corresponding Metropolitan area..." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# You can download this file here: https://drive.google.com/open?id=10HUUJSy9dkZS6m4vCVZ-8GiwH0EXqIau\n", "with open('data/tract_metro_corresp.pkl', 'rb') as input:\n", " tract_metro_corresp = pickle.load(input).drop_duplicates()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "..and merge them with the previous data." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "merged_gdf = gdf.merge(tract_metro_corresp, left_on = 'GEOID10', right_on = 'geoid10')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now build the composition variable (`compo`) which is the division of the frequency of the chosen group and total population. Let's inspect the first rows of the data." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | GEOID10 | \n", "geometry | \n", "pop10 | \n", "nhblk10 | \n", "geoid10 | \n", "metro_id | \n", "numeric_id | \n", "geoid | \n", "name | \n", "compo | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01001020801 | \n", "POLYGON ((-86.456273 32.405837, -86.4570349999... | \n", "3081 | \n", "293 | \n", "01001020801 | \n", "33860 | \n", "33860 | \n", "33860 | \n", "Montgomery, AL | \n", "0.095099 | \n", "
1 | \n", "01001020802 | \n", "POLYGON ((-86.412497 32.589422, -86.412442 32.... | \n", "10435 | \n", "1420 | \n", "01001020802 | \n", "33860 | \n", "33860 | \n", "33860 | \n", "Montgomery, AL | \n", "0.136080 | \n", "
2 | \n", "01001020200 | \n", "POLYGON ((-86.467354 32.459308, -86.46764 32.4... | \n", "2170 | \n", "1226 | \n", "01001020200 | \n", "33860 | \n", "33860 | \n", "33860 | \n", "Montgomery, AL | \n", "0.564977 | \n", "
3 | \n", "01001020700 | \n", "POLYGON ((-86.46106999999999 32.42709, -86.461... | \n", "2891 | \n", "452 | \n", "01001020700 | \n", "33860 | \n", "33860 | \n", "33860 | \n", "Montgomery, AL | \n", "0.156347 | \n", "
4 | \n", "01001020600 | \n", "POLYGON ((-86.470524 32.456117, -86.4700469999... | \n", "3668 | \n", "776 | \n", "01001020600 | \n", "33860 | \n", "33860 | \n", "33860 | \n", "Montgomery, AL | \n", "0.211559 | \n", "