This page was generated from user-guide/viz/schutz.ipynb. Interactive online version: Binder badge

Lorenz Curves and Schutz Lines#

Lorenz Curves and Schutz diagrams are powerful tools for visualizing economic inequality, particularly in the distribution of income or wealth within a population. The Lorenz Curve plots the cumulative percentage of total income received by the bottom x% of the population, providing a visual representation of how income is distributed. The further the curve bows away from the 45-degree line of equality, the more unequal the distribution. Schutz diagrams complement this by showing the area between the Lorenz Curve and the line of equality, which can be used to calculate the Gini coefficient—a numerical measure of inequality. Both visualizations help highlight disparities in resource allocation, making them useful in policy discussions on equity and social justice.

Schutz Measures of Inequality#

The Schutz class calculates measures of inequality in an income distribution.

It calculates the Schutz distance, which is the maximum distance between the line of perfect equality and the Lorenz curve. Additionally, it computes the intersection point with the line of perfect equality where the Schutz distance occurs and the original Schutz coefficient.

[1]:
from inequality.schutz import Schutz
[2]:
import numpy as np
[3]:
import pandas as pd
[4]:
df = pd.DataFrame(data=np.array([1000, 2000, 1500, 3000, 2500]), columns=["GDP"])
[5]:
s = Schutz(df, "GDP")
[6]:
s.distance
[6]:
np.float64(0.15000000000000008)
[7]:
s.intersection_point
[7]:
np.float64(0.6000000000000001)
[8]:
s.df_processed
[8]:
GDP unit upct ypct ucpct ycpct distance slope coefficient
0 1000 1 0.2 0.10 0.2 0.10 0.10 0.50 -5.0
1 1500 1 0.2 0.15 0.4 0.25 0.15 0.75 -2.5
2 2000 1 0.2 0.20 0.6 0.45 0.15 1.00 0.0
3 2500 1 0.2 0.25 0.8 0.70 0.10 1.25 2.5
4 3000 1 0.2 0.30 1.0 1.00 0.00 1.50 5.0
[9]:
s.coefficient
[9]:
np.float64(7.499999999999998)
[10]:
s.plot()
../../_images/user-guide_viz_schutz_10_0.png

Increase the inequality#

[11]:
y = np.array([20, 50, 80, 100, 100, 100, 100, 120, 150, 180])
[12]:
df = pd.DataFrame(data=y, columns=["GDP"])
[13]:
s = Schutz(df, "GDP")
[14]:
s.coefficient
[14]:
np.float64(14.999999999999996)
[15]:
s.df_processed
[15]:
GDP unit upct ypct ucpct ycpct distance slope coefficient
0 20 1 0.1 0.02 0.1 0.02 8.000000e-02 0.2 -8.0
1 50 1 0.1 0.05 0.2 0.07 1.300000e-01 0.5 -5.0
2 80 1 0.1 0.08 0.3 0.15 1.500000e-01 0.8 -2.0
3 100 1 0.1 0.10 0.4 0.25 1.500000e-01 1.0 0.0
4 100 1 0.1 0.10 0.5 0.35 1.500000e-01 1.0 0.0
5 100 1 0.1 0.10 0.6 0.45 1.500000e-01 1.0 0.0
6 100 1 0.1 0.10 0.7 0.55 1.500000e-01 1.0 0.0
7 120 1 0.1 0.12 0.8 0.67 1.300000e-01 1.2 2.0
8 150 1 0.1 0.15 0.9 0.82 8.000000e-02 1.5 5.0
9 180 1 0.1 0.18 1.0 1.00 -1.110223e-16 1.8 8.0
[16]:
s.distance
[16]:
np.float64(0.15000000000000002)
[17]:
s.intersection_point
[17]:
np.float64(0.30000000000000004)
[18]:
s.plot()
../../_images/user-guide_viz_schutz_19_0.png

Visualizing Mexican State Income Inequality#

[19]:
import geopandas as gpd

gdf = gpd.read_file("weighted.shp")
[20]:
gdf.head()
[20]:
POLY_ID AREA CODE NAME PERIMETER ACRES HECTARES PCGDP1940 PCGDP1950 PCGDP1960 ... TEST Name_1 Population NAMEp populati_1 Y2000 y2000_1 p State geometry
0 5 5.467030e+09 MX01 Aguascalientes 313895.530 1.350927e+06 546702.985 10384.0 6234.0 8714.0 ... 5.0 Aguascalientes 944285 Aguascalientes 944285 2.623413e+10 27782.0 0.009647 Aguascalientes POLYGON ((-101.8462 22.01176, -101.9653 21.883...
1 1 7.252751e+10 MX02 Baja California Norte 2040312.385 1.792187e+07 7252751.376 22361.0 20977.0 17865.0 ... 1.0 Querétaro de Arteaga 1404306 Baja California Norte 1404306 4.192556e+10 29855.0 0.014347 Baja California Norte MULTIPOLYGON (((-113.13972 29.01778, -113.2405...
2 2 7.225988e+10 MX03 Baja California Sur 2912880.772 1.785573e+07 7225987.769 9573.0 16013.0 16707.0 ... 2.0 Baja California Sur 424041 Baja California Sur 424041 1.106874e+10 26103.0 0.004332 Baja California Sur MULTIPOLYGON (((-111.20612 25.80278, -111.2302...
3 15 5.016584e+10 MX04 Campeche 1575361.146 1.239620e+07 5016583.723 3758.0 4929.0 5925.0 ... 15.0 Campeche 690689 Campeche 690689 2.497739e+10 36163.0 0.007056 Campeche MULTIPOLYGON (((-91.83446 18.63805, -91.84195 ...
4 22 7.339157e+10 MX05 Chiapas 1477195.199 1.813538e+07 7339157.376 2934.0 4138.0 5280.0 ... 22.0 Chiapas 3920892 Chiapas 3920892 3.404903e+10 8684.0 0.040057 Chiapas POLYGON ((-91.4375 17.24111, -91.35278 17.1763...

5 rows × 43 columns

[21]:
gdf.columns
[21]:
Index(['POLY_ID', 'AREA', 'CODE', 'NAME', 'PERIMETER', 'ACRES', 'HECTARES',
       'PCGDP1940', 'PCGDP1950', 'PCGDP1960', 'PCGDP1970', 'PCGDP1980',
       'PCGDP1990', 'PCGDP2000', 'HANSON03', 'HANSON98', 'ESQUIVEL99', 'INEGI',
       'INEGI2', 'MAXP', 'GR4000', 'GR5000', 'GR6000', 'GR7000', 'GR8000',
       'GR9000', 'LPCGDP40', 'LPCGDP50', 'LPCGDP60', 'LPCGDP70', 'LPCGDP80',
       'LPCGDP90', 'LPCGDP00', 'TEST', 'Name_1', 'Population', 'NAMEp',
       'populati_1', 'Y2000', 'y2000_1', 'p', 'State', 'geometry'],
      dtype='object')
[38]:
s1960 = Schutz(gdf, "PCGDP1960")
[39]:
s1960.plot(xlabel="State Percentile Rank 1960")
../../_images/user-guide_viz_schutz_25_0.png
[40]:
s2000 = Schutz(gdf, "PCGDP2000")
[41]:
s2000.plot(xlabel="State Percentile Rank 2000")
../../_images/user-guide_viz_schutz_27_0.png
[36]:
s1960.coefficient
[36]:
np.float64(58.09565999182672)
[37]:
s2000.coefficient
[37]:
np.float64(62.413744551459686)
[43]:
s1960.calculate_schutz_distance()
[43]:
np.float64(0.1815489374744585)
[44]:
s2000.calculate_schutz_distance()
[44]:
np.float64(0.1950429517233115)

The location and height of the Schutz line (which is derived from the Lorenz curve) offer key insights into the degree of inequality in a distribution. In the case above, the Schutz line for the first distribution in 1960 is positioned to the right and is shorter than that of the second distribution in 2000, this indicates that the first distribution exhibits less inequality than the second one.

Here’s how to interpret this:

  1. Position of the Schutz line (to the right): The more the line shifts to the right, the greater the share of the population that holds a significant portion of the total income or wealth, meaning less inequality. This suggests that in the first distribution, a larger proportion of the population is receiving a more equitable share of the total resources, compared to the second distribution.

  2. Height of the Schutz line (shorter): A shorter Schutz line represents a smaller area between the Lorenz curve and the line of perfect equality (the 45-degree line). Since this area is proportional to the Gini coefficient (a measure of inequality), a shorter Schutz line indicates a lower Gini coefficient and thus lower inequality in the first distribution compared to the second.

In summary, the first distribution is more equal, with a larger portion of the state’s having a fairer share of the total income, while the second distribution has higher inequality, as reflected in its taller and more leftward-shifted Schutz line.