This page was generated from user-guide/viz/schutz.ipynb. Interactive online version:
Lorenz Curves and Schutz Lines#
Lorenz Curves and Schutz diagrams are powerful tools for visualizing economic inequality, particularly in the distribution of income or wealth within a population. The Lorenz Curve plots the cumulative percentage of total income received by the bottom x% of the population, providing a visual representation of how income is distributed. The further the curve bows away from the 45-degree line of equality, the more unequal the distribution. Schutz diagrams complement this by showing the area between the Lorenz Curve and the line of equality, which can be used to calculate the Gini coefficient—a numerical measure of inequality. Both visualizations help highlight disparities in resource allocation, making them useful in policy discussions on equity and social justice.
Schutz Measures of Inequality#
The Schutz class calculates measures of inequality in an income distribution.
It calculates the Schutz distance, which is the maximum distance between the line of perfect equality and the Lorenz curve. Additionally, it computes the intersection point with the line of perfect equality where the Schutz distance occurs and the original Schutz coefficient.
[1]:
from inequality.schutz import Schutz
[2]:
import numpy as np
[3]:
import pandas as pd
[4]:
df = pd.DataFrame(data=np.array([1000, 2000, 1500, 3000, 2500]), columns=["GDP"])
[5]:
s = Schutz(df, "GDP")
[6]:
s.distance
[6]:
np.float64(0.15000000000000008)
[7]:
s.intersection_point
[7]:
np.float64(0.6000000000000001)
[8]:
s.df_processed
[8]:
GDP | unit | upct | ypct | ucpct | ycpct | distance | slope | coefficient | |
---|---|---|---|---|---|---|---|---|---|
0 | 1000 | 1 | 0.2 | 0.10 | 0.2 | 0.10 | 0.10 | 0.50 | -5.0 |
1 | 1500 | 1 | 0.2 | 0.15 | 0.4 | 0.25 | 0.15 | 0.75 | -2.5 |
2 | 2000 | 1 | 0.2 | 0.20 | 0.6 | 0.45 | 0.15 | 1.00 | 0.0 |
3 | 2500 | 1 | 0.2 | 0.25 | 0.8 | 0.70 | 0.10 | 1.25 | 2.5 |
4 | 3000 | 1 | 0.2 | 0.30 | 1.0 | 1.00 | 0.00 | 1.50 | 5.0 |
[9]:
s.coefficient
[9]:
np.float64(7.499999999999998)
[10]:
s.plot()
Increase the inequality#
[11]:
y = np.array([20, 50, 80, 100, 100, 100, 100, 120, 150, 180])
[12]:
df = pd.DataFrame(data=y, columns=["GDP"])
[13]:
s = Schutz(df, "GDP")
[14]:
s.coefficient
[14]:
np.float64(14.999999999999996)
[15]:
s.df_processed
[15]:
GDP | unit | upct | ypct | ucpct | ycpct | distance | slope | coefficient | |
---|---|---|---|---|---|---|---|---|---|
0 | 20 | 1 | 0.1 | 0.02 | 0.1 | 0.02 | 8.000000e-02 | 0.2 | -8.0 |
1 | 50 | 1 | 0.1 | 0.05 | 0.2 | 0.07 | 1.300000e-01 | 0.5 | -5.0 |
2 | 80 | 1 | 0.1 | 0.08 | 0.3 | 0.15 | 1.500000e-01 | 0.8 | -2.0 |
3 | 100 | 1 | 0.1 | 0.10 | 0.4 | 0.25 | 1.500000e-01 | 1.0 | 0.0 |
4 | 100 | 1 | 0.1 | 0.10 | 0.5 | 0.35 | 1.500000e-01 | 1.0 | 0.0 |
5 | 100 | 1 | 0.1 | 0.10 | 0.6 | 0.45 | 1.500000e-01 | 1.0 | 0.0 |
6 | 100 | 1 | 0.1 | 0.10 | 0.7 | 0.55 | 1.500000e-01 | 1.0 | 0.0 |
7 | 120 | 1 | 0.1 | 0.12 | 0.8 | 0.67 | 1.300000e-01 | 1.2 | 2.0 |
8 | 150 | 1 | 0.1 | 0.15 | 0.9 | 0.82 | 8.000000e-02 | 1.5 | 5.0 |
9 | 180 | 1 | 0.1 | 0.18 | 1.0 | 1.00 | -1.110223e-16 | 1.8 | 8.0 |
[16]:
s.distance
[16]:
np.float64(0.15000000000000002)
[17]:
s.intersection_point
[17]:
np.float64(0.30000000000000004)
[18]:
s.plot()
Visualizing Mexican State Income Inequality#
[19]:
import geopandas as gpd
gdf = gpd.read_file("weighted.shp")
[20]:
gdf.head()
[20]:
POLY_ID | AREA | CODE | NAME | PERIMETER | ACRES | HECTARES | PCGDP1940 | PCGDP1950 | PCGDP1960 | ... | TEST | Name_1 | Population | NAMEp | populati_1 | Y2000 | y2000_1 | p | State | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5 | 5.467030e+09 | MX01 | Aguascalientes | 313895.530 | 1.350927e+06 | 546702.985 | 10384.0 | 6234.0 | 8714.0 | ... | 5.0 | Aguascalientes | 944285 | Aguascalientes | 944285 | 2.623413e+10 | 27782.0 | 0.009647 | Aguascalientes | POLYGON ((-101.8462 22.01176, -101.9653 21.883... |
1 | 1 | 7.252751e+10 | MX02 | Baja California Norte | 2040312.385 | 1.792187e+07 | 7252751.376 | 22361.0 | 20977.0 | 17865.0 | ... | 1.0 | Querétaro de Arteaga | 1404306 | Baja California Norte | 1404306 | 4.192556e+10 | 29855.0 | 0.014347 | Baja California Norte | MULTIPOLYGON (((-113.13972 29.01778, -113.2405... |
2 | 2 | 7.225988e+10 | MX03 | Baja California Sur | 2912880.772 | 1.785573e+07 | 7225987.769 | 9573.0 | 16013.0 | 16707.0 | ... | 2.0 | Baja California Sur | 424041 | Baja California Sur | 424041 | 1.106874e+10 | 26103.0 | 0.004332 | Baja California Sur | MULTIPOLYGON (((-111.20612 25.80278, -111.2302... |
3 | 15 | 5.016584e+10 | MX04 | Campeche | 1575361.146 | 1.239620e+07 | 5016583.723 | 3758.0 | 4929.0 | 5925.0 | ... | 15.0 | Campeche | 690689 | Campeche | 690689 | 2.497739e+10 | 36163.0 | 0.007056 | Campeche | MULTIPOLYGON (((-91.83446 18.63805, -91.84195 ... |
4 | 22 | 7.339157e+10 | MX05 | Chiapas | 1477195.199 | 1.813538e+07 | 7339157.376 | 2934.0 | 4138.0 | 5280.0 | ... | 22.0 | Chiapas | 3920892 | Chiapas | 3920892 | 3.404903e+10 | 8684.0 | 0.040057 | Chiapas | POLYGON ((-91.4375 17.24111, -91.35278 17.1763... |
5 rows × 43 columns
[21]:
gdf.columns
[21]:
Index(['POLY_ID', 'AREA', 'CODE', 'NAME', 'PERIMETER', 'ACRES', 'HECTARES',
'PCGDP1940', 'PCGDP1950', 'PCGDP1960', 'PCGDP1970', 'PCGDP1980',
'PCGDP1990', 'PCGDP2000', 'HANSON03', 'HANSON98', 'ESQUIVEL99', 'INEGI',
'INEGI2', 'MAXP', 'GR4000', 'GR5000', 'GR6000', 'GR7000', 'GR8000',
'GR9000', 'LPCGDP40', 'LPCGDP50', 'LPCGDP60', 'LPCGDP70', 'LPCGDP80',
'LPCGDP90', 'LPCGDP00', 'TEST', 'Name_1', 'Population', 'NAMEp',
'populati_1', 'Y2000', 'y2000_1', 'p', 'State', 'geometry'],
dtype='object')
[38]:
s1960 = Schutz(gdf, "PCGDP1960")
[39]:
s1960.plot(xlabel="State Percentile Rank 1960")
[40]:
s2000 = Schutz(gdf, "PCGDP2000")
[41]:
s2000.plot(xlabel="State Percentile Rank 2000")
[36]:
s1960.coefficient
[36]:
np.float64(58.09565999182672)
[37]:
s2000.coefficient
[37]:
np.float64(62.413744551459686)
[43]:
s1960.calculate_schutz_distance()
[43]:
np.float64(0.1815489374744585)
[44]:
s2000.calculate_schutz_distance()
[44]:
np.float64(0.1950429517233115)
The location and height of the Schutz line (which is derived from the Lorenz curve) offer key insights into the degree of inequality in a distribution. In the case above, the Schutz line for the first distribution in 1960 is positioned to the right and is shorter than that of the second distribution in 2000, this indicates that the first distribution exhibits less inequality than the second one.
Here’s how to interpret this:
Position of the Schutz line (to the right): The more the line shifts to the right, the greater the share of the population that holds a significant portion of the total income or wealth, meaning less inequality. This suggests that in the first distribution, a larger proportion of the population is receiving a more equitable share of the total resources, compared to the second distribution.
Height of the Schutz line (shorter): A shorter Schutz line represents a smaller area between the Lorenz curve and the line of perfect equality (the 45-degree line). Since this area is proportional to the Gini coefficient (a measure of inequality), a shorter Schutz line indicates a lower Gini coefficient and thus lower inequality in the first distribution compared to the second.
In summary, the first distribution is more equal, with a larger portion of the state’s having a fairer share of the total income, while the second distribution has higher inequality, as reflected in its taller and more leftward-shifted Schutz line.