This page was generated from user-guide/viz/pengram.ipynb. Interactive online version: Binder badge

Pen’s Parade and Pengrams#

This notebook explores the concepts of Pen’s Parade and Pengrams, which are graphical methods used to depict income inequality, both at the individual level and in spatial analysis. The Pen’s Parade, named after economist Jan Pen, illustrates the distribution of income by visualizing individuals as different heights proportional to their income. This notebook adapts the Pen’s Parade for spatial data, treating regions as units and showing the spatial distribution of income levels across different areas. Through this approach, we can visualize income inequality both geographically and statistically, allowing for a clearer understanding of wealth concentration across regions. Additionally, the notebook demonstrates how to create weighted Pengrams, which adjust the visual representation based on factors such as population size.

Although we illustrate these methods using income, it is important to highlight that the Pen’s Parade and Pengram can be applied not only to income but to other variables like wealth, education levels, or even health metrics.

Data#

To illustrate the concepts of Pen’s Parade and Pengrams, this notebook utilizes data on per capita Gross Domestic Product (GDP) for Mexico’s 32 states, covering the decades from 1940 to 2000. Per capita GDP is calculated by dividing the total GDP of a region by its population, providing a measure of the average economic output per person. This metric serves as a useful indicator of the standard of living and economic development across different states. By analyzing this data, we can effectively visualize and assess income inequality and economic disparities both over time and across various regions within Mexico.

[17]:
import geopandas as gpd
[18]:
gdf = gpd.read_file("weighted.shp")
[19]:
gdf.head()
[19]:
POLY_ID AREA CODE NAME PERIMETER ACRES HECTARES PCGDP1940 PCGDP1950 PCGDP1960 ... TEST Name_1 Population NAMEp populati_1 Y2000 y2000_1 p State geometry
0 5 5.467030e+09 MX01 Aguascalientes 313895.530 1.350927e+06 546702.985 10384.0 6234.0 8714.0 ... 5.0 Aguascalientes 944285 Aguascalientes 944285 2.623413e+10 27782.0 0.009647 Aguascalientes POLYGON ((-101.8462 22.01176, -101.9653 21.883...
1 1 7.252751e+10 MX02 Baja California Norte 2040312.385 1.792187e+07 7252751.376 22361.0 20977.0 17865.0 ... 1.0 Querétaro de Arteaga 1404306 Baja California Norte 1404306 4.192556e+10 29855.0 0.014347 Baja California Norte MULTIPOLYGON (((-113.13972 29.01778, -113.2405...
2 2 7.225988e+10 MX03 Baja California Sur 2912880.772 1.785573e+07 7225987.769 9573.0 16013.0 16707.0 ... 2.0 Baja California Sur 424041 Baja California Sur 424041 1.106874e+10 26103.0 0.004332 Baja California Sur MULTIPOLYGON (((-111.20612 25.80278, -111.2302...
3 15 5.016584e+10 MX04 Campeche 1575361.146 1.239620e+07 5016583.723 3758.0 4929.0 5925.0 ... 15.0 Campeche 690689 Campeche 690689 2.497739e+10 36163.0 0.007056 Campeche MULTIPOLYGON (((-91.83446 18.63805, -91.84195 ...
4 22 7.339157e+10 MX05 Chiapas 1477195.199 1.813538e+07 7339157.376 2934.0 4138.0 5280.0 ... 22.0 Chiapas 3920892 Chiapas 3920892 3.404903e+10 8684.0 0.040057 Chiapas POLYGON ((-91.4375 17.24111, -91.35278 17.1763...

5 rows × 43 columns

Pen’s Parade#

The parade proceeds from the poorest individuals to the wealthiest, where those with the lowest incomes appear as very short figures, while the wealthiest individuals tower above the rest. This visualization allows us to see the stark disparities in income distribution, often revealing that a large portion of the population has much lower incomes, with only a small percentage earning exceptionally high amounts. As the parade progresses, there is usually a gradual increase in height, but the final segment often shows a dramatic spike, symbolizing the extreme concentration of wealth in the hands of a few.

This method is particularly powerful because it combines a vivid, intuitive picture with a clear, ordered progression of income distribution. The Pen’s Parade makes inequality tangible by visually showing not only the presence of inequality but also its scale and extremity. It emphasizes the vast differences between individuals at the lower and upper ends of the income spectrum, making it clear how much of the total wealth is held by the top earners. This graphical representation is useful in discussions about income inequality as it demonstrates the inequality’s real-world impact, illustrating the imbalance in a way that is more immediate and relatable than abstract statistical measures alone, such as the Gini coefficient.

For spatial analysis, the regions take the place of individuals, as the focus is on the spatial distribution of incomes, rather than the distribution of individual incomes.

[20]:
from inequality.pen import pengram, pen
[21]:
col = "PCGDP2000"
weight = "p"
name = "NAME"
figsize = (16, 9)
[22]:
p = pen(gdf, col, name)
../../_images/user-guide_viz_pengram_8_0.png

As can be seen in this Pen’s Parade for 2000, the sharp rise at the end signifies concentration of income among the highest income states, suggesting high income inequality.

Pen Parade Options#

[23]:
p = pen(gdf, col, name, xticks=False)
../../_images/user-guide_viz_pengram_11_0.png
[24]:
p = pen(gdf, col, name, ascending=False)
../../_images/user-guide_viz_pengram_12_0.png

Weighted Pen’s Parade#

Weighting the Pengram by population size provides a more nuanced understanding of inequality. Without weighting, each state’s income distribution may obscure the fact that some regions have larger populations, which can distort how we perceive inequality on a per-person basis.

[25]:
wp = pen(gdf, col, name, weight=weight, figsize=figsize)
../../_images/user-guide_viz_pengram_14_0.png

The weighted Pen’s Parade introduces complexity but has some inherent limitations. A key drawback is that it assumes no inequality exists among individuals within each state, as everyone is assigned the same income level. Consequently, this leads to the assumption that in comparing two states with different per capita incomes, the poorest individuals in the wealthier state would still be wealthier than the richest individuals in the poorer state. This assumption conflicts with real-world data (Rey, 2024).

Pengram#

While the Pen’s Parade effectively shows where individual states fall within the overall income distribution, it overlooks the geographic dimension of income inequality. The Pengram addresses this by extending the Pen’s Parade to incorporate a spatial component, combining the income distribution with a choropleth map. This dual approach allows us to visualize not only the statistical distribution of incomes but also how these incomes are geographically distributed across different regions. By blending both perspectives, the Pengram provides a more comprehensive view of inequality, linking spatial and statistical elements in a single visualization.

[26]:
pengram(gdf, col, name, figsize=figsize);
../../_images/user-guide_viz_pengram_17_0.png

By using a Pengram, we can identify regions that may be economically lagging and those where income is concentrated. we can observe that wealthier regions are primarily concentrated in the northern and central states, while southern states have significantly lower income levels, which suggests regional economic disparity.

This dual analysis of statistical and spatial distribution allows policymakers and researchers to target interventions more effectively.

[27]:
pengram(gdf, col, name, figsize=figsize, leg_pos="lower left");
../../_images/user-guide_viz_pengram_19_0.png
[28]:
pengram(gdf, col, name, figsize=figsize, leg_pos="lower left", scheme="FisherJenks");
../../_images/user-guide_viz_pengram_20_0.png
[29]:
pengram(
    gdf, col, name, figsize=figsize, leg_pos="lower left", scheme="FisherJenks", k=3
);
../../_images/user-guide_viz_pengram_21_0.png

Queries#

The Pengram enables users to search for and highlight specific subsets of areas, both on the map and within the Pengram itself. This feature allows users to focus on specific regions of interest, making it easier to analyze economic disparities in targeted areas. For instance, searching for states like Chiapas and Campeche, which appear far apart in the Pen’s Parade due to their significant income differences, but are actually contiguous on the map, highlights abrupt spatial disparities. This helps illustrate how neighboring regions can experience vastly different levels of income, underscoring the complexity of geographic inequality.

[30]:
pengram(
    gdf,
    col,
    name,
    figsize=figsize,
    leg_pos="lower left",
    scheme="FisherJenks",
    k=3,
    query=["Chiapas", "Campeche"],
);
../../_images/user-guide_viz_pengram_23_0.png

Subplotting#

Both the Pen’s Parade and Pengram are implemented using Matplotlib objects. By combining various visualizations—such as KDE plots, choropleth maps, and the Pen’s Parade—we can examine multiple aspects of inequality in a single view. For example, this approach allows us to compare the distribution of per capita GDP across states while simultaneously observing how it corresponds to their geographic locations. Additionally, these plots can be embedded in a single figure, making them easy to include in reports or papers.

[31]:
import seaborn as sns
[32]:
import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(2, 2, figsize=(10, 8), gridspec_kw={"height_ratios": [1, 1]})
ax0, ax1, ax2, ax3 = axes[0, 0], axes[0, 1], axes[1, 0], axes[1, 1]

sns.kdeplot(data=gdf, x="PCGDP2000", ax=ax0)
gdf.plot("PCGDP2000", ax=ax1)
pen(gdf, col, name, ax=ax2)
pengram(gdf, col, name, xticks=False, legend=False, ax=ax3);
../../_images/user-guide_viz_pengram_26_0.png
[ ]: