This page was generated from notebooks/topo.ipynb. Interactive online version: Binder badge

Topological Measures of Geographical Surfaces

Levi John Wolf

[1]:
import warnings

import geopandas
import matplotlib.pyplot as plt
import numpy
from libpysal import weights

from esda import topo

This notebook explains the usage and meaning of the topo functions, prominence and isolation. Both of these concepts arise from topography, the study of the forms and features of a surface.

In this sense, topographical measures focus on the shape of a surface. In physical geography, prominence and isolation are both topographical measures that refer to how high (or how “dominant”) a mountain is relative to its landscape. Mathematically, if we represent “elevation” with a surface (like a Digital Elevation Model, or DEM), then this becomes a way to characterize the local maxima, minima, and/or saddle points of a given surface. In this way, prominence (and isolation) are intricately connected to the local extrema and concavity/convexity of a surface, and provide useful ways to think about the structure of a geographical distribution. In a physical sense, the isolation of a mountain peak is the distance you have to travel to find a higher point on the surface. So, the common example given, say, on Wikipedia, is that Mount Everest has an “infinite” isolation, since it is the highest point on Earth’s surface. The nearby peak K2, the second-highest mountain on Earth is relatively close to Everest, so K2 has quite a low isolation. However, Aconcauga, the tallest peak in the Western Hemisphere, is about 1,000 meters shorter than K2, but way more isolated, since the distance from it to the nearest highest point is very large. So, isolation is a measure of horizontal distance.

In contrast, prominence is a measure of vertical distance, describing the gap between the peak’s elevation and the “highest” low point between it and its parent peak. Conceptually, imagine that the whole world is flooded, all the way to the top of Mount Everest. Let’s denote the height of mountain \(i\) as \(h_i\), and the water level as \(w\). So, starting with \(max(h_i) = w\), we lower the water. When \(h_i = w\), a peak “emerges” from the water. Imagine two peaks of height \(h_i\) and \(h_j\) (such that \(h_i > h_j\)). Then, imagine that \(w\) falls to a point \(w^* < h_j < h_i\) where \(h_i\) and \(h_j\) become “connected” together as part of a single landmass. Then, the prominence of \(h_j\) is computed as \(p_j = h_i - w^*\), the relative height between peak \(j\) and the lowest point connecting \(j\) and \(i\). In this model, it makes sense to then consider \(j\) and \(i\) the same peak \((ij)\), and measure the next peak’s prominence as the lowest point between \(k\) and \((ij)\). We use prominence because it tells you how high a peak/local maxima is relative to its surroundings.

Things are complicated somewhat when we’re dealing with irregular or dis-continuous relief. When dealing with mountains and elevation, we know that elevation is a surface that changes (relatively) smoothly. In contrast, most of our vector data does not change smoothly. Despite this, we can still use these notions of isolation and prominence to measure the relative contrasts between parts of a geographcial surface.

So, for what happens next, we’ll refer to the natural earth dataset baked into geopandas. We’ll use the population estmate (pop_est) as a kind of “elevation,” a la @undertheraedar’s excellent population topography:

this is a picture of global population like it's a surface

Isolation and Prominence of country populations

Let’s start with a natural example: isolation of population of countries.

[2]:
natearth = geopandas.read_file(
    "https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"
)

To start, we know that China, India, and the US are the three most populous countries in the world:

[3]:
natearth.sort_values("POP_EST", ascending=False).head()
[3]:
featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ... FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA geometry
139 Admin-0 country 1 2 China CH1 1 2 Country 1 China ... None None None None None None None None None MULTIPOLYGON (((109.47521 18.1977, 108.65521 1...
98 Admin-0 country 1 2 India IND 0 2 Sovereign country 1 India ... None None None None None None None None None POLYGON ((97.32711 28.26158, 97.40256 27.88254...
4 Admin-0 country 1 2 United States of America US1 1 2 Country 1 United States of America ... None None None None None None None None None MULTIPOLYGON (((-122.84 49, -120 49, -117.0312...
8 Admin-0 country 1 2 Indonesia IDN 0 2 Sovereign country 1 Indonesia ... None None None None None None None None None MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ...
102 Admin-0 country 1 2 Pakistan PAK 0 2 Sovereign country 1 Pakistan ... None None None None None None None None None POLYGON ((77.83745 35.49401, 76.87172 34.65354...

5 rows × 169 columns

This is a perfect dataset to think about isolation/prominence, since we have big gaps between the highest population points in each continent.

[4]:
f, ax = plt.subplots(1, 1, figsize=(12, 6))
natearth.plot("POP_EST", ax=ax)
[4]:
<Axes: >
../_images/notebooks_topo_10_1.png

To simplify things, we’ll just consider the Euclidean distances between country centroids, but we could (in theory) use any kind of distance we want in this function by specifying the metric argument.

[5]:
with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore",
        category=UserWarning,
        message="Geometry is in a geographic CRS",
    )

    coordinates = numpy.column_stack((natearth.centroid.x, natearth.centroid.y))

To get a bit more information on the result, we can use return_all to get the rank and distance for the isolated peak. This also returns the gap between the peak and its nearest higher peak.

[6]:
iso = topo.isolation(natearth["POP_EST"], coordinates, return_all=True)

Under the hood, this algorithm uses rtree and numpy.sort to insert observations into a SpatialIndex() in rank-order. Iterating down the ranks, the nearest higher observation is found in the SpatialIndex(), and that becomes the parent of that observation. As we iterate, pop_est gets smaller and smaller, and more and more entries are added to the rtree.

Merging this back to the data we have will let us visualize it.

[7]:
natearth = natearth.merge(iso, left_index=True, right_index=True)

So, you can see that the isolation for China is infinite, since there is no country with a larger population. However, the isolation for India is only the distance from China to India (27). The isolation for the US, however, is much larger, and reflects the distance from the US to India (which just happens to be closer than China). The fourth-most populous country, Indonesia, is closer to China than India, so its parent_rank is 0, and its isolation is the distance from Indonesia to China. Last, you see Brazil, 5th most populous country, has US as a parent, with an isolation distance of 82.

[8]:
natearth[
    ["CONTINENT", "NAME", "rank", "POP_EST", "parent_rank", "isolation", "gap"]
].sort_values("POP_EST", ascending=False).head()
[8]:
CONTINENT NAME rank POP_EST parent_rank isolation gap
139 Asia China 0.0 1.397715e+09 NaN NaN NaN
98 Asia India 1.0 1.366418e+09 0.0 27.852795 3.129725e+07
4 North America United States of America 2.0 3.282395e+08 1.0 193.538522 1.038178e+09
8 Asia Indonesia 3.0 2.706256e+08 0.0 41.072699 1.127089e+09
102 Asia Pakistan 4.0 2.165653e+08 1.0 12.381725 1.149852e+09

Visualizing this below, we get the following:

[9]:
f, ax = plt.subplots(2, 1, figsize=(12, 12))
for i in range(2):
    natearth.plot(color="grey", ax=ax[i])
natearth.plot("isolation", ax=ax[0])
natearth.plot("gap", ax=ax[1])
ax[0].set_title("how far do you have to go to find a country with more people?")
ax[1].set_title("how any extra people does this place need to swap with its parent?")
plt.show()
../_images/notebooks_topo_20_0.png

Next, to see prominence, we’ll use the same arguments. However, prominence can also take a graph/libpysal.weights.W object that specifies the connections between sampled locations. If no pre-existing graph or surface is provided, the Delaunay Triangulation is used between the coordinates provided, which is a common triangulation in digital elevation models. Here, I’ll use the Rook contiguity graph, just for ease of interpretation. This means that some islands will automatically be considered as “peaks” because they don’t share any borders with any higher observations.

[10]:
w_rook = weights.Rook.from_dataframe(natearth, use_index=False, silence_warnings=True)
prominence = topo.prominence(natearth["POP_EST"], w_rook, return_all=True)

Under the hood, this algorithm may take a bit longer for big data, so you can use the progressbar option to visualize the progress.

The actual algorithm entails sorting the data and determining whether we “connect” peaks together each time we introduce a new observation as equal to the declining sea level \(w\). We avoid re-computing scipy.sparse.connected_components each iteration by creating a set of neighbors for each “peak.” When two (or more) peaks merge, we replace the independent peaks i and j (where \(h_i > h_j\)) with peak (i,j). Each merge, the new peak gets the union of the sub-peaks neighbors, and are then treated as a single “peak” from then on.

We merge it back onto our data in the same fashion as before:

[11]:
natearth = natearth.merge(
    prominence, left_index=True, right_index=True, suffixes=("", "_prom")
)

This gives us the following table. I’m going to show a bit more rows than usual so we can interpret it more clearly.

[12]:
natearth[
    [
        "CONTINENT",
        "NAME",
        "POP_EST",
        "rank",
        "parent_rank",
        "isolation",
        "dominating_peak",
        "keycol",
        "prominence",
    ]
].sort_values("POP_EST", ascending=False).head(20)
[12]:
CONTINENT NAME POP_EST rank parent_rank isolation dominating_peak keycol prominence
139 Asia China 1.397715e+09 0.0 NaN NaN 139.0 107.0 1.314801e+09
98 Asia India 1.366418e+09 1.0 0.0 27.852795 139.0 -1.0 NaN
4 North America United States of America 3.282395e+08 2.0 1.0 193.538522 139.0 33.0 3.239931e+08
8 Asia Indonesia 2.706256e+08 3.0 0.0 41.072699 4.0 148.0 2.386758e+08
102 Asia Pakistan 2.165653e+08 4.0 1.0 12.381725 139.0 -1.0 NaN
29 South America Brazil 2.110495e+08 5.0 2.0 82.093057 8.0 43.0 1.439896e+08
56 Africa Nigeria 2.009636e+08 6.0 5.0 64.353456 29.0 55.0 1.776529e+08
99 Asia Bangladesh 1.630462e+08 7.0 1.0 10.713323 139.0 -1.0 NaN
18 Europe Russia 1.443735e+08 8.0 0.0 26.374718 139.0 -1.0 NaN
27 North America Mexico 1.275755e+08 9.0 2.0 23.966775 4.0 -1.0 NaN
155 Asia Japan 1.262649e+08 10.0 0.0 34.199305 56.0 -1.0 1.262648e+08
165 Africa Ethiopia 1.120787e+08 11.0 6.0 31.568798 155.0 13.0 5.950476e+07
147 Asia Philippines 1.081166e+08 12.0 3.0 15.020572 165.0 -1.0 1.081165e+08
163 Africa Egypt 1.003881e+08 13.0 11.0 20.320874 147.0 14.0 5.757484e+07
94 Asia Vietnam 9.646211e+07 14.0 12.0 17.322577 139.0 -1.0 NaN
11 Africa Dem. Rep. Congo 8.679057e+07 15.0 11.0 19.680827 163.0 13.0 3.421659e+07
124 Asia Turkey 8.342962e+07 16.0 13.0 13.623370 11.0 107.0 5.157090e+05
121 Europe Germany 8.313280e+07 17.0 16.0 27.604762 124.0 43.0 1.607291e+07
107 Asia Iran 8.291391e+07 18.0 4.0 15.341195 124.0 -1.0 0.000000e+00
91 Asia Thailand 6.962558e+07 19.0 14.0 5.528840 121.0 93.0 1.558016e+07

These are the top 20 countries in terms of population. Walking down the rows, you see that China has a isolation of NaN and a prominence that is exactly equal to its population. India shares a border with China. So, when we “lower the sea level” to where India emerges from the water level, it’s already connected to the highest peak, China. So, its prominence is not very large. In fact, it’s considered a “slope” of the “China” mountain, and its dominating_peak is 139, the index for “China.” However, the US’s prominence is (confusingly) computed as the height relative to the population of Panama, sin ce Panama connects it to Sough America, where French Guyana then connects between NA/SA and Eurasia in this graph. Indonesia, likewise, has a prominence measured starting from Malaysia, which connects it to Asia, where it encounters the China-based subgraph.

For a visual on what the “key-cols” are, we’re thinking about the points at which the disconnected countries become connected in the following sequences of graphs. In each graph, only the “red” outlined countries are above the “water line” \(w\).

[13]:
f, ax = plt.subplots(3, 2, figsize=(20, 20), sharex=True, sharey=True)
ax = ax.flatten()
for ix, rank in enumerate(range(5, 31, 5)):
    natearth.plot("classification", ax=ax[ix], cmap="Accent")
    natearth.sort_values("POP_EST", ascending=False).iloc[0:rank].boundary.plot(
        color="salmon", ax=ax[ix]
    )
    ax[ix].set_title(f"# of countries included: {rank}")
f.tight_layout()
plt.show()
../_images/notebooks_topo_28_0.png

You can see that the first “key col” country, France, is included around the 25-country mark, wheras before that most included countries are “peaks,” disconnected from one another. Below, you can see the full map of classifications, alongside the prominence of each country.

[14]:
f, ax = plt.subplots(2, 1, figsize=(12, 12))
natearth.plot("classification", ax=ax[0], legend=True, cmap="Accent")
natearth.plot(color="grey", ax=ax[1])
natearth.plot("prominence", ax=ax[1])
ax[0].set_ylabel("Population")
ax[1].set_ylabel("Prominence")
plt.show()
../_images/notebooks_topo_30_0.png