This page was generated from notebooks/3_basic_mapping.ipynb. Interactive online version: Binder badge

Basic Mapping

Luc Anselin

09/06/2024

Preliminaries

There are many ways to create beautiful maps in Python using packages such as folium or plotly. In this notebook, the plot functionality of geopandas is illustrated, which is sufficient for most of our purposes. The functionality will be illustrated with the Police sample data set that contains police expenditure data for Mississippi counties. It is assumed that this data has been installed using libpysal.examples.load_example("Police").

A video recording is available from the GeoDa Center YouTube channel playlist Applied Spatial Regression - Notebooks, at https://www.youtube.com/watch?v=rZ1Mw-hZcMY&list=PLzREt6r1NenmhNy-FCUwiXL17Vyty5VL6&index=3.

Modules Needed

As before, the main modules are geopandas and libpysal. Specifically, libpysal.examples is used to get the path to the sample data. In addition, to save the maps to a file, matplotlib.pyplot is needed.

The full set of imports is shown below.

[44]:
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import matplotlib.pyplot as plt
from libpysal.examples import get_path

Functions Used

  • from geopandas:

    • read_file

    • plot

  • from libpysal:

    • get_path

  • from matplotlib.pyplot:

    • savefig

Files

The mapping functionality will be illustrated with the same Police sample data set as used in the previous notebook. The following files will be used:

  • police.shp,shx,dbf,prj: shape file (four files) for 82 counties

[3]:
infileshp = "police.shp"   # input shape file
inpath = get_path(infileshp)
dfs = gpd.read_file(inpath)

Getting Started

Default Map

Before delving into customization, the default choropleth map created by the plot function applied to a GeoDataFrame is illustrated. A bare bones implementation only requires the variable (column) to be mapped and the argument legend = True. Without the latter, there will still be a map, but it will not have a legend, so there will be no guide as to what the colors mean.

[ ]:
dfs.plot('POLICE',legend=True)

Not exactly the prettiest thing in the world. A continuous color ramp as seen here is not recommended by cartographers. Also, the classification is such that too many observations have seemingly the same color. Finally, there is also this strange mention of <Axes: >.

There are two important types of modifications that can be considered. One pertains to the fundamental characteristics of a choropleth map, the other to the way matplotlib constructs visualizations under the hood. The geopandas library relies on matplotlib so there is no need to import the latter explicitly, except when one wants to save the maps to a file. In any case, it helps to understand the matplotlib logic. This is considered first.

Matplotlib Logic

The matplotlib library is extremely powerful and allows just about any type of customized visualization. It starts by setting up the basic parameters and then builds a graphic representation layer by layer. The terminology may seem a bit strange at first, but after a while, it becomes more familiar.

A plot is initialized by assigning some parameters to the tuple fig , ax. It is important to realize that fig is about the figure makeup and ax is about the actual plots. For example, fig is used to specify how many subplots there need to be, how they are arranged and what their size is. Since the examples used here and in later notebooks will only produce a single plot, the fig aspect can be ignored and only ax is needed. In fact, for simple plots such as the maps in our applications, the specification of ax as such is not needed and the plot function can be applied directly to the GeoDataFrame. However, it remains important that the plot object is referred to as ax in many operations.

An alternative way to set up the default map just shown is to explicitly assign it to an object ax, as ax = dfs.plot( ) with the same arguments as before. To remove the x-y coordinates and box around the map, the method set_axis_off() is applied to the ax object. Using this setup also removes the <Axes: > listing. Otherwise, everything is still the same as before.

[ ]:
ax = dfs.plot('POLICE',legend = True)
ax.set_axis_off()

Note that the same result can be obtained without the explicit assignment to ax by simply applying the method to the plot object, as in the example below. Typically, the more explicit assignment is considered to be more readable, but it is mostly a matter of preference.

[ ]:
dfs.plot('POLICE',legend=True).set_axis_off()

Map Design Characteristics

The purpose of a choropleth or thematic map is to visualize the spatial distribution of a variable over areal units. Choropleth comes from the Greek choros, which stands for region, so it is a map for regions. For our purposes, the proper design of a map has three important characteristics, which each translate into arguments to the plot function:

  • classification

  • color

  • legend

Classification

Arguably the most important characteristic is the classification used, i.e., how the continuous distribution of a given variable gets translated into a small number of discrete categories, or bins. This is exactly the same issue encountered in the design of histogram bins.

The assignment of observations to distinct bins is done by the mapclassify library, which is part of the PySAL family. However, this is done under the hood by geopandas so that no separate import statement is needed.

The classification is set by means of the scheme argument. Common options are Quantiles (for a quantile map), EqualInterval (for an equal intervals map), NaturalBreaks (for a natural breaks map), StdMean (for a standard deviational map), and BoxPlot (for a box map). All but the last two classifications require an additional argument for the number of bins, k. This is not needed for the standard deviational map and the box map, for which the breakpoints are derived from the data, respectively the standard deviation and the quartiles/hinge.

The default hinge for the box map is 1.5 times the interquartile range. Other values for the hinge can be specified by setting a different value for the argument hinge, but this is typically not necessary. However, to pass this to the geopandas plot function it cannot just be set as hinge = 3.0 as in mapclassify. In geopandas it is necessary to pass this in a classification_kwds dictionary, where the relevant parameters are set. For example, this would be classification_kwds = {"hinge": 3.0} for a hinge of 3 times the interquartile range.

The default for the standard deviational map is to show all observations within one standard deviation below and above the mean as one category. To separate observations below and above the mean can be accomplished by setting the argument anchor to True. Again, this is done by means of the classification_kwds dictionary.

Full details on all the classifications available through mapclassify and their use in geopandas can be found at https://geopandas.org/en/stable/docs/user_guide/mapping.html# and https://pysal.org/mapclassify/api.html.

Each of the five cases is illustrated in turn. Note that the column argument is used to designate the variable to be mapped.

The placement of the legend is managed by means of the legend_kwds argument (similar to classification_kwds). This is a dictionary that specifies aspects such as the location of the legend and how it is positioned relative to its anchor point. It also makes it possible to set a title for the legend, e.g., to set it to the variable that is being mapped.

In the examples, the following arguments are used: legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": "<variable name>"}. This is not totally intuitive, but it works. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend for details about the various legend customizations.

Also note that the map uses the default color map. More appropriate color maps will be considered below.

A simple six category quantile map is illustrated by setting scheme = "Quantiles" and k=6. The legend arguments now also include a title. In addition, two ax methods are used for a minor customization: ax.set_title to give the map a title and, as before, ax.set_axis_off to get rid of the box with x-y coordinates.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'Quantiles',
    k = 6,
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title" : "Police"}
)
ax.set_title("Quantiles")
ax.set_axis_off()

Rather than repeating the single command for each type of map that needs the argument k, a small loop is constructed that creates each in turn. This is accomplished by putting the name for the respective scheme in a list and using that same name as the map title. The three types are Quantiles, EqualInterval and NaturalBreaks.

[ ]:
schemek = ["Quantiles","EqualInterval","NaturalBreaks"]
for i in schemek:
    ax = dfs.plot(
    column = 'POLICE',
    scheme = i,
    k = 6,
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": "Police"}
    )
    ax.set_title(i)
    ax.set_axis_off()

Note the contrast in the visualization of the spatial distribution between the different classifications. It is important to keep in mind that each has pros and cons. For example, the quantile map yields an equal number of observations in each category, but the range of the categories can vary subtantially, resulting in the grouping of very disparate observations. In the example, this is the case for the top category, which ranges from 1,275 to 10,972.

On the other hand, the range in an equal intervals map is the same for all categories, but as a result some bins may have very few or very many observations, as is the case here for the lowest bin.

Finally, a natural breaks map uses an optimization criterion (essentially equivalent to k-means on one variable) to determine the grouping of observations. Both the number of observations in each bin and the range of the bins is variable.

The standard deviational map and box map have a pre-set number of bins, depending on, respectively, standard deviational units and quantiles/interquantile range. Again, they are illustrated using a small loop.

[ ]:
schemenok = ["StdMean","BoxPlot"]
for i in schemenok:
    ax = dfs.plot(
    column = 'POLICE',
    scheme = i,
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
    )
    ax.set_title(i)
    ax.set_axis_off()

Both types of maps are designed to highlight outliers. In the standard deviational map, these are observations more than two standard deviations away from the mean, in the box map, the outliers are outside the hinge (1.5 times the interquartile range from the median). This can be customized by setting a different value for the hinge through the classification_kwds argument. For example, selecting only the most extreme observations is achieved by setting classification_kwds = {"hinge": 3.0}, as illustrated below.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'BoxPlot',
    k = 6,
    classification_kwds = {'hinge': 3.0},
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": "Police"}
)
ax.set_title("Box Map")
ax.set_axis_off()

A standard deviational map with the categories below and above the mean shown is implemented with classification_kwds = {"anchor" : True}, as shown below.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'StdMean',
    k = 6,
    classification_kwds = {'anchor': True},
    legend = True,
    legend_kwds = {"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()

Whereas the first three types of classifications have a color scheme that suggests a progression from low to high values, a so-called sequential legend, the standard deviational map and box map focus on differences from a central value. This requires a color map that highlights the move away from the center, a so-called diverging legend. In the examples shown so far, the categories were shown with the default sequential color map, which is not appropriate. The needed customizations are considered next.

Color Map

The color scheme for the map is set by means of the cmap argument. This refers to a matplotlib color map, i.e., a pre-determined range of colors optimized for a particular purpose. For example, this allows for a different color map to represent a sequential vs. a diverging legend.

The full range of color maps can be found at https://matplotlib.org/stable/users/explain/colors/colormaps.html.

For our purposes, a good sequential color map uses a gradation that goes from light to dark, either in the same color, such as cmap="Blues", or moving between colors, such as cmap="YlOrRd". For a diverging legend, going from one extreme color to another is preferred, e.g., dark blue to light blue and then to light red and dark red, as in cmap="bwr", or even more extreme, as in cmap="seismic".

Some examples are shown below.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'Quantiles',
    k = 6,
    cmap = 'Blues',
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Quantiles")
ax.set_axis_off()
[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'Quantiles',
    k = 6,
    cmap = 'YlOrRd',
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Quantiles")
ax.set_axis_off()
[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'BoxPlot',
    cmap = 'seismic',
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Box Map")
ax.set_axis_off()

But notice when this is applied to the standard deviational map with cmap = bwr.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'StdMean',
    cmap = 'bwr',
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()

What happened? Many of the counties are invisible. The reason is that there is no borderline specified for the map. This final customization is considered next.

Final Customization

As mentioned, the full range of matplotlib customizations is available to manipulate legends, colors and placement. For our purposes, one more map-specific element is of interest. As seen in the previous examples, the border between polygons is not clear or even non-existent.

This can be fixed by setting the edgecolor and associated linewidth attributes. For example, with edgecolor = "Black", the standard deviational map becomes more meaningful.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'StdMean',
    cmap = 'bwr',
    edgecolor = "Black",
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()

So far, the maps are generated in the notebook, but are not separately available. To save a specific map to a file, the matplotlib.pyplot.savefig command is used. For example, to save the standard deviational map (or any other map) to a png format file, only the filename needs to be specified as an argument to plt.savefig. Optionally, to get higher quality figures, the number of dots per inch can be set by means of dpi.

This is illustrated for the standard deviational map where a more subtle border line is obtained by setting the thickness with linewidth = 0.2. The quality is set to dpi = 600.

The file will be in the current working directory.

[ ]:
ax = dfs.plot(
    column = 'POLICE',
    scheme = 'StdMean',
    cmap = 'bwr',
    edgecolor = "Black",
    linewidth = 0.2,
    legend = True,
    legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()
plt.savefig("police_stdmean.png",dpi=600)

Finally, a map with just the county borders is obtained with the boundary.plot command, where the color of the border line is controlled by edgecolor and the line thickness by linewidth, as before.

[ ]:
ax = dfs.boundary.plot(
    edgecolor = "Black",
    linewidth = 0.2,
)
ax.set_title("Map of County Boundaries")
ax.set_axis_off()

Practice

Use your own data set or one of the PySAL sample data sets to load a spatial data frame and experiment with various map types, color schemes and other customizations. Save each map to a file for inclusion in papers, reports, etc.