This page was generated from notebooks/3_basic_mapping.ipynb. Interactive online version:
Basic Mapping¶
Luc Anselin¶
09/06/2024¶
Preliminaries¶
There are many ways to create beautiful maps in Python using packages such as folium or plotly. In this notebook, the plot
functionality of geopandas is illustrated, which is sufficient for most of our purposes. The functionality will be illustrated with the Police sample data set that contains police expenditure data for Mississippi counties. It is assumed that this data has been installed using libpysal.examples.load_example("Police")
.
A video recording is available from the GeoDa Center YouTube channel playlist Applied Spatial Regression - Notebooks, at https://www.youtube.com/watch?v=rZ1Mw-hZcMY&list=PLzREt6r1NenmhNy-FCUwiXL17Vyty5VL6&index=3.
Modules Needed¶
As before, the main modules are geopandas and libpysal. Specifically, libpysal.examples is used to get the path to the sample data. In addition, to save the maps to a file, matplotlib.pyplot is needed.
The full set of imports is shown below.
[44]:
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import matplotlib.pyplot as plt
from libpysal.examples import get_path
Functions Used¶
from geopandas:
read_file
plot
from libpysal:
get_path
from matplotlib.pyplot:
savefig
Files¶
The mapping functionality will be illustrated with the same Police sample data set as used in the previous notebook. The following files will be used:
police.shp,shx,dbf,prj: shape file (four files) for 82 counties
[3]:
infileshp = "police.shp" # input shape file
inpath = get_path(infileshp)
dfs = gpd.read_file(inpath)
Getting Started¶
Default Map¶
Before delving into customization, the default choropleth map created by the plot
function applied to a GeoDataFrame is illustrated. A bare bones implementation only requires the variable (column) to be mapped and the argument legend = True
. Without the latter, there will still be a map, but it will not have a legend, so there will be no guide as to what the colors mean.
[ ]:
dfs.plot('POLICE',legend=True)
Not exactly the prettiest thing in the world. A continuous color ramp as seen here is not recommended by cartographers. Also, the classification is such that too many observations have seemingly the same color. Finally, there is also this strange mention of <Axes: >.
There are two important types of modifications that can be considered. One pertains to the fundamental characteristics of a choropleth map, the other to the way matplotlib constructs visualizations under the hood. The geopandas library relies on matplotlib so there is no need to import
the latter explicitly, except when one wants to save the maps to a file. In any case, it helps to understand the matplotlib logic. This is considered first.
Matplotlib Logic¶
The matplotlib library is extremely powerful and allows just about any type of customized visualization. It starts by setting up the basic parameters and then builds a graphic representation layer by layer. The terminology may seem a bit strange at first, but after a while, it becomes more familiar.
A plot is initialized by assigning some parameters to the tuple fig , ax
. It is important to realize that fig
is about the figure makeup and ax
is about the actual plots. For example, fig
is used to specify how many subplots there need to be, how they are arranged and what their size is. Since the examples used here and in later notebooks will only produce a single plot, the fig
aspect can be ignored and only ax
is needed. In fact, for simple plots such as the maps in our
applications, the specification of ax
as such is not needed and the plot
function can be applied directly to the GeoDataFrame. However, it remains important that the plot object is referred to as ax
in many operations.
An alternative way to set up the default map just shown is to explicitly assign it to an object ax
, as ax = dfs.plot( )
with the same arguments as before. To remove the x-y coordinates and box around the map, the method set_axis_off()
is applied to the ax
object. Using this setup also removes the <Axes: > listing. Otherwise, everything is still the same as before.
[ ]:
ax = dfs.plot('POLICE',legend = True)
ax.set_axis_off()
Note that the same result can be obtained without the explicit assignment to ax
by simply applying the method to the plot
object, as in the example below. Typically, the more explicit assignment is considered to be more readable, but it is mostly a matter of preference.
[ ]:
dfs.plot('POLICE',legend=True).set_axis_off()
Map Design Characteristics¶
The purpose of a choropleth or thematic map is to visualize the spatial distribution of a variable over areal units. Choropleth comes from the Greek choros, which stands for region, so it is a map for regions. For our purposes, the proper design of a map has three important characteristics, which each translate into arguments to the plot
function:
classification
color
legend
Classification¶
Arguably the most important characteristic is the classification used, i.e., how the continuous distribution of a given variable gets translated into a small number of discrete categories, or bins. This is exactly the same issue encountered in the design of histogram bins.
The assignment of observations to distinct bins is done by the mapclassify library, which is part of the PySAL family. However, this is done under the hood by geopandas so that no separate import
statement is needed.
The classification is set by means of the scheme
argument. Common options are Quantiles
(for a quantile map), EqualInterval
(for an equal intervals map), NaturalBreaks
(for a natural breaks map), StdMean
(for a standard deviational map), and BoxPlot
(for a box map). All but the last two classifications require an additional argument for the number of bins, k
. This is not needed for the standard deviational map and the box map, for which the breakpoints are derived
from the data, respectively the standard deviation and the quartiles/hinge.
The default hinge for the box map is 1.5 times the interquartile range. Other values for the hinge can be specified by setting a different value for the argument hinge
, but this is typically not necessary. However, to pass this to the geopandas plot
function it cannot just be set as hinge = 3.0
as in mapclassify. In geopandas it is necessary to pass this in a classification_kwds
dictionary, where the relevant parameters are set. For example, this would be
classification_kwds = {"hinge": 3.0}
for a hinge of 3 times the interquartile range.
The default for the standard deviational map is to show all observations within one standard deviation below and above the mean as one category. To separate observations below and above the mean can be accomplished by setting the argument anchor
to True
. Again, this is done by means of the classification_kwds
dictionary.
Full details on all the classifications available through mapclassify and their use in geopandas can be found at https://geopandas.org/en/stable/docs/user_guide/mapping.html# and https://pysal.org/mapclassify/api.html.
Each of the five cases is illustrated in turn. Note that the column
argument is used to designate the variable to be mapped.
The placement of the legend is managed by means of the legend_kwds
argument (similar to classification_kwds
). This is a dictionary that specifies aspects such as the location of the legend and how it is positioned relative to its anchor point. It also makes it possible to set a title
for the legend, e.g., to set it to the variable that is being mapped.
In the examples, the following arguments are used: legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": "<variable name>"}
. This is not totally intuitive, but it works. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend for details about the various legend customizations.
Also note that the map uses the default color map. More appropriate color maps will be considered below.
A simple six category quantile map is illustrated by setting scheme = "Quantiles"
and k=6
. The legend
arguments now also include a title
. In addition, two ax
methods are used for a minor customization: ax.set_title
to give the map a title and, as before, ax.set_axis_off
to get rid of the box with x-y coordinates.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'Quantiles',
k = 6,
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title" : "Police"}
)
ax.set_title("Quantiles")
ax.set_axis_off()
Rather than repeating the single command for each type of map that needs the argument k
, a small loop is constructed that creates each in turn. This is accomplished by putting the name for the respective scheme
in a list and using that same name as the map title. The three types are Quantiles
, EqualInterval
and NaturalBreaks
.
[ ]:
schemek = ["Quantiles","EqualInterval","NaturalBreaks"]
for i in schemek:
ax = dfs.plot(
column = 'POLICE',
scheme = i,
k = 6,
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": "Police"}
)
ax.set_title(i)
ax.set_axis_off()
Note the contrast in the visualization of the spatial distribution between the different classifications. It is important to keep in mind that each has pros and cons. For example, the quantile map yields an equal number of observations in each category, but the range of the categories can vary subtantially, resulting in the grouping of very disparate observations. In the example, this is the case for the top category, which ranges from 1,275 to 10,972.
On the other hand, the range in an equal intervals map is the same for all categories, but as a result some bins may have very few or very many observations, as is the case here for the lowest bin.
Finally, a natural breaks map uses an optimization criterion (essentially equivalent to k-means on one variable) to determine the grouping of observations. Both the number of observations in each bin and the range of the bins is variable.
The standard deviational map and box map have a pre-set number of bins, depending on, respectively, standard deviational units and quantiles/interquantile range. Again, they are illustrated using a small loop.
[ ]:
schemenok = ["StdMean","BoxPlot"]
for i in schemenok:
ax = dfs.plot(
column = 'POLICE',
scheme = i,
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title(i)
ax.set_axis_off()
Both types of maps are designed to highlight outliers. In the standard deviational map, these are observations more than two standard deviations away from the mean, in the box map, the outliers are outside the hinge (1.5 times the interquartile range from the median). This can be customized by setting a different value for the hinge through the classification_kwds
argument. For example, selecting only the most extreme observations is achieved by setting
classification_kwds = {"hinge": 3.0}
, as illustrated below.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'BoxPlot',
k = 6,
classification_kwds = {'hinge': 3.0},
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": "Police"}
)
ax.set_title("Box Map")
ax.set_axis_off()
A standard deviational map with the categories below and above the mean shown is implemented with classification_kwds = {"anchor" : True}
, as shown below.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'StdMean',
k = 6,
classification_kwds = {'anchor': True},
legend = True,
legend_kwds = {"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()
Whereas the first three types of classifications have a color scheme that suggests a progression from low to high values, a so-called sequential legend, the standard deviational map and box map focus on differences from a central value. This requires a color map that highlights the move away from the center, a so-called diverging legend. In the examples shown so far, the categories were shown with the default sequential color map, which is not appropriate. The needed customizations are considered next.
Color Map¶
The color scheme for the map is set by means of the cmap
argument. This refers to a matplotlib color map, i.e., a pre-determined range of colors optimized for a particular purpose. For example, this allows for a different color map to represent a sequential vs. a diverging legend.
The full range of color maps can be found at https://matplotlib.org/stable/users/explain/colors/colormaps.html.
For our purposes, a good sequential color map uses a gradation that goes from light to dark, either in the same color, such as cmap="Blues"
, or moving between colors, such as cmap="YlOrRd"
. For a diverging legend, going from one extreme color to another is preferred, e.g., dark blue to light blue and then to light red and dark red, as in cmap="bwr"
, or even more extreme, as in cmap="seismic"
.
Some examples are shown below.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'Quantiles',
k = 6,
cmap = 'Blues',
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Quantiles")
ax.set_axis_off()
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'Quantiles',
k = 6,
cmap = 'YlOrRd',
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Quantiles")
ax.set_axis_off()
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'BoxPlot',
cmap = 'seismic',
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Box Map")
ax.set_axis_off()
But notice when this is applied to the standard deviational map with cmap = bwr
.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'StdMean',
cmap = 'bwr',
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()
What happened? Many of the counties are invisible. The reason is that there is no borderline specified for the map. This final customization is considered next.
Final Customization¶
As mentioned, the full range of matplotlib customizations is available to manipulate legends, colors and placement. For our purposes, one more map-specific element is of interest. As seen in the previous examples, the border between polygons is not clear or even non-existent.
This can be fixed by setting the edgecolor
and associated linewidth
attributes. For example, with edgecolor = "Black"
, the standard deviational map becomes more meaningful.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'StdMean',
cmap = 'bwr',
edgecolor = "Black",
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()
So far, the maps are generated in the notebook, but are not separately available. To save a specific map to a file, the matplotlib.pyplot.savefig
command is used. For example, to save the standard deviational map (or any other map) to a png format file, only the filename needs to be specified as an argument to plt.savefig
. Optionally, to get higher quality figures, the number of dots per inch can be set by means of dpi
.
This is illustrated for the standard deviational map where a more subtle border line is obtained by setting the thickness with linewidth = 0.2
. The quality is set to dpi = 600
.
The file will be in the current working directory.
[ ]:
ax = dfs.plot(
column = 'POLICE',
scheme = 'StdMean',
cmap = 'bwr',
edgecolor = "Black",
linewidth = 0.2,
legend = True,
legend_kwds={"loc":"center left","bbox_to_anchor":(1,0.5), "title": 'Police'}
)
ax.set_title("Standard Deviational Map")
ax.set_axis_off()
plt.savefig("police_stdmean.png",dpi=600)
Finally, a map with just the county borders is obtained with the boundary.plot
command, where the color of the border line is controlled by edgecolor
and the line thickness by linewidth
, as before.
[ ]:
ax = dfs.boundary.plot(
edgecolor = "Black",
linewidth = 0.2,
)
ax.set_title("Map of County Boundaries")
ax.set_axis_off()
Practice¶
Use your own data set or one of the PySAL sample data sets to load a spatial data frame and experiment with various map types, color schemes and other customizations. Save each map to a file for inclusion in papers, reports, etc.