{ "cells": [ { "cell_type": "markdown", "id": "6c2d40bf", "metadata": {}, "source": [ "# Basic Mapping\n", "\n", "### Luc Anselin\n", "\n", "### 09/06/2024" ] }, { "cell_type": "markdown", "id": "f411eb3f", "metadata": {}, "source": [ "## Preliminaries" ] }, { "cell_type": "markdown", "id": "3c764aed", "metadata": {}, "source": [ "There are many ways to create beautiful maps in Python using packages such as *folium* or *plotly*. In this notebook, the `plot` functionality of *geopandas* is illustrated, which is sufficient for most of our purposes. The functionality will be illustrated with the **Police** sample data set that contains police expenditure data for Mississippi counties. It is assumed that this data has been installed using `libpysal.examples.load_example(\"Police\")`.\n", "\n", "A video recording is available from the GeoDa Center YouTube channel playlist *Applied Spatial Regression - Notebooks*, at https://www.youtube.com/watch?v=rZ1Mw-hZcMY&list=PLzREt6r1NenmhNy-FCUwiXL17Vyty5VL6&index=3." ] }, { "cell_type": "markdown", "id": "936a0938", "metadata": {}, "source": [ "### Modules Needed\n", "\n", "As before, the main modules are *geopandas* and *libpysal*. Specifically, *libpysal.examples* is used to get the path to the sample data. In addition, to save the maps to a file, *matplotlib.pyplot* is needed.\n", "\n", "The full set of imports is shown below." ] }, { "cell_type": "code", "execution_count": 44, "id": "db02c49b", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['USE_PYGEOS'] = '0'\n", "import geopandas as gpd\n", "import matplotlib.pyplot as plt\n", "from libpysal.examples import get_path" ] }, { "cell_type": "markdown", "id": "203e4930", "metadata": {}, "source": [ "### Functions Used\n", "\n", "- from geopandas:\n", " - read_file\n", " - plot\n", "\n", "- from libpysal:\n", " - get_path\n", "\n", "- from matplotlib.pyplot:\n", " - savefig" ] }, { "cell_type": "markdown", "id": "2c3416ff", "metadata": {}, "source": [ "### Files\n", "\n", "The mapping functionality will be illustrated with the same **Police** sample data set as used in the previous notebook. The following files will be used:\n", "\n", "- **police.shp,shx,dbf,prj**: shape file (four files) for 82 counties\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "d82e2963", "metadata": {}, "outputs": [], "source": [ "infileshp = \"police.shp\" # input shape file\n", "inpath = get_path(infileshp)\n", "dfs = gpd.read_file(inpath)" ] }, { "cell_type": "markdown", "id": "2ac636ba", "metadata": {}, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "id": "7c30a057", "metadata": {}, "source": [ "### Default Map" ] }, { "cell_type": "markdown", "id": "52cf67ae", "metadata": {}, "source": [ "Before delving into customization, the default choropleth map created by the `plot` function applied to a **GeoDataFrame** is illustrated. A bare bones implementation only requires the variable (column) to be mapped and the argument `legend = True`. Without the latter, there will still be a map, but it will not have a legend, so there will be no guide as to what the colors mean." ] }, { "cell_type": "code", "execution_count": null, "id": "b2f412ce", "metadata": {}, "outputs": [], "source": [ "dfs.plot('POLICE',legend=True)" ] }, { "cell_type": "markdown", "id": "01ce3b94", "metadata": {}, "source": [ "Not exactly the prettiest thing in the world. A continuous color ramp as seen here is not recommended by cartographers. Also, the classification is such that too many observations have seemingly the same color. Finally, there is also this strange mention of ****.\n", "\n", "There are two important types of modifications that can be considered. One pertains to the fundamental characteristics of a choropleth map, the other to the way *matplotlib* constructs visualizations under the hood. The *geopandas* library relies on *matplotlib* so there is no need to `import` the latter explicitly, except when one wants to save the maps to a file. In any case, it helps to understand the *matplotlib* logic. This is considered first." ] }, { "cell_type": "markdown", "id": "c12c0184", "metadata": {}, "source": [ "### Matplotlib Logic" ] }, { "cell_type": "markdown", "id": "78e5979e", "metadata": {}, "source": [ "The *matplotlib* library is extremely powerful and allows just about any type of customized visualization. It starts by setting up the basic parameters and then builds a graphic representation layer by layer. The terminology may seem a bit strange at first, but after a while, it becomes more familiar.\n", "\n", "A plot is initialized by assigning some parameters to the tuple `fig , ax`. It is important to realize that `fig` is about the figure makeup and `ax` is about the actual plots. For example, `fig` is used to specify how many subplots there need to be, how they are arranged and what their size is. Since the examples used here and in later notebooks will only produce a single plot, the `fig` aspect can be ignored and only `ax` is needed. In fact, for simple plots such as the maps in our applications, the specification of `ax` as such is not needed and the `plot` function can be applied directly to the GeoDataFrame. However, it remains important that the plot object is referred to as `ax` in many operations." ] }, { "cell_type": "markdown", "id": "579ab0f0", "metadata": {}, "source": [ "An alternative way to set up the default map just shown is to explicitly assign it to an object `ax`, as `ax = dfs.plot( )` with the same arguments as before. To remove the x-y coordinates and box around the map, the method `set_axis_off()` is applied to the `ax` object. Using this setup also removes the **** listing. Otherwise, everything is still the same as before." ] }, { "cell_type": "code", "execution_count": null, "id": "d50f07df", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot('POLICE',legend = True)\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "0d881ef0", "metadata": {}, "source": [ "Note that the same result can be obtained without the explicit assignment to `ax` by simply applying the method to the `plot` object, as in the example below. Typically, the more explicit assignment is considered to be more readable, but it is mostly a matter of preference." ] }, { "cell_type": "code", "execution_count": null, "id": "09d54b24", "metadata": {}, "outputs": [], "source": [ "dfs.plot('POLICE',legend=True).set_axis_off()" ] }, { "cell_type": "markdown", "id": "392346c2", "metadata": {}, "source": [ "## Map Design Characteristics" ] }, { "cell_type": "markdown", "id": "a505e9bf", "metadata": {}, "source": [ "The purpose of a choropleth or thematic map is to visualize the spatial distribution of a variable over areal units. Choropleth comes from the Greek choros, which stands for region, so it is a map for regions. For our purposes, the proper design of a map has three important characteristics, which each translate into arguments to the `plot` function:\n", "\n", "- classification\n", "\n", "- color\n", "\n", "- legend" ] }, { "cell_type": "markdown", "id": "2c3619b7", "metadata": {}, "source": [ "### Classification" ] }, { "cell_type": "markdown", "id": "a3db6d84", "metadata": {}, "source": [ "Arguably the most important characteristic is the classification used, i.e., how the continuous distribution of a given variable gets translated into a small number of discrete categories, or bins. This is exactly the same issue encountered in the design of histogram bins.\n", "\n", "The assignment of observations to distinct bins is done by the *mapclassify* library, which is part of the *PySAL* family. However, this is done under the hood by *geopandas* so that no separate `import` statement is needed.\n", "\n", "The classification is set by means of the `scheme` argument. Common options are `Quantiles` (for a quantile map), `EqualInterval` (for an equal intervals map), `NaturalBreaks` (for a natural breaks map), `StdMean` (for a standard deviational map), and `BoxPlot` (for a box map). All but the last two classifications require an additional argument for the number of bins, `k`. This is not needed for the standard deviational map and the box map, for which the breakpoints are derived from the data, respectively the standard deviation and the quartiles/hinge.\n", "\n", "The default hinge for the box map is 1.5 times the interquartile range. Other values for the hinge can be specified by setting a different value for the argument `hinge`, but this is typically not necessary. However, to pass this to the *geopandas* `plot` function it cannot just be set as `hinge = 3.0` as in *mapclassify*. In *geopandas* it is necessary to pass this in a `classification_kwds` dictionary, where the relevant parameters are set. For example, this would be `classification_kwds = {\"hinge\": 3.0}` for a hinge of 3 times the interquartile range.\n", "\n", "The default for the standard deviational map is to show all observations within one standard deviation below and above the mean as one category. To separate observations below and above the mean can be accomplished by setting the argument `anchor` to `True`. Again, this is done by means of the `classification_kwds` dictionary.\n", "\n", "Full details on all the classifications available through *mapclassify* and their use in *geopandas* can be found at https://geopandas.org/en/stable/docs/user_guide/mapping.html# and https://pysal.org/mapclassify/api.html.\n", "\n", "Each of the five cases is illustrated in turn. Note that the `column` argument is used to designate the variable to be mapped.\n", "\n", "The placement of the legend is managed by means of the `legend_kwds` argument (similar to `classification_kwds`). This is a dictionary that specifies aspects such as the location of the legend and how it is positioned relative to its anchor point. It also makes it possible to set a `title` for the legend, e.g., to set it to the variable that is being mapped.\n", "\n", "In the examples, the following arguments are used: `legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": \"\"}`. This is not totally intuitive, but it works. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend for details about the various legend customizations.\n", "\n", "Also note that the map uses the default color map. More appropriate color maps will be considered below." ] }, { "cell_type": "markdown", "id": "bcae4d0a", "metadata": {}, "source": [ "#### Quantile Map\n", "\n", "A simple six category quantile map is illustrated by setting `scheme = \"Quantiles\"` and `k=6`. The `legend` arguments now also include a `title`. In addition, two `ax` methods are used for a minor customization: `ax.set_title` to give the map a title and, as before, `ax.set_axis_off` to get rid of the box with x-y coordinates." ] }, { "cell_type": "code", "execution_count": null, "id": "a7724f4c", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'Quantiles',\n", " k = 6,\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\" : \"Police\"}\n", ")\n", "ax.set_title(\"Quantiles\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "734148b7", "metadata": {}, "source": [ "#### Maps with Set Number of Bins" ] }, { "cell_type": "markdown", "id": "a96e9a4b", "metadata": {}, "source": [ "Rather than repeating the single command for each type of map that needs the argument `k`, a small loop is constructed that creates each in turn. This is accomplished by putting the name for the respective `scheme` in a list and using that same name as the map title. The three types are `Quantiles`, `EqualInterval` and `NaturalBreaks`." ] }, { "cell_type": "code", "execution_count": null, "id": "ca52a103", "metadata": {}, "outputs": [], "source": [ "schemek = [\"Quantiles\",\"EqualInterval\",\"NaturalBreaks\"]\n", "for i in schemek:\n", " ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = i,\n", " k = 6,\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": \"Police\"}\n", " )\n", " ax.set_title(i)\n", " ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "6fbc2acd", "metadata": {}, "source": [ "Note the contrast in the visualization of the spatial distribution between the different classifications. It is important to keep in mind that each has pros and cons. For example, the quantile map yields an equal number of observations in each category, but the range of the categories can vary subtantially, resulting in the grouping of very disparate observations. In the example, this is the case for the top category, which ranges from 1,275 to 10,972.\n", "\n", "On the other hand, the range in an equal intervals map is the same for all categories, but as a result some bins may have very few or very many observations, as is the case here for the lowest bin.\n", "\n", "Finally, a natural breaks map uses an optimization criterion (essentially equivalent to k-means on one variable) to determine the grouping of observations. Both the number of observations in each bin and the range of the bins is variable." ] }, { "cell_type": "markdown", "id": "62839ace", "metadata": {}, "source": [ "#### Maps with Predetermined number of Bins\n", "\n", "The standard deviational map and box map have a pre-set number of bins, depending on, respectively, standard deviational units and quantiles/interquantile range. Again, they are illustrated using a small loop." ] }, { "cell_type": "code", "execution_count": null, "id": "3b56d7d8", "metadata": {}, "outputs": [], "source": [ "schemenok = [\"StdMean\",\"BoxPlot\"]\n", "for i in schemenok:\n", " ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = i,\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", " )\n", " ax.set_title(i)\n", " ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "c2b0cd96", "metadata": {}, "source": [ "Both types of maps are designed to highlight outliers. In the standard deviational map, these are observations more than two standard deviations away from the mean, in the box map, the outliers are outside the hinge (1.5 times the interquartile range from the median). This can be customized by setting a different value for the hinge through the `classification_kwds` argument. For example, selecting only the most extreme observations is achieved by setting `classification_kwds = {\"hinge\": 3.0}`, as illustrated below." ] }, { "cell_type": "code", "execution_count": null, "id": "5de45eaa", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'BoxPlot',\n", " k = 6,\n", " classification_kwds = {'hinge': 3.0},\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": \"Police\"}\n", ")\n", "ax.set_title(\"Box Map\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "df595929", "metadata": {}, "source": [ "A standard deviational map with the categories below and above the mean shown is implemented with `classification_kwds = {\"anchor\" : True}`, as shown below." ] }, { "cell_type": "code", "execution_count": null, "id": "7e17b901", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'StdMean',\n", " k = 6,\n", " classification_kwds = {'anchor': True},\n", " legend = True,\n", " legend_kwds = {\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Standard Deviational Map\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "0a4e8ede", "metadata": {}, "source": [ "Whereas the first three types of classifications have a color scheme that suggests a progression from low to high values, a so-called *sequential* legend, the standard deviational map and box map focus on differences from a central value. This requires a color map that highlights the move away from the center, a so-called *diverging* legend. In the examples shown so far, the categories were shown with the default sequential color map, which is not appropriate. The needed customizations are considered next." ] }, { "cell_type": "markdown", "id": "934e2cab", "metadata": {}, "source": [ "### Color Map" ] }, { "cell_type": "markdown", "id": "4ee73f79", "metadata": {}, "source": [ "The color scheme for the map is set by means of the `cmap` argument. This refers to a *matplotlib* color map, i.e., a pre-determined range of colors optimized for a particular purpose. For example, this allows for a different color map to represent a sequential vs. a diverging legend.\n", "\n", "The full range of color maps can be found at https://matplotlib.org/stable/users/explain/colors/colormaps.html.\n", "\n", "For our purposes, a good sequential color map uses a gradation that goes from light to dark, either in the same color, such as `cmap=\"Blues\"`, or moving between colors, such as `cmap=\"YlOrRd\"`. For a diverging legend, going from one extreme color to another is preferred, e.g., dark blue to light blue and then to light red and dark red, as in `cmap=\"bwr\"`, or even more extreme, as in `cmap=\"seismic\"`.\n", "\n", "Some examples are shown below." ] }, { "cell_type": "code", "execution_count": null, "id": "d4eaf9c1", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'Quantiles',\n", " k = 6,\n", " cmap = 'Blues',\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Quantiles\")\n", "ax.set_axis_off()" ] }, { "cell_type": "code", "execution_count": null, "id": "e523a1f3", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'Quantiles',\n", " k = 6,\n", " cmap = 'YlOrRd',\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Quantiles\")\n", "ax.set_axis_off()" ] }, { "cell_type": "code", "execution_count": null, "id": "bfd213e8", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'BoxPlot',\n", " cmap = 'seismic',\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Box Map\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "bc43ce36", "metadata": {}, "source": [ "But notice when this is applied to the standard deviational map with `cmap = bwr`." ] }, { "cell_type": "code", "execution_count": null, "id": "c658bd19", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'StdMean',\n", " cmap = 'bwr',\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Standard Deviational Map\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "75ec5a5a", "metadata": {}, "source": [ "What happened? Many of the counties are invisible. The reason is that there is no borderline specified for the map. This final customization is considered next." ] }, { "cell_type": "markdown", "id": "bb66c005", "metadata": {}, "source": [ "### Final Customization" ] }, { "cell_type": "markdown", "id": "da719cbc", "metadata": {}, "source": [ "As mentioned, the full range of *matplotlib* customizations is available to manipulate legends, colors and placement. For our purposes, one more map-specific element is of interest. As seen in the previous examples, the border between polygons is not clear or even non-existent. \n", "\n", "This can be fixed by setting the `edgecolor` and associated `linewidth` attributes. For example, with `edgecolor = \"Black\"`, the standard deviational map becomes more meaningful." ] }, { "cell_type": "code", "execution_count": null, "id": "a6392b64", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'StdMean',\n", " cmap = 'bwr',\n", " edgecolor = \"Black\",\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Standard Deviational Map\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "0fd86ac4", "metadata": {}, "source": [ "#### Saving the Map to a File" ] }, { "cell_type": "markdown", "id": "2f317018", "metadata": {}, "source": [ "So far, the maps are generated in the notebook, but are not separately available. To save a specific map to a file, the `matplotlib.pyplot.savefig` command is used. For example, to save the standard deviational map (or any other map) to a png format file, only the filename needs to be specified as an argument to `plt.savefig`. Optionally, to get higher quality figures, the number of dots per inch can be set by means of `dpi`. \n", "\n", "This is illustrated for the standard deviational map where a more subtle border line is obtained by setting the thickness with `linewidth = 0.2`. The quality is set to `dpi = 600`.\n", "\n", "The file will be in the current working directory." ] }, { "cell_type": "code", "execution_count": null, "id": "61991782", "metadata": {}, "outputs": [], "source": [ "ax = dfs.plot(\n", " column = 'POLICE',\n", " scheme = 'StdMean',\n", " cmap = 'bwr',\n", " edgecolor = \"Black\",\n", " linewidth = 0.2,\n", " legend = True,\n", " legend_kwds={\"loc\":\"center left\",\"bbox_to_anchor\":(1,0.5), \"title\": 'Police'}\n", ")\n", "ax.set_title(\"Standard Deviational Map\")\n", "ax.set_axis_off()\n", "plt.savefig(\"police_stdmean.png\",dpi=600)" ] }, { "cell_type": "markdown", "id": "27a262bc", "metadata": {}, "source": [ "Finally, a map with just the county borders is obtained with the `boundary.plot` command, where the color of the border line is controlled by `edgecolor` and the line thickness by `linewidth`, as before." ] }, { "cell_type": "code", "execution_count": null, "id": "98d3c839", "metadata": {}, "outputs": [], "source": [ "ax = dfs.boundary.plot(\n", " edgecolor = \"Black\",\n", " linewidth = 0.2,\n", ")\n", "ax.set_title(\"Map of County Boundaries\")\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "1ad019ff", "metadata": {}, "source": [ "## Practice" ] }, { "cell_type": "markdown", "id": "72ca1664", "metadata": {}, "source": [ "Use your own data set or one of the PySAL sample data sets to load a spatial data frame and experiment with various map types, color schemes and other customizations. Save each map to a file for inclusion in papers, reports, etc." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }