This page was generated from notebooks/azp.ipynb. Interactive online version: Binder badge

Automatic Zoning Procedure (AZP) algorithm

Authors: Xin Feng, James Gaboardi

AZP can work with different types of objective functions, which are very sensitive to aggregating data from a large number of zones into a pre-designated smaller number of regions. AZP was originally formulated in Openshaw, 1977 and then extended in Openshaw, S. and Rao, L. (1995).

%config InlineBackend.figure_format = "retina"
%load_ext watermark
Last updated: 2023-12-10T13:26:50.972099-05:00

Python implementation: CPython
Python version       : 3.12.0
IPython version      : 8.18.0

Compiler    : Clang 15.0.7
OS          : Darwin
Release     : 23.1.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit

import warnings

import geopandas
import libpysal
import spopt
from spopt.region import AZP

%matplotlib inline
%watermark -w
%watermark -iv
Watermark: 2.4.3

spopt    : 0.5.1.dev53+g5cadae7
libpysal : 4.9.2
geopandas: 0.14.1

Mexican State Regional Income Clustering

To illustrate azp we utilize data on regional incomes for Mexican states over the period 1940-2000, originally used in Rey and Sastré-Gutiérrez (2010).

We can first explore the data by plotting the per capital gross regional domestic product (in constant USD 2000 dollars) for each year in the sample, using a quintile classification:

pth = libpysal.examples.get_path("mexicojoin.shp")
mexico = geopandas.read_file(pth)
for year in range(1940, 2010, 10):
    base = mexico.plot(
        figsize=(8, 5),


First, we specify a number of parameters that will serve as input to the azp model.

The variables in the dataframe that will be used to measure regional dissimilarity:

attrs_name = [f"PCGDP{year}" for year in range(1950, 2010, 10)]
['PCGDP1950', 'PCGDP1960', 'PCGDP1970', 'PCGDP1980', 'PCGDP1990', 'PCGDP2000']

A spatial weights object expresses the spatial connectivity of the zones:

with warnings.catch_warnings(record=True):
    w = libpysal.weights.Queen.from_dataframe(mexico)

The number of regions that we would like to aggregate these zones into:

n_clusters = 5

There are four optional parameters. In this example, we only use the default settings, you can define them as needed.

allow_move_strategy: For a different behavior for allowing moves, an AllowMoveStrategy instance can be passed as argument.

class: AllowMoveStrategy or None, default: None

random_state: Random seed.

None, int, str, bytes, or bytearray, default: None

initial_labels: One-dimensional array of labels at the beginning of the algorithm.

class: numpy.ndarray or None, default: None
If None, then a random initial clustering will be generated.

objective_func: the objective function to use.

class: spopt.region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()

The model can then be solved:

model = AZP(mexico, w, attrs_name, n_clusters)
mexico["azp_new"] = model.labels_
mexico["number"] = 1
mexico[["azp_new", "number"]].groupby(by="azp_new").count()
0.0 8
1.0 10
2.0 5
3.0 4
4.0 5
mexico.plot(figsize=(8, 5), column="azp_new", categorical=True, ec="w").axis("off");

The model solution results in five regions, two of which have five states, one with four, one with eight, and one with ten states.

Year-by-Year Regionalization (n_clusters = 5 regions)

for year in attrs_name:

    model = AZP(mexico, w, year, 5)
    lab = year + "labels_"
    mexico[lab] = model.labels_
    base = mexico.plot(figsize=(8, 5), column=lab, categorical=True, edgecolor="w")