Introduction to mapclassify

mapclassify implements a family of classification schemes for choropleth maps. Its focus is on the determination of the number of classes, and the assignment of observations to those classes. It is intended for use with upstream mapping and geovisualization packages (see geopandas and geoplot for examples) that handle the rendering of the maps.

In this notebook, the basic functionality of mapclassify is presented.

[28]:
import mapclassify as mc
mc.__version__
[28]:
'2.3.0'

Example data

mapclassify contains a built-in dataset for employment density for the 58 California counties.

[29]:
y = mc.load_example()

Basic Functionality

All classifiers in mapclassify have a common interface and afford similar functionality. We illustrate these using the MaximumBreaks classifier. MaximumBreaks requires that the user specify the number of classes k. Given this, the logic of the classifier is to sort the observations in ascending order and find the difference between rank adjacent values. The class boundaries are defined as the \(k-1\) largest rank-adjacent breaks in the sorted values.

[30]:
mc.MaximumBreaks(y, k=4)
[30]:
MaximumBreaks

     Interval        Count
--------------------------
[   0.13,  228.49] |    52
( 228.49,  546.67] |     4
( 546.67, 2417.15] |     1
(2417.15, 4111.45] |     1

The classifier returns an instance of MaximumBreaks that reports the resulting intervals and counts. The first class has closed lower and upper bounds: [   0.13,  228.49], with 0.13 being the minimum value in the dataset:

[31]:
y.min()
[31]:
0.13

Subsequent intervals are open on the lower bound and closed on the upper bound. The fourth class has the maximum value as its closed upper bound:

[32]:
y.max()
[32]:
4111.45

Assigning the classifier to an object let’s us inspect other aspects of the classifier:

[33]:
mb4 = mc.MaximumBreaks(y, k=4)
[34]:
mb4
[34]:
MaximumBreaks

     Interval        Count
--------------------------
[   0.13,  228.49] |    52
( 228.49,  546.67] |     4
( 546.67, 2417.15] |     1
(2417.15, 4111.45] |     1

The bins attribute has the upper bounds of the intervals:

[35]:
mb4.bins
[35]:
array([ 228.49 ,  546.675, 2417.15 , 4111.45 ])

and counts reports the number of values falling in each bin:

[36]:
mb4.counts
[36]:
array([52,  4,  1,  1])

The specific bin (i.e. label) for each observation can be found in the yb attribute:

[37]:
mb4.yb
[37]:
array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Changing the number of classes

Staying the the same classifier, the user can apply the same classification rule, but for a different number of classes:

[47]:
mc.MaximumBreaks(y, k=7)
[47]:
MaximumBreaks

     Interval        Count
--------------------------
[   0.13,  146.00] |    50
( 146.00,  228.49] |     2
( 228.49,  291.02] |     1
( 291.02,  350.21] |     2
( 350.21,  546.67] |     1
( 546.67, 2417.15] |     1
(2417.15, 4111.45] |     1
[48]:
mb7 = mc.MaximumBreaks(y, k=7)
[49]:
mb7.bins
[49]:
array([ 146.005,  228.49 ,  291.02 ,  350.21 ,  546.675, 2417.15 ,
       4111.45 ])
[50]:
mb7.counts
[50]:
array([50,  2,  1,  2,  1,  1,  1])
[51]:
mb7.yb
[51]:
array([3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 1, 0, 0, 0, 6, 0, 0, 3, 0, 2, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

One additional attribute to mention here is the adcm attribute:

[43]:
mb7.adcm
[43]:
727.3200000000002

adcm is a measure of fit, defined as the mean absolute deviation around the class median.

[44]:
mb4.adcm
[44]:
1181.4900000000002

The adcm can be expected to decrease as \(k\) increases for a given classifier. Thus, if using as a measure of fit, the adcm should only be used to compare classifiers defined on the same number of classes.

Next Steps

MaximumBreaks is but one of many classifiers in mapclassify:

[53]:
mc.classifiers.CLASSIFIERS
[53]:
('BoxPlot',
 'EqualInterval',
 'FisherJenks',
 'FisherJenksSampled',
 'HeadTailBreaks',
 'JenksCaspall',
 'JenksCaspallForced',
 'JenksCaspallSampled',
 'MaxP',
 'MaximumBreaks',
 'NaturalBreaks',
 'Quantiles',
 'Percentiles',
 'StdMean',
 'UserDefined')

To learn more about an individual classifier, introspection is available:

[54]:
mc.MaximumBreaks?
Init signature: mc.MaximumBreaks(y, k=5, mindiff=0)
Docstring:
Maximum Breaks Map Classification

Parameters
----------
y  : array
     (n, 1), values to classify

k  : int
     number of classes required

mindiff : float
          The minimum difference between class breaks

Attributes
----------
yb : array
     (n, 1), bin ids for observations
bins : array
       (k, 1), the upper bounds of each class
k    : int
       the number of classes
counts : array
         (k, 1), the number of observations falling in each class (numpy
         array k x 1)

Examples
--------
>>> import mapclassify as mc
>>> cal = mc.load_example()
>>> mb = mc.MaximumBreaks(cal, k = 5)
>>> mb.k
5
>>> mb.bins
array([ 146.005,  228.49 ,  546.675, 2417.15 , 4111.45 ])
>>> mb.counts
array([50,  2,  4,  1,  1])
File:           ~/Dropbox/p/pysal/src/subpackages/mapclassify/mapclassify/classifiers.py
Type:           type
Subclasses:

For more comprehensive appliciations of mapclassify the interested reader is directed to the chapter on choropleth mapping in Rey, Arribas-Bel, and Wolf (2020) “Geographic Data Science with PySAL and the PyData Stack”.