This page was generated from notebooks/01_maximum_breaks.ipynb. Interactive online version:
Introduction to mapclassify¶
mapclassify
implements a family of classification schemes for choropleth maps. Its focus is on the determination of the number of classes, and the assignment of observations to those classes. It is intended for use with upstream mapping and geovisualization packages (see geopandas and geoplot for examples) that handle the rendering of the maps.
In this notebook, the basic functionality of mapclassify is presented.
[1]:
import mapclassify as mc
mc.__version__
[1]:
'2.4.2+78.gc62d2d7.dirty'
Example data¶
mapclassify
contains a built-in dataset for employment density for the 58 California counties.
[2]:
y = mc.load_example()
Basic Functionality¶
All classifiers in mapclassify
have a common interface and afford similar functionality. We illustrate these using the MaximumBreaks
classifier.
MaximumBreaks
requires that the user specify the number of classes k
. Given this, the logic of the classifier is to sort the observations in ascending order and find the difference between rank adjacent values. The class boundaries are defined as the \(k-1\) largest rank-adjacent breaks in the sorted values.
[3]:
mc.MaximumBreaks(y, k=4)
[3]:
MaximumBreaks
Interval Count
--------------------------
[ 0.13, 228.49] | 52
( 228.49, 546.67] | 4
( 546.67, 2417.15] | 1
(2417.15, 4111.45] | 1
The classifier returns an instance of MaximumBreaks
that reports the resulting intervals and counts. The first class has closed lower and upper bounds:
[ 0.13, 228.49]
with 0.13
being the minimum value in the dataset:
[4]:
y.min()
[4]:
0.13
Subsequent intervals are open on the lower bound and closed on the upper bound. The fourth class has the maximum value as its closed upper bound:
[5]:
y.max()
[5]:
4111.45
Assigning the classifier to an object let’s us inspect other aspects of the classifier:
[6]:
mb4 = mc.MaximumBreaks(y, k=4)
mb4
[6]:
MaximumBreaks
Interval Count
--------------------------
[ 0.13, 228.49] | 52
( 228.49, 546.67] | 4
( 546.67, 2417.15] | 1
(2417.15, 4111.45] | 1
The bins
attribute has the upper bounds of the intervals:
[7]:
mb4.bins
[7]:
array([ 228.49 , 546.675, 2417.15 , 4111.45 ])
and counts
reports the number of values falling in each bin:
[8]:
mb4.counts
[8]:
array([52, 4, 1, 1])
The specific bin (i.e. label) for each observation can be found in the yb
attribute:
[9]:
mb4.yb
[9]:
array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Changing the number of classes¶
Staying with the the same classifier, the user can apply the same classification rule, but for a different number of classes:
[10]:
mb7 = mc.MaximumBreaks(y, k=7)
mb7
[10]:
MaximumBreaks
Interval Count
--------------------------
[ 0.13, 146.00] | 50
( 146.00, 228.49] | 2
( 228.49, 291.02] | 1
( 291.02, 350.21] | 2
( 350.21, 546.67] | 1
( 546.67, 2417.15] | 1
(2417.15, 4111.45] | 1
[11]:
mb7.bins
[11]:
array([ 146.005, 228.49 , 291.02 , 350.21 , 546.675, 2417.15 ,
4111.45 ])
[12]:
mb7.counts
[12]:
array([50, 2, 1, 2, 1, 1, 1])
[13]:
mb7.yb
[13]:
array([3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 1, 0, 0, 0, 6, 0, 0, 3, 0, 2, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
One additional attribute to mention here is the adcm
attribute:
[14]:
mb7.adcm
[14]:
727.3200000000002
adcm
is a measure of fit, defined as the mean absolute deviation around the class median.
[15]:
mb4.adcm
[15]:
1181.4900000000002
The adcm
can be expected to decrease as \(k\) increases for a given classifier. Thus, if using as a measure of fit, the adcm
should only be used to compare classifiers defined on the same number of classes.
Next Steps¶
MaximumBreaks
is but one of many classifiers in mapclassify
:
[16]:
mc.classifiers.CLASSIFIERS
[16]:
('BoxPlot',
'EqualInterval',
'FisherJenks',
'FisherJenksSampled',
'HeadTailBreaks',
'JenksCaspall',
'JenksCaspallForced',
'JenksCaspallSampled',
'MaxP',
'MaximumBreaks',
'NaturalBreaks',
'Quantiles',
'Percentiles',
'StdMean',
'UserDefined')
To learn more about an individual classifier, introspection is available:
[17]:
mc.MaximumBreaks?
For more comprehensive appliciations of mapclassify
the interested reader is directed to the chapter on choropleth mapping in Rey, Arribas-Bel, and Wolf (2020) “Geographic Data Science with PySAL and the PyData Stack”.