This page was generated from notebooks/04_pooled.ipynb. Interactive online version: Binder badge

Pooled Classification

A common workflow with longitudinal spatial data is to apply the same classification scheme to an attribute over different time periods. More specifically, one would like to keep the class breaks the same over each period and examine how the mass of the distribution changes over these classes in the different periods.

The Pooled classifier supports this workflow.

[1]:
import mapclassify
import numpy

mapclassify.__version__
[1]:
'2.4.2+78.gc62d2d7.dirty'

Sample Data

We construct a synthetic dataset composed of 20 cross-sectional units at three time points. Here the mean of the series is increasing over time.

[2]:
n = 20
data = numpy.array([numpy.arange(n) + i * n for i in range(1, 4)]).T
data.shape
[2]:
(20, 3)
[3]:
data
[3]:
array([[20, 40, 60],
       [21, 41, 61],
       [22, 42, 62],
       [23, 43, 63],
       [24, 44, 64],
       [25, 45, 65],
       [26, 46, 66],
       [27, 47, 67],
       [28, 48, 68],
       [29, 49, 69],
       [30, 50, 70],
       [31, 51, 71],
       [32, 52, 72],
       [33, 53, 73],
       [34, 54, 74],
       [35, 55, 75],
       [36, 56, 76],
       [37, 57, 77],
       [38, 58, 78],
       [39, 59, 79]])

Default: Quintiles

The default is to apply a vec operator to the data matrix and treat the observations as a single collection. Here the quantiles of the pooled data are obtained.

[4]:
res = mapclassify.Pooled(data)
res
[4]:
Pooled Classifier

Pooled Quantiles

   Interval      Count
----------------------
[20.00, 31.80] |    12
(31.80, 43.60] |     8
(43.60, 55.40] |     0
(55.40, 67.20] |     0
(67.20, 79.00] |     0

Pooled Quantiles

   Interval      Count
----------------------
[20.00, 31.80] |     0
(31.80, 43.60] |     4
(43.60, 55.40] |    12
(55.40, 67.20] |     4
(67.20, 79.00] |     0

Pooled Quantiles

   Interval      Count
----------------------
[20.00, 31.80] |     0
(31.80, 43.60] |     0
(43.60, 55.40] |     0
(55.40, 67.20] |     8
(67.20, 79.00] |    12

Note that the class definitions are constant across the periods.

[5]:
res = mapclassify.Pooled(data, k=4)
[6]:
res.col_classifiers[0].counts
[6]:
array([15,  5,  0,  0])
[7]:
res.col_classifiers[-1].counts
[7]:
array([ 0,  0,  5, 15])
[8]:
res.global_classifier.counts
[8]:
array([15, 15, 15, 15])
[9]:
res
[9]:
Pooled Classifier

Pooled Quantiles

   Interval      Count
----------------------
[20.00, 34.75] |    15
(34.75, 49.50] |     5
(49.50, 64.25] |     0
(64.25, 79.00] |     0

Pooled Quantiles

   Interval      Count
----------------------
[20.00, 34.75] |     0
(34.75, 49.50] |    10
(49.50, 64.25] |    10
(64.25, 79.00] |     0

Pooled Quantiles

   Interval      Count
----------------------
[20.00, 34.75] |     0
(34.75, 49.50] |     0
(49.50, 64.25] |     5
(64.25, 79.00] |    15

Extract the pooled classification objects for each column.

[10]:
c0, c1, c2 = res.col_classifiers
c0
[10]:
Pooled Quantiles

   Interval      Count
----------------------
[20.00, 34.75] |    15
(34.75, 49.50] |     5
(49.50, 64.25] |     0
(64.25, 79.00] |     0

Compare to the unrestricted classifier for the first column…

[11]:
mapclassify.Quantiles(c0.y, k=4)
[11]:
Quantiles

   Interval      Count
----------------------
[20.00, 24.75] |     5
(24.75, 29.50] |     5
(29.50, 34.25] |     5
(34.25, 39.00] |     5

… and the last column comparisions.

[12]:
c2
[12]:
Pooled Quantiles

   Interval      Count
----------------------
[20.00, 34.75] |     0
(34.75, 49.50] |     0
(49.50, 64.25] |     5
(64.25, 79.00] |    15
[13]:
mapclassify.Quantiles(c2.y, k=4)
[13]:
Quantiles

   Interval      Count
----------------------
[60.00, 64.75] |     5
(64.75, 69.50] |     5
(69.50, 74.25] |     5
(74.25, 79.00] |     5

Non-default classifier: BoxPlot

[14]:
res = mapclassify.Pooled(data, classifier="BoxPlot", hinge=1.5)
res
[14]:
Pooled Classifier

Pooled BoxPlot

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |    15
( 34.75,  49.50] |     5
( 49.50,  64.25] |     0
( 64.25, 108.50] |     0

Pooled BoxPlot

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |     0
( 34.75,  49.50] |    10
( 49.50,  64.25] |    10
( 64.25, 108.50] |     0

Pooled BoxPlot

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |     0
( 34.75,  49.50] |     0
( 49.50,  64.25] |     5
( 64.25, 108.50] |    15
[15]:
res.col_classifiers[0].bins
[15]:
array([ -9.5 ,  34.75,  49.5 ,  64.25, 108.5 ])
[16]:
c0, c1, c2 = res.col_classifiers
[17]:
c0.yb
[17]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2])
[18]:
c00 = mapclassify.BoxPlot(c0.y, hinge=3)
[19]:
c00.yb
[19]:
array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4])
[20]:
c00
[20]:
BoxPlot

   Interval      Count
----------------------
( -inf, -3.75] |     0
(-3.75, 24.75] |     5
(24.75, 29.50] |     5
(29.50, 34.25] |     5
(34.25, 62.75] |     5
[21]:
c0
[21]:
Pooled BoxPlot

    Interval       Count
------------------------
(  -inf,  -9.50] |     0
( -9.50,  34.75] |    15
( 34.75,  49.50] |     5
( 49.50,  64.25] |     0
( 64.25, 108.50] |     0

Non-default classifier: FisherJenks

[22]:
res = mapclassify.Pooled(data, classifier="FisherJenks", k=5)
res
[22]:
Pooled Classifier

Pooled FisherJenks

   Interval      Count
----------------------
[20.00, 31.00] |    12
(31.00, 43.00] |     8
(43.00, 55.00] |     0
(55.00, 67.00] |     0
(67.00, 79.00] |     0

Pooled FisherJenks

   Interval      Count
----------------------
[20.00, 31.00] |     0
(31.00, 43.00] |     4
(43.00, 55.00] |    12
(55.00, 67.00] |     4
(67.00, 79.00] |     0

Pooled FisherJenks

   Interval      Count
----------------------
[20.00, 31.00] |     0
(31.00, 43.00] |     0
(43.00, 55.00] |     0
(55.00, 67.00] |     8
(67.00, 79.00] |    12
[23]:
c0, c1, c2 = res.col_classifiers
mapclassify.FisherJenks(c0.y, k=5)
[23]:
FisherJenks

   Interval      Count
----------------------
[20.00, 23.00] |     4
(23.00, 27.00] |     4
(27.00, 31.00] |     4
(31.00, 35.00] |     4
(35.00, 39.00] |     4

Non-default classifier: MaximumBreaks

[24]:
data[1, 0] = 10
data[1, 1] = 10
data[1, 2] = 10
data[9, 2] = 10
data
[24]:
array([[20, 40, 60],
       [10, 10, 10],
       [22, 42, 62],
       [23, 43, 63],
       [24, 44, 64],
       [25, 45, 65],
       [26, 46, 66],
       [27, 47, 67],
       [28, 48, 68],
       [29, 49, 10],
       [30, 50, 70],
       [31, 51, 71],
       [32, 52, 72],
       [33, 53, 73],
       [34, 54, 74],
       [35, 55, 75],
       [36, 56, 76],
       [37, 57, 77],
       [38, 58, 78],
       [39, 59, 79]])
[25]:
res = mapclassify.Pooled(data, classifier="MaximumBreaks", k=5)
res
[25]:
Pooled Classifier

Pooled MaximumBreaks

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     1
(21.00, 41.00] |    18
(41.00, 61.00] |     0
(61.00, 79.00] |     0

Pooled MaximumBreaks

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     0
(21.00, 41.00] |     1
(41.00, 61.00] |    18
(61.00, 79.00] |     0

Pooled MaximumBreaks

   Interval      Count
----------------------
[10.00, 15.00] |     2
(15.00, 21.00] |     0
(21.00, 41.00] |     0
(41.00, 61.00] |     1
(61.00, 79.00] |    17
[26]:
c0, c1, c2 = res.col_classifiers
c0
[26]:
Pooled MaximumBreaks

   Interval      Count
----------------------
[10.00, 15.00] |     1
(15.00, 21.00] |     1
(21.00, 41.00] |    18
(41.00, 61.00] |     0
(61.00, 79.00] |     0
[27]:
c0.y
[27]:
array([20, 10, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
       37, 38, 39])
[28]:
import warnings
[29]:
with warnings.catch_warnings():
    warnings.filterwarnings("error")
    try:
        mapclassify.MaximumBreaks(c0.y, k=5)
    except UserWarning as e:
        print(e)
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore")
            mapclassify.MaximumBreaks(c0.y, k=5)
Insufficient number of unique diffs. Breaks are random.
[30]:
res = mapclassify.Pooled(
    data, classifier="UserDefined", bins=mapclassify.Quantiles(data[:, -1]).bins
)
res
[30]:
Pooled Classifier

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |    20
(62.80, 66.60] |     0
(66.60, 71.40] |     0
(71.40, 75.20] |     0
(75.20, 79.00] |     0

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |    20
(62.80, 66.60] |     0
(66.60, 71.40] |     0
(71.40, 75.20] |     0
(75.20, 79.00] |     0

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |     4
(62.80, 66.60] |     4
(66.60, 71.40] |     4
(71.40, 75.20] |     4
(75.20, 79.00] |     4
[31]:
mapclassify.Quantiles(data[:, -1])
[31]:
Quantiles

   Interval      Count
----------------------
[10.00, 62.80] |     4
(62.80, 66.60] |     4
(66.60, 71.40] |     4
(71.40, 75.20] |     4
(75.20, 79.00] |     4
[32]:
data[:, -1]
[32]:
array([60, 10, 62, 63, 64, 65, 66, 67, 68, 10, 70, 71, 72, 73, 74, 75, 76,
       77, 78, 79])

Pinning the pooling

Another option is to specify a specific subperiod as the definition for the classes in the pooling.

Pinning to the last period

As an example, we can use the quintles from the third period to defined the pooled classifier:

[33]:
pinned = mapclassify.Pooled(
    data, classifier="UserDefined", bins=mapclassify.Quantiles(data[:, -1]).bins
)
pinned
[33]:
Pooled Classifier

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |    20
(62.80, 66.60] |     0
(66.60, 71.40] |     0
(71.40, 75.20] |     0
(75.20, 79.00] |     0

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |    20
(62.80, 66.60] |     0
(66.60, 71.40] |     0
(71.40, 75.20] |     0
(75.20, 79.00] |     0

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |     4
(62.80, 66.60] |     4
(66.60, 71.40] |     4
(71.40, 75.20] |     4
(75.20, 79.00] |     4
[34]:
pinned.global_classifier
[34]:
UserDefined

   Interval      Count
----------------------
[10.00, 62.80] |    44
(62.80, 66.60] |     4
(66.60, 71.40] |     4
(71.40, 75.20] |     4
(75.20, 79.00] |     4

Pinning to the first period

[35]:
pinned = mapclassify.Pooled(
    data, classifier="UserDefined", bins=mapclassify.Quantiles(data[:, 0]).bins
)
pinned
[35]:
Pooled Classifier

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 23.80] |     4
(23.80, 27.60] |     4
(27.60, 31.40] |     4
(31.40, 35.20] |     4
(35.20, 39.00] |     4
(39.00, 79.00] |     0

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 23.80] |     1
(23.80, 27.60] |     0
(27.60, 31.40] |     0
(31.40, 35.20] |     0
(35.20, 39.00] |     0
(39.00, 79.00] |    19

Pooled UserDefined

   Interval      Count
----------------------
[10.00, 23.80] |     2
(23.80, 27.60] |     0
(27.60, 31.40] |     0
(31.40, 35.20] |     0
(35.20, 39.00] |     0
(39.00, 79.00] |    18

Note that the quintiles for the first period, by definition, contain all the values from that period, they do not bound the larger values in subsequent periods. Following the mapclassify policy, an additional class is added to contain all values in the pooled series.