{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pooled Classification\n", "\n", "A common workflow with longitudinal spatial data is to apply the same classification scheme to an attribute over different time periods. More specifically, one would like to keep the class breaks the same over each period and examine how the mass of the distribution changes over these classes in the different periods.\n", "\n", "The `Pooled` classifier supports this workflow." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.941529Z", "start_time": "2022-11-05T19:18:40.603589Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:26.436870Z", "iopub.status.busy": "2025-07-11T20:09:26.436634Z", "iopub.status.idle": "2025-07-11T20:09:27.889059Z", "shell.execute_reply": "2025-07-11T20:09:27.888847Z", "shell.execute_reply.started": "2025-07-11T20:09:26.436845Z" } }, "outputs": [ { "data": { "text/plain": [ "'2.9.1.dev9+gde74d6f.d20250614'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy\n", "\n", "import mapclassify\n", "\n", "mapclassify.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample Data\n", "We construct a synthetic dataset composed of 20 cross-sectional units at three time points. Here the mean of the series is increasing over time." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.949728Z", "start_time": "2022-11-05T19:18:41.945010Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.889460Z", "iopub.status.busy": "2025-07-11T20:09:27.889336Z", "iopub.status.idle": "2025-07-11T20:09:27.891484Z", "shell.execute_reply": "2025-07-11T20:09:27.891297Z", "shell.execute_reply.started": "2025-07-11T20:09:27.889453Z" } }, "outputs": [ { "data": { "text/plain": [ "(20, 3)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = 20\n", "data = numpy.array([numpy.arange(n) + i * n for i in range(1, 4)]).T\n", "data.shape" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.955945Z", "start_time": "2022-11-05T19:18:41.951635Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.891751Z", "iopub.status.busy": "2025-07-11T20:09:27.891690Z", "iopub.status.idle": "2025-07-11T20:09:27.893577Z", "shell.execute_reply": "2025-07-11T20:09:27.893379Z", "shell.execute_reply.started": "2025-07-11T20:09:27.891745Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[20, 40, 60],\n", " [21, 41, 61],\n", " [22, 42, 62],\n", " [23, 43, 63],\n", " [24, 44, 64],\n", " [25, 45, 65],\n", " [26, 46, 66],\n", " [27, 47, 67],\n", " [28, 48, 68],\n", " [29, 49, 69],\n", " [30, 50, 70],\n", " [31, 51, 71],\n", " [32, 52, 72],\n", " [33, 53, 73],\n", " [34, 54, 74],\n", " [35, 55, 75],\n", " [36, 56, 76],\n", " [37, 57, 77],\n", " [38, 58, 78],\n", " [39, 59, 79]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Default: Quintiles\n", "The default is to apply a [vec](https://en.wikipedia.org/wiki/Vectorization_(mathematics)) operator to the data matrix and treat the observations as a single collection. Here the quantiles of the pooled data are obtained." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.965023Z", "start_time": "2022-11-05T19:18:41.957991Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.893896Z", "iopub.status.busy": "2025-07-11T20:09:27.893838Z", "iopub.status.idle": "2025-07-11T20:09:27.896375Z", "shell.execute_reply": "2025-07-11T20:09:27.896222Z", "shell.execute_reply.started": "2025-07-11T20:09:27.893889Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 31.80] | 12\n", "(31.80, 43.60] | 8\n", "(43.60, 55.40] | 0\n", "(55.40, 67.20] | 0\n", "(67.20, 79.00] | 0\n", "\n", "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 31.80] | 0\n", "(31.80, 43.60] | 4\n", "(43.60, 55.40] | 12\n", "(55.40, 67.20] | 4\n", "(67.20, 79.00] | 0\n", "\n", "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 31.80] | 0\n", "(31.80, 43.60] | 0\n", "(43.60, 55.40] | 0\n", "(55.40, 67.20] | 8\n", "(67.20, 79.00] | 12" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mapclassify.Pooled(data)\n", "res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the class definitions are constant across the periods." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.971895Z", "start_time": "2022-11-05T19:18:41.967042Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.896709Z", "iopub.status.busy": "2025-07-11T20:09:27.896648Z", "iopub.status.idle": "2025-07-11T20:09:27.898438Z", "shell.execute_reply": "2025-07-11T20:09:27.898270Z", "shell.execute_reply.started": "2025-07-11T20:09:27.896702Z" } }, "outputs": [], "source": [ "res = mapclassify.Pooled(data, k=4)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.978331Z", "start_time": "2022-11-05T19:18:41.974160Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.899550Z", "iopub.status.busy": "2025-07-11T20:09:27.899475Z", "iopub.status.idle": "2025-07-11T20:09:27.901214Z", "shell.execute_reply": "2025-07-11T20:09:27.901053Z", "shell.execute_reply.started": "2025-07-11T20:09:27.899543Z" } }, "outputs": [ { "data": { "text/plain": [ "array([15, 5, 0, 0])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.col_classifiers[0].counts" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.984738Z", "start_time": "2022-11-05T19:18:41.980393Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.901580Z", "iopub.status.busy": "2025-07-11T20:09:27.901470Z", "iopub.status.idle": "2025-07-11T20:09:27.903176Z", "shell.execute_reply": "2025-07-11T20:09:27.903011Z", "shell.execute_reply.started": "2025-07-11T20:09:27.901574Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 0, 5, 15])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.col_classifiers[-1].counts" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:41.990334Z", "start_time": "2022-11-05T19:18:41.986702Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.903491Z", "iopub.status.busy": "2025-07-11T20:09:27.903432Z", "iopub.status.idle": "2025-07-11T20:09:27.905163Z", "shell.execute_reply": "2025-07-11T20:09:27.904974Z", "shell.execute_reply.started": "2025-07-11T20:09:27.903484Z" } }, "outputs": [ { "data": { "text/plain": [ "array([15, 15, 15, 15])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.global_classifier.counts" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.001119Z", "start_time": "2022-11-05T19:18:41.997311Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.905412Z", "iopub.status.busy": "2025-07-11T20:09:27.905355Z", "iopub.status.idle": "2025-07-11T20:09:27.907365Z", "shell.execute_reply": "2025-07-11T20:09:27.907172Z", "shell.execute_reply.started": "2025-07-11T20:09:27.905405Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 34.75] | 15\n", "(34.75, 49.50] | 5\n", "(49.50, 64.25] | 0\n", "(64.25, 79.00] | 0\n", "\n", "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 34.75] | 0\n", "(34.75, 49.50] | 10\n", "(49.50, 64.25] | 10\n", "(64.25, 79.00] | 0\n", "\n", "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 34.75] | 0\n", "(34.75, 49.50] | 0\n", "(49.50, 64.25] | 5\n", "(64.25, 79.00] | 15" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract the pooled classification objects for each column." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.007476Z", "start_time": "2022-11-05T19:18:42.003714Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.907807Z", "iopub.status.busy": "2025-07-11T20:09:27.907737Z", "iopub.status.idle": "2025-07-11T20:09:27.910340Z", "shell.execute_reply": "2025-07-11T20:09:27.910143Z", "shell.execute_reply.started": "2025-07-11T20:09:27.907800Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 34.75] | 15\n", "(34.75, 49.50] | 5\n", "(49.50, 64.25] | 0\n", "(64.25, 79.00] | 0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c0, c1, c2 = res.col_classifiers\n", "c0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare to the unrestricted classifier for the first column..." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.016243Z", "start_time": "2022-11-05T19:18:42.010510Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.910677Z", "iopub.status.busy": "2025-07-11T20:09:27.910620Z", "iopub.status.idle": "2025-07-11T20:09:27.912837Z", "shell.execute_reply": "2025-07-11T20:09:27.912653Z", "shell.execute_reply.started": "2025-07-11T20:09:27.910671Z" } }, "outputs": [ { "data": { "text/plain": [ "Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 24.75] | 5\n", "(24.75, 29.50] | 5\n", "(29.50, 34.25] | 5\n", "(34.25, 39.00] | 5" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mapclassify.Quantiles(c0.y, k=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... and the last column comparisions." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.024230Z", "start_time": "2022-11-05T19:18:42.018980Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.913187Z", "iopub.status.busy": "2025-07-11T20:09:27.913134Z", "iopub.status.idle": "2025-07-11T20:09:27.915033Z", "shell.execute_reply": "2025-07-11T20:09:27.914840Z", "shell.execute_reply.started": "2025-07-11T20:09:27.913181Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 34.75] | 0\n", "(34.75, 49.50] | 0\n", "(49.50, 64.25] | 5\n", "(64.25, 79.00] | 15" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c2" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.034362Z", "start_time": "2022-11-05T19:18:42.027227Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.915302Z", "iopub.status.busy": "2025-07-11T20:09:27.915242Z", "iopub.status.idle": "2025-07-11T20:09:27.917452Z", "shell.execute_reply": "2025-07-11T20:09:27.917276Z", "shell.execute_reply.started": "2025-07-11T20:09:27.915296Z" } }, "outputs": [ { "data": { "text/plain": [ "Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[60.00, 64.75] | 5\n", "(64.75, 69.50] | 5\n", "(69.50, 74.25] | 5\n", "(74.25, 79.00] | 5" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mapclassify.Quantiles(c2.y, k=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-default classifier: BoxPlot" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.046090Z", "start_time": "2022-11-05T19:18:42.037414Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.917751Z", "iopub.status.busy": "2025-07-11T20:09:27.917691Z", "iopub.status.idle": "2025-07-11T20:09:27.920192Z", "shell.execute_reply": "2025-07-11T20:09:27.920005Z", "shell.execute_reply.started": "2025-07-11T20:09:27.917744Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled BoxPlot\n", "\n", " Interval Count\n", "------------------------\n", "( -inf, -9.50] | 0\n", "( -9.50, 34.75] | 15\n", "( 34.75, 49.50] | 5\n", "( 49.50, 64.25] | 0\n", "( 64.25, 108.50] | 0\n", "\n", "Pooled BoxPlot\n", "\n", " Interval Count\n", "------------------------\n", "( -inf, -9.50] | 0\n", "( -9.50, 34.75] | 0\n", "( 34.75, 49.50] | 10\n", "( 49.50, 64.25] | 10\n", "( 64.25, 108.50] | 0\n", "\n", "Pooled BoxPlot\n", "\n", " Interval Count\n", "------------------------\n", "( -inf, -9.50] | 0\n", "( -9.50, 34.75] | 0\n", "( 34.75, 49.50] | 0\n", "( 49.50, 64.25] | 5\n", "( 64.25, 108.50] | 15" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mapclassify.Pooled(data, classifier=\"BoxPlot\", hinge=1.5)\n", "res" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.054064Z", "start_time": "2022-11-05T19:18:42.048325Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.920544Z", "iopub.status.busy": "2025-07-11T20:09:27.920476Z", "iopub.status.idle": "2025-07-11T20:09:27.922486Z", "shell.execute_reply": "2025-07-11T20:09:27.922281Z", "shell.execute_reply.started": "2025-07-11T20:09:27.920537Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ -9.5 , 34.75, 49.5 , 64.25, 108.5 ])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res.col_classifiers[0].bins" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.060428Z", "start_time": "2022-11-05T19:18:42.056889Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.922812Z", "iopub.status.busy": "2025-07-11T20:09:27.922756Z", "iopub.status.idle": "2025-07-11T20:09:27.924242Z", "shell.execute_reply": "2025-07-11T20:09:27.924055Z", "shell.execute_reply.started": "2025-07-11T20:09:27.922806Z" } }, "outputs": [], "source": [ "c0, c1, c2 = res.col_classifiers" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.067574Z", "start_time": "2022-11-05T19:18:42.062869Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.924636Z", "iopub.status.busy": "2025-07-11T20:09:27.924571Z", "iopub.status.idle": "2025-07-11T20:09:27.926389Z", "shell.execute_reply": "2025-07-11T20:09:27.926225Z", "shell.execute_reply.started": "2025-07-11T20:09:27.924630Z" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c0.yb" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.076215Z", "start_time": "2022-11-05T19:18:42.070735Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.926767Z", "iopub.status.busy": "2025-07-11T20:09:27.926702Z", "iopub.status.idle": "2025-07-11T20:09:27.928352Z", "shell.execute_reply": "2025-07-11T20:09:27.928166Z", "shell.execute_reply.started": "2025-07-11T20:09:27.926761Z" } }, "outputs": [], "source": [ "c00 = mapclassify.BoxPlot(c0.y, hinge=3)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.085022Z", "start_time": "2022-11-05T19:18:42.078938Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.928735Z", "iopub.status.busy": "2025-07-11T20:09:27.928618Z", "iopub.status.idle": "2025-07-11T20:09:27.930507Z", "shell.execute_reply": "2025-07-11T20:09:27.930321Z", "shell.execute_reply.started": "2025-07-11T20:09:27.928728Z" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c00.yb" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.093521Z", "start_time": "2022-11-05T19:18:42.088235Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.930763Z", "iopub.status.busy": "2025-07-11T20:09:27.930695Z", "iopub.status.idle": "2025-07-11T20:09:27.933175Z", "shell.execute_reply": "2025-07-11T20:09:27.932986Z", "shell.execute_reply.started": "2025-07-11T20:09:27.930755Z" } }, "outputs": [ { "data": { "text/plain": [ "BoxPlot\n", "\n", " Interval Count\n", "----------------------\n", "( -inf, -3.75] | 0\n", "(-3.75, 24.75] | 5\n", "(24.75, 29.50] | 5\n", "(29.50, 34.25] | 5\n", "(34.25, 62.75] | 5" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c00" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.100363Z", "start_time": "2022-11-05T19:18:42.095608Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.935748Z", "iopub.status.busy": "2025-07-11T20:09:27.935658Z", "iopub.status.idle": "2025-07-11T20:09:27.938444Z", "shell.execute_reply": "2025-07-11T20:09:27.938219Z", "shell.execute_reply.started": "2025-07-11T20:09:27.935741Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled BoxPlot\n", "\n", " Interval Count\n", "------------------------\n", "( -inf, -9.50] | 0\n", "( -9.50, 34.75] | 15\n", "( 34.75, 49.50] | 5\n", "( 49.50, 64.25] | 0\n", "( 64.25, 108.50] | 0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-default classifier: FisherJenks" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.111872Z", "start_time": "2022-11-05T19:18:42.103537Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.938976Z", "iopub.status.busy": "2025-07-11T20:09:27.938786Z", "iopub.status.idle": "2025-07-11T20:09:27.941939Z", "shell.execute_reply": "2025-07-11T20:09:27.941747Z", "shell.execute_reply.started": "2025-07-11T20:09:27.938965Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled FisherJenks\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 31.00] | 12\n", "(31.00, 43.00] | 8\n", "(43.00, 55.00] | 0\n", "(55.00, 67.00] | 0\n", "(67.00, 79.00] | 0\n", "\n", "Pooled FisherJenks\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 31.00] | 0\n", "(31.00, 43.00] | 4\n", "(43.00, 55.00] | 12\n", "(55.00, 67.00] | 4\n", "(67.00, 79.00] | 0\n", "\n", "Pooled FisherJenks\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 31.00] | 0\n", "(31.00, 43.00] | 0\n", "(43.00, 55.00] | 0\n", "(55.00, 67.00] | 8\n", "(67.00, 79.00] | 12" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mapclassify.Pooled(data, classifier=\"FisherJenks\", k=5)\n", "res" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.119626Z", "start_time": "2022-11-05T19:18:42.113905Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.942365Z", "iopub.status.busy": "2025-07-11T20:09:27.942291Z", "iopub.status.idle": "2025-07-11T20:09:27.944501Z", "shell.execute_reply": "2025-07-11T20:09:27.944319Z", "shell.execute_reply.started": "2025-07-11T20:09:27.942358Z" } }, "outputs": [ { "data": { "text/plain": [ "FisherJenks\n", "\n", " Interval Count\n", "----------------------\n", "[20.00, 23.00] | 4\n", "(23.00, 27.00] | 4\n", "(27.00, 31.00] | 4\n", "(31.00, 35.00] | 4\n", "(35.00, 39.00] | 4" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c0, c1, c2 = res.col_classifiers\n", "mapclassify.FisherJenks(c0.y, k=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-default classifier: MaximumBreaks\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.127177Z", "start_time": "2022-11-05T19:18:42.121621Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.944780Z", "iopub.status.busy": "2025-07-11T20:09:27.944718Z", "iopub.status.idle": "2025-07-11T20:09:27.946888Z", "shell.execute_reply": "2025-07-11T20:09:27.946706Z", "shell.execute_reply.started": "2025-07-11T20:09:27.944773Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[20, 40, 60],\n", " [10, 10, 10],\n", " [22, 42, 62],\n", " [23, 43, 63],\n", " [24, 44, 64],\n", " [25, 45, 65],\n", " [26, 46, 66],\n", " [27, 47, 67],\n", " [28, 48, 68],\n", " [29, 49, 10],\n", " [30, 50, 70],\n", " [31, 51, 71],\n", " [32, 52, 72],\n", " [33, 53, 73],\n", " [34, 54, 74],\n", " [35, 55, 75],\n", " [36, 56, 76],\n", " [37, 57, 77],\n", " [38, 58, 78],\n", " [39, 59, 79]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[1, 0] = 10\n", "data[1, 1] = 10\n", "data[1, 2] = 10\n", "data[9, 2] = 10\n", "data" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.136098Z", "start_time": "2022-11-05T19:18:42.128885Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.947183Z", "iopub.status.busy": "2025-07-11T20:09:27.947120Z", "iopub.status.idle": "2025-07-11T20:09:27.949682Z", "shell.execute_reply": "2025-07-11T20:09:27.949473Z", "shell.execute_reply.started": "2025-07-11T20:09:27.947177Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled MaximumBreaks\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 15.00] | 1\n", "(15.00, 21.00] | 1\n", "(21.00, 41.00] | 18\n", "(41.00, 61.00] | 0\n", "(61.00, 79.00] | 0\n", "\n", "Pooled MaximumBreaks\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 15.00] | 1\n", "(15.00, 21.00] | 0\n", "(21.00, 41.00] | 1\n", "(41.00, 61.00] | 18\n", "(61.00, 79.00] | 0\n", "\n", "Pooled MaximumBreaks\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 15.00] | 2\n", "(15.00, 21.00] | 0\n", "(21.00, 41.00] | 0\n", "(41.00, 61.00] | 1\n", "(61.00, 79.00] | 17" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mapclassify.Pooled(data, classifier=\"MaximumBreaks\", k=5)\n", "res" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.143133Z", "start_time": "2022-11-05T19:18:42.138961Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.950163Z", "iopub.status.busy": "2025-07-11T20:09:27.950044Z", "iopub.status.idle": "2025-07-11T20:09:27.951905Z", "shell.execute_reply": "2025-07-11T20:09:27.951722Z", "shell.execute_reply.started": "2025-07-11T20:09:27.950156Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled MaximumBreaks\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 15.00] | 1\n", "(15.00, 21.00] | 1\n", "(21.00, 41.00] | 18\n", "(41.00, 61.00] | 0\n", "(61.00, 79.00] | 0" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c0, c1, c2 = res.col_classifiers\n", "c0" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.149335Z", "start_time": "2022-11-05T19:18:42.144915Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.952286Z", "iopub.status.busy": "2025-07-11T20:09:27.952216Z", "iopub.status.idle": "2025-07-11T20:09:27.953962Z", "shell.execute_reply": "2025-07-11T20:09:27.953796Z", "shell.execute_reply.started": "2025-07-11T20:09:27.952279Z" } }, "outputs": [ { "data": { "text/plain": [ "array([20, 10, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,\n", " 37, 38, 39])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c0.y" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.154525Z", "start_time": "2022-11-05T19:18:42.151983Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.954242Z", "iopub.status.busy": "2025-07-11T20:09:27.954185Z", "iopub.status.idle": "2025-07-11T20:09:27.955631Z", "shell.execute_reply": "2025-07-11T20:09:27.955443Z", "shell.execute_reply.started": "2025-07-11T20:09:27.954235Z" } }, "outputs": [], "source": [ "import warnings" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.163717Z", "start_time": "2022-11-05T19:18:42.156794Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.955900Z", "iopub.status.busy": "2025-07-11T20:09:27.955844Z", "iopub.status.idle": "2025-07-11T20:09:27.958018Z", "shell.execute_reply": "2025-07-11T20:09:27.957842Z", "shell.execute_reply.started": "2025-07-11T20:09:27.955893Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Insufficient number of unique diffs. Breaks are random.\n" ] } ], "source": [ "with warnings.catch_warnings():\n", " warnings.filterwarnings(\"error\")\n", " try:\n", " mapclassify.MaximumBreaks(c0.y, k=5)\n", " except UserWarning as e:\n", " print(e)\n", " with warnings.catch_warnings():\n", " warnings.filterwarnings(\"ignore\")\n", " mapclassify.MaximumBreaks(c0.y, k=5)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.174401Z", "start_time": "2022-11-05T19:18:42.166025Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.958377Z", "iopub.status.busy": "2025-07-11T20:09:27.958313Z", "iopub.status.idle": "2025-07-11T20:09:27.961102Z", "shell.execute_reply": "2025-07-11T20:09:27.960907Z", "shell.execute_reply.started": "2025-07-11T20:09:27.958370Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 20\n", "(62.80, 66.60] | 0\n", "(66.60, 71.40] | 0\n", "(71.40, 75.20] | 0\n", "(75.20, 79.00] | 0\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 20\n", "(62.80, 66.60] | 0\n", "(66.60, 71.40] | 0\n", "(71.40, 75.20] | 0\n", "(75.20, 79.00] | 0\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 4\n", "(62.80, 66.60] | 4\n", "(66.60, 71.40] | 4\n", "(71.40, 75.20] | 4\n", "(75.20, 79.00] | 4" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = mapclassify.Pooled(\n", " data, classifier=\"UserDefined\", bins=mapclassify.Quantiles(data[:, -1]).bins\n", ")\n", "res" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.181788Z", "start_time": "2022-11-05T19:18:42.176573Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.961595Z", "iopub.status.busy": "2025-07-11T20:09:27.961527Z", "iopub.status.idle": "2025-07-11T20:09:27.964196Z", "shell.execute_reply": "2025-07-11T20:09:27.964003Z", "shell.execute_reply.started": "2025-07-11T20:09:27.961587Z" } }, "outputs": [ { "data": { "text/plain": [ "Quantiles\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 4\n", "(62.80, 66.60] | 4\n", "(66.60, 71.40] | 4\n", "(71.40, 75.20] | 4\n", "(75.20, 79.00] | 4" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mapclassify.Quantiles(data[:, -1])" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.188254Z", "start_time": "2022-11-05T19:18:42.183388Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.964521Z", "iopub.status.busy": "2025-07-11T20:09:27.964460Z", "iopub.status.idle": "2025-07-11T20:09:27.966457Z", "shell.execute_reply": "2025-07-11T20:09:27.966222Z", "shell.execute_reply.started": "2025-07-11T20:09:27.964514Z" } }, "outputs": [ { "data": { "text/plain": [ "array([60, 10, 62, 63, 64, 65, 66, 67, 68, 10, 70, 71, 72, 73, 74, 75, 76,\n", " 77, 78, 79])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[:, -1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pinning the pooling\n", "\n", "Another option is to specify a specific subperiod as the definition for the classes in the pooling.\n", "\n", "### Pinning to the last period\n", "\n", "As an example, we can use the quintles from the third period to defined the pooled classifier:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.198019Z", "start_time": "2022-11-05T19:18:42.190998Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.966763Z", "iopub.status.busy": "2025-07-11T20:09:27.966702Z", "iopub.status.idle": "2025-07-11T20:09:27.969472Z", "shell.execute_reply": "2025-07-11T20:09:27.969276Z", "shell.execute_reply.started": "2025-07-11T20:09:27.966756Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 20\n", "(62.80, 66.60] | 0\n", "(66.60, 71.40] | 0\n", "(71.40, 75.20] | 0\n", "(75.20, 79.00] | 0\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 20\n", "(62.80, 66.60] | 0\n", "(66.60, 71.40] | 0\n", "(71.40, 75.20] | 0\n", "(75.20, 79.00] | 0\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 4\n", "(62.80, 66.60] | 4\n", "(66.60, 71.40] | 4\n", "(71.40, 75.20] | 4\n", "(75.20, 79.00] | 4" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pinned = mapclassify.Pooled(\n", " data, classifier=\"UserDefined\", bins=mapclassify.Quantiles(data[:, -1]).bins\n", ")\n", "pinned" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.206186Z", "start_time": "2022-11-05T19:18:42.200535Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.969874Z", "iopub.status.busy": "2025-07-11T20:09:27.969800Z", "iopub.status.idle": "2025-07-11T20:09:27.971653Z", "shell.execute_reply": "2025-07-11T20:09:27.971467Z", "shell.execute_reply.started": "2025-07-11T20:09:27.969867Z" } }, "outputs": [ { "data": { "text/plain": [ "UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 62.80] | 44\n", "(62.80, 66.60] | 4\n", "(66.60, 71.40] | 4\n", "(71.40, 75.20] | 4\n", "(75.20, 79.00] | 4" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pinned.global_classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pinning to the first period" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2022-11-05T19:18:42.215909Z", "start_time": "2022-11-05T19:18:42.207832Z" }, "execution": { "iopub.execute_input": "2025-07-11T20:09:27.972019Z", "iopub.status.busy": "2025-07-11T20:09:27.971958Z", "iopub.status.idle": "2025-07-11T20:09:27.974710Z", "shell.execute_reply": "2025-07-11T20:09:27.974519Z", "shell.execute_reply.started": "2025-07-11T20:09:27.972012Z" } }, "outputs": [ { "data": { "text/plain": [ "Pooled Classifier\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 23.80] | 4\n", "(23.80, 27.60] | 4\n", "(27.60, 31.40] | 4\n", "(31.40, 35.20] | 4\n", "(35.20, 39.00] | 4\n", "(39.00, 79.00] | 0\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 23.80] | 1\n", "(23.80, 27.60] | 0\n", "(27.60, 31.40] | 0\n", "(31.40, 35.20] | 0\n", "(35.20, 39.00] | 0\n", "(39.00, 79.00] | 19\n", "\n", "Pooled UserDefined\n", "\n", " Interval Count\n", "----------------------\n", "[10.00, 23.80] | 2\n", "(23.80, 27.60] | 0\n", "(27.60, 31.40] | 0\n", "(31.40, 35.20] | 0\n", "(35.20, 39.00] | 0\n", "(39.00, 79.00] | 18" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pinned = mapclassify.Pooled(\n", " data, classifier=\"UserDefined\", bins=mapclassify.Quantiles(data[:, 0]).bins\n", ")\n", "pinned" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the quintiles for the first period, by definition, contain all the values from that period, they do not bound the larger values in subsequent periods. Following the [mapclassify policy](https://github.com/pysal/mapclassify/blob/a7770fb98bf945dad3c62ccf2c0f8b53abb1774a/mapclassify/classifiers.py#L589), an additional class is added to contain all values in the pooled series." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 4 }