Overview of the mapclassify API

There are a number of ways to access the functionality in mapclassify

We first load the example dataset that we have seen earlier.

[1]:
from libpysal import examples
import geopandas as gpd
from mapclassify import classify
[2]:
pth = examples.get_path('columbus.shp')
gdf = gpd.read_file(pth)
y = gdf.HOVAL
gdf.head()
[2]:
AREA PERIMETER COLUMBUS_ COLUMBUS_I POLYID NEIG HOVAL INC CRIME OPEN ... DISCBD X Y NSA NSB EW CP THOUS NEIGNO geometry
0 0.309441 2.440629 2 5 1 5 80.467003 19.531 15.725980 2.850747 ... 5.03 38.799999 44.070000 1.0 1.0 1.0 0.0 1000.0 1005.0 POLYGON ((8.62413 14.23698, 8.55970 14.74245, ...
1 0.259329 2.236939 3 1 2 1 44.567001 21.232 18.801754 5.296720 ... 4.27 35.619999 42.380001 1.0 1.0 0.0 0.0 1000.0 1001.0 POLYGON ((8.25279 14.23694, 8.28276 14.22994, ...
2 0.192468 2.187547 4 6 3 6 26.350000 15.956 30.626781 4.534649 ... 3.89 39.820000 41.180000 1.0 1.0 1.0 0.0 1000.0 1006.0 POLYGON ((8.65331 14.00809, 8.81814 14.00205, ...
3 0.083841 1.427635 5 2 4 2 33.200001 4.477 32.387760 0.394427 ... 3.70 36.500000 40.520000 1.0 1.0 0.0 0.0 1000.0 1002.0 POLYGON ((8.45950 13.82035, 8.47341 13.83227, ...
4 0.488888 2.997133 6 7 5 7 23.225000 11.252 50.731510 0.405664 ... 2.83 40.009998 38.000000 1.0 1.0 1.0 0.0 1000.0 1007.0 POLYGON ((8.68527 13.63952, 8.67758 13.72221, ...

5 rows × 21 columns

Original API (< 2.4.0)

[3]:
import mapclassify

bp = mapclassify.BoxPlot(y)
bp
[3]:
BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5

Extended API (>= 2.40)

Note the original API is still available so this extension keeps backwards compatibility.

[4]:
bp = classify(y, 'box_plot')
bp
[4]:
BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5
[5]:
type(bp)
[5]:
mapclassify.classifiers.BoxPlot
[6]:
q5 = classify(y, 'quantiles', k=5)
q5
[6]:
Quantiles

   Interval      Count
----------------------
[17.90, 23.08] |    10
(23.08, 30.48] |    10
(30.48, 39.10] |     9
(39.10, 45.83] |    10
(45.83, 96.40] |    10

Robustness of the scheme argument

[7]:
classify(y, 'boxPlot')
[7]:
BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5
[8]:
classify(y, 'Boxplot')
[8]:
BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5
[9]:
classify(y, 'Box_plot')
[9]:
BoxPlot

   Interval      Count
----------------------
( -inf, -0.70] |     0
(-0.70, 25.70] |    13
(25.70, 33.50] |    12
(33.50, 43.30] |    12
(43.30, 69.70] |     7
(69.70, 96.40] |     5
[10]:
classify?
Signature:
classify(
    y,
    scheme,
    k=5,
    pct=[1, 10, 50, 90, 99, 100],
    pct_sampled=0.1,
    truncate=True,
    hinge=1.5,
    multiples=[-2, -1, 1, 2],
    mindiff=0,
    initial=100,
    bins=None,
)
Docstring:
Classify your data with `mapclassify.classify`
Note: Input parameters are dependent on classifier used.

Parameters
----------
y : array
    (n,1), values to classify
scheme : str
    pysal.mapclassify classification scheme
k : int, optional
    The number of classes. Default=5.
pct  : array, optional
    Percentiles used for classification with `percentiles`.
    Default=[1,10,50,90,99,100]
pct_sampled : float, optional
    The percentage of n that should form the sample
    (JenksCaspallSampled, FisherJenksSampled)
    If pct is specified such that n*pct > 1000, then pct = 1000./n
truncate : boolean, optional
    truncate pct_sampled in cases where pct * n > 1000., (Default True)
hinge : float, optional
    Multiplier for IQR when `BoxPlot` classifier used.
    Default=1.5.
multiples : array, optional
    The multiples of the standard deviation to add/subtract from
    the sample mean to define the bins using `std_mean`.
    Default=[-2,-1,1,2].
mindiff : float, optional
    The minimum difference between class breaks
    if using `maximum_breaks` classifier. Deafult =0.
initial : int
    Number of initial solutions to generate or number of runs
    when using `natural_breaks` or `max_p_classifier`.
    Default =100.
    Note: setting initial to 0 will result in the quickest
    calculation of bins.
bins : array, optional
    (k,1), upper bounds of classes (have to be monotically
    increasing) if using `user_defined` classifier.
    Default =None, Example =[20, max(y)].

Returns
-------
classifier : pysal.mapclassify.classifier instance
        Object containing bin ids for each observation (.yb),
        upper bounds of each class (.bins), number of classes (.k)
        and number of observations falling in each class (.counts)

Note: Supported classifiers include: quantiles, box_plot, euqal_interval,
    fisher_jenks, headtail_breaks, jenks_caspall, jenks_caspall_forced,
    max_p_classifier, maximum_breaks, natural_breaks, percentiles, std_mean,
    user_defined


Examples
--------
Imports

>>> from libpysal import examples
>>> import geopandas as gpd
>>> from mapclassify import classify

Load Example Data

>>> link_to_data = examples.get_path('columbus.shp')
>>> gdf = gpd.read_file(link_to_data)
>>> x = gdf['HOVAL'].values

Classify values by quantiles

>>> quantiles = classify(x, 'quantiles')

Classify values by box_plot and set hinge to 2

>>> box_plot = classify(x, 'box_plot', hinge=2)
File:      ~/Dropbox/p/pysal/src/subpackages/mapclassify/mapclassify/_classify_API.py
Type:      function

[ ]: