mapclassify.JenksCaspallSampled

class mapclassify.JenksCaspallSampled(y, k=5, pct=0.1)[source]

Jenks Caspall Map Classification using a random sample.

Parameters:
ynumpy.array

\((n,1)\), values to classify.

kint (default 5)

The number of classes required.

pctfloat (default 0.10)

The percentage of \(n\) that should form the sample. If pct is specified such that \(n*pct > 1000\), then \(pct = 1000./n\).

Attributes:
ybnumpy.array

\((n,1)\), bin IDs for observations.

binsnumpy.array

\((k,1)\), the upper bounds of each class.

kint

The number of classes.

countsnumpy.array

\((k,1)\), the number of observations falling in each class.

Notes

This is intended for large \(n\) problems. The logic is to apply Jenks_Caspall to a random subset of the \(y\) space and then bin the complete vector \(y\) on the bins obtained from the subset. This would trade off some “accuracy” for a gain in speed.

Examples

>>> import mapclassify
>>> import numpy
>>> cal = mapclassify.load_example()
>>> numpy.random.seed(0)
>>> x = numpy.random.random(100000)
>>> jc = mapclassify.JenksCaspall(x)
>>> jcs = mapclassify.JenksCaspallSampled(x)
>>> jc.bins
array([0.20108144, 0.4025151 , 0.60396127, 0.80302249, 0.99997795])
>>> jcs.bins
array([0.19978245, 0.40793025, 0.59253555, 0.78241472, 0.99997795])
>>> jc.counts.tolist()
[20286, 19951, 20310, 19708, 19745]
>>> jcs.counts.tolist()
[20147, 20633, 18591, 18857, 21772]

# not for testing since we get different times on different hardware # just included for documentation of likely speed gains #>>> t1 = time.time(); jc = Jenks_Caspall(x); t2 = time.time() #>>> t1s = time.time(); jcs = Jenks_Caspall_Sampled(x); t2s = time.time() #>>> t2 - t1; t2s - t1s #1.8292930126190186 #0.061631917953491211

__init__(y, k=5, pct=0.1)[source]

Methods

__init__(y[, k, pct])

find_bin(x)

Sort input or inputs according to the current bin estimate.

get_adcm()

Absolute deviation around class median (ADCM).

get_fmt()

get_gadf()

Goodness of absolute deviation of fit.

get_legend_classes([fmt])

Format the strings for the classes on the legend.

get_tss()

Returns sum of squares over all class means.

make(*args, **kwargs)

Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.

plot(gdf[, border_color, border_width, ...])

Plot a mapclassifier object.

plot_histogram([color, linecolor, ...])

Plot histogram of y with bin values superimposed

set_fmt(fmt)

table()

update([y, inplace])

Add data or change classification parameters.

Attributes

fmt

update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
ynumpy.array (default None)

\((n,1)\), array of data to classify.

inplacebool (default False)

Whether to conduct the update in place or to return a copy estimated from the additional specifications.

**kwargsdict

Additional parameters that are passed to the __init__ function of the class. For documentation, check the class constructor.