# mapclassify.JenksCaspallSampled¶

class mapclassify.JenksCaspallSampled(y, k=5, pct=0.1)[source]

Jenks Caspall Map Classification using a random sample.

Parameters:
ynumpy.array

$$(n,1)$$, values to classify.

kint (default 5)

The number of classes required.

pctfloat (default 0.10)

The percentage of $$n$$ that should form the sample. If pct is specified such that $$n*pct > 1000$$, then $$pct = 1000./n$$.

Notes

This is intended for large $$n$$ problems. The logic is to apply Jenks_Caspall to a random subset of the $$y$$ space and then bin the complete vector $$y$$ on the bins obtained from the subset. This would trade off some “accuracy” for a gain in speed.

Examples

>>> import mapclassify
>>> import numpy
>>> numpy.random.seed(0)
>>> x = numpy.random.random(100000)
>>> jc = mapclassify.JenksCaspall(x)
>>> jcs = mapclassify.JenksCaspallSampled(x)
>>> jc.bins
array([0.20108144, 0.4025151 , 0.60396127, 0.80302249, 0.99997795])
>>> jcs.bins
array([0.19978245, 0.40793025, 0.59253555, 0.78241472, 0.99997795])
>>> jc.counts.tolist()
[20286, 19951, 20310, 19708, 19745]
>>> jcs.counts.tolist()
[20147, 20633, 18591, 18857, 21772]

# not for testing since we get different times on different hardware # just included for documentation of likely speed gains #>>> t1 = time.time(); jc = Jenks_Caspall(x); t2 = time.time() #>>> t1s = time.time(); jcs = Jenks_Caspall_Sampled(x); t2s = time.time() #>>> t2 - t1; t2s - t1s #1.8292930126190186 #0.061631917953491211

Attributes:
ybnumpy.array

$$(n,1)$$, bin IDs for observations.

binsnumpy.array

$$(k,1)$$, the upper bounds of each class.

kint

The number of classes.

countsnumpy.array

$$(k,1)$$, the number of observations falling in each class.

__init__(y, k=5, pct=0.1)[source]

Methods

 __init__(y[, k, pct]) find_bin(x) Sort input or inputs according to the current bin estimate. get_adcm() Absolute deviation around class median (ADCM). get_fmt() get_gadf() Goodness of absolute deviation of fit. get_legend_classes([fmt]) Format the strings for the classes on the legend. get_tss() Returns sum of squares over all class means. make(*args, **kwargs) Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function. plot(gdf[, border_color, border_width, ...]) Plot a mapclassifier object. plot_histogram([color, linecolor, ...]) Plot histogram of y with bin values superimposed set_fmt(fmt) table() update([y, inplace]) Add data or change classification parameters.

Attributes

 fmt
update(y=None, inplace=False, **kwargs)[source]

Add data or change classification parameters.

Parameters:
ynumpy.array (default None)

$$(n,1)$$, array of data to classify.

inplacebool (default False)

Whether to conduct the update in place or to return a copy estimated from the additional specifications.

**kwargsdict

Additional parameters that are passed to the __init__ function of the class. For documentation, check the class constructor.