mapclassify.JenksCaspallSampled¶

class mapclassify.JenksCaspallSampled(y, k=5, pct=0.1)[source]¶

Jenks Caspall Map Classification using a random sample.

Parameters:

ynumpy.array: \((n,1)\), values to classify.
kint (default 5): The number of classes required.
pctfloat (default 0.10): The percentage of \(n\) that should form the sample. If pct is specified such that \(n*pct > 1000\), then \(pct = 1000./n\).

Attributes:

ybnumpy.array: \((n,1)\), bin IDs for observations.
binsnumpy.array: \((k,1)\), the upper bounds of each class.
kint: The number of classes.
countsnumpy.array: \((k,1)\), the number of observations falling in each class.

Notes

This is intended for large \(n\) problems. The logic is to apply Jenks_Caspall to a random subset of the \(y\) space and then bin the complete vector \(y\) on the bins obtained from the subset. This would trade off some “accuracy” for a gain in speed.

Examples

>>> import mapclassify
>>> import numpy
>>> cal = mapclassify.load_example()
>>> numpy.random.seed(0)
>>> x = numpy.random.random(100000)
>>> jc = mapclassify.JenksCaspall(x)
>>> jcs = mapclassify.JenksCaspallSampled(x)
>>> jc.bins
array([0.20108144, 0.4025151 , 0.60396127, 0.80302249, 0.99997795])

>>> jcs.bins
array([0.19978245, 0.40793025, 0.59253555, 0.78241472, 0.99997795])

>>> jc.counts.tolist()
[20286, 19951, 20310, 19708, 19745]

>>> jcs.counts.tolist()
[20147, 20633, 18591, 18857, 21772]

# not for testing since we get different times on different hardware # just included for documentation of likely speed gains #>>> t1 = time.time(); jc = Jenks_Caspall(x); t2 = time.time() #>>> t1s = time.time(); jcs = Jenks_Caspall_Sampled(x); t2s = time.time() #>>> t2 - t1; t2s - t1s #1.8292930126190186 #0.061631917953491211

__init__(y, k=5, pct=0.1)[source]¶

Methods

`__init__`(y[, k, pct])
`find_bin`(x)	Sort input or inputs according to the current bin estimate.
`get_adcm`()	Absolute deviation around class median (ADCM).
`get_fmt`()
`get_gadf`()	Goodness of absolute deviation of fit.
`get_legend_classes`([fmt])	Format the strings for the classes on the legend.
`get_tss`()	Returns sum of squares over all class means.
`make`(args, *kwargs)	Configure and create a classifier that will consume data and produce classifications, given the configuration options specified by this function.
`plot`(gdf[, border_color, border_width, ...])	Plot a mapclassifier object.
`plot_histogram`([color, linecolor, ...])	Plot histogram of y with bin values superimposed
`plot_legendgram`(*[, ax, cmap, bins, inset, ...])	Plot a legendgram, which is a histogram with classification breaks.
`set_fmt`(fmt)
`table`()
`update`([y, inplace])	Add data or change classification parameters.

Attributes

fmt

update(y=None, inplace=False, **kwargs)[source]¶

Add data or change classification parameters.

Parameters:

ynumpy.array (default None): \((n,1)\), array of data to classify.
inplacebool (default False): Whether to conduct the update in place or to return a copy estimated from the additional specifications.
**kwargsdict: Additional parameters that are passed to the __init__ function of the class. For documentation, check the class constructor.