pointpattern

Planar Point Patterns in PySAL

Author: Serge Rey sjsrey@gmail.com and Wei Kang weikang9009@gmail.com

Introduction

This notebook introduces the basic PointPattern class in PySAL and covers the following:

What is a point pattern?
Creating Point Patterns
Atributes of Point Patterns
Intensity Estimates
Next steps

What is a point pattern?

We introduce basic terminology here and point the interested reader to more detailed references on the underlying theory of the statistical analysis of point patterns.

Points and Event Points

To start we consider a series of point locations, $(s_1, s_2, \ldots, s_n)$ in a study region $\Re$. We limit our focus here to a two-dimensional space so that $s_j = (x_j, y_j)$ is the spatial coordinate pair for point location $j$.

We will be interested in two different types of points.

Event Points

Event Points are locations where something of interest has occurred. The term event is very general here and could be used to represent a wide variety of phenomena. Some examples include:

among many others.

It is important to recognize that in the statistical analysis of point patterns the interest extends beyond the observed point pattern at hand. The observed patterns are viewed as realizations from some underlying spatial stochastic process.

Arbitrary Points

The second type of point we consider are those locations where the phenomena of interest has not been observed. These go by various names such as "empty space" or "regular" points, and at first glance might seem less interesting to a spatial analayst. However, these types of points play a central role in a class of point pattern methods that we explore below.

Point Pattern Analysis

The analysis of event points focuses on a number of different characteristics of the collective spatial pattern that is observed. Often the pattern is jugded against the hypothesis of complete spatial randomness (CSR). That is, one assumes that the point events arise independently of one another and with constant probability across $\Re$, loosely speaking.

Of course, many of the empirical point patterns we encounter do not appear to be generated from such a simple stochastic process. The depatures from CSR can be due to two types of effects.

First order effects

For a point process, the first-order properties pertain to the intensity of the process across space. Whether and how the intensity of the point pattern varies within our study region are questions that assume center stage. Such variation in the itensity of the pattern of, say, addresses of individuals with a certain type of non-infectious disease may reflect the underlying population density. In other words, although the point pattern of disease cases may display variation in intensity in our study region, and thus violate the constant probability of an event condition, that spatial drift in the pattern intensity could be driven by an underlying covariate.

Second order effects

The second channel by which departures from CSR can arise is through interaction and dependence between events in space. The canonical example being contagious diseases whereby the presence of an infected individual increases the probability of subsequent additional cases nearby.

When a pattern departs from expectation under CSR, this is suggestive that the underlying process may have some spatial structure that merits further investigation. Thus methods for detection of deviations from CSR and testing for alternative processes have given rise to a large literature in point pattern statistics.

Methods of Point Pattern Analysis in PySAL

The points module in PySAL implements basic methods of point pattern analysis organized into the following groups:

Point Processing
Centrography and Visualization
Quadrat Based Methods
Distance Based Methods

In the remainder of this notebook we shall focus on point processing.

import libpysal as ps
import numpy as np
from pointpats import PointPattern

Creating Point Patterns

From lists

We can build a point pattern by using Python lists of coordinate pairs $(s_0, s_1,\ldots, s_m)$ as follows:

points = [[66.22, 32.54], [22.52, 22.39], [31.01, 81.21],
          [9.47, 31.02],  [30.78, 60.10], [75.21, 58.93],
          [79.26,  7.68], [8.23, 39.93],  [98.73, 77.17],
          [89.78, 42.53], [65.19, 92.08], [54.46, 8.48]]
p1 = PointPattern(points)

p1.mbb

array([ 8.23,  7.68, 98.73, 92.08])

Thus $s_0 = (66.22, 32.54), \ s_{11}=(54.46, 8.48)$.

p1.summary()

Point Pattern
12 points
Bounding rectangle [(8.23,7.68), (98.73,92.08)]
Area of window: 7638.200000000002
Intensity estimate for window: 0.0015710507711240865
       x      y
0  66.22  32.54
1  22.52  22.39
2  31.01  81.21
3   9.47  31.02
4  30.78  60.10

type(p1.points)

pandas.core.frame.DataFrame

np.asarray(p1.points)

array([[66.22, 32.54],
       [22.52, 22.39],
       [31.01, 81.21],
       [ 9.47, 31.02],
       [30.78, 60.1 ],
       [75.21, 58.93],
       [79.26,  7.68],
       [ 8.23, 39.93],
       [98.73, 77.17],
       [89.78, 42.53],
       [65.19, 92.08],
       [54.46,  8.48]])

p1.mbb

array([ 8.23,  7.68, 98.73, 92.08])

From numpy arrays

points = np.asarray(points)
points

array([[66.22, 32.54],
       [22.52, 22.39],
       [31.01, 81.21],
       [ 9.47, 31.02],
       [30.78, 60.1 ],
       [75.21, 58.93],
       [79.26,  7.68],
       [ 8.23, 39.93],
       [98.73, 77.17],
       [89.78, 42.53],
       [65.19, 92.08],
       [54.46,  8.48]])

p1_np = PointPattern(points)
p1_np.summary()

Point Pattern
12 points
Bounding rectangle [(8.23,7.68), (98.73,92.08)]
Area of window: 7638.200000000002
Intensity estimate for window: 0.0015710507711240865
       x      y
0  66.22  32.54
1  22.52  22.39
2  31.01  81.21
3   9.47  31.02
4  30.78  60.10

From shapefiles

This example uses 200 randomly distributed points within the counties of Virginia. Coordinates are for UTM zone 17 N.

f = ps.examples.get_path('vautm17n_points.shp')
fo = ps.io.open(f)
pp_va = PointPattern(np.asarray([pnt for pnt in fo]))
fo.close()
pp_va.summary()

Point Pattern
200 points
Bounding rectangle [(273959.664381352,4049220.903414295), (972595.9895779632,4359604.85977962)]
Area of window: 216845506675.0557
Intensity estimate for window: 9.223156295311261e-10
               x             y
0  865322.486181  4.150317e+06
1  774479.213103  4.258993e+06
2  308048.692232  4.054700e+06
3  670711.529980  4.258864e+06
4  666254.475614  4.256514e+06

Attributes of PySAL Point Patterns

pp_va.summary()

Point Pattern
200 points
Bounding rectangle [(273959.664381352,4049220.903414295), (972595.9895779632,4359604.85977962)]
Area of window: 216845506675.0557
Intensity estimate for window: 9.223156295311261e-10
               x             y
0  865322.486181  4.150317e+06
1  774479.213103  4.258993e+06
2  308048.692232  4.054700e+06
3  670711.529980  4.258864e+06
4  666254.475614  4.256514e+06

pp_va.points

	x	y
0	865322.486181	4.150317e+06
1	774479.213103	4.258993e+06
2	308048.692232	4.054700e+06
3	670711.529980	4.258864e+06
4	666254.475614	4.256514e+06
5	664464.571678	4.061242e+06
6	784718.209785	4.076109e+06
7	972595.989578	4.183781e+06
8	657798.357403	4.253278e+06
9	682259.020242	4.282441e+06
10	727004.821077	4.068344e+06
11	705895.874812	4.266602e+06
12	828584.046576	4.065666e+06
13	713905.086059	4.316151e+06
14	881552.803340	4.091455e+06
15	469051.359337	4.117702e+06
16	316765.769715	4.074300e+06
17	697788.542435	4.060254e+06
18	735806.711384	4.169688e+06
19	857188.061626	4.069335e+06
20	840068.036835	4.157035e+06
21	554658.507423	4.056777e+06
22	273959.664381	4.057244e+06
23	751755.354691	4.212530e+06
24	862508.493456	4.068196e+06
25	608082.366460	4.137227e+06
26	783720.206483	4.131364e+06
27	648766.829060	4.193105e+06
28	560753.141222	4.059971e+06
29	659157.093262	4.157386e+06
...	...	...
170	693186.966524	4.139941e+06
171	845699.719699	4.231892e+06
172	796797.110375	4.313534e+06
173	691583.213674	4.074581e+06
174	752905.895923	4.166523e+06
175	963207.941343	4.165624e+06
176	611691.334171	4.049221e+06
177	777399.041143	4.170244e+06
178	781453.204801	4.124116e+06
179	675900.040876	4.059235e+06
180	530691.417350	4.087626e+06
181	555641.288115	4.122360e+06
182	532600.970765	4.051876e+06
183	800528.454702	4.335969e+06
184	516747.058864	4.104977e+06
185	647291.961412	4.223991e+06
186	673236.038854	4.292326e+06
187	534897.641241	4.129232e+06
188	789507.980935	4.240825e+06
189	701276.258284	4.199411e+06
190	669424.422196	4.276723e+06
191	602477.348747	4.146360e+06
192	872333.052082	4.156737e+06
193	773967.535489	4.145192e+06
194	803387.940279	4.173642e+06
195	876485.065262	4.148120e+06
196	621600.111400	4.177462e+06
197	450246.610116	4.106031e+06
198	740919.375814	4.359605e+06
199	797522.610898	4.208606e+06

200 rows × 2 columns

pp_va.head()

	x	y
0	865322.486181	4.150317e+06
1	774479.213103	4.258993e+06
2	308048.692232	4.054700e+06
3	670711.529980	4.258864e+06
4	666254.475614	4.256514e+06

pp_va.tail()

	x	y
195	876485.065262	4.148120e+06
196	621600.111400	4.177462e+06
197	450246.610116	4.106031e+06
198	740919.375814	4.359605e+06
199	797522.610898	4.208606e+06

Intensity Estimates

The intensity of a point process at point $s_i$ can be defined as:

$$\lambda(s_j) = \lim \limits_{|\mathbf{A}s_j| \to 0} \left \{ \frac{E(Y(\mathbf{A}s_j)}{|\mathbf{A}s_j|} \right \} $$

where $\mathbf{A}s_j$ is a small region surrounding location $s_j$ with area $|\mathbf{A}s_j|$, and $E(Y(\mathbf{A}s_j)$ is the expected number of event points in $\mathbf{A}s_j$.

The intensity is the mean number of event points per unit of area at point $s_j$.

Recall that one of the implications of CSR is that the intensity of the point process is constant in our study area $\Re$. In other words $\lambda(s_j) = \lambda(s_{j+1}) = \ldots = \lambda(s_n) = \lambda \ \forall s_j \in \Re$. Thus, if the area of $\Re$ = $|\Re|$ the expected number of event points in the study region is: $E(Y(\Re)) = \lambda |\Re|.$

In PySAL, the intensity is estimated by using a geometric object to encode the study region. We refer to this as the window, $W$. The reason for distinguishing between $\Re$ and $W$ is that the latter permits alternative definitions of the bounding object.

Intensity estimates are based on the following: $$\hat{\lambda} = \frac{n}{|W|}$$

where $n$ is the number of points in the window $W$, and $|W|$ is the area of $W$.

Intensity based on minimum bounding box: $$\hat{\lambda}_{mbb} = \frac{n}{|W_{mbb}|}$$

where $W_{mbb}$ is the minimum bounding box for the point pattern.

pp_va.lambda_mbb

9.223156295311263e-10

Intensity based on convex hull: $$\hat{\lambda}_{hull} = \frac{n}{|W_{hull}|}$$

where $W_{hull}$ is the convex hull for the point pattern.

pp_va.lambda_hull

1.5973789098179388e-09

Next steps

There is more to learn about point patterns in PySAL.

The centrographic notebook illustrates a number of spatial descriptive statistics and visualization of point patterns.

Clearly the window chosen will impact the intensity estimate. For more on windows see the window notebook.

To test if your point pattern departs from complete spatial randomness see the distance statistics notebook and quadrat statistics notebook.

To simulate different types of point processes in various windows see process notebook.

If you have point pattern data with additional attributes associated with each point see how to handle this in the marks notebook.