This page was generated from /home/runner/work/esda/esda/docs/notebooks/LOSH.ipynb. Interactive online version:

# Assessing local patterns of spatial heteroskedasticity¶

In the following notebook we review the local spatial heteroskedasticity (LOSH) statistic (\(H_i\)) put forward by Ord and Getis (2012). LOSH is meant as an accompainment to the local statistics that analyze the mean level of a spatial process. LOSH focuses on analyzing the variance of the spatial process.

As outlined by Ord and Getis, consider a 10 x 10 grid of property values. Within this grid, there is a central high-rent district (identified by cells that are 1) and a surrounding lower-value area (identified by cells that are 0. We can visualize this spatial arrangement as:

0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |

0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |

0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |

0 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |

0 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |

0 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |

0 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |

0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |

0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |

0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |

While we can see that the inner core of the high-rent district have similar values to their neigbhors (i.e. a cells of 1 surrounded by other cells of 1), this is less true when we consider the outer rim of the high-rent district. This rim represents a possible transition region between the stable inner high-rent area and the stable outer lower value areas. Ord and Getis seek to use the LOSH statistic (\(H_i\)) to identify this transitional region between the two areas.

# Understanding the LOSH statistic¶

Ord and Getis begin by defining a local mean, \(\bar{x}_i (d)\), as a re-scaled form of their \(G^*_i\) statistic. This local mean is:

Eq. 1

With a local mean defined, we can understand local residuals as the value of a local unit minus the local mean, or:

Eq. 2

These local residuals may be incorporated into Eq. 1 to form a final local dispersion statistic called \(H_i\):

Eq. 3

Note the addition of a new variable, \(a\). This handles the interpretation of the residuals. As explaiend by Ord and Getis:

When a = 1, we have an absolute deviations measure, Hi 1, and when a = 2 a variance measure, Hi2. Clearly, other choices are possible, along with various robust forms to avoid outliers. In order to produce a standard measure, we should divide by the mean absolute deviation or variance for the whole data set.

The default settings of the `losh()`

function set \(a=2\).

# Interpreting the LOSH statistic¶

Ord and Getis suggest that the \(H_i\) statistic may benefit from interpretation with the \(G^*_i\) statistic. The logic of this combined interpretation is that a the \(G^*_i\) value speaks to the local mean of the phenomenon of interest, whereas the \(H_i\) speaks to the local heterogeneity. Ord and Getis provide the following table as a simplified guide for interpretation:

Mean:nbsphinx-math:variance |
High \(H_i\) |
Low \(H_i\) |
---|---|---|

Large \(|G^*_i|\) |
A hot spot with heterogeneous local conditions |
A hot spot with similar surrounding areas; the map would indicate whether the affected region is larger than the single ‘cell’ |

Small \(|G^*_i|\) |
Heterogeneous local conditions but at a low average level (an unlikely event) |
Homogeneous local conditions and a low average level |

# Inference on the LOSH statistic¶

The current inference in the PySAL implementation of the LOSH statistic uses a \(\chi^2\) distribution. For each unit, we calculate a Z-score as:

with \(2/V_i\) degrees of freedom. \(V_i\) is the local variance of the unit, calculated as:

It is worthwhile to note that alternative methods for inference have been proposed in Xu et al (2014). While these methods are not yet implemented in PySAL, they are available in the `R`

`spdep`

package as `LOSH.mc`

. The PySAL function is comparable to the `spdep`

`LOSH`

and `LOSH.cs`

functions.

# Applying the LOSH statistic on a dataset¶

As a point of comparison, we now demonstrate the PySAL `losh`

function on the Boston Housing Dataset, which is also used as the docexample in `R`

`spdep`

`LOSH.cs`

. We are interested in the variable `NOX`

, which is the ‘…vector of nitric oxides concentration (parts per 10 million) per town’.

We first load the `Bostonhsg`

example dataset from `libpysal`

:

```
[1]:
```

```
import libpysal
import geopandas
boston = libpysal.examples.load_example('Bostonhsg')
boston_ds = geopandas.read_file(boston.get_path('boston.shp'))
```

```
/home/serge/anaconda3/envs/dev/lib/python3.8/site-packages/geopandas/_compat.py:84: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.8.1-CAPI-1.13.3). Conversions between both will be slow.
warnings.warn(
```

Then we construct a Queen weight structure:

```
[2]:
```

```
w = libpysal.weights.Queen.from_dataframe(boston_ds)
```

We can now import the `losh()`

function. Note that it is in the form of a scikit-learn type estimator, so we pass both a series of arguments and then call `.fit()`

.

```
[3]:
```

```
from esda.losh import LOSH
ls = LOSH(connectivity=w, inference="chi-square").fit(boston_ds['NOX'])
```

We can now examine the LOSH (\(H_i\)) values and their significance.

```
[4]:
```

```
ls.Hi[0:10]
```

```
[4]:
```

```
array([0.19690679, 0.51765774, 0.80382881, 0.80854441, 0.530667 ,
0.525579 , 0.83291425, 0.84215733, 0.48875154, 0.41955327])
```

```
[5]:
```

```
ls.pval[0:10]
```

```
[5]:
```

```
array([0.86292242, 0.61157688, 0.45697742, 0.34426167, 0.57934554,
0.55430556, 0.4135546 , 0.40999792, 0.54025022, 0.57801529])
```

If we want to map the LOSH (\(H_i\)) values, we need to add them back to the `boston_ds`

dataframe.

```
[6]:
```

```
boston_ds['Hi'] = ls.Hi
boston_ds['Hi_pval'] = ls.pval
```

We can now map the LOSH (\(H_i\)) values. Note that we choose a Quantile break scheme here for visualization purposes.

```
[7]:
```

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
boston_ds.plot(column='Hi', scheme='Quantiles',
k=5, cmap='GnBu',
legend=True, ax=ax)
```

```
[7]:
```

```
<AxesSubplot:>
```

We can also examine significance of the values. We use cutoffs of 0.01, 0.05, 0.10, and above 0.10.

```
[8]:
```

```
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
boston_ds.plot(column='Hi_pval', cmap='RdBu',
legend=True, ax=ax,
scheme='user_defined',
classification_kwds={'bins':[0.01, 0.05, 0.10, 1]})
```

```
[8]:
```

```
<AxesSubplot:>
```