# Using the sampler

`spvcm`

is a generic gibbs sampling framework for spatially-correlated variance components models. The current supported models are:

`spvcm.both`

contains specifications with correlated errors in both levels, with the first statement`se/sma`

describing the lower level and the second statement`se/sma`

describing the upper level. In addition,`MVCM`

, the multilevel variance components model with no spatial correlation, is in the`both`

namespace.`spvcm.lower`

contains two specifications,`se/sma`

, that can be used for a variance components model with correlated lower-level errors.`spvcm.upper`

contains two specifications,`se/sma`

that can be used for a variance components model with correlated upper-level errors.

### Specification

These derive from a variance components specification:

Where:

- $\beta$, called
`Betas`

in code, is the marginal effect parameter. In this implementation, any region-level covariates $Z$ get appended to the end of $X$. So, if $X$ is $n \times p$ ($n$ observations of $p$ covariates) and $Z$ is $J \times p’$ ($p’$ covariates observed for $J$ regions), then the model’s $X$ matrix is $n \times (p + p’)$ and $\beta$ is $p + p’ \times 1$. - $\Psi_1$ is the covariance function for the response-level model. In the software, a separable covariance is assumed, so that $\Psi_1(\rho, \sigma^2) = \Psi_1(\rho) * I \sigma^2)$, where $I$ is the $n \times n$ covariance matrix. Thus, $\rho$ is the spatial autoregressive parameter and $\sigma^2$ is the variance parameter. In the software, $\Psi_1$ takes any of the following forms:
- Spatial Error (
`SE`

): $\Psi_1(\rho) = [(I - \rho \mathbf{W})’(I - \rho \mathbf{W})]^{-1} \sigma^2$ - Spatial Moving Average (
`SMA`

): $\Psi_1(\rho) = (I + \rho \mathbf{W})(I + \lambda \mathbf{W})’$ - Identity: $\Psi_1(\rho) = I$

- Spatial Error (
- $\Psi_2$ is the region-level covariance function, with region-level autoregressive parameter $\lambda$ and region-level variance $\tau^2$. It has the same potential forms as $\Psi_1$.
- $\alpha$, called
`Alphas`

in code, is the region-level random effect. In a variance components model, this is interpreted as a random effect for the upper-level. For a Varying-intercept format, this random component should be added to a region-level fixed effect to provide the varying intercept. This may also make it more difficult to identify the spatial parameter.

## Softare implementation

All of the possible combinations of Spatial Moving Average and Spatial Error processes are contained in the following classes. I will walk through estimating one below, and talk about the various features of the package.

First, the API of the package is defined by the `spvcm.api`

submodule. To load it, use `import spvcm.api as spvcm`

:

```
import spvcm as spvcm #package API
spvcm.both_levels.Generic # abstract customizable class, ignores rho/lambda, equivalent to MVCM
spvcm.both_levels.MVCM # no spatial effect
spvcm.both_levels.SESE # both spatial error (SE)
spvcm.both_levels.SESMA # response-level SE, region-level spatial moving average
spvcm.both_levels.SMASE # response-level SMA, region-level SE
spvcm.both_levels.SMASMA # both levels SMA
spvcm.upper_level.Upper_SE # response-level uncorrelated, region-level SE
spvcm.upper_level.Upper_SMA # response-level uncorrelated, region-level SMA
spvcm.lower_level.Lower_SE # response-level SE, region-level uncorrelated
spvcm.lower_level.Lower_SMA # response-level SMA, region-level uncorrelated
```

```
spvcm.lower_level.sma.model.Lower_SMA
```

Depending on the structure of the model, you need at least:

`X`

, data at the response (lower) level`Y`

, system response in the lower level`membership`

or`Delta`

, the membership vector relating each observation to its group or the “dummy variable” matrix encoding the same information.

Then, if spatial correlation is desired, `M`

is the “upper-level” weights matrix and `W`

the lower-level weights matrix.

Any upper-level data should be passed in $Z$, and have $J$ rows. To fit a varying-intercept model, include an identity matrix in $Z$. You can include state-level and response-level intercept terms simultaneously.

Finally, there are many configuration and tuning options that can be passed in at the start, or assigned after the model is initialized.

First, though, let’s set up some data for a model on southern counties predicting `HR90`

, the Homicide Rate in the US South in 1990, using the the percent of the labor force that is unemployed (`UE90`

), a principal component expressing the population structure (`PS90`

), and a principal component expressing resource deprivation.

We will also use the state-level average percentage of families below the poverty line and the average Gini coefficient at the state level for a $Z$ variable.

```
#seaborn is required for the traceplots
import pysal as ps
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import geopandas as gpd
%matplotlib inline
```

Reading in the data, we’ll extract these values we need from the dataframe.

```
data = ps.pdio.read_files(ps.examples.get_path('south.shp'))
gdf = gpd.read_file(ps.examples.get_path('south.shp'))
data = data[data.STATE_NAME != 'District of Columbia']
X = data[['UE90', 'PS90', 'RD90']].values
N = X.shape[0]
Z = data.groupby('STATE_NAME')[['FP89', 'GI89']].mean().values
J = Z.shape[0]
Y = data.HR90.values.reshape(-1,1)
```

Then, we’ll construct some queen contiguity weights from the files to show how to run a model.

```
W2 = ps.queen_from_shapefile(ps.examples.get_path('us48.shp'),
idVariable='STATE_NAME')
W2 = ps.w_subset(W2, ids=data.STATE_NAME.unique().tolist()) #only keep what's in the data
W1 = ps.queen_from_shapefile(ps.examples.get_path('south.shp'),
idVariable='FIPS')
W1 = ps.w_subset(W1, ids=data.FIPS.tolist()) #again, only keep what's in the data
W1.transform = 'r'
W2.transform = 'r'
```

With the data, upper-level weights, and lower-level weights, we can construct a membership vector *or* a dummy data matrix. For now, I’ll create the membership vector.

```
membership = data.STATE_NAME.apply(lambda x: W2.id_order.index(x)).values
```

But, we could also build the dummy variable matrix using `pandas`

, if we have a suitable categorical variable:

```
Delta_frame = pd.get_dummies(data.STATE_NAME)
Delta = Delta_frame.values
```

Every call to the sampler is of the following form:

`sampler(Y, X, W, M, Z, membership, Delta, n_samples, **configuration)`

Where `W`

, `M`

are passed if appropriate, `Z`

is passed if used, and only one of `membership`

or `Delta`

is required. In the end, `Z`

is appended to `X`

, so the effects pertaining to the upper level will be at the tail end of the $\beta$ effects vector. If both `Delta`

and `membership`

are supplied, they’re verified against each other to ensure that they agree before they are used in the model.

For all models, the membership vector or an equivalent dummy variable matrix is required. For models with correlation in the upper level, only the upper-level weights matrix $\mathbf{M}$ is needed. For lower level models, the lower-level weights matrix $\mathbf{W}$ is required. For models with correlation in both levels, both $\mathbf{W}$ and $\mathbf{M}$ are required.

Every sampler uses, either in whole or in part, `spvcm.both.generic`

, which implements the full generic sampler discussed in the working paper. For efficiency, the upper-level samplers modify this runtime to avoid processing the full lower-level covariance matrix.

Like many of the `R`

packages dedicated to bayesian models, configuration occurs by passing the correct dictionary to the model call. In addition, you can “setup” the model, configure it, and then run samples in separate steps.

The most common way to call the sampler is something like:

```
vcsma = spvcm.upper_level.Upper_SMA(Y, X, M=W2, Z=Z,
membership=membership,
n_samples=5000,
configs=dict(tuning=1000,
adapt_step=1.01))
```

This model, `spvcm.upper_level.Upper_SMA`

, is a variance components/varying intercept model with a state-level SMA-correlated error.

Thus, there are only five parameters in this model, since $\rho$, the lower-level autoregressive parameter, is constrained to zero:

```
vcsma.trace.varnames
```

```
['Alphas', 'Betas', 'Sigma2', 'Tau2', 'Lambda']
```

The results and state of the sampler are stored within the `vcsma`

object. I’ll step through the most important parts of this object.

`trace`

The quickest way to get information out of the model is via the trace object. This is where the results of the tracked parameters are stored each iteration. Any variable in the sampler state can be added to the tracked params. Trace objects are essentially dictionaries with the keys being the name of the tracked parameter and the values being a list of each iteration’s sampler output.

```
vcsma.trace.varnames
```

```
['Alphas', 'Betas', 'Sigma2', 'Tau2', 'Lambda']
```

In this case, `Lambda`

is the upper-level moving average parameter, `Alphas`

is the vector of correlated group-level random effects, `Tau2`

is the upper-level variance, `Betas`

are the marginal effects, and `Sigma2`

is the lower-level error variance.

I’ve written two helper functions for working with traces. First is to just dump all the output into a pandas dataframe, which makes it super easy to do work on the samples, or write them out to `csv`

and assess convergence in R’s `coda`

package.

```
trace_dataframe = vcsma.trace.to_df()
```

the dataframe will have columns containing the elements of the parameters and each row is a single iteration of the sampler:

```
trace_dataframe.head()
```

Sigma2 | Tau2 | Lambda | Alphas_0 | Alphas_1 | Alphas_2 | Alphas_3 | Alphas_4 | Alphas_5 | Alphas_6 | ... | Alphas_12 | Alphas_13 | Alphas_14 | Alphas_15 | Betas_0 | Betas_1 | Betas_2 | Betas_3 | Betas_4 | Betas_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 31.233393 | 0.923435 | -0.047215 | -1.261656 | -0.736935 | -0.399375 | -0.034331 | -2.236596 | -1.213818 | -0.568231 | ... | 0.106951 | 1.516595 | -1.364607 | 2.041536 | 10.531876 | -0.483666 | 2.109549 | 4.480676 | 0.003791 | -0.540474 |

1 | 34.099349 | 2.240627 | -0.047215 | -1.842016 | -0.847182 | -0.389802 | -0.111497 | -2.765864 | -2.446336 | -0.447575 | ... | 0.798257 | 1.826385 | 0.221681 | 2.458930 | 5.712048 | -0.236144 | 2.160096 | 4.133212 | -0.054797 | 10.323832 |

2 | 32.481750 | 2.785222 | -0.047215 | -2.489083 | 0.720136 | -1.188099 | -0.501012 | -3.213930 | -2.258950 | -1.216194 | ... | 0.859106 | 2.192760 | -1.669444 | 3.091403 | 12.897608 | -0.215380 | 1.741864 | 3.757071 | -0.011310 | -9.057812 |

3 | 34.457911 | 1.190100 | -0.047215 | -1.201301 | -1.893444 | -0.217819 | -0.681230 | -2.466085 | -1.134824 | -1.307211 | ... | 0.386158 | 1.437928 | -0.062408 | 2.459207 | 4.788274 | -0.172863 | 1.685140 | 3.760042 | -0.049314 | 11.777028 |

4 | 32.185869 | 2.298772 | -0.047215 | -1.554065 | -0.884244 | 1.800027 | -0.027299 | -2.574601 | -1.991742 | -0.659893 | ... | -0.148268 | 1.202387 | -0.822042 | 1.747517 | 3.763874 | -0.237071 | 2.108391 | 3.923432 | -0.041574 | 15.185345 |

5 rows × 25 columns

You can write this out to a csv or analyze it in memory like a typical pandas dataframes:

```
trace_dataframe.mean()
```

```
Sigma2 32.191602
Tau2 1.875002
Lambda 0.590718
Alphas_0 -1.511708
Alphas_1 -0.091938
Alphas_2 -0.034680
Alphas_3 0.119041
Alphas_4 -2.194331
Alphas_5 -0.865278
Alphas_6 -0.313610
Alphas_7 1.154932
Alphas_8 0.196744
Alphas_9 -0.106956
Alphas_10 1.071748
Alphas_11 0.372573
Alphas_12 0.264176
Alphas_13 1.943292
Alphas_14 -0.562918
Alphas_15 2.167571
Betas_0 8.178978
Betas_1 -0.261140
Betas_2 1.793932
Betas_3 3.974848
Betas_4 -0.006224
Betas_5 2.013807
dtype: float64
```

The second is a method to plot the traces:

```
fig, ax = vcsma.trace.plot()
plt.show()
```

The trace object can be sliced by (chain, parameter, index) tuples, or any subset thereof.

```
vcsma.trace['Lambda',-4:] #last 4 draws of lambda
```

```
array([0.72462283, 0.72462283, 0.34900417, 0.34900417])
```

```
vcsma.trace[['Tau2', 'Sigma2'], 0:2] #the first 2 variance parameters
```

```
{'Tau2': [0.923435131177113, 2.240627234434746],
'Sigma2': [31.23339280347621, 34.09934860150135]}
```

We only ran a single chain, so the first index is assumed to be zero. You can run more than one chain in parallel, using the builtin python `multiprocessing`

library:

```
vcsma_p = spvcm.upper_level.Upper_SMA(Y, X, M=W2, Z=Z,
membership=membership,
#run 3 chains
n_samples=5000, n_jobs=3,
configs=dict(tuning=500,
adapt_step=1.01))
```

```
vcsma_p.trace[0, 'Betas', -1] #the last draw of Beta on the first chain.
```

```
array([[ 8.34392377],
[-0.29268161],
[ 1.85086248],
[ 4.0572153 ],
[ 0.12809572],
[-2.34258149]])
```

```
vcsma_p.trace[1, 'Betas', -1] #the last draw of Beta on the second chain
```

```
array([[ 8.12849684],
[-0.26135619],
[ 1.43630595],
[ 4.01361537],
[ 0.15468895],
[-4.25558513]])
```

and the chain plotting works also for the multi-chain traces. In addition, there are quite a few traceplot options, and all the plots are returned by the methods as matplotlib objects, so they can also be saved using `plt.savefig()`

.

```
vcsma_p.trace.plot(burn=1000, thin=10)
plt.suptitle('SMA of Homicide Rate in Southern US Counties', y=0, fontsize=20)
#plt.savefig('trace.png') #saves to a file called "trace.png"
plt.show()
```

```
vcsma_p.trace.plot(burn=-100, varnames='Lambda') #A negative burn-in works like negative indexing in Python & R
plt.suptitle('First 100 iterations of $\lambda$', fontsize=20, y=.02)
plt.show() #so this plots Lambda in the first 100 iterations.
```

To get stuff like posterior quantiles, you can use the attendant pandas dataframe functionality, like `describe`

.

```
df = vcsma.trace.to_df()
```

```
df.describe()
```

Sigma2 | Tau2 | Lambda | Alphas_0 | Alphas_1 | Alphas_2 | Alphas_3 | Alphas_4 | Alphas_5 | Alphas_6 | ... | Alphas_12 | Alphas_13 | Alphas_14 | Alphas_15 | Betas_0 | Betas_1 | Betas_2 | Betas_3 | Betas_4 | Betas_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

count | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | ... | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 |

mean | 32.191602 | 1.875002 | 0.590718 | -1.511708 | -0.091938 | -0.034680 | 0.119041 | -2.194331 | -0.865278 | -0.313610 | ... | 0.264176 | 1.943292 | -0.562918 | 2.167571 | 8.178978 | -0.261140 | 1.793932 | 3.974848 | -0.006224 | 2.013807 |

std | 1.215098 | 1.109614 | 0.310978 | 0.851727 | 1.316852 | 0.994013 | 0.741269 | 0.711876 | 0.824202 | 0.845083 | ... | 0.766177 | 0.721764 | 0.841160 | 0.834297 | 3.238247 | 0.078861 | 0.202619 | 0.229363 | 0.086044 | 9.466125 |

min | 28.402067 | 0.191395 | -0.559725 | -4.727676 | -5.917587 | -3.671202 | -3.125425 | -4.846071 | -4.800286 | -3.748046 | ... | -3.088506 | -0.282006 | -3.737387 | -0.651108 | -1.523421 | -0.567038 | 1.083104 | 3.093189 | -0.329770 | -29.672208 |

25% | 31.350144 | 1.131752 | 0.396936 | -2.059361 | -0.923023 | -0.692784 | -0.382492 | -2.664186 | -1.400970 | -0.882462 | ... | -0.243498 | 1.435780 | -1.129984 | 1.588522 | 5.942375 | -0.313919 | 1.658467 | 3.822273 | -0.063075 | -4.347806 |

50% | 32.154740 | 1.613698 | 0.652983 | -1.480018 | -0.099072 | -0.078723 | 0.084165 | -2.191222 | -0.870115 | -0.329271 | ... | 0.245405 | 1.902598 | -0.552477 | 2.149598 | 8.184837 | -0.260328 | 1.794327 | 3.974249 | -0.004559 | 2.020761 |

75% | 33.023853 | 2.324573 | 0.856933 | -0.939373 | 0.718350 | 0.599587 | 0.583585 | -1.722762 | -0.313782 | 0.226628 | ... | 0.760613 | 2.404884 | -0.009692 | 2.697399 | 10.406652 | -0.209999 | 1.929081 | 4.129607 | 0.049533 | 8.396030 |

max | 36.857707 | 16.142846 | 0.996376 | 1.885945 | 5.544127 | 4.107627 | 2.975758 | 0.513466 | 1.952173 | 3.301225 | ... | 3.775299 | 4.693092 | 2.795567 | 5.473443 | 21.099968 | 0.063420 | 2.660798 | 4.763580 | 0.346991 | 31.900158 |

8 rows × 25 columns

There is also a `trace.summarize`

function that will compute various things contained in `spvcm.diagnostics`

on the chain. It takes a while for large chains, because the `statsmodels.tsa.AR`

estimator is much slower than the `ar`

estimator in `R`

. If you have rpy2 installed *and* `CODA`

installed in your R environment, I attempt to use R directly.

```
vcsma.trace.summarize()
```

mean | HPD_low | median | HPD_high | std | N_iters | N_effective | AR_loss | ||
---|---|---|---|---|---|---|---|---|---|

Chain_0 | Sigma2 | 32.191602 | 29.907175 | 32.154740 | 34.637546 | 1.215098 | 5000 | 4788 | 0.0424 |

Tau2 | 1.875002 | 0.363706 | 1.613698 | 3.967562 | 1.109614 | 5000 | 1156 | 0.7688 | |

Lambda | 0.590718 | -0.010077 | 0.652983 | 0.988580 | 0.310978 | 5000 | 243 | 0.9514 | |

Alphas_0 | -1.511708 | -3.261126 | -1.480018 | 0.112208 | 0.851727 | 5000 | 558 | 0.8884 | |

Alphas_1 | -0.091938 | -2.717617 | -0.099072 | 2.510040 | 1.316852 | 5000 | 2415 | 0.5170 | |

Alphas_2 | -0.034680 | -1.939546 | -0.078723 | 1.956138 | 0.994013 | 5000 | 871 | 0.8258 | |

Alphas_3 | 0.119041 | -1.240382 | 0.084165 | 1.660025 | 0.741269 | 5000 | 419 | 0.9162 | |

Alphas_4 | -2.194331 | -3.625861 | -2.191222 | -0.832241 | 0.711876 | 5000 | 419 | 0.9162 | |

Alphas_5 | -0.865278 | -2.595160 | -0.870115 | 0.644972 | 0.824202 | 5000 | 398 | 0.9204 | |

Alphas_6 | -0.313610 | -1.913203 | -0.329271 | 1.459151 | 0.845083 | 5000 | 303 | 0.9394 | |

Alphas_7 | 1.154932 | -0.462574 | 1.116507 | 2.936222 | 0.874826 | 5000 | 484 | 0.9032 | |

Alphas_8 | 0.196744 | -1.471783 | 0.157301 | 1.876040 | 0.875341 | 5000 | 237 | 0.9526 | |

Alphas_9 | -0.106956 | -1.671481 | -0.119013 | 1.397663 | 0.780549 | 5000 | 447 | 0.9106 | |

Alphas_10 | 1.071748 | -0.629166 | 1.058539 | 2.726753 | 0.861175 | 5000 | 497 | 0.9006 | |

Alphas_11 | 0.372573 | -1.162294 | 0.366974 | 1.930433 | 0.799864 | 5000 | 409 | 0.9182 | |

Alphas_12 | 0.264176 | -1.130617 | 0.245405 | 1.871470 | 0.766177 | 5000 | 484 | 0.9032 | |

Alphas_13 | 1.943292 | 0.589564 | 1.902598 | 3.395755 | 0.721764 | 5000 | 249 | 0.9502 | |

Alphas_14 | -0.562918 | -2.205367 | -0.552477 | 1.125969 | 0.841160 | 5000 | 510 | 0.8980 | |

Alphas_15 | 2.167571 | 0.555624 | 2.149598 | 3.804761 | 0.834297 | 5000 | 409 | 0.9182 | |

Betas_0 | 8.178978 | 1.900055 | 8.184837 | 14.474289 | 3.238247 | 5000 | 2668 | 0.4664 | |

Betas_1 | -0.261140 | -0.410947 | -0.260328 | -0.101738 | 0.078861 | 5000 | 2513 | 0.4974 | |

Betas_2 | 1.793932 | 1.384019 | 1.794327 | 2.176764 | 0.202619 | 5000 | 4310 | 0.1380 | |

Betas_3 | 3.974848 | 3.508665 | 3.974249 | 4.408589 | 0.229363 | 5000 | 3408 | 0.3184 | |

Betas_4 | -0.006224 | -0.178816 | -0.004559 | 0.162295 | 0.086044 | 5000 | 418 | 0.9164 | |

Betas_5 | 2.013807 | -16.324642 | 2.020761 | 20.862976 | 9.466125 | 5000 | 4064 | 0.1872 |

So, 5000 iterations, but many parameters have an effective sample size that’s much less than this. There’s debate about whether it’s necesasry to thin these samples in accordance with the effective size, and I think you should thin your sample to the effective size and see if it affects your HPD/Standard Errorrs.

The existing python packages for MCMC diagnostics were incorrect. So, I’ve implemented many of the diagnostics from `CODA`

, and have verified that the diagnostics comport with `CODA`

diagnostics. One can also use `numpy`

& `statsmodels`

functions. I’ll show some types of analysis.

```
from statsmodels.api import tsa
#if you don't have it, try removing the comment and:
#! pip install statsmodels
```

For example, a plot of the partial autocorrelation in $\lambda$, the upper-level spatial moving average parameter, over the last half of the chain is:

```
plt.plot(tsa.pacf(vcsma.trace['Lambda', -2500:]))
```

```
[<matplotlib.lines.Line2D at 0x7fcfe122d7f0>]
```

So, the chain is close-to-first order:

```
tsa.pacf(df.Lambda)[0:3]
```

```
array([1. , 0.90178845, 0.03398422])
```

We could do this for many parameters, too. An Autocorrelation/Partial Autocorrelation plot can be made of the marginal effects by:

```
betas = [c for c in df.columns if c.startswith('Beta')]
f,ax = plt.subplots(len(betas), 2, figsize=(10,8))
for i, col in enumerate(betas):
ax[i,0].plot(tsa.acf(df[col].values))
ax[i,1].plot(tsa.pacf(df[col].values)) #the pacf plots take a while
ax[i,0].set_title(col +' (ACF)')
ax[i,1].set_title('(PACF)')
f.tight_layout()
plt.show()
```

As far as the builtin diagnostics for convergence and simulation quality, the `diagnostics`

module exposes a few things:

Geweke statistics for differences in means between chain components:

```
gstats = spvcm.diagnostics.geweke(vcsma, varnames='Tau2') #takes a while
print(gstats)
```

```
[{'Tau2': array([-0.70525936, -0.86465127, -0.66491713, -1.10260496, -0.85742401,
-0.57409951, -0.41226321, -1.20085802, -1.45556743, -1.50305192,
-1.36917624, -1.7651641 , -1.62251801, -1.91653818, -2.95959606,
-3.25223318, -3.00471879, -1.58320811, -1.49846186, -1.84009018,
-1.94378006, -1.55312447, -1.74487948, -1.77123135, -1.25633365,
-1.75786086, -0.88990747, -0.68509239, -0.66391378, 0.32228814,
0.98220296, 1.03257125, 0.93549226, 0.88218052, 0.41084557,
0.80814834, -0.23443594, 0.2413701 , -0.73956236, -0.77123872,
-0.74954084, 0.1017357 , -0.141298 , -0.11990577, 1.05561834,
1.04085492, 1.58169489, 1.69289289, 2.08680592, 1.98418018])}]
```

Typically, this means the chain is converged at the given “bin” count if the line stays within $\pm2$. The geweke statistic is a test of differences in means between the given chunk of the chain and the remaining chain. If it’s outside of +/- 2 in the early part of the chain, you should discard observations early in the chain. If you get extreme values of these statistics throughout, you need to keep running the chain.

```
plt.plot(gstats[0]['Tau2'][:-1])
```

```
[<matplotlib.lines.Line2D at 0x7fcfe12a9eb8>]
```

We can also compute Monte Carlo Standard Errors like in the `mcse`

R package, which represent the intrinsic error contained in the estimate:

```
spvcm.diagnostics.mcse(vcsma, varnames=['Tau2', 'Sigma2'])
```

```
{'Sigma2': 0.01727439579905615, 'Tau2': 0.031562453419661775}
```

Another handy statistic is the Partial Scale Reduction factor, which measures of how likely a set of chains run in parallel have converged to the same stationary distribution. It provides the difference in variance between between chains vs. within chains.

If these are significantly larger than one (say, 1.5), the chain probably has not converged. Being marginally below $1$ is fine, too.

```
spvcm.diagnostics.psrf(vcsma_p, varnames=['Tau2', 'Sigma2'])
```

```
{'Tau2': 1.0077401382404925, 'Sigma2': 0.9998141761817744}
```

Highest posterior density intervals provide a kind of interval estimate for parameters in Bayesian models:

```
spvcm.diagnostics.hpd_interval(vcsma, varnames=['Betas', 'Lambda', 'Sigma2'])
```

```
{'Betas': [(1.9000548460393478, 14.474288717664574),
(-0.41094673834079987, -0.10173775264997412),
(1.384018831652215, 2.1767635687081506),
(3.508665394133376, 4.408589008319321),
(-0.17881562058023562, 0.16229519966707567),
(-16.32464174373274, 20.862975934687753)],
'Sigma2': (29.9071748836272, 34.63754603742357),
'Lambda': (-0.010077055754050324, 0.9885795446636961)}
```

Sometimes, you want to apply arbitrary functions to each parameter trace. To do this, I’ve written a `map`

function that works like the python builtin `map`

. For example, if you wanted to get arbitrary percentiles from the chain:

```
vcsma.trace.map(np.percentile,
varnames=['Lambda', 'Tau2', 'Sigma2'],
#arguments to pass to the function go last
q=[25, 50, 75])
```

```
[{'Lambda': array([0.39693634, 0.65298256, 0.85693311]),
'Tau2': array([1.13175212, 1.61369846, 2.32457265]),
'Sigma2': array([31.35014374, 32.15473988, 33.02385302])}]
```

In addition, you can pop the trace results pretty simply to a `.csv`

file and analyze it elsewhere, like if you want to use use the `coda`

Bayesian Diagnostics package in `R`

.

To write out a model to a csv, you can use:

```
vcsma.trace.to_csv('./model_run.csv')
```

And, you can even load traces from csvs:

```
tr = spvcm.abstracts.Trace.from_csv('./model_run.csv')
print(tr.varnames)
tr.plot(varnames=['Tau2'])
```

```
['Lambda', 'Sigma2', 'Tau2', 'Alphas', 'Betas']
```

```
(<Figure size 576x144 with 2 Axes>,
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fcfe12fd5f8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fcfe12c5198>]],
dtype=object))
```

# Working with models: `draw`

and `sample`

These two functions are used to call the underlying Gibbs sampler. They take no arguments, and operate on the sampler in place. `draw`

provides a single new sample:

```
vcsma.draw()
```

And sample steps forward an arbitrary number of times:

```
vcsma.sample(10)
```

At this point, we did 5000 initial samples and 11 extra samples. Thus:

```
vcsma.cycles
```

```
5011
```

Parallel models can suspend/resume sampling too:

```
vcsma_p.sample(10)
```

```
vcsma_p.cycles
```

```
5010
```

Under the hood, it’s the `draw`

method that actually ends up calling one run of `model._iteration`

, which is where the actual statistical code lives. Then, it updates all `model.traced_params`

by adding their current value in `model.state`

to `model.trace.`

In addition, `model._finalize`

is called the first time sampling is run, which computes some of the constants & derived quantities that save computing time.

# Working with models: `state`

This is the collection of current values in the sampler. To be efficient, Gibbs sampling must keep around some of the computations used in the simulation, since sometimes the same terms show up in different conditional posteriors. So, the current values of the sampler are stored in `state`

.

All of the following are tracked in the state:

```
print(vcsma.state.keys())
```

```
dict_keys(['X', 'Y', 'M', 'W', 'Delta', 'N', 'J', 'p', 'Sigma2_a0', 'Sigma2_b0', 'Betas_cov0', 'Betas_mean0', 'Tau2_a0', 'Tau2_b0', 'Log_Lambda0', 'Log_Rho0', 'Rho_min', 'Rho_max', 'Lambda_min', 'Lambda_max', 'Betas', 'Alphas', 'Sigma2', 'Tau2', 'Rho', 'Lambda', 'Psi_1', 'Psi_1i', 'Psi_2', 'Psi_2i', 'In', 'Ij', 'Betas_cov0i', 'Betas_covm', 'Sigma2_an', 'Tau2_an', 'PsiRhoi', 'PsiLambdai', 'XtX', 'DeltatDelta', 'DeltaAlphas', 'XBetas', 'initial_values'])
```

If you want to track how something (maybe a hyperparameter) changes over sampling, you can pass `extra_traced_params`

to the model declaration:

```
example = spvcm.upper_level.Upper_SMA(Y, X, M=W2, Z=Z,
membership=membership,
n_samples=250,
extra_traced_params = ['DeltaAlphas'],
configs=dict(tuning=500, adapt_step=1.01))
example.trace.varnames
```

```
['Alphas', 'Betas', 'Sigma2', 'Tau2', 'Lambda', 'DeltaAlphas']
```

`configs`

this is where configuration options for the various MCMC steps are stored. For multilevel variance components models, these are called $\rho$ for the lower-level error parameter and $\lambda$ for the upper-level parameter. Two exact sampling methods are implemented, Metropolis sampling & Slice sampling.

Each MCMC step has its own config:

```
vcsma.configs
```

```
{'Rho': <spvcm.steps.Metropolis at 0x7fcfe1438780>,
'Lambda': <spvcm.steps.Metropolis at 0x7fcfe1438828>}
```

Since `vcsma`

is an upper-level-only model, the `Rho`

config is skipped. But, we can look at the `Lambda`

config. The number of accepted `lambda`

draws is contained in :

```
vcsma.configs.Lambda.accepted
```

```
725
```

so, the acceptance rate is

```
vcsma.configs.Lambda.accepted / float(vcsma.cycles)
```

```
0.14468170025942925
```

Also, if you want to get verbose output from the metropolis sampler, there is a “debug” flag:

```
example = spvcm.upper_level.Upper_SMA(Y, X, M=W2, Z=Z,
membership=membership,
n_samples=500,
configs=dict(tuning=250,
adapt_step=1.01,
debug=True))
```

Which stores the information about each iteration in a list, accessible from `model.configs.<parameter>._cache`

:

```
example.configs.Lambda._cache[-1] #let's only look at the last one
```

```
{'jump': 2.1374234835161063,
'current_logp': array([-8.76422011]),
'new_logp': array([-8.76422011]),
'accepted': False}
```

Configuration of the MCMC steps is done using the `config`

options dictionary, like done in `spBayes`

in `R`

. The actual configuration classes exist in spvcm.steps:

```
from spvcm.steps import Metropolis, Slice
```

Most of the common options are:

### Metropolis

`jump`

: the starting standard deviation of the proposal distribution`tuning`

: the number of iterations to tune the scale of the proposal`ar_low`

: the lower bound of the target acceptance rate range`ar_hi`

: the upper bound of the target acceptance rate range`adapt_step`

: a number (bigger than 1) that will be used to modify the jump in order to keep the acceptance rate betwen`ar_lo`

and`ar_hi`

. Values much larger than`1`

result in much more dramatic tuning.

### Slice

`width`

: starting width of the level set`adapt`

: number of previous slices use in the weighted average for the next slice. If`0`

, the`width`

is not dynamically tuned.

```
example = spvcm.upper_level.Upper_SMA(Y, X, M=W2, Z=Z,
membership=membership,
n_samples=500,
configs=dict(tuning=250,
adapt_step=1.01,
debug=True, ar_low=.1, ar_hi=.4))
```

```
example.configs.Lambda.ar_hi, example.configs.Lambda.ar_low
```

```
(0.4, 0.1)
```

```
example_slicer = spvcm.upper_level.Upper_SMA(Y, X, M=W2, Z=Z,
membership=membership,
n_samples=500,
configs=dict(Lambda_method='slice'))
```

```
example_slicer.trace.plot(varnames='Lambda')
plt.show()
```

```
example_slicer.configs.Lambda.adapt, example_slicer.configs.Lambda.width
```

```
(0, 0.5)
```

# Working with models: customization

If you’re doing heavy customization, it makes the most sense to first initialize the class without sampling. We did this before when showing how the “extra_traced_params” option worked.

To show, let’s initialize a double-level SAR-Error variance components model, but not actually draw anything.

To do this, you pass the option `n_samples=0`

.

```
vcsese = spvcm.both_levels.SESE(Y, X, W=W1, M=W2, Z=Z,
membership=membership,
n_samples=0)
```

This sets up a two-level spatial error model with the default uninformative configuration. This means the prior precisions are all `I * .001*`

, prior means are all 0, spatial parameters are set to `-1/(n-1)`

, and prior scale factors are set arbitrarily.

### Configs

Options are set by assgning to the relevant property in `model.configs`

.

The model configuration object is another dictionary with a few special methods.

Configuration options are stored for each parameter separately:

```
vcsese.configs
```

```
{'Rho': <spvcm.steps.Metropolis at 0x7fcfe13004a8>,
'Lambda': <spvcm.steps.Metropolis at 0x7fcfe13002b0>}
```

So, for example, if we wanted to turn off adaptation in the upper-level parameter, and fix the Metrpolis jump variance to `.25`

:

```
vcsese.configs.Lambda.max_tuning = 0
vcsese.configs.Lambda.jump = .25
```

### Priors

Another thing that might be interesting (though not “bayesian”) would be to fix the prior mean of $\beta$ to the OLS estimates. One way this could be done would be to pull the `Delta`

matrix out from the state, and estimate:
using `PySAL`

:

```
Delta = vcsese.state.Delta
DeltaZ = Delta.dot(Z)
vcsese.state.Betas_mean0 = ps.spreg.OLS(Y, np.hstack((X, DeltaZ))).betas
```

### Starting Values

If you wanted to start the sampler at a given starting value, you can do so by assigning that value to the `Lambda`

value in `state`

.

```
vcsese.state.Lambda = -.25
```

Sometimes, it’s suggested that you start the beta vector randomly, rather than at zero. For the parallel sampling, the model starting values are adjusted to induce overdispersion in the start values.

You could do this manually, too:

```
vcsese.state.Betas += np.random.uniform(-10, 10, size=(vcsese.state.p,1))
```

### Spatial Priors

Changing the spatial parameter priors is also done by changing their prior in state. This prior must be a function that takes a value of the parameter and return the log of the prior probability for that value.

For example, we could assign `P(\lambda) = Beta(2,1)`

and zero if outside $(0,1)$, and asign $\rho$ a truncated $\mathcal{N}(0,.5)$ prior by first defining their functional form:

```
from scipy import stats
```

```
def Lambda_prior(val):
if (val < 0) or (val > 1):
return -np.inf
return np.log(stats.beta.pdf(val, 2,1))
```

```
def Rho_prior(val):
if (val > .5) or (val < -.5):
return -np.inf
return np.log(stats.truncnorm.pdf(val, -.5, .5, loc=0, scale=.5))
```

And then assigning to their symbols, `LogLambda0`

and `LogRho0`

in the state:

```
vcsese.state.LogLambda0 = Lambda_prior
vcsese.state.LogRho0 = Rho_prior
```

### Performance

The efficiency of the sampler is contingent on the lower-level size. If we were to estimate the draw in a dual-level SAR-Error Variance Components iteration:

```
%timeit vcsese.draw()
```

```
26 ms ± 2.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

To make it easy to work with the model, you can interrupt and resume sampling using keyboard interrupts (`ctrl-c`

or the `stop`

button in the notebook).

```
%time vcsese.sample(100)
```

```
CPU times: user 10.4 s, sys: 148 ms, total: 10.5 s
Wall time: 2.64 s
```

```
vcsese.sample(10)
```

# Under the Hood

### Package Structure

Most of the tools in the package are stored in relevant python files in the top level or a dedicated subfolder. Explaining a few:

`abstracts.py`

- the abstract class machinery to iterate over a sampling loop. This is where the classes are defined, like`Trace`

,`Sampler_Mixin`

, or`Hashmap`

.`plotting.py`

- tools for plotting output`steps.py`

- the step method definitions`verify.py`

- like`user`

checks in`pysal.spreg`

, this contains a few sanity checks.`utils.py`

- contains statistical or numerical utilities to make the computation easier, like cholesky multivariate normal sampling, more sparse utility functions, etc.`diagnostics.py`

- all the diagnostics`priors.py`

- definitions of alternative prior forms. Right now, this is pretty simple.`sqlite.py`

- functions to use a sqlite database instead of an in-memory chain are defined here.

### The implementation of a Model

The package is implemented so that every “model type” first sends off to the `spvcm.both.Base_Generic`

, which sets up the state, trace, and priors.

Models are added by writing a `model.py`

file and possibly a `sample.py`

file. The `model.py`

file defines a `Base/User`

class pair (like `spreg`

) that sets up the state and trace. It must define hyperparameters, and can precompute objects used in the sampling loop. The base class should inherit from `Sampler_Mixin`

, which defines all of the machinery of sampling.

The loop through the conditional posteriors should be defined in `model.py`

, in the `model._iteration`

function. This should update the model state in place.

The model may also define a `_finalize`

function which is run once before sampling.

So, if I write a new model, like a varying-intercept model with endogenously-lagged intercepts, I would write a `model.py`

containing something like:

```
class Base_VISAR(spvcm.generic.Base_Generic):
def __init__(self, Y, X, M, membership=None, Delta=None,
extra_traced_params=None, #record extra things in state
n_samples=1000, n_jobs=1, #sampling config
priors = None, # dict with prior values for params
configs=None, # dict with configs for MCMC steps
starting_values=None, # dict with starting values
truncation=None, # options to truncate MCMC step priors
center=False, # Whether to center the X,Z matrices
scale=False # Whether re-scale the X,Z matrices
):
super(Base_VISAR, self).__init__(self, Y, X, M, W=None,
membership=membership,
Delta=Delta,
n_samples=0, n_jobs=n_jobs,
priors=priors, configs=configs,
starting_values=starting_values,
truncation=truncation,
center=center,
scale=scale
)
self.sample(n_samples, n_jobs=n_jobs)
def _finalize(self):
# the degrees of freedom of the variance parameter is constant
self.state.Sigma2_an = self.state.N/2 + self.state.Sigma2_a0
...
def _iteration(self):
# computing the values needed to sample from the conditional posteriors
mean = spdot(X.T, spdot(self.PsiRhoi, X)) / Sigma2 + self.state.bmean0
...
...
```

I’ve organized the directories in this project into `both_levels`

, `upper_level`

, `lower_level`

, and `hierarchical`

, which contains some of the spatially-varying coefficient models & other models I’m working on that are unrelated to the multilevel variance components stuff.

Since most of the `_iteration`

loop is the same between models, most of the models share the same sampling code, but customize the structure of the covariance in each level. These covariance variables are stored in the `state.Psi_1`

, for the lower-level covariance, and `state.Psi_2`

for the upper-level covariance. Likewise, the precision functions are `state.Psi_1i`

and `state.Psi_2i`

.

For example:

```
vcsese.state.Psi_1 #lower-level covariance
```

```
<function spvcm.utils.se_covariance(param, W, sparse=False)>
```

```
vcsese.state.Psi_2 #upper-level covariance
```

```
<function spvcm.utils.se_covariance(param, W, sparse=False)>
```

```
vcsma.state.Psi_2 #upper-level covariance
```

```
<function spvcm.utils.sma_covariance(param, W, sparse=True)>
```

```
vcsma.state.Psi_2i
```

```
<function spvcm.utils.sma_precision(param, W, sparse=False)>
```

```
vcsma.state.Psi_1
```

```
<function spvcm.utils.ind_covariance(param, W, sparse=False)>
```

The functions that generate the covariance matrices are stored in `spvcm.utils`

. They can be arbitrarily overwritten for alternative covariance specifications.

Thus, if we want to sample a model with a new covariance specification, then we need to define functions for the variance and precision.