giddy.markov.Spatial_Markov¶

class giddy.markov.Spatial_Markov(y, w, k=4, m=4, permutations=0, fixed=True, discrete=False, cutoffs=None, lag_cutoffs=None, variable_name=None, fill_empty_classes=False)[source]¶

Markov transitions conditioned on the value of the spatial lag.

Parameters:

yarray: (n, t), one row per observation, one column per state of each observation, with as many columns as time periods.
wW: spatial weights object.
kinteger, optional: number of classes (quantiles) for input time series y. Default is 4. If discrete=True, k is determined endogenously.
minteger, optional: number of classes (quantiles) for the spatial lags of regional time series. Default is 4. If discrete=True, m is determined endogenously.
permutationsint, optional: number of permutations for use in randomization based inference (the default is 0).
fixedbool, optional: If true, discretization are taken over the entire n*t pooled series and cutoffs can be user-defined. If cutoffs and lag_cutoffs are not given, quantiles are used. If false, quantiles are taken each time period over n. Default is True.
discretebool, optional: If true, categorical spatial lags which are most common categories of neighboring observations serve as the conditioning and fixed is ignored; if false, weighted averages of neighboring observations are used. Default is false.
cutoffsarray, optional: users can specify the discretization cutoffs for continuous time series. Default is None, meaning that quantiles will be used for the discretization.
lag_cutoffsarray, optional: users can specify the discretization cutoffs for the spatial lags of continuous time series. Default is None, meaning that quantiles will be used for the discretization.
variable_namestring: name of variable.
fill_empty_classes: bool: If True, assign 1 to diagonal elements which fall in rows full of 0s to ensure each conditional transition probability matrix is a stochastic matrix (each row sums up to 1). In other words, the probability of staying at that state is 1.

Attributes:

class_idsarray: (n, t), discretized series if y is continuous. Otherwise it is identical to y.
classesarray: (k, 1), all different classes (bins).
lclass_idsarray: (n, t), spatial lag series.
lclassesarray: (k, 1), all different classes (bins) for spatial lags.
parray: (k, k), transition probability matrix for a-spatial Markov.
sarray: (k, ), steady state distribution for a-spatial Markov.
farray: (k, k), first mean passage times for a-spatial Markov.
transitionsarray: (k, k), counts of transitions between each state i and j for a-spatial Markov.
Tarray: (m, k, k), counts of transitions for each conditional Markov. T[0] is the matrix of transitions for observations with lags in the 0th quantile; T[m-1] is the transitions for the observations with lags in the m-1th.
Parray: (m, k, k), transition probability matrix for spatial Markov first dimension is the conditioned on the lag.
Sarraylike: (m, k), steady state distributions for spatial Markov. Each row is a conditional steady state distribution. If one (or more) spatially conditional Markov chain is reducible (having more than 1 steady state distribution), this attribute is an array of m arrays of varying dimensions.
Farray: (m, k, k),first mean passage times. First dimension is conditioned on the spatial lag.
shtestlist: (k elements), each element of the list is a tuple for a multinomial difference test between the steady state distribution from a conditional distribution versus the overall steady state distribution: first element of the tuple is the chi2 value, second its p-value and the third the degrees of freedom.
chi2list: (k elements), each element of the list is a tuple for a chi-squared test of the difference between the conditional transition matrix against the overall transition matrix: first element of the tuple is the chi2 value, second its p-value and the third the degrees of freedom.
x2float: sum of the chi2 values for each of the conditional tests. Has an asymptotic chi2 distribution with k(k-1)(k-1) degrees of freedom. Under the null that transition probabilities are spatially homogeneous. (see chi2 above)
x2_dofint: degrees of freedom for homogeneity test.
x2_pvaluefloat: pvalue for homogeneity test based on analytic. distribution
x2_rpvaluefloat: (if permutations>0) pseudo p-value for x2 based on random spatial permutations of the rows of the original transitions.
x2_realizationsarray: (permutations,1), the values of x2 for the random permutations.
Qfloat: Chi-square test of homogeneity across lag classes based on [BB03].
Q_p_valuefloat: p-value for Q.
LRfloat: Likelihood ratio statistic for homogeneity across lag classes based on [BB03].
LR_p_valuefloat: p-value for LR.
dof_homint: degrees of freedom for LR and Q, corrected for 0 cells.

Notes

Based on [Rey01].

The shtest and chi2 tests should be used with caution as they are based on classic theory assuming random transitions. The x2 based test is preferable since it simulates the randomness under the null. It is an experimental test requiring further analysis.

Examples

>>> import libpysal
>>> from giddy.markov import Spatial_Markov
>>> import numpy as np
>>> f = libpysal.io.open(libpysal.examples.get_path("usjoin.csv"))
>>> pci = np.array([f.by_col[str(y)] for y in range(1929,2010)])
>>> pci = pci.transpose()
>>> rpci = pci/(pci.mean(axis=0))
>>> w = libpysal.io.open(libpysal.examples.get_path("states48.gal")).read()
>>> w.transform = 'r'

Now we create a Spatial_Markov instance for the continuous relative per capita income time series for 48 US lower states 1929-2009. The current implementation allows users to classify the continuous incomes in a more flexible way.

(1) Global quintiles to discretize the income data (k=5), and global quintiles to discretize the spatial lags of incomes (m=5).

>>> sm = Spatial_Markov(rpci, w, fixed=True, k=5, m=5, variable_name='rpci')

We can examine the cutoffs for the incomes and cutoffs for the spatial lags

>>> sm.cutoffs
array([0.83999133, 0.94707545, 1.03242697, 1.14911154])
>>> sm.lag_cutoffs
array([0.88973386, 0.95891917, 1.01469758, 1.1183566 ])

Obviously, they are slightly different.

We now look at the estimated spatially lag conditioned transition probability matrices.

>>> for p in sm.P:
...     print(p)
[[0.96341463 0.0304878  0.00609756 0.         0.        ]
 [0.06040268 0.83221477 0.10738255 0.         0.        ]
 [0.         0.14       0.74       0.12       0.        ]
 [0.         0.03571429 0.32142857 0.57142857 0.07142857]
 [0.         0.         0.         0.16666667 0.83333333]]
[[0.79831933 0.16806723 0.03361345 0.         0.        ]
 [0.0754717  0.88207547 0.04245283 0.         0.        ]
 [0.00537634 0.06989247 0.8655914  0.05913978 0.        ]
 [0.         0.         0.06372549 0.90196078 0.03431373]
 [0.         0.         0.         0.19444444 0.80555556]]
[[0.84693878 0.15306122 0.         0.         0.        ]
 [0.08133971 0.78947368 0.1291866  0.         0.        ]
 [0.00518135 0.0984456  0.79274611 0.0984456  0.00518135]
 [0.         0.         0.09411765 0.87058824 0.03529412]
 [0.         0.         0.         0.10204082 0.89795918]]
[[0.8852459  0.09836066 0.         0.01639344 0.        ]
 [0.03875969 0.81395349 0.13953488 0.         0.00775194]
 [0.0049505  0.09405941 0.77722772 0.11881188 0.0049505 ]
 [0.         0.02339181 0.12865497 0.75438596 0.09356725]
 [0.         0.         0.         0.09661836 0.90338164]]
[[0.33333333 0.66666667 0.         0.         0.        ]
 [0.0483871  0.77419355 0.16129032 0.01612903 0.        ]
 [0.01149425 0.16091954 0.74712644 0.08045977 0.        ]
 [0.         0.01036269 0.06217617 0.89637306 0.03108808]
 [0.         0.         0.         0.02352941 0.97647059]]

The probability of a poor state remaining poor is 0.963 if their neighbors are in the 1st quintile and 0.798 if their neighbors are in the 2nd quintile. The probability of a rich economy remaining rich is 0.976 if their neighbors are in the 5th quintile, but if their neighbors are in the 4th quintile this drops to 0.903.

The global transition probability matrix is estimated:

>>> print(sm.p)
[[0.91461837 0.07503234 0.00905563 0.00129366 0.        ]
 [0.06570302 0.82654402 0.10512484 0.00131406 0.00131406]
 [0.00520833 0.10286458 0.79427083 0.09505208 0.00260417]
 [0.         0.00913838 0.09399478 0.84856397 0.04830287]
 [0.         0.         0.         0.06217617 0.93782383]]

The Q and likelihood ratio statistics are both significant indicating the dynamics are not homogeneous across the lag classes:

>>> "%.3f"%sm.LR
'170.659'
>>> "%.3f"%sm.Q
'200.624'
>>> "%.3f"%sm.LR_p_value
'0.000'
>>> "%.3f"%sm.Q_p_value
'0.000'
>>> sm.dof_hom
60

The long run distribution for states with poor (rich) neighbors has 0.435 (0.018) of the values in the first quintile, 0.263 (0.200) in the second quintile, 0.204 (0.190) in the third, 0.0684 (0.255) in the fourth and 0.029 (0.337) in the fifth quintile.

>>> sm.S.astype(float).round(8)
array([[0.43509425, 0.2635327 , 0.20363044, 0.06841983, 0.02932278],
       [0.13391287, 0.33993305, 0.25153036, 0.23343016, 0.04119356],
       [0.12124869, 0.21137444, 0.2635101 , 0.29013417, 0.1137326 ],
       [0.0776413 , 0.19748806, 0.25352636, 0.22480415, 0.24654013],
       [0.01776781, 0.19964349, 0.19009833, 0.25524697, 0.3372434 ]])

States with incomes in the first quintile with neighbors in the first quintile return to the first quartile after 2.298 years, after leaving the first quintile. They enter the fourth quintile after 80.810 years after leaving the first quintile, on average. Poor states within neighbors in the fourth quintile return to the first quintile, on average, after 12.88 years, and would enter the fourth quintile after 28.473 years.

>>> for f in sm.F:
...     print(f.round(8))
[[  2.29835259  28.95614035  46.14285714  80.80952381 279.42857143]
 [ 33.86549708   3.79459555  22.57142857  57.23809524 255.85714286]
 [ 43.60233918   9.73684211   4.91085714  34.66666667 233.28571429]
 [ 46.62865497  12.76315789   6.25714286  14.61564626 198.61904762]
 [ 52.62865497  18.76315789  12.25714286   6.          34.1031746 ]]
[[  7.46754205   9.70574606  25.76785714  74.53116883 194.23446197]
 [ 27.76691978   2.94175577  24.97142857  73.73474026 193.4380334 ]
 [ 53.57477715  28.48447637   3.97566318  48.76331169 168.46660482]
 [ 72.03631562  46.94601483  18.46153846   4.28393653 119.70329314]
 [ 77.17917276  52.08887197  23.6043956    5.14285714  24.27564033]]
[[  8.24751154   6.53333333  18.38765432  40.70864198 112.76732026]
 [ 47.35040872   4.73094099  11.85432099  34.17530864 106.23398693]
 [ 69.42288828  24.76666667   3.794921    22.32098765  94.37966594]
 [ 83.72288828  39.06666667  14.3          3.44668119  76.36702977]
 [ 93.52288828  48.86666667  24.1          9.8          8.79255406]]
[[ 12.87974382  13.34847151  19.83446328  28.47257282  55.82395142]
 [ 99.46114206   5.06359731  10.54545198  23.05133495  49.68944423]
 [117.76777159  23.03735526   3.94436301  15.0843986   43.57927247]
 [127.89752089  32.4393006   14.56853107   4.44831643  31.63099455]
 [138.24752089  42.7893006   24.91853107  10.35         4.05613474]]
[[ 56.2815534    1.5         10.57236842  27.02173913 110.54347826]
 [ 82.9223301    5.00892857   9.07236842  25.52173913 109.04347826]
 [ 97.17718447  19.53125      5.26043557  21.42391304 104.94565217]
 [127.1407767   48.74107143  33.29605263   3.91777427  83.52173913]
 [169.6407767   91.24107143  75.79605263  42.5          2.96521739]]

(2) Global quintiles to discretize the income data (k=5), and global quartiles to discretize the spatial lags of incomes (m=4).

>>> sm = Spatial_Markov(rpci, w, fixed=True, k=5, m=4, variable_name='rpci')

We can also examine the cutoffs for the incomes and cutoffs for the spatial lags:

>>> sm.cutoffs
array([0.83999133, 0.94707545, 1.03242697, 1.14911154])
>>> sm.lag_cutoffs
array([0.91440247, 0.98583079, 1.08698351])

We now look at the estimated spatially lag conditioned transition probability matrices.

>>> for p in sm.P:
...     print(p)
[[0.95708955 0.03544776 0.00746269 0.         0.        ]
 [0.05825243 0.83980583 0.10194175 0.         0.        ]
 [0.         0.1294964  0.76258993 0.10791367 0.        ]
 [0.         0.01538462 0.18461538 0.72307692 0.07692308]
 [0.         0.         0.         0.14285714 0.85714286]]
[[0.7421875  0.234375   0.0234375  0.         0.        ]
 [0.08550186 0.85130112 0.06319703 0.         0.        ]
 [0.00865801 0.06926407 0.86147186 0.05627706 0.004329  ]
 [0.         0.         0.05363985 0.92337165 0.02298851]
 [0.         0.         0.         0.13432836 0.86567164]]
[[0.95145631 0.04854369 0.         0.         0.        ]
 [0.06       0.79       0.145      0.         0.005     ]
 [0.00358423 0.10394265 0.7921147  0.09677419 0.00358423]
 [0.         0.01630435 0.13586957 0.75543478 0.0923913 ]
 [0.         0.         0.         0.10204082 0.89795918]]
[[0.16666667 0.66666667 0.         0.16666667 0.        ]
 [0.03488372 0.80232558 0.15116279 0.01162791 0.        ]
 [0.00840336 0.13445378 0.70588235 0.1512605  0.        ]
 [0.         0.01171875 0.08203125 0.87109375 0.03515625]
 [0.         0.         0.         0.03434343 0.96565657]]

We now obtain 4 (5,5) spatial lag conditioned transition probability matrices instead of 5 as in case (1).

The Q and likelihood ratio statistics are still both significant.

>>> "%.3f"%sm.LR
'172.105'
>>> "%.3f"%sm.Q
'321.128'
>>> "%.3f"%sm.LR_p_value
'0.000'
>>> "%.3f"%sm.Q_p_value
'0.000'
>>> sm.dof_hom
45

(3) We can also set the cutoffs for relative incomes and their spatial lags manually. For example, we want the defining cutoffs to be [0.8, 0.9, 1, 1.2], meaning that relative incomes:

class 0: smaller than 0.8
class 1: between 0.8 and 0.9
class 2: between 0.9 and 1.0
class 3: between 1.0 and 1.2
class 4: larger than 1.2

>>> cc = np.array([0.8, 0.9, 1, 1.2])
>>> sm = Spatial_Markov(rpci, w, cutoffs=cc, lag_cutoffs=cc, variable_name='rpci')
>>> sm.cutoffs
array([0.8, 0.9, 1. , 1.2])
>>> sm.k
5
>>> sm.lag_cutoffs
array([0.8, 0.9, 1. , 1.2])
>>> sm.m
5
>>> for p in sm.P:
...     print(p)
[[0.96703297 0.03296703 0.         0.         0.        ]
 [0.10638298 0.68085106 0.21276596 0.         0.        ]
 [0.         0.14285714 0.7755102  0.08163265 0.        ]
 [0.         0.         0.5        0.5        0.        ]
 [0.         0.         0.         0.         0.        ]]
[[0.88636364 0.10606061 0.00757576 0.         0.        ]
 [0.04402516 0.89308176 0.06289308 0.         0.        ]
 [0.         0.05882353 0.8627451  0.07843137 0.        ]
 [0.         0.         0.13846154 0.86153846 0.        ]
 [0.         0.         0.         0.         1.        ]]
[[0.78082192 0.17808219 0.02739726 0.01369863 0.        ]
 [0.03488372 0.90406977 0.05813953 0.00290698 0.        ]
 [0.         0.05919003 0.84735202 0.09034268 0.00311526]
 [0.         0.         0.05811623 0.92985972 0.01202405]
 [0.         0.         0.         0.14285714 0.85714286]]
[[0.82692308 0.15384615 0.         0.01923077 0.        ]
 [0.0703125  0.7890625  0.125      0.015625   0.        ]
 [0.00295858 0.06213018 0.82248521 0.10946746 0.00295858]
 [0.         0.00185529 0.07606679 0.88497217 0.03710575]
 [0.         0.         0.         0.07803468 0.92196532]]
[[0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.        ]
 [0.         0.06666667 0.9        0.03333333 0.        ]
 [0.         0.         0.05660377 0.90566038 0.03773585]
 [0.         0.         0.         0.03932584 0.96067416]]

(3.1) As we can see from the above estimated conditional transition probability matrices, some rows are full of zeros and this violate the requirement that each row of a transition probability matrix sums to 1. We can easily adjust this assigning fill_empty_classes = True when initializing Spatial_Markov.

>>> sm = Spatial_Markov(
...     rpci, w, cutoffs=cc, lag_cutoffs=cc, fill_empty_classes=True
... )
>>> for p in sm.P:
...     print(p)
[[0.96703297 0.03296703 0.         0.         0.        ]
 [0.10638298 0.68085106 0.21276596 0.         0.        ]
 [0.         0.14285714 0.7755102  0.08163265 0.        ]
 [0.         0.         0.5        0.5        0.        ]
 [0.         0.         0.         0.         1.        ]]
[[0.88636364 0.10606061 0.00757576 0.         0.        ]
 [0.04402516 0.89308176 0.06289308 0.         0.        ]
 [0.         0.05882353 0.8627451  0.07843137 0.        ]
 [0.         0.         0.13846154 0.86153846 0.        ]
 [0.         0.         0.         0.         1.        ]]
[[0.78082192 0.17808219 0.02739726 0.01369863 0.        ]
 [0.03488372 0.90406977 0.05813953 0.00290698 0.        ]
 [0.         0.05919003 0.84735202 0.09034268 0.00311526]
 [0.         0.         0.05811623 0.92985972 0.01202405]
 [0.         0.         0.         0.14285714 0.85714286]]
[[0.82692308 0.15384615 0.         0.01923077 0.        ]
 [0.0703125  0.7890625  0.125      0.015625   0.        ]
 [0.00295858 0.06213018 0.82248521 0.10946746 0.00295858]
 [0.         0.00185529 0.07606679 0.88497217 0.03710575]
 [0.         0.         0.         0.07803468 0.92196532]]
[[1.         0.         0.         0.         0.        ]
 [0.         1.         0.         0.         0.        ]
 [0.         0.06666667 0.9        0.03333333 0.        ]
 [0.         0.         0.05660377 0.90566038 0.03773585]
 [0.         0.         0.         0.03932584 0.96067416]]
>>> sm.S[0]
array([[0.54148249, 0.16780007, 0.24991499, 0.04080245, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 1.        ]])
>>> sm.S[2]
array([0.03607655, 0.22667277, 0.25883041, 0.43607249, 0.04234777])

(4) Spatial_Markov also accepts discrete time series and calculates categorical spatial lags on which several transition probability matrices are conditioned. The categorical spatial lag is defined as the most common categories of neighboring observations, weighted by their weight strengths.

Please note that occasionally there are ties for the “most common categories” in the calculation of the categorical spatial lag. To break the tie, the category of the focal observation is included with its neighbors. If this does not resolve the tie, a winner is chosen randomly.

Let’s still use the US state income time series to demonstrate. We first discretize them into categories and then pass them to Spatial_Markov.

>>> import mapclassify as mc
>>> y = mc.Quantiles(rpci.flatten(), k=5).yb.reshape(rpci.shape)
>>> np.random.seed(5)
>>> sm = Spatial_Markov(y, w, discrete=True, variable_name='discretized rpci')
>>> sm.k
5
>>> sm.m
5
>>> for p in sm.P:
...     print(p)
[[0.94787645 0.04440154 0.00772201 0.         0.        ]
 [0.08333333 0.81060606 0.10606061 0.         0.        ]
 [0.         0.12765957 0.79787234 0.07446809 0.        ]
 [0.         0.02777778 0.22222222 0.66666667 0.08333333]
 [0.         0.         0.         0.33333333 0.66666667]]
[[0.888      0.096      0.016      0.         0.        ]
 [0.06049822 0.84341637 0.09608541 0.         0.        ]
 [0.00666667 0.10666667 0.81333333 0.07333333 0.        ]
 [0.         0.         0.08527132 0.86821705 0.04651163]
 [0.         0.         0.         0.10204082 0.89795918]]
[[0.65217391 0.32608696 0.02173913 0.         0.        ]
 [0.07446809 0.80851064 0.11170213 0.         0.00531915]
 [0.01071429 0.1        0.76428571 0.11785714 0.00714286]
 [0.         0.00552486 0.09392265 0.86187845 0.03867403]
 [0.         0.         0.         0.13157895 0.86842105]]
[[0.91935484 0.06451613 0.         0.01612903 0.        ]
 [0.06796117 0.90291262 0.02912621 0.         0.        ]
 [0.         0.05755396 0.87769784 0.0647482  0.        ]
 [0.         0.02150538 0.10752688 0.80107527 0.06989247]
 [0.         0.         0.         0.08064516 0.91935484]]
[[0.81818182 0.18181818 0.         0.         0.        ]
 [0.01754386 0.70175439 0.26315789 0.01754386 0.        ]
 [0.         0.14285714 0.73333333 0.12380952 0.        ]
 [0.         0.0042735  0.06837607 0.89316239 0.03418803]
 [0.         0.         0.         0.03891051 0.96108949]]

__init__(y, w, k=4, m=4, permutations=0, fixed=True, discrete=False, cutoffs=None, lag_cutoffs=None, variable_name=None, fill_empty_classes=False)[source]¶

Methods

`__init__`(y, w[, k, m, permutations, fixed, ...])
`summary`([file_name])	A summary method to call the Markov homogeneity test to test for temporally lagged spatial dependence.

Attributes

`F`
`LR`
`LR_p_value`
`Q`
`Q_p_value`
`S`
`chi2`
`dof_hom`
`f`
`ht`
`s`
`shtest`
`x2`
`x2_dof`
`x2_pvalue`

property F¶

property LR¶

property LR_p_value¶

property Q¶

property Q_p_value¶

property S¶

property chi2¶

property dof_hom¶

property f¶

property ht¶

property s¶

property shtest¶

summary(file_name=None)[source]¶

A summary method to call the Markov homogeneity test to test for temporally lagged spatial dependence.

To learn more about the properties of the tests, refer to [RKW16] and [KR18].

property x2¶

property x2_dof¶

property x2_pvalue¶