spreg.GM_KKP¶
- class spreg.GM_KKP(y, x, w, full_weights=False, regimes=None, vm=False, name_y=None, name_x=None, name_w=None, name_ds=None, name_regimes=None)[source]¶
GMM method for a spatial random effects panel model based on Kapoor, Kelejian and Prucha (2007) [KKP07].
- Parameters:
- y
array
orpandas
DataFrame
n*tx1 or nxt array for dependent variable
- x
array
orpandas
DataFrame
Two dimensional array or DF with n*t rows and k columns for independent (exogenous) variable or n rows and k*t columns (note, must not include a constant term)
- w
spatial
weights
object
Spatial weights matrix, nxn
- full_weights: boolean
Considers different weights for each of the 6 moment conditions if True or only 2 sets of weights for the first 3 and the last 3 moment conditions if False (default)
- regimes
list
List of n values with the mapping of each observation to a regime. Assumed to be aligned with ‘y’.
- vmbool
If True, include variance-covariance matrix in summary results
- name_y
str
orlist
of
strings
Name of dependent variable for use in output
- name_x
list
of
strings
Names of independent variables for use in output
- name_w
str
Name of weights matrix for use in output
- name_ds
str
Name of dataset for use in output
- name_regimes
str
Name of regime variable for use in the output
- y
- Attributes:
- betas
array
kx1 array of estimated coefficients
- u
array
nx1 array of residuals
- e_filtered
array
nx1 array of spatially filtered residuals
- predy
array
nx1 array of predicted y values
- n
integer
Number of observations
- t
integer
Number of time periods
- k
integer
Number of variables for which coefficients are estimated (including the constant)
- y
array
nx1 array for dependent variable
- x
array
Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant
- vm
array
Variance covariance matrix (kxk)
- chow
tuple
Contains 2 elements. 1: Pair of Wald statistic and p-value for the setup of global regime stability. 2: array with Wald statistic (col 0) and its p-value (col 1) for each beta that varies across regimes. Exists only if regimes is not None.
- name_y
str
Name of dependent variable for use in output
- name_x
list
of
strings
Names of independent variables for use in output
- name_w
str
Name of weights matrix for use in output
- name_ds
str
Name of dataset for use in output
- name_regimes
str
Name of regime variable for use in the output
- title
str
Name of the regression method used
- “””
- Examples
- ——–
- We first need to import the needed modules, namely numpy to convert the
- data we read into arrays that ``spreg`` understands and ``pysal`` to
- perform all the analysis.
- >>> from spreg import GM_KKP
- >>> import numpy as np
- >>> import libpysal
- Open data on NCOVR US County Homicides (3085 areas) using libpysal.io.open().
- This is the DBF associated with the NAT shapefile. Note that
- libpysal.io.open() also reads data in CSV format; The GM_KKP function requires
- data to be passed in as numpy arrays, hence the user can read their
- data in using any method.
- >>> nat = libpysal.examples.load_example(‘NCOVR’)
- >>> db = libpysal.io.open(nat.get_path(“NAT.dbf”),’r’)
- Extract the HR (homicide rates) data in the 70’s, 80’s and 90’s from the DBF file
- and make it the dependent variable for the regression. Note that the data can also
- be passed in the long format instead of wide format (i.e. a vector with n*t rows
- and a single column for the dependent variable and a matrix of dimension n*txk
- for the independent variables).
- >>> name_y = [‘HR70’,’HR80’,’HR90’]
- >>> y = np.array([db.by_col(name) for name in name_y]).T
- Extract RD and PS in the same time periods from the DBF to be used as
- independent variables in the regression. Note that PySAL requires this to
- be an nxk*t numpy array, where k is the number of independent variables (not
- including a constant) and t is the number of time periods. Data must be
- organized in a way that all time periods of a given variable are side-by-side
- and in the correct time order.
- By default a vector of ones will be added to the independent variables passed in.
- >>> name_x = [‘RD70’,’RD80’,’RD90’,’PS70’,’PS80’,’PS90’]
- >>> x = np.array([db.by_col(name) for name in name_x]).T
- Since we want to run a spatial error panel model, we need to specify the spatial
- weights matrix that includes the spatial configuration of the observations
- into the error component of the model. To do that, we can open an already
- existing gal file or create a new one. In this case, we will create one
- from ``NAT.shp``.
- >>> w = libpysal.weights.Queen.from_shapefile(libpysal.examples.get_path(“NAT.shp”))
- Unless there is a good reason not to do it, the weights have to be
- row-standardized so every row of the matrix sums to one. Among other
- things, his allows to interpret the spatial lag of a variable as the
- average value of the neighboring observations. In PySAL, this can be
- easily performed in the following way:
- >>> w.transform = ‘r’
- We are all set with the preliminaries, we are good to run the model. In this
- case, we will need the variables and the weights matrix. If we want to
- have the names of the variables printed in the output summary, we will
- have to pass them in as well, although this is optional. In this example
- we set full_weights to False (the default), indicating that we will use
- only 2 sets of moments weights for the first 3 and the last 3 moment conditions.
- >>> reg = GM_KKP(y,x,w,full_weights=False,name_y=name_y, name_x=name_x)
- Warning: Assuming time data is in wide format, i.e. y[0] refers to T0, y[1], refers to T1, etc.
Similarly, assuming x[0:k] refers to independent variables for T0, x[k+1:2k] refers to T1, etc.
- Once we have run the model, we can explore a little bit the output. We can
- either request a printout of the results with the command print(reg.summary) or
- check out the individual attributes of GM_KKP:
- >>> print(reg.summary)
- REGRESSION
- ———-
- SUMMARY OF OUTPUT: GM SPATIAL ERROR PANEL MODEL - RANDOM EFFECTS (KKP)
- ———————————————————————-
- Data set
unknown
- Weights matrix
unknown
- Dependent Variable
HR
Number
of
Observations: 3085 - Mean dependent var6.4983
Number
of
Variables
3 - S.D. dependent var6.9529
Degrees
of
Freedom
3082 - Pseudo R-squared0.3248
- <BLANKLINE>
- ————————————————————————————
Variable Coefficient Std.Error z-Statistic Probability
- ————————————————————————————
- CONSTANT 6.4922156 0.1126713 57.6208690 0.0000000
RD 3.6244575 0.0877475 41.3055536 0.0000000 PS 1.3118778 0.0852516 15.3883058 0.0000000
lambda 0.4177759
sigma2_v 22.8190822 sigma2_1 39.9099323
- ————————————————————————————
- ================================ END OF REPORT =====================================
- >>> print(reg.name_x)
- [‘CONSTANT’, ‘RD’, ‘PS’, ‘lambda’, ‘ sigma2_v’, ‘sigma2_1’]
- The attribute reg.betas contains all the coefficients: betas, the spatial error
- coefficient lambda, sig2_v and sig2_1:
- >>> print(np.around(reg.betas,4))
- [[ 6.4922]
[ 3.6245] [ 1.3119] [ 0.4178] [22.8191] [39.9099]]
- Finally, we can check the standard erros of the betas:
- >>> print(np.around(np.sqrt(reg.vm.diagonal().reshape(3,1)),4))
- [[0.1127]
[0.0877] [0.0853]]
- betas
- __init__(y, x, w, full_weights=False, regimes=None, vm=False, name_y=None, name_x=None, name_w=None, name_ds=None, name_regimes=None)[source]¶
Methods
__init__
(y, x, w[, full_weights, regimes, ...])Attributes
- property mean_y¶
- property std_y¶