spreg.GM_Lag_Endog_Regimes¶
- class spreg.GM_Lag_Endog_Regimes(y, x, w, n_clusters=None, quorum=-1, trace=True, name_y=None, name_x=None, **kwargs)[source]¶
Spatial two stage least squares (S2SLS) with endogenous regimes. Based on the function skater_reg as shown in [AA24].
- Parameters:
- y
numpy.ndarray
orpandas.Series
nx1 array for dependent variable
- x
numpy.ndarray
orpandas
object
Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant
- w
pysal
W
object
Spatial weights object (required if running spatial diagnostics)
- n_clusters
int
Number of clusters to be used in the endogenous regimes. If None (default), the number of clusters will be chosen according to the function utils.optim_k using a method adapted from Mojena (1977)’s Rule Two
- quorum
int
Minimum number of observations in a cluster to be considered Must be at least larger than the number of variables in x Default value is 30 or 10*k, whichever is larger.
- tracebool
Sets whether to store intermediate results of the clustering Hard-coded to True if n_clusters is None
- name_y
str
Name of dependent variable for use in output
- name_x
list
of
strings
Names of independent variables for use in output
- name_w
str
Name of weights matrix for use in output
- name_gwk
str
Name of kernel weights matrix for use in output
- name_ds
str
Name of dataset for use in output
- name_regimes
str
Name of regimes variable for use in output
- latexbool
Specifies if summary is to be printed in latex format
- **kwargs
additional
keyword
arguments
depending
on
the
specific
model
- y
- Attributes:
- output
dataframe
regression results pandas dataframe
- summary
str
Summary of regression results and diagnostics (note: use in conjunction with the print command)
- betas
array
kx1 array of estimated coefficients
- u
array
nx1 array of residuals
- e_pred
array
nx1 array of residuals (using reduced form)
- predy
array
nx1 array of predicted y values
- predy_e
array
nx1 array of predicted y values (using reduced form)
- n
integer
Number of observations
- k
integer
Number of variables for which coefficients are estimated (including the constant) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- kstar
integer
Number of endogenous variables. Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- y
array
nx1 array for dependent variable
- x
array
Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- yend
array
Two dimensional array with n rows and one column for each endogenous variable Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- q
array
Two dimensional array with n rows and one column for each external exogenous variable used as instruments Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- z
array
nxk array of variables (combination of x and yend) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- h
array
nxl array of instruments (combination of x and q) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- robust
str
Adjustment for robust standard errors Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- mean_y
float
Mean of dependent variable
- std_y
float
Standard deviation of dependent variable
- vm
array
Variance covariance matrix (kxk)
- pr2
float
Pseudo R squared (squared correlation between y and ypred) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- pr2_e
float
Pseudo R squared (squared correlation between y and ypred_e (using reduced form)) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- utu
float
Sum of squared residuals
- sig2
float
Sigma squared used in computations Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- std_err
array
1xk array of standard errors of the betas Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- z_stat
list
of
tuples
z statistic; each tuple contains the pair (statistic, p-value), where each is a float Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- ak_test
tuple
Anselin-Kelejian test; tuple contains the pair (statistic, p-value) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- cfh_test
tuple
Common Factor Hypothesis test; tuple contains the pair (statistic, p-value). Only when it applies (see specific documentation).
- name_y
str
Name of dependent variable for use in output
- name_x
list
of
strings
Names of independent variables for use in output
- name_yend
list
of
strings
Names of endogenous variables for use in output
- name_z
list
of
strings
Names of exogenous and endogenous variables for use in output
- name_q
list
of
strings
Names of external instruments
- name_h
list
of
strings
Names of all instruments used in ouput
- name_w
str
Name of weights matrix for use in output
- name_gwk
str
Name of kernel weights matrix for use in output
- name_ds
str
Name of dataset for use in output
- name_regimes
str
Name of regimes variable for use in output
- title
str
Name of the regression method used Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- sig2n
float
Sigma squared (computed with n in the denominator)
- sig2n_k
float
Sigma squared (computed with n-k in the denominator)
- hth
float
\(H'H\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- hthi
float
\((H'H)^{-1}\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- varb
array
\((Z'H (H'H)^{-1} H'Z)^{-1}\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- zthhthi
array
\(Z'H(H'H)^{-1}\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- pfora1a2
array
n(zthhthi)’varb Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- sp_multipliers: dict
Dictionary of spatial multipliers (if spat_impacts is not None) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details)
- regimes
list
List of n values with the mapping of each observation to a regime. Assumed to be aligned with ‘x’.
- constant_regi: string
Ignored if regimes=False. Constant option for regimes. Switcher controlling the constant term setup. It may take the following values:
‘one’: a vector of ones is appended to x and held constant across regimes.
‘many’: a vector of ones is appended to x and considered different per regime.
- cols2regi
list
, ‘all’ Ignored if regimes=False. Argument indicating whether each column of x should be considered as different per regime or held constant across regimes (False). If a list, k booleans indicating for each variable the option (True if one per regime, False to be held constant). If ‘all’, all the variables vary by regime.
- regime_lag_sep: boolean
If True, the spatial parameter for spatial lag is also computed according to different regimes. If False (default), the spatial parameter is fixed accross regimes.
- regime_err_sep: boolean
If True, a separate regression is run for each regime.
- kr
int
Number of variables/columns to be “regimized” or subject to change by regime. These will result in one parameter estimate by regime for each variable (i.e. nr parameters per variable)
- kf
int
Number of variables/columns to be considered fixed or global across regimes and hence only obtain one parameter estimate
- nr
int
Number of different regimes in the ‘regimes’ list
- multi
dictionary
Only available when multiple regressions are estimated, i.e. when regime_err_sep=True and no variable is fixed across regimes. Contains all attributes of each individual regression
- SSR
list
list with the total sum of squared residuals for the model considering all regimes for each of steps of number of regimes considered, starting with the solution with 2 regimes.
- clusters
int
Number of clusters considered in the endogenous regimes
- _trace
list
List of dictionaries with the clustering results for each number of clusters tested. Only available if n_clusters is None or trace=True.
- Examples
- ——–
- >>> import libpysal
- >>> import numpy as np
- >>> np.set_printoptions(legacy=’1.25’) #to avoid printing issues with numpy floats
- >>> import geopandas as gpd
- >>> from spreg import OLS_Endog_Regimes
- Open data on Baltimore house sales price and characteristics in Baltimore
- from libpysal examples using geopandas.
- >>> db = gpd.read_file(libpysal.examples.get_path(‘baltim.shp’))
- We will create a weights matrix based on contiguity.
- >>> w = libpysal.weights.Queen.from_dataframe(db, use_index=True)
- >>> w.transform = “r”
- For this example, we will use the ‘PRICE’ column as the dependent variable and
- the ‘NROOM’, ‘AGE’, and ‘SQFT’ columns as independent variables.
- At this point, we will let the model choose the number of clusters.
- >>> reg = GM_Lag_Endog_Regimes(y=db[‘PRICE’], x=db[[‘NROOM’,’AGE’,’SQFT’]], w=w, name_w=”baltim_q.gal”)
- The function `print(reg.summary)` can be used to visualize the results of the regression.
- Alternatively, we can check individual attributes:
- >>> reg.betas
- array([[ 6.20932938],
[ 4.25581944], [-0.1468118 ], [ 0.40893082], [ 5.01866492], [ 4.84994184], [-0.55425337], [ 1.04577632], [ 0.05155043]])
- >>> reg.SSR
- [59784.06769835169, 56858.621800274515]
- >>> reg.clusters
- array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int32)
- We will now set the number of clusters to 2 and run the regression again.
- >>> reg = GM_Lag_Endog_Regimes(y=db[‘PRICE’], x=db[[‘NROOM’,’AGE’,’SQFT’]], w=w, n_clusters=2, name_w=”baltim_q.gal”)
- The function `print(reg.summary)` can be used to visualize the results of the regression.
- Alternatively, we can check individual attributes as before:
- >>> reg.betas
- array([[ 6.20932938],
[ 4.25581944], [-0.1468118 ], [ 0.40893082], [ 5.01866492], [ 4.84994184], [-0.55425337], [ 1.04577632], [ 0.05155043]])
- >>> reg.SSR
- [59784.06769835169]
- >>> reg.clusters
- array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int32)
- output
- __init__(y, x, w, n_clusters=None, quorum=-1, trace=True, name_y=None, name_x=None, **kwargs)[source]¶
Methods
GM_Lag_Regimes_Multi
(y, x, w_i, w, regi_ids)__init__
(y, x, w[, n_clusters, quorum, ...])sp_att_reg
(w_i, regi_ids, wy)Attributes
- GM_Lag_Regimes_Multi(y, x, w_i, w, regi_ids, cores=False, yend=None, q=None, w_lags=1, slx_lags=0, lag_q=True, robust=None, gwk=None, sig2n_k=False, cols2regi='all', spat_impacts=False, spat_diag=False, vm=False, name_y=None, name_x=None, name_yend=None, name_q=None, name_regimes=None, name_w=None, name_gwk=None, name_ds=None, latex=False, hard_bound=False)¶
- property mean_y¶
- property pfora1a2¶
- property sig2n¶
- property sig2n_k¶
- sp_att_reg(w_i, regi_ids, wy)¶
- property std_y¶
- property utu¶
- property vm¶