spreg.GM_Lag_Endog_Regimes¶
- class spreg.GM_Lag_Endog_Regimes(y, x, w, n_clusters=None, quorum=-1, trace=True, name_y=None, name_x=None, constant_regi='many', cols2regi='all', regime_err_sep=False, **kwargs)[source]¶
- Spatial two stage least squares (S2SLS) with endogenous regimes. Based on the function skater_reg as shown in [AA24]. - Parameters:
- ynumpy.ndarrayorpandas.Series
- nx1 array for dependent variable 
- xnumpy.ndarrayorpandasobject
- Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant 
- wpysalWobject
- Spatial weights object (required if running spatial diagnostics) 
- n_clustersint
- Number of clusters to be used in the endogenous regimes. If None (default), the number of clusters will be chosen according to the function utils.optim_k using a method adapted from Mojena (1977)’s Rule Two 
- quorumint
- Minimum number of observations in a cluster to be considered Must be at least larger than the number of variables in x Default value is 30 or 10*k, whichever is larger. 
- tracebool
- Sets whether to store intermediate results of the clustering Hard-coded to True if n_clusters is None 
- name_ystr
- Name of dependent variable for use in output 
- name_xlistofstrings
- Names of independent variables for use in output 
- name_wstr
- Name of weights matrix for use in output 
- name_gwkstr
- Name of kernel weights matrix for use in output 
- name_dsstr
- Name of dataset for use in output 
- name_regimesstr
- Name of regimes variable for use in output 
- latexbool
- Specifies if summary is to be printed in latex format 
- **kwargsadditionalkeywordargumentsdependingonthespecificmodel
 
- y
- Attributes:
- outputdataframe
- regression results pandas dataframe 
- summarystr
- Summary of regression results and diagnostics (note: use in conjunction with the print command) 
- betasarray
- kx1 array of estimated coefficients 
- uarray
- nx1 array of residuals 
- e_predarray
- nx1 array of residuals (using reduced form) 
- predyarray
- nx1 array of predicted y values 
- predy_earray
- nx1 array of predicted y values (using reduced form) 
- ninteger
- Number of observations 
- kinteger
- Number of variables for which coefficients are estimated (including the constant) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- kstarinteger
- Number of endogenous variables. Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- yarray
- nx1 array for dependent variable 
- xarray
- Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- yendarray
- Two dimensional array with n rows and one column for each endogenous variable Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- qarray
- Two dimensional array with n rows and one column for each external exogenous variable used as instruments Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- zarray
- nxk array of variables (combination of x and yend) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- harray
- nxl array of instruments (combination of x and q) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- robuststr
- Adjustment for robust standard errors Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- mean_yfloat
- Mean of dependent variable 
- std_yfloat
- Standard deviation of dependent variable 
- vmarray
- Variance covariance matrix (kxk) 
- pr2float
- Pseudo R squared (squared correlation between y and ypred) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- pr2_efloat
- Pseudo R squared (squared correlation between y and ypred_e (using reduced form)) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- utufloat
- Sum of squared residuals 
- sig2float
- Sigma squared used in computations Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- std_errarray
- 1xk array of standard errors of the betas Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- z_statlistoftuples
- z statistic; each tuple contains the pair (statistic, p-value), where each is a float Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- ak_testtuple
- Anselin-Kelejian test; tuple contains the pair (statistic, p-value) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- cfh_testtuple
- Common Factor Hypothesis test; tuple contains the pair (statistic, p-value). Only when it applies (see specific documentation). 
- name_ystr
- Name of dependent variable for use in output 
- name_xlistofstrings
- Names of independent variables for use in output 
- name_yendlistofstrings
- Names of endogenous variables for use in output 
- name_zlistofstrings
- Names of exogenous and endogenous variables for use in output 
- name_qlistofstrings
- Names of external instruments 
- name_hlistofstrings
- Names of all instruments used in ouput 
- name_wstr
- Name of weights matrix for use in output 
- name_gwkstr
- Name of kernel weights matrix for use in output 
- name_dsstr
- Name of dataset for use in output 
- name_regimesstr
- Name of regimes variable for use in output 
- titlestr
- Name of the regression method used Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- sig2nfloat
- Sigma squared (computed with n in the denominator) 
- sig2n_kfloat
- Sigma squared (computed with n-k in the denominator) 
- hthfloat
- \(H'H\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- hthifloat
- \((H'H)^{-1}\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- varbarray
- \((Z'H (H'H)^{-1} H'Z)^{-1}\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- zthhthiarray
- \(Z'H(H'H)^{-1}\). Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- pfora1a2array
- n(zthhthi)’varb Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- sp_multipliers: dict
- Dictionary of spatial multipliers (if spat_impacts is not None) Only available in dictionary ‘multi’ when multiple regressions (see ‘multi’ below for details) 
- regimeslist
- List of n values with the mapping of each observation to a regime. Assumed to be aligned with ‘x’. 
- constant_regi: string
- Ignored if regimes=False. Constant option for regimes. Switcher controlling the constant term setup. It may take the following values: - ‘one’: a vector of ones is appended to x and held constant across regimes. 
- ‘many’: a vector of ones is appended to x and considered different per regime. 
 
- cols2regilist, ‘all’
- Ignored if regimes=False. Argument indicating whether each column of x should be considered as different per regime or held constant across regimes (False). If a list, k booleans indicating for each variable the option (True if one per regime, False to be held constant). If ‘all’, all the variables vary by regime. 
- regime_lag_sep: boolean
- If True, the spatial parameter for spatial lag is also computed according to different regimes. If False (default), the spatial parameter is fixed accross regimes. 
- regime_err_sep: boolean
- If True, a separate regression is run for each regime. 
- krint
- Number of variables/columns to be “regimized” or subject to change by regime. These will result in one parameter estimate by regime for each variable (i.e. nr parameters per variable) 
- kfint
- Number of variables/columns to be considered fixed or global across regimes and hence only obtain one parameter estimate 
- nrint
- Number of different regimes in the ‘regimes’ list 
- multidictionary
- Only available when multiple regressions are estimated, i.e. when regime_err_sep=True and no variable is fixed across regimes. Contains all attributes of each individual regression 
- SSRlist
- list with the total sum of squared residuals for the model considering all regimes for each of steps of number of regimes considered, starting with the solution with 2 regimes. 
- clustersint
- Number of clusters considered in the endogenous regimes 
- _tracelist
- List of dictionaries with the clustering results for each number of clusters tested. Only available if n_clusters is None or trace=True. 
- Examples
- ——–
- >>> import libpysal
- >>> import numpy as np
- >>> np.set_printoptions(legacy=’1.25’) #to avoid printing issues with numpy floats
- >>> import geopandas as gpd
- >>> from spreg import GM_Lag_Endog_Regimes
- Open data on Baltimore house sales price and characteristics in Baltimore
- from libpysal examples using geopandas.
- >>> db = gpd.read_file(libpysal.examples.get_path(‘baltim.shp’))
- We will create a weights matrix based on contiguity.
- >>> w = libpysal.weights.Queen.from_dataframe(db, use_index=True)
- >>> w.transform = “r”
- For this example, we will use the ‘PRICE’ column as the dependent variable, and
- the ‘NROOM’, ‘AGE’, and ‘SQFT’ columns as independent variables.
- At this point, we will let the model choose the number of clusters.
- >>> reg = GM_Lag_Endog_Regimes(y=db[‘PRICE’], x=db[[‘NROOM’,’AGE’,’SQFT’]], w=w, name_w=”baltim_q.gal”)
- The function `print(reg.summary)` can be used to visualize the results of the regression.
- Alternatively, we can check individual attributes:
- >>> reg.betas
- array([[ 6.20932938],
- [ 4.25581944], [-0.1468118 ], [ 0.40893082], [ 5.01866492], [ 4.84994184], [-0.55425337], [ 1.04577632], [ 0.05155043]]) 
- >>> reg.SSR
- [59784.06769835169, 56858.621800274515]
- >>> reg.clusters
- array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int32) 
- We will now set the number of clusters to 2 and run the regression again.
- >>> reg = GM_Lag_Endog_Regimes(y=db[‘PRICE’], x=db[[‘NROOM’,’AGE’,’SQFT’]], w=w, n_clusters=2, name_w=”baltim_q.gal”)
- The function `print(reg.summary)` can be used to visualize the results of the regression.
- Alternatively, we can check individual attributes as before:
- >>> reg.betas
- array([[ 6.20932938],
- [ 4.25581944], [-0.1468118 ], [ 0.40893082], [ 5.01866492], [ 4.84994184], [-0.55425337], [ 1.04577632], [ 0.05155043]]) 
- >>> reg.SSR
- [59784.06769835169]
- >>> reg.clusters
- array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int32) 
 
- output
 - __init__(y, x, w, n_clusters=None, quorum=-1, trace=True, name_y=None, name_x=None, constant_regi='many', cols2regi='all', regime_err_sep=False, **kwargs)[source]¶
 - Methods - GM_Lag_Regimes_Multi(y, x, w_i, w, regi_ids)- __init__(y, x, w[, n_clusters, quorum, ...])- sp_att_reg(w_i, regi_ids, wy)- Attributes - GM_Lag_Regimes_Multi(y, x, w_i, w, regi_ids, cores=False, yend=None, q=None, w_lags=1, slx_lags=0, lag_q=True, robust=None, gwk=None, sig2n_k=False, cols2regi='all', spat_impacts=False, spat_diag=False, vm=False, name_y=None, name_x=None, name_yend=None, name_q=None, name_regimes=None, name_w=None, name_gwk=None, name_ds=None, latex=False, hard_bound=False)¶
 - property mean_y¶
 - property pfora1a2¶
 - property sig2n¶
 - property sig2n_k¶
 - sp_att_reg(w_i, regi_ids, wy)¶
 - property std_y¶
 - property utu¶
 - property vm¶