spreg.OLS¶
- class spreg.OLS(y, x, w=None, robust=None, gwk=None, slx_lags=0, slx_vars='All', sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vif=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None, latex=False)[source]¶
Ordinary least squares with results and diagnostics.
- Parameters:
- y
numpy.ndarray
orpandas.Series
nx1 array for dependent variable
- x
numpy.ndarray
orpandas
object
Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant
- w
pysal
W
object
Spatial weights object (required if running spatial diagnostics)
- robust
str
If ‘white’, then a White consistent estimator of the variance-covariance matrix is given. If ‘hac’, then a HAC consistent estimator of the variance-covariance matrix is given. Default set to None.
- gwk
pysal
W
object
Kernel spatial weights needed for HAC estimation. Note: matrix must have ones along the main diagonal.
- slx_lags
integer
Number of spatial lags of X to include in the model specification. If slx_lags>0, the specification becomes of the SLX type.
- slx_vars
either
“All” (default
)or
list
of
booleans
to
select
x
variables
to be lagged
- sig2n_kbool
If True, then use n-k to estimate sigma^2. If False, use n.
- nonspat_diagbool
If True, then compute non-spatial diagnostics on the regression.
- spat_diagbool
If True, then compute Lagrange multiplier tests (requires w). Note: see moran for further tests.
- moranbool
If True, compute Moran’s I on the residuals. Note: requires spat_diag=True.
- white_testbool
If True, compute White’s specification robust test. (requires nonspat_diag=True)
- vifbool
If True, compute variance inflation factor.
- vmbool
If True, include variance-covariance matrix in summary results
- name_y
str
Name of dependent variable for use in output
- name_x
list
of
strings
Names of independent variables for use in output
- name_w
str
Name of weights matrix for use in output
- name_gwk
str
Name of kernel weights matrix for use in output
- name_ds
str
Name of dataset for use in output
- latexbool
Specifies if the table with the coefficients’ results and their inference is to be printed in LaTeX format
- y
- Attributes:
- output
dataframe
regression results pandas dataframe
- summary
str
Summary of regression results and diagnostics (note: use in conjunction with the print command)
- betas
array
kx1 array of estimated coefficients
- u
array
nx1 array of residuals
- predy
array
nx1 array of predicted y values
- n
integer
Number of observations
- k
integer
Number of variables for which coefficients are estimated (including the constant)
- y
array
nx1 array for dependent variable
- x
array
Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant
- robust
str
Adjustment for robust standard errors
- mean_y
float
Mean of dependent variable
- std_y
float
Standard deviation of dependent variable
- vm
array
Variance covariance matrix (kxk)
- r2
float
R squared
- ar2
float
Adjusted R squared
- utu
float
Sum of squared residuals
- sig2
float
Sigma squared used in computations
- sig2ML
float
Sigma squared (maximum likelihood)
- f_stat
tuple
Statistic (float), p-value (float)
- logll
float
Log likelihood
- aic
float
Akaike information criterion
- schwarz
float
Schwarz information criterion
- std_err
array
1xk array of standard errors of the betas
- t_stat
list
of
tuples
t statistic; each tuple contains the pair (statistic, p-value), where each is a float
- mulColli
float
Multicollinearity condition number
- jarque_bera
dictionary
‘jb’: Jarque-Bera statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- breusch_pagan
dictionary
‘bp’: Breusch-Pagan statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- koenker_bassett
dictionary
‘kb’: Koenker-Bassett statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- white
dictionary
‘wh’: White statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)
- lm_error
tuple
Lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float
- lm_lag
tuple
Lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float
- rlm_error
tuple
Robust lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float
- rlm_lag
tuple
Robust lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float
- lm_sarma
tuple
Lagrange multiplier test for spatial SARMA model; tuple contains the pair (statistic, p-value), where each is a float
- moran_res
tuple
Moran’s I for the residuals; tuple containing the triple (Moran’s I, standardized Moran’s I, p-value)
- name_y
str
Name of dependent variable for use in output
- name_x
list
of
strings
Names of independent variables for use in output
- name_w
str
Name of weights matrix for use in output
- name_gwk
str
Name of kernel weights matrix for use in output
- name_ds
str
Name of dataset for use in output
- title
str
Name of the regression method used
- sig2n
float
Sigma squared (computed with n in the denominator)
- sig2n_k
float
Sigma squared (computed with n-k in the denominator)
- xtx
float
\(X'X\)
- xtxi
float
\((X'X)^{-1}\)
- output
Examples
>>> import numpy as np >>> import libpysal >>> from spreg import OLS
Open data on Columbus neighborhood crime (49 areas) using libpysal.io.open(). This is the DBF associated with the Columbus shapefile. Note that libpysal.io.open() also reads data in CSV format; also, the actual OLS class requires data to be passed in as numpy arrays so the user can read their data in using any method.
>>> db = libpysal.io.open(libpysal.examples.get_path('columbus.dbf'),'r')
Extract the HOVAL column (home values) from the DBF file and make it the dependent variable for the regression. Note that PySAL requires this to be an nx1 numpy array.
>>> hoval = db.by_col("HOVAL") >>> y = np.array(hoval) >>> y.shape = (len(hoval), 1)
Extract CRIME (crime) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that PySAL requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). spreg.OLS adds a vector of ones to the independent variables passed in.
>>> X = [] >>> X.append(db.by_col("INC")) >>> X.append(db.by_col("CRIME")) >>> X = np.array(X).T
The minimum parameters needed to run an ordinary least squares regression are the two numpy arrays containing the independent variable and dependent variables respectively. To make the printed results more meaningful, the user can pass in explicit names for the variables used; this is optional.
>>> ols = OLS(y, X, name_y='home value', name_x=['income','crime'], name_ds='columbus', white_test=True)
spreg.OLS computes the regression coefficients and their standard errors, t-stats and p-values. It also computes a large battery of diagnostics on the regression. In this example we compute the white test which by default isn’t (‘white_test=True’). All of these results can be independently accessed as attributes of the regression object created by running spreg.OLS. They can also be accessed at one time by printing the summary attribute of the regression object. In the example below, the parameter on crime is -0.4849, with a t-statistic of -2.6544 and p-value of 0.01087.
>>> ols.betas array([[46.42818268], [ 0.62898397], [-0.48488854]]) >>> print(round(ols.t_stat[2][0],3)) -2.654 >>> print(round(ols.t_stat[2][1],3)) 0.011 >>> print(round(ols.r2,3)) 0.35
Or we can easily obtain a full summary of all the results nicely formatted and ready to be printed:
>>> print(ols.summary) REGRESSION RESULTS ------------------ SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES ----------------------------------------- Data set : columbus Weights matrix : None Dependent Variable : home value Number of Observations: 49 Mean dependent var : 38.4362 Number of Variables : 3 S.D. dependent var : 18.4661 Degrees of Freedom : 46 R-squared : 0.3495 Adjusted R-squared : 0.3212 Sum squared residual: 10647 F-statistic : 12.3582 Sigma-square : 231.457 Prob(F-statistic) : 5.064e-05 S.E. of regression : 15.214 Log likelihood : -201.368 Sigma-square ML : 217.286 Akaike info criterion : 408.735 S.E of regression ML: 14.7406 Schwarz criterion : 414.411 ------------------------------------------------------------------------------------ Variable Coefficient Std.Error t-Statistic Probability ------------------------------------------------------------------------------------ CONSTANT 46.4281827 13.1917570 3.5194844 0.0009867 income 0.6289840 0.5359104 1.1736736 0.2465669 crime -0.4848885 0.1826729 -2.6544086 0.0108745 ------------------------------------------------------------------------------------ REGRESSION DIAGNOSTICS MULTICOLLINEARITY CONDITION NUMBER 12.538 TEST ON NORMALITY OF ERRORS TEST DF VALUE PROB Jarque-Bera 2 39.706 0.0000 DIAGNOSTICS FOR HETEROSKEDASTICITY RANDOM COEFFICIENTS TEST DF VALUE PROB Breusch-Pagan test 2 5.767 0.0559 Koenker-Bassett test 2 2.270 0.3214 SPECIFICATION ROBUST TEST TEST DF VALUE PROB White 5 2.906 0.7145 ================================ END OF REPORT =====================================
If the optional parameters w and spat_diag are passed to spreg.OLS, spatial diagnostics will also be computed for the regression. These include Lagrange multiplier tests and Moran’s I of the residuals. The w parameter is a PySAL spatial weights matrix. In this example, w is built directly from the shapefile columbus.shp, but w can also be read in from a GAL or GWT file. In this case a rook contiguity weights matrix is built, but PySAL also offers queen contiguity, distance weights and k nearest neighbor weights among others. In the example, the Moran’s I of the residuals is 0.204 with a standardized value of 2.592 and a p-value of 0.0095.
>>> w = libpysal.weights.Rook.from_shapefile(libpysal.examples.get_path("columbus.shp")) >>> ols = OLS(y, X, w, spat_diag=True, moran=True, name_y='home value', name_x=['income','crime'], name_ds='columbus') >>> ols.betas array([[46.42818268], [ 0.62898397], [-0.48488854]])
>>> print(round(ols.moran_res[0],3)) 0.204 >>> print(round(ols.moran_res[1],3)) 2.592 >>> print(round(ols.moran_res[2],4)) 0.0095
- __init__(y, x, w=None, robust=None, gwk=None, slx_lags=0, slx_vars='All', sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vif=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None, latex=False)[source]¶
Methods
__init__
(y, x[, w, robust, gwk, slx_lags, ...])Attributes
- property mean_y¶
- property sig2n¶
- property sig2n_k¶
- property std_y¶
- property utu¶
- property vm¶