spreg.OLS

class spreg.OLS(y, x, w=None, robust=None, gwk=None, slx_lags=0, slx_vars='All', sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vif=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None, latex=False)[source]

Ordinary least squares with results and diagnostics.

Parameters:
ynumpy.ndarray or pandas.Series

nx1 array for dependent variable

xnumpy.ndarray or pandas object

Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant

wpysal W object

Spatial weights object (required if running spatial diagnostics)

robuststr

If ‘white’, then a White consistent estimator of the variance-covariance matrix is given. If ‘hac’, then a HAC consistent estimator of the variance-covariance matrix is given. Default set to None.

gwkpysal W object

Kernel spatial weights needed for HAC estimation. Note: matrix must have ones along the main diagonal.

slx_lagsinteger

Number of spatial lags of X to include in the model specification. If slx_lags>0, the specification becomes of the SLX type.

slx_varseither “All” (default) or list of booleans to select x variables

to be lagged

sig2n_kbool

If True, then use n-k to estimate sigma^2. If False, use n.

nonspat_diagbool

If True, then compute non-spatial diagnostics on the regression.

spat_diagbool

If True, then compute Lagrange multiplier tests (requires w). Note: see moran for further tests.

moranbool

If True, compute Moran’s I on the residuals. Note: requires spat_diag=True.

white_testbool

If True, compute White’s specification robust test. (requires nonspat_diag=True)

vifbool

If True, compute variance inflation factor.

vmbool

If True, include variance-covariance matrix in summary results

name_ystr

Name of dependent variable for use in output

name_xlist of strings

Names of independent variables for use in output

name_wstr

Name of weights matrix for use in output

name_gwkstr

Name of kernel weights matrix for use in output

name_dsstr

Name of dataset for use in output

latexbool

Specifies if the table with the coefficients’ results and their inference is to be printed in LaTeX format

Attributes:
outputdataframe

regression results pandas dataframe

summarystr

Summary of regression results and diagnostics (note: use in conjunction with the print command)

betasarray

kx1 array of estimated coefficients

uarray

nx1 array of residuals

predyarray

nx1 array of predicted y values

ninteger

Number of observations

kinteger

Number of variables for which coefficients are estimated (including the constant)

yarray

nx1 array for dependent variable

xarray

Two dimensional array with n rows and one column for each independent (exogenous) variable, including the constant

robuststr

Adjustment for robust standard errors

mean_yfloat

Mean of dependent variable

std_yfloat

Standard deviation of dependent variable

vmarray

Variance covariance matrix (kxk)

r2float

R squared

ar2float

Adjusted R squared

utufloat

Sum of squared residuals

sig2float

Sigma squared used in computations

sig2MLfloat

Sigma squared (maximum likelihood)

f_stattuple

Statistic (float), p-value (float)

logllfloat

Log likelihood

aicfloat

Akaike information criterion

schwarzfloat

Schwarz information criterion

std_errarray

1xk array of standard errors of the betas

t_statlist of tuples

t statistic; each tuple contains the pair (statistic, p-value), where each is a float

mulCollifloat

Multicollinearity condition number

jarque_beradictionary

‘jb’: Jarque-Bera statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)

breusch_pagandictionary

‘bp’: Breusch-Pagan statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)

koenker_bassettdictionary

‘kb’: Koenker-Bassett statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)

whitedictionary

‘wh’: White statistic (float); ‘pvalue’: p-value (float); ‘df’: degrees of freedom (int)

lm_errortuple

Lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float

lm_lagtuple

Lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float

rlm_errortuple

Robust lagrange multiplier test for spatial error model; tuple contains the pair (statistic, p-value), where each is a float

rlm_lagtuple

Robust lagrange multiplier test for spatial lag model; tuple contains the pair (statistic, p-value), where each is a float

lm_sarmatuple

Lagrange multiplier test for spatial SARMA model; tuple contains the pair (statistic, p-value), where each is a float

moran_restuple

Moran’s I for the residuals; tuple containing the triple (Moran’s I, standardized Moran’s I, p-value)

name_ystr

Name of dependent variable for use in output

name_xlist of strings

Names of independent variables for use in output

name_wstr

Name of weights matrix for use in output

name_gwkstr

Name of kernel weights matrix for use in output

name_dsstr

Name of dataset for use in output

titlestr

Name of the regression method used

sig2nfloat

Sigma squared (computed with n in the denominator)

sig2n_kfloat

Sigma squared (computed with n-k in the denominator)

xtxfloat

\(X'X\)

xtxifloat

\((X'X)^{-1}\)

Examples

>>> import numpy as np
>>> import libpysal
>>> from spreg import OLS

Open data on Columbus neighborhood crime (49 areas) using libpysal.io.open(). This is the DBF associated with the Columbus shapefile. Note that libpysal.io.open() also reads data in CSV format; also, the actual OLS class requires data to be passed in as numpy arrays so the user can read their data in using any method.

>>> db = libpysal.io.open(libpysal.examples.get_path('columbus.dbf'),'r')

Extract the HOVAL column (home values) from the DBF file and make it the dependent variable for the regression. Note that PySAL requires this to be an nx1 numpy array.

>>> hoval = db.by_col("HOVAL")
>>> y = np.array(hoval)
>>> y.shape = (len(hoval), 1)

Extract CRIME (crime) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that PySAL requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). spreg.OLS adds a vector of ones to the independent variables passed in.

>>> X = []
>>> X.append(db.by_col("INC"))
>>> X.append(db.by_col("CRIME"))
>>> X = np.array(X).T

The minimum parameters needed to run an ordinary least squares regression are the two numpy arrays containing the independent variable and dependent variables respectively. To make the printed results more meaningful, the user can pass in explicit names for the variables used; this is optional.

>>> ols = OLS(y, X, name_y='home value', name_x=['income','crime'], name_ds='columbus', white_test=True)

spreg.OLS computes the regression coefficients and their standard errors, t-stats and p-values. It also computes a large battery of diagnostics on the regression. In this example we compute the white test which by default isn’t (‘white_test=True’). All of these results can be independently accessed as attributes of the regression object created by running spreg.OLS. They can also be accessed at one time by printing the summary attribute of the regression object. In the example below, the parameter on crime is -0.4849, with a t-statistic of -2.6544 and p-value of 0.01087.

>>> ols.betas
array([[46.42818268],
       [ 0.62898397],
       [-0.48488854]])
>>> print(round(ols.t_stat[2][0],3))
-2.654
>>> print(round(ols.t_stat[2][1],3))
0.011
>>> print(round(ols.r2,3))
0.35

Or we can easily obtain a full summary of all the results nicely formatted and ready to be printed:

>>> print(ols.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :    columbus
Weights matrix      :        None
Dependent Variable  :  home value                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:       10647                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      46.4281827      13.1917570       3.5194844       0.0009867
              income       0.6289840       0.5359104       1.1736736       0.2465669
               crime      -0.4848885       0.1826729      -2.6544086       0.0108745
------------------------------------------------------------------------------------

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER           12.538

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2          39.706           0.0000

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test                2           5.767           0.0559
Koenker-Bassett test              2           2.270           0.3214

SPECIFICATION ROBUST TEST
TEST                             DF        VALUE           PROB
White                             5           2.906           0.7145
================================ END OF REPORT =====================================

If the optional parameters w and spat_diag are passed to spreg.OLS, spatial diagnostics will also be computed for the regression. These include Lagrange multiplier tests and Moran’s I of the residuals. The w parameter is a PySAL spatial weights matrix. In this example, w is built directly from the shapefile columbus.shp, but w can also be read in from a GAL or GWT file. In this case a rook contiguity weights matrix is built, but PySAL also offers queen contiguity, distance weights and k nearest neighbor weights among others. In the example, the Moran’s I of the residuals is 0.204 with a standardized value of 2.592 and a p-value of 0.0095.

>>> w = libpysal.weights.Rook.from_shapefile(libpysal.examples.get_path("columbus.shp"))
>>> ols = OLS(y, X, w, spat_diag=True, moran=True, name_y='home value', name_x=['income','crime'], name_ds='columbus')
>>> ols.betas
array([[46.42818268],
       [ 0.62898397],
       [-0.48488854]])
>>> print(round(ols.moran_res[0],3))
0.204
>>> print(round(ols.moran_res[1],3))
2.592
>>> print(round(ols.moran_res[2],4))
0.0095
__init__(y, x, w=None, robust=None, gwk=None, slx_lags=0, slx_vars='All', sig2n_k=True, nonspat_diag=True, spat_diag=False, moran=False, white_test=False, vif=False, vm=False, name_y=None, name_x=None, name_w=None, name_gwk=None, name_ds=None, latex=False)[source]

Methods

__init__(y, x[, w, robust, gwk, slx_lags, ...])

Attributes

mean_y

sig2n

sig2n_k

std_y

utu

vm

property mean_y
property sig2n
property sig2n_k
property std_y
property utu
property vm